CDH Parcels 离线安装

来源:互联网 发布:java 文件上传原理 编辑:程序博客网 时间:2024/05/27 10:43
Installation Path A - Automated Installation by ClouderaManager
要求所有机器都能连网,而且外国网站不太稳定。一旦失败,重装非常痛苦。
Installation PathB - Manual Installation Using Cloudera ManagerPackages
设置RedHat/CentOS或者Debian/Ubuntu,下载系统package安装,下载量数目众多
Installation PathC - Manual Installation Using Tarballs andParcels安装步骤
该方法对系统侵入性最小,最大优点可实现全离线安装,而且重装什么的都非常方便。后期的集群统一包升级也非常好 

(一)前置条件(所有结点)
  • 关闭防火墙: 
    service iptables stop(临时关闭)
    chkconfig iptables off(重启后生效)
  • 关闭SELINUX: 
    setenforce 0(临时生效)  
    修改 /etc/selinux/config 下的SELINUX=disabled (重启后永久生效)
            sestatus 检查状态
  • Cloudera-Manager-Server与Cloudera-Manager-Agents之间SSH免密码登陆
(二)基础组件(所有结点)
  • 安装JDK1.7
  • 安装Python 2.6 or2.7
(三)配置安装用户与目录(所有结点)
$ mkdir/opt/cloudera-manager
$ useradd --system--home=/opt/cloudera-manager/cm-5.0.0/run/cloudera-scm-server--no-create-home --shell=/bin/false --comment "Cloudera SCM User"cloudera-scm
$ chown -Rcloudera-scm:cloudera-scm /opt/cloudera-manager
$ mkdir -p/opt/cloudera/parcel-repo
$ chowncloudera-scm:cloudera-scm /opt/cloudera

(四)下载配置ClouderaManager Server与Cloudera Manager Agent(所有结点)
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM5/latest/Cloudera-Manager-Version-and-Download-Information/Cloudera-Manager-Version-and-Download-Information.html?scroll=cmvd_topic_1
Tarball Files 针对RHEL6/CentOS6为:cloudera-manager-el6-cm5.0.0_x86_64.tar.gz
$ tar -xzvf cloudera-manager*.tar.gz -C /opt/cloudera-manager(/opt/cloudera-manager/cm-5.0.0将是CDH根目录
$ vim /etc/cloudera-scm-agent/config.ini
  编辑 server_host=
hadoop-1 (主节点HOST IP)

(五)下载配置Parcel本地安装源(主结点)
yum install httpd

下载CDH Parcels 
http://archive-primary.cloudera.com/cdh5/parcels/ 最新RHEL6/CentOS6 为:CDH-5.0.0-1.cdh5.0.0.p0.47-el6.parcel
mkdir /var/www/html/cdh5.0
$ mv CDH-5.0.0-1.cdh5.0.0.p0.47-el6.parcel /var/www/html/cdh5.0

下载CDH manifest.json 
$ mv manifest.json/var/www/html/cdh5.0

$ chmod -R ugo+rX/var/www/html/cdh5.0

$ service httpdstart
访问Apache Httpd验证安装包 http://hostname/cdh5.0

(六)配置Cloudera Configuration Service数据库(主结点MySQL数据库)
  • 安装外部数据库(Oracle/MySQL/PostgreSQL),RDBMS字符集必须支持UTF-8,如Oracle设置为AL32UTF8
  • 下载安装数据库驱动(主结点) 
$ cp/tmp/mysql-connector-java-5.1.30.jar/usr/share/java/mysql-connector-java.jar
  • 创建Cloudera CDH配置数据库用户及授权
$mysql-hhadoop-1 -uroot -p
mysql> CREATE USER cdhusr IDENTIFIEDBY 'cdhpwd';
mysql >CREATEDATABASE cdh_cfg  DEFAULT CHARACTER SET utf8;
mysql> GRANTALL ON cdh_cfg.* TO 'cdhusr'@'%' IDENTIFIED BY 'cdhpwd';
  • 运行脚本自动创建Cloudera CDH配置数据库
$/share/cmf/schema/scm_prepare_database.sh-hhadoop-1 -uroot -proot--scm-host hadoop-10 mysql cdh_cfg cdhusr cdhpwd
  • 创建ClouderaCDH 数据库及用户帐号
  1. Reports Manager(必装) 
  2. Hive Metastore(必装)
  3. Activity Monitor(仅MRv1需要) 
  4. Cloudera Navigator(Data HubEdition Trial 或者 ClouderaEnterprise可装)
$mysql -hhadoop-1  -uroot -p
mysql >CREATE USER reports IDENTIFIED BY'
reports';
mysql >CREATEDATABASE cdh_
reports DEFAULT CHARACTER SETutf8;
mysql> GRANT ALL ON 
cdh_reports .* TO 'reports'@'%' IDENTIFIED BY 'reports';
mysql >CREATE USER hiveIDENTIFIED BY 'hive';
mysql >CREATEDATABASE cdh_hive DEFAULTCHARACTER SET utf8;
mysql> GRANT ALL ON 
cdh_hive.* TO 'hive'@'%' IDENTIFIED BY'hive'; 
mysql >CREATE USERactivity@172.16.36.191 IDENTIFIED BY 'activity';
mysql >CREATEDATABASE cdh_
activity DEFAULT CHARACTER SETutf8;
mysql> GRANT ALL ON 
cdh_activity.* TO 'activity'@'%' IDENTIFIED BY 'activity';  


(七)启动Cloudera ManagerServer与Cloudera Manager Agent
启动Cloudera ManagerServer(主结点) 
$ vim/etc/init.d/cloudera-scm-server
将CMF_DEFAULTS 由 ${CMF_DEFAULTS:-/etc/default} 修改为/etc/default
$ cp/etc/init.d/cloudera-scm-server/etc/init.d/cloudera-scm-server
$/etc/init.d/cloudera-scm-server start
$chkconfig cloudera-scm-server on
启动Cloudera Manager Agent(从结点)
$ vim/etc/init.d/cloudera-scm-agent
将CMF_DEFAULTS 由 ${CMF_DEFAULTS:-/etc/default} 修改为/etc/default
$ cp/etc/init.d/cloudera-scm-agent/etc/init.d/cloudera-scm-agent
$/etc/init.d/cloudera-scm-agent start 
$chkconfig cloudera-scm-agent on  

(八)根据向导安装
启动Cloudera Manager AdminConsole  http://hadoop-10:7180   admin/admin,选择ClouderaExpress进行安装。

添加本地Parcel配置源:http://hadoop-1/cdh5.0/, 根据需要选择自定义安装组件。一般情况下除Zookeeper需要修改三个主机部署外,如无特殊原因建议按照默认配置进行安装。推荐保持默认,挂载HDFS文件、Hive数据仓库、Zookeeper等数据目录所在磁盘进行安装。

BTW:安装成功后,也可根据下述步骤配置或更新Parcel本地源:
  1. Do one of the followingto open the parcel settings page:
      1. Click CDH <wbr>Parcels <wbr>离线安装 inthe top navigation bar
      2. Clickthe EditSettings button.
      1. Select Administration Settings.
      2. Clickthe Parcels category.
      1. Clickthe Hosts tab.
      2. Select Configuration Viewand Edit.
      3. Clickthe Parcels category.
      4. Clickthe EditSettings button.
  2. Inthe RemoteParcel Repository URLs list,click CDH <wbr>Parcels <wbr>离线安装 toopen an additional row.
  3. Enter the path to theparcel. For example, http://hostname:80/cdh5.0/.
  4. Click SaveChanges to commit the changes.
(九)Cloudera默认系统安装概要

组件安装版本

CDH Packaging and TarballInformation

组件

版本

Apache Hadoop

2.3.0-cdh5.0.0

Apache Hadoop MRv1

2.3.0-mr1-cdh5.0.0

Apache Hive

0.12.0-cdh5.0.0

Apache HBase

0.96.1.1-cdh5.0.0

Apache ZooKeeper

3.4.5-cdh5.0.0

Apache Sqoop 1

1.4.4-cdh5.0.0

Apache Sqoop2

1.99.3-cdh5.0.0

Apache Pig

0.12.0-cdh5.0.0

Apache Flume

1.4.0-cdh5.0.0

Apache Oozie

4.0.0-cdh5.0.0

Apache Mahout

0.8-cdh5.0.0

Apache Whirr

0.9.0-cdh5.0.0

DataFu

1.1.0-cdh5.0.0

Apache Sentry (incubating)

1.2.0-cdh5.0.0

Parquet

1.2.5-cdh5.0.0

Llama

1.0.0-cdh5.0.0

Apache Spark

0.9.0-cdh5.0.0

Apache Crunch

0.9.0-cdh5.0.0

Apache Avro

1.7.5-cdh5.0.0

Kite SDK

0.10.0-cdh5.0.0

Apache Solr

4.4.0-cdh5.0.0

Cloudera Search

1.0.0-cdh5.0.0

Lily HBase Indexer

1.3-cdh5.0.0

ClouderaManager

服务

实例

说明

路径

cloudera-manager

Server

组件目录

/opt/cloudera-manager/cm-5.0.0/lib/cloudera-scm-server

cloudera-manager

Server

启动目录

/opt/cloudera-manager/cm-5.0.0/etc/init.d/cloudera-scm-server

cloudera-manager

Server

日志目录

/opt/cloudera-manager/cm-5.0.0/log/cloudera-scm-server

cloudera-manager

Agent

组件目录

/opt/cloudera-manager/cm-5.0.0/lib/cloudera-scm-agent

cloudera-manager

Agent

启动目录

/opt/cloudera-manager/cm-5.0.0/etc/init.d/cloudera-scm-agent

cloudera-manager

Agent

日志目录

/opt/cloudera-manager/cm-5.0.0/log/cloudera-scm-agent

ManagementService

服务

实例

说明

路径

mgmt

alertpublisher 

Alert Publisher 组件目录

/var/lib/cloudera-scm-alertpublisher

mgmt

alertpublisher 

Alert Publisher 日志目录

/var/log/cloudera-scm-alertpublisher

mgmt

eventserver

Event Server 组件目录

/var/lib/cloudera-scm-eventserver

mgmt

eventserver

Event Server 日志目录

/var/log/cloudera-scm-eventserver

mgmt

hostmonitor

Host Monitor 组件目录

/var/lib/cloudera-host-monitor

mgmt

hostmonitor

Host Monitor 日志目录

/var/log/cloudera-scm-firehose

mgmt

servicemonitor

Service Monitor 存储目录

/var/lib/cloudera-service-monitor

mgmt

servicemonitor

Service Monitor 日志目录

/var/log/cloudera-scm-firehose

mgmt

headlamp

headlamp 存储目录

/var/lib/cloudera-scm-headlamp

mgmt

headlamp

headlamp日志目录

/var/log/cloudera-scm-headlamp

ComponentLib

服务

路径

zookeeper

/opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/

hadoop

/opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/hadoop

hadoop-hdfs

/opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/hadoop-hdfs

hadoop-mapreduce

/opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/hadoop-mapreduce

hadoop-yarn

/opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/hadoop-yarn

hbase

/opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/hbase

hive

/opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/hive

spark

/opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/spark

ComponentConfig

服务

路径

zookeeper

/etc/zookeeper/conf

hadoop

/etc/hadoop/conf

hadoop-hdfs

/etc/hadoop/conf.cloudera.hdfs

hadoop-mapreduce

/etc/hadoop/conf.cloudera.mapreduce1

hadoop-yarn

/etc/hadoop/conf.cloudera.yarn

hbase

/etc/hbase/conf

hive

/etc/hive/conf

spark

/etc/spark/conf

ComponentShell

服务

路径

zookeeper

/usr/bin/zookeeper-client ->/etc/alternatives/zookeeper-client

zookeeper

/usr/bin/zookeeper-server ->/etc/alternatives/zookeeper-server

hadoop

/usr/bin/hadoop -> /etc/alternatives/hadoop

hadoop-hdfs

/usr/bin/hdfs -> /etc/alternatives/hdfs

hadoop-mapred

/usr/bin/mapred -> /etc/alternatives/mapred

hadoop-yarn

/usr/bin/yarn -> /etc/alternatives/yarn

spark

/usr/bin/spark-shell ->/etc/alternatives/spark-shell

spark

/usr/bin/spark-executor ->/etc/alternatives/spark-executor

hbase

/usr/bin/hbase -> /etc/alternatives/hbase

hive

/usr/bin/hive -> /etc/alternatives/hive

ComponentLog

服务

路径

zookeeper

/var/log/zookeeper/

hadoop-hdfs

/var/log/hadoop-hdfs

hadoop-mapred

/var/log/hadoop-mapreduce

hadoop-yarn

/var/log/hadoop-yarn

spark

/var/log/spark/

hbase

/var/log/hbase

hive

/var/log/hive

(十)验证安装
安装完成后,首先确认MapReduceJobs的运行框架,可通过修改Linux alternatives的方式进行切换
[root@hadoop-2hadoop]# ll /etc/hadoop/conf
lrwxrwxrwx 1root root 29 Apr 30 17:23 /etc/hadoop/conf ->/etc/alternatives/hadoop-conf
[root@hadoop-2hadoop]# ll /etc/alternatives/hadoop-conf
lrwxrwxrwx 1root root 30 Apr 30 17:23 /etc/alternatives/hadoop-conf-> /etc/hadoop/conf.cloudera.yarn 
这表示目前集群MapReduce运行于YARN MRv2上
[root@hadoop-9 hadoop]# ll/etc/hadoop/conf
lrwxrwxrwx 1 root root 29 Apr 30 17:23/etc/hadoop/conf -> /etc/alternatives/hadoop-conf
[root@hadoop-9 hadoop]# ll/etc/alternatives/hadoop-conf
lrwxrwxrwx 1 root root 36 Apr 30 17:23/etc/alternatives/hadoop-conf -> /etc/hadoop/conf.cloudera.mapreduce1
这表示目前集群MapReduce运行于MRv1上

sudo -u hdfshadoop jar/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jarpi 10 100
取决与MapReduce Jobs配置运行于YARN或者MapReduceService,登录相应的控制台界面进行查看:
  • Clusters ClusterName yarnApplications
  • Clusters ClusterName mapreduceActivities

(十一)Ports Used by Components of CDH5
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM5/latest/Cloudera-Manager-Installation-Guide/cm5ig_ports_cdh5.html

升级问题:
   CDH4升级CDH5时查看配置会发现hadoop仍然执行MRv1,如下:
# ll/etc/alternatives/hadoop-conf 
lrwxrwxrwx 1root root 36 May 9 14:02 /etc/alternatives/hadoop-conf ->/etc/hadoop/conf.cloudera.mapreduce1
CDH通过Linux alternatives管理MR分布式计算框架,因此进行如下调整:
# alternatives--set hadoop-conf /etc/hadoop/conf.cloudera.yarn;
# alternatives--remove hadoop-conf/etc/hadoop/conf.cloudera.mapreduce1;
# alternatives--remove hadoop-conf /etc/hadoop/conf.cloudera.hdfs1;
# rm -rf/etc/hadoop/conf*1;
回到ClouderaManager Admin Console重新部署客户端配置后,再次查看如下:
# ll/etc/alternatives/hadoop-conf
lrwxrwxrwx 1root root 30 May 12 09:17 /etc/alternatives/hadoop-conf ->/etc/hadoop/conf.cloudera.yarn

配置问题:
1)Cloudera 建议将/proc/sys/vm/swappiness 设置为 0。当前设置为 60。使用 sysctl 命令在运行时更改该设置并编辑/etc/sysctl.conf 以在重启后保存该设置。您可以继续进行安装,但可能会遇到问题,Cloudera Manager报告您的主机由于交换运行状况不佳。以下主机受到影响: 
解决办法:
$ echo 0 >/proc/sys/vm/swappiness
$ vim /etc/sysctl.conf
增加 vm.swappiness = 0
2)Running in non-interactive mode, and dataappears to exist in Storage Directory /dfs/nn. Notformatting
解决办法:
检查NameNode、SecondaryName、DataNode数据目录是否为空,如有文件则需要备份后清除;很有可能由于反复安装造成垃圾数据或升级遗留历史数据
3)时钟偏差
解决办法:设置NTP时间同步,所有节点安装NTP服务 yum installntp:
  • 主节点
  1. 根据提示设置时区$ tzselect  [ 5) Asia-> 9)China -> 1) east China -> 1) Yes ]
  2. 查看系统时间 $date设置系统时间$ date --set "04/25/09 10:19" (月/日/年时:分:秒) 
  3. 查看硬件时间 $hwclock --show
  4. 同步硬件时间 $clock -w $hwclock--hctosys (hc代表硬件时间,sys代表系统时间)
  5. 修改NTP配置$ vim /etc/ntp.conf
  6. restrict 172.16.66.0 mask 255.255.255.0
  7. server 127.127.1.0 
  8. fudge 127.127.1.0
  9. 重启并加入开机启动$service ntpd restart $ chkconfig ntpdon
  • 子节点
  1. 修改NTP配置$vim /etc/ntp.conf 注释掉其他server,并添加与主节点同步
  2. server172.16.66.138
  3. 重启并加入开机启动$service ntpd restart $chkconfig ntpd on
4 ) 401 Unauthorized: ERROR Failed to connectto newly launched supervisor. Agent willexit 
解决办法:
$/etc/init.d/cloudera-scm-agenthard_stop
$ kill -9 $(pgrep -fsupervisord)
$/etc/init.d/cloudera-scm-agentstart
5)已启用“透明大页面”,它可能会导致重大的性能问题。版本为“Red Hat EnterpriseLinux Server release 6.4 (Santiago)”且版本为“2.6.32-358.el6.x86_64”的Kernel 已将 enabled 设置为“[always] never”,并将 defrag 设置为“[always]never”。请运行“echo never >/sys/kernel/mm/redhat_transparent_hugepage/defrag”以禁用此设置。然后将同一命令添加到一个init 脚本中,如 /etc/rc.local,这样当系统重启时就会设置它。或者,升级到 RHEL 6.5或更新版本,它们不存在此错误。
解决办法:根据提示操作
6)主机主机检查器检查主机失败: Inspectorfailed on the following hosts...
解决办法:
修改/etc/host,参考如下:
127.0.0.1 localhost
::1localhost6
172.16.66.129 hadoop-1.certus.comhadoop-1
7)java.io.IOException: the pathcomponent: '/' is world-writable. Its permissions are 0777. Pleasefix this or select a different socket path
解决办法:
DataNode的root根目录权限设置为0777太高导致不安全,修改为755或者默认权限
8)java.io.IOException: Cannotrun program "/etc/hadoop/conf.cloudera.yarn/topology.py" (indirectory "/root"): error=13, Permission denied
很有可能/root目录没有“执行权限”,尝试 chmod +x/root

卸载Cloudera:
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM5/latest/Cloudera-Manager-Installation-Guide/cm5ig_uninstall_cm.html#cmig_topic_18
0 0