Hadoop2.4.1集群搭建

来源:互联网 发布:gameloft的java小游戏 编辑:程序博客网 时间:2024/05/22 09:52


本文搭建Hadoop2.4.1分布式系统,包括NameNode ,ResourceManger HA,忽略了Web Application Proxy 和Job HistoryServer。 

一概述(一)HDFS


1)基础架构


(1)NameNode(Master)

  • 命名空间管理:命名空间支持对HDFS中的目录、文件和块做类似文件系统的创建、修改、删除、列表文件和目录等基本操作。
  • 块存储管理



(2)DataNode(Slaver)

namenode和client的指令进行存储或者检索block,并且周期性的向namenode节点报告它存了哪些文件的block


2)HA架构









使用Active NameNode,Standby NameNode 两个结点解决单点问题,两个结点通过JounalNode共享状态,通过ZKFC 选举Active ,监控状态,自动备援。


(1)Active NameNode:

接受client的RPC请求并处理,同时写自己的Editlog和共享存储上的Editlog,接收DataNode的Block report, block location updates和heartbeat;


(2)Standby NameNode:

      同样会接到来自DataNode的Block report, block location updates和heartbeat,同时会从共享存储的Editlog上读取并执行这些log操作,使得自己的NameNode中的元数据(Namespcae information + Block locations map)都是和Active NameNode中的元数据是同步的。所以说Standby模式的NameNode是一个热备(Hot Standby NameNode),一旦切换成Active模式,马上就可以提供NameNode服务


(3)JounalNode:

用于Active NameNode , Standby NameNode 同步数据,本身由一组JounnalNode结点组成,该组结点基数个,支持Paxos协议,保证高可用,是CDH5唯一支持的共享方式(相对于CDH4 促在NFS共享方式)


(4)ZKFC:

监控NameNode进程,自动备援。


(二)YARN



1)基础架构


(1)ResourceManager(RM)

接收客户端任务请求,接收和监控NodeManager(NM)的资源情况汇报,负责资源的分配与调度,启动和监控ApplicationMaster(AM)。


(2)NodeManager

节点上的资源管理,启动Container运行task计算,上报资源、container情况给RM和任务处理情况给AM。


(3)ApplicationMaster

单个Application(Job)的task管理和调度,向RM进行资源的申请,向NM发出launch Container指令,接收NM的task处理状态信息。NodeManager


(4)Web Application Proxy

用于防止Yarn遭受Web攻击,本身是ResourceManager的一部分,可通过配置独立进程。ResourceManager Web的访问基于守信用户,当Application Master运行于一个非受信用户,其提供给ResourceManager的可能是非受信连接,Web Application Proxy可以阻止这种连接提供给RM。


(5)Job History Server

NodeManager在启动的时候会初始化LogAggregationService服务, 该服务会在把本机执行的container log (在container结束的时候)收集并存放到hdfs指定的目录下. ApplicationMaster会把jobhistory信息写到hdfs的jobhistory临时目录下, 并在结束的时候把jobhisoty移动到最终目录, 这样就同时支持了job的recovery.History会启动web和RPC服务, 用户可以通过网页或RPC方式获取作业的信息


2)HA架构









      ResourceManager HA 由一对Active,Standby结点构成,通过RMStateStore存储内部数据和主要应用的数据及标记。目前支持的可替代的RMStateStore实现有:基于内存的MemoryRMStateStore,基于文件系统的FileSystemRMStateStore,及基于zookeeper的ZKRMStateStore。

ResourceManager HA的架构模式同NameNode HA的架构模式基本一致,数据共享由RMStateStore,而ZKFC成为 ResourceManager进程的一个服务,非独立存在。



二 规划



(一)版本

组件名
版本
说明
JDK
jdk1.7.0_71


Hadoop
hadoop-2.4.1.tar.gz
主程序包
Zookeeper
 zookeeper-3.4.5.tar.gz
热切,Yarn 存储数据使用的协调服务



(二)主机规划

IP
Host
部署模块
进程
192.168.102
drh02
NameNode
ResourceManager
DataNode
NodeManager
Zookeeper
NameNode
DFSZKFailoverController
ResourceManager
DataNode
NodeManager
JournalNode
QuorumPeerMain
192.168.103
drh03
NameNode
ResourceManager
DataNode
NodeManager
Zookeeper
NameNode
DFSZKFailoverController
ResourceManager
DataNode
NodeManager
JournalNode
QuorumPeerMain
192.168.104
drh04
DataNode
NodeManager
Zookeeper
DataNode
NodeManager
JournalNode
QuorumPeerMain
 因本人条件(在虚拟机中搭建)使用最少机器的方案
下面列举几个其他的:

IP
Host
部署模块
进程
8.8.8.11
Hadoop-NN-01
NameNode
ResourceManager
NameNode
DFSZKFailoverController
ResourceManager
8.8.8.13
Hadoop-NN-02
NameNode
ResourceManager
NameNode
DFSZKFailoverController
ResourceManager
8.8.8.13
Hadoop-DN-01
Zookeeper-01
DataNode
NodeManager
Zookeeper
DataNode
NodeManager
JournalNode
QuorumPeerMain
8.8.8.14
Hadoop-DN-02
Zookeeper-02
DataNode
NodeManager
Zookeeper
DataNode
NodeManager
JournalNode
QuorumPeerMain
8.8.8.15
Hadoop-DN-03
Zookeeper-03
DataNode
NodeManager
Zookeeper
DataNode
NodeManager
JournalNode
QuorumPeerMain
或者NN和RM分别2台机器!
     主机名          IP                    安装的软件                         运行的进程
     drh01     192.168.68.201     jdk、hadoop                         NameNode、DFSZKFailoverController(zkfc)
     drh02     192.168.68.202     jdk、hadoop                         NameNode、DFSZKFailoverController(zkfc)
     drh03     192.168.68.203     jdk、hadoop                         ResourceManager
     drh04     192.168.68.204     jdk、hadoop                         ResourceManager
     drh05     192.168.68.205     jdk、hadoop、zookeeper          DataNode、NodeManager、JournalNode、QuorumPeerMain
     drh06     192.168.68.206     jdk、hadoop、zookeeper          DataNode、NodeManager、JournalNode、QuorumPeerMain
     drh07     192.168.68.207     jdk、hadoop、zookeeper          DataNode、NodeManager、JournalNode、QuorumPeerMain

各个进程解释:

  • NameNode:   hdfs的cluster
  • ResourceManager:   yarn的cluster
  • DFSZKFC:DFS Zookeeper Failover Controller 激活Standby NameNode
  • DataNode:    hdfs的master
  • NodeManager    
  • JournalNode:NameNode共享editlog结点服务(如果使用NFS共享,则该进程和所有启动相关配置接可省略)。
  • QuorumPeerMain:Zookeeper主进程

说明
     1.在hadoop2.0中通常由两个NameNode组成,一个处于active状态,另一个处于standby状态。Active NameNode对外提供服务,而Standby NameNode则不对外提供服务,仅同步active namenode的状态,以便能够在它失败时快速进行切换。
     hadoop2.0官方提供了两种HDFS HA的解决方案,一种是NFS,另一种是QJM。这里我们使用简单的QJM。在该方案中,主备NameNode之间通过一组JournalNode同步元数据信息,一条数据只要成功写入多数JournalNode即认为写入成功。通常配置奇数个JournalNode
     这里还配置了一个zookeeper集群,用于ZKFC(DFSZKFailoverController)故障转移,当Active NameNode挂掉了,会自动切换Standby NameNode为standby状态
     2.hadoop-2.2.0中依然存在一个问题,就是ResourceManager只有一个,存在单点故障,hadoop-2.4.1解决了这个问题,有两个ResourceManager,一个是Active,一个是Standby,状态由zookeeper进行协调



(三)目录规划

注意  本人是在根目录下创建了一个hadoop目录,有关hadoop的东西放这个目录下,但是在根目录下只有root可以创建,因此创建之后因为把这个文件夹的权限给我们搭建环境的用户,本人的是drh用户  不建议用root ,除非你真的不懂一点linux权限知识,如果真不懂,最好先学!(个人观点)

root用户下  mkdir /hadoop
root用户下  chown drh.drh /haoop

三 环境准备
1)修改ip,主机名,映射,关闭防火墙


    1.1修改主机名
          vim /etc/sysconfig/network
         
          NETWORKING=yes
          HOSTNAME=drh01   

     1.2修改IP
          两种方式:
          第一种:通过Linux图形界面进行修改(强烈推荐)
               进入Linux图形界面 -> 右键点击右上方的两个小电脑 -> 点击Edit connections -> 选中当前网络System eth0 -> 点击edit按钮 -> 选择IPv4 -> method选择为manual -> 点击add按钮 -> 添加IP:192.168.68.102 子网掩码:255.255.255.0 网关:192.168.68.1 -> apply
    
          第二种:修改配置文件方式(屌丝程序猿专用)
               vim /etc/sysconfig/network-scripts/ifcfg-eth0
              
               DEVICE="eth0"
               BOOTPROTO="static"               ###
               ........
               IPADDR="192.168.68.102"           ###
               NETMASK="255.255.255.0"          ###
               GATEWAY="192.168.68.1"            ###
              
     1.3修改主机名和IP的映射关系
          vim /etc/hosts
          192.168.68.102     drh02
          192.168.68.103     drh03
          192.168.68.104     drh04
注意:除去自己本身:例如现在在drh02上配置hosts   那么只需要添加drh03和drh04即可,多添加启动hdfs失败
  1. [drh@drh02 hadoop-2.4.1]$ service iptables status (显示防火墙状态  那么执行下面代码  本人的之前已经关了 所以。。 如果你也关了那忽略)
  2. [drh@drh02 hadoop-2.4.1]$ service iptables stop (关闭防火墙)
  3. [drh@drh02 hadoop-2.4.1]$ chkconfig iptables --list(#查看防火墙开机启动状态,本人的之前已关闭了)
  4. iptables            0:关闭     1:关闭     2:关闭     3:关闭     4:关闭     5:关闭     6:关闭
  5. [drh@drh02 hadoop-2.4.1]$ chkconfig iptables off
    你没有足够权限执行此操作。
    [drh@drh02 hadoop-2.4.1]$ su root (切换到root)
    密码:
    [root@drh02 hadoop-2.4.1]# chkconfig iptables off(关闭防火墙开机启动)
    [root@drh02 hadoop-2.4.1]# 
复制代码

2)安装JRE:

     2.1上传
    
     2.2解压jdk
          #创建文件夹
          mkdir /home/drh/java
          #解压
          tar -zxvf jdk-7u71-linux-i586.tar.gz -C /home/drh/java
         
     2.3将java添加到环境变量中
          vim /etc/profile(记得切换到root用户)
          #在文件最后添加
          export JAVA_HOME=/home/drh/java/jdk1.7.0_71
          export PATH=$PATH:$JAVA_HOME/bin
          #刷新配置
          source /etc/profil

***************************以上步骤3台机器都重复做下************************

3)安装Zookeeper :
安装步骤:
 1.安装配置zooekeeper集群(在drh02上)
          1.1解压
               tar -zxvf zookeeper-3.4.5.tar.gz -C /hadoop/
          1.2修改配置
               cd /haoop/zookeeper-3.4.5/conf/
               cp zoo_sample.cfg zoo.cfg
               vim zoo.cfg
               修改:dataDir=/hadoop/zookeeper-3.4.5/tmp
               在最后添加:
               server.1=drh02:2888:3888
               server.2=drh03:2888:3888
               server.3=drh04:2888:3888
               保存退出
               然后创建一个tmp文件夹
               mkdir /hadoop/zookeeper-3.4.5/tmp
               再创建一个空文件
               touch /hadoop/zookeeper-3.4.5/tmp/myid
               最后向该文件写入ID
               echo 1 > /hadoop/zookeeper-3.4.5/tmp/myid
          1.3将配置好的zookeeper拷贝到其他节点(首先分别在drh03、drh04根目录下创建一个itcast目录:mkdir /haoop)
               scp -r /hadoop/zookeeper-3.4.5/ drh03:/hadoop/
               scp -r /haoop/zookeeper-3.4.5/drho4:/hadoop/
               
               注意:修改drh03、drh04对应/hadoop/zookeeper-3.4.5/tmp/myid内容
               drh03:
                    echo 2 > /hadoop/zookeeper-3.4.5/tmp/myid
               drh04:
                    echo 3 > /hadoop/zookeeper-3.4.5/tmp/myid
  1. [drh@drh02 hadoop]$ tar -zxvf zookeeper-3.4.5.tar.gz -C  /hadoop  (#解压)
  2. [drh@drh02 hadoop]$ cd zookeeper-3.4.5/conf/  (#进入zookeeper的配置文件夹)
  3. [drh@drh02 conf]$ cp zoo_sample.cfg zoo.cfg  (#拷贝一个配置文件,最好拷贝)
  4. [drh@drh02 conf]$ vim zoo.cfg(修改)

  5. 文件最后加上

  6. drh@drh02 conf]$ mkdir /hadoop/zookeeper-3.4.5/tmp
  7. [drh@drh02 conf]$ touch /hadoop/zookeeper-3.4.5/tmp/myid
  8. [drh@drh02 conf]$ echo 1 > /hadoop/zookeeper-3.4.5/tmp/myid
  9. [drh@drh02 conf]$ scp -r /hadoop/zookeeper-3.4.5/ drh03:/hadoop/
  10. [drh@drh02 conf]$ scp -r /hadoop/zookeeper-3.4.5/ drh04:/hadoop/
  11. [drh@drh03 hadoop]$ echo 2 > /hadoop/zookeeper-3.4.5/tmp/myid   (***这是在drh03下***)
  12. [drh@drh04 hadoop]$ echo 3 > /hadoop/zookeeper-3.4.5/tmp/myid   (***这是在drh04下***)



复制代码


4)配置SSH互信:

(1)drh01创建密钥:

  1. [drh@drh02 .ssh]$ ssh-keygen -t rsa
复制代码


(2)分发密钥:

  1. [drh@drh02 .ssh]$ ssh-copy-id drh03
  2. [drh@drh02 .ssh]$ ssh-copy-id drh04
复制代码


(3)验证:

  1. [drh@drh02 .ssh]$ ssh drh03
    Last login: Fri Feb  6 13:37:49 2015 from drh02
  2. [drh@drh03 ~]$ exit
    logout
    Connection to drh03 closed.
  3. [drh@drh02 .ssh]$ ssh drh04
    Last login: Fri Feb  6 13:38:32 2015 from drh02
  4. [drh@drh04 ~]$ 



5)配置/etc/hosts并分发: root用户修改  或者是用  sudo
  1. [drh@drh02 .ssh]$ su root
    密码:
  2. [root@drh02 .ssh]# vim /etc/hosts
  3. [root@drh02 .ssh]#
  4. 27.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
    ::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
    192.168.68.101  drh01
    192.168.68.102  drh02
    192.168.68.103  drh03
    192.168.68.104  drh04
复制代码

************************3台都需要配置(而且一模一样)*****************************






6)配置环境变量:本人使用修改profile              还可以:vi ~/.bashrc 然后 source ~/.bashrc

  1. [drh@drh02 .ssh]$ su root
    密码:
  2. [root@drh02 etc]# vim /etc/profile  (文件后面追加)
  3. export JAVA_HOME=/home/drh/java/jdk1.7.0_71
    export HADOOP_HOME=/hadoop/hadoop-2.4.1
    export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:
    $HADOOP_HOME/sbin
复制代码


四 安装



1)解压
  1. [drh@drh02 hadoop]$ tar -zxvf hadoop-2.4.1.tar.gz -C /hadoop/
复制代码


2)修改配置文件

说明: cd /hadoop/hadoop-2.4.1/etc/hadoop

配置名称
类型
说明
hadoop-env.sh
Bash脚本
Hadoop运行环境变量设置
core-site.xml
xml
配置Hadoop core,如IO
hdfs-site.xml
xml
配置HDFS守护进程:NN、JN、DN
yarn-env.sh
Bash脚本
Yarn运行环境变量设置
yarn-site.xml
xml
Yarn框架配置环境
mapred-site.xml
xml
MR属性设置
capacity-scheduler.xml
xml
Yarn调度属性设置
container-executor.cfg
Cfg
Yarn Container配置
mapred-queues.xml
xml
MR队列设置
hadoop-metrics.properties
Java属性
Hadoop Metrics配置
hadoop-metrics2.properties
Java属性
Hadoop Metrics配置
slaves
Plain Text
DN节点配置
exclude
Plain Text
移除DN节点配置文件
log4j.properties


系统日志设置
configuration.xsl








(1)修改$HADOOP_HOME/etc/hadoop/hadoop-env.sh:

  1. export JAVA_HOME=${JAVA_HOME}(默认的)
  2. 上面那行修改成下面:  值就是你JAVA_HOME的安装目录
  3. export JAVA_HOME="/home/drh/java/jdk1.7.0_71"
复制代码
截图如下:






(2)修改$HADOOP_HOME/etc/hadoop/core-site.xml



  1. <configuration>
  2.      <!-- 指定hdfs的nameservice为ns1 -->
  3.      <property>
  4.           <name>fs.defaultFS</name>
  5.           <value>hdfs://ns1</value>
  6.      </property>
  7.      <!-- 指定hadoop临时目录 -->
  8.      <property>
  9.           <name>hadoop.tmp.dir</name>
  10.           <value>/hadoop/hadoop-2.4.1/tmp</value>
  11.      </property>
  12.      <!-- 指定zookeeper地址 -->
  13.      <property>
  14.           <name>ha.zookeeper.quorum</name>
  15.           <value>drh02:2181,drh03:2181,drh04:2181</value>
  16.      </property>
  17. </configuration>

复制代码


(3)修改$HADOOP_HOME/etc/hdfs-site.xml



  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
  3. <configuration> 
  4.  <configuration>
  5.      <!--指定hdfs的nameservice为ns1,需要和core-site.xml中的保持一致 -->
  6.  <property>
  7.           <name>dfs.nameservices</name>
  8.           <value>ns1</value>
  9.      </property>
  10.      <!-- ns1下面有两个NameNode,分别是nn1,nn2 -->
  11.      <property>
  12.           <name>dfs.ha.namenodes.ns1</name>
  13.           <value>nn1,nn2</value>
  14.      </property>
  15.      <!-- nn1的RPC通信地址 -->
  16.      <property>
  17.           <name>dfs.namenode.rpc-address.ns1.nn1</name>
  18.           <value>drh02:9000</value>
  19.      </property>
  20.      <!-- nn1的http通信地址 -->
  21.      <property>
  22.           <name>dfs.namenode.http-address.ns1.nn1</name>
  23.           <value>drh02:50070</value>
  24.      </property>
  25.      <!-- nn2的RPC通信地址 -->
  26.      <property>
  27.           <name>dfs.namenode.rpc-address.ns1.nn2</name>
  28.           <value>drh03:9000</value>
  29.      </property>
  30.      <!-- nn2的http通信地址 -->
  31.      <property>
  32.           <name>dfs.namenode.http-address.ns1.nn2</name>
  33.           <value>drh03:50070</value>
  34.      </property>
  35.      <!-- 指定NameNode的元数据在JournalNode上的存放位置 -->
  36.      <property>
  37.           <name>dfs.namenode.shared.edits.dir</name>
  38.           <value>qjournal://drh02:8485;drh03:8485;drh04:8485/ns1</value>
  39.      </property>
  40.      <!-- 指定JournalNode在本地磁盘存放数据的位置 -->
  41.      <property>
  42.           <name>dfs.journalnode.edits.dir</name>
  43.           <value>/itcast/hadoop-2.4.1/journal</value>
  44.      </property>
  45.      <!-- 开启NameNode失败自动切换 -->
  46.      <property>
  47.           <name>dfs.ha.automatic-failover.enabled</name>
  48.           <value>true</value>
  49.      </property>
  50.      <!-- 配置失败自动切换实现方式 -->
  51.      <property>
  52.           <name>dfs.client.failover.proxy.provider.ns1</name>
  53.           <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
  54.      </property>
  55.      <!-- 配置隔离机制方法,多个机制用换行分割,即每个机制暂用一行-->
  56.      <property>
  57.           <name>dfs.ha.fencing.methods</name>
  58.           <value>
  59.                sshfence
  60.                shell(/bin/true)
  61.           </value>
  62.      </property>
  63.      <!-- 使用sshfence隔离机制时需要ssh免登陆 -->
  64.      <property>
  65.           <name>dfs.ha.fencing.ssh.private-key-files</name>
  66.           <value>/home/drh/.ssh/id_rsa</value>
  67.      </property>
  68.      <!-- 配置sshfence隔离机制超时时间 -->
  69.      <property>
  70.           <name>dfs.ha.fencing.ssh.connect-timeout</name>
  71.           <value>30000</value>
  72.      </property>
  73. </configuration>

复制代码


(4)$HADOOP_HOEM/etc/hadoop/mapred-site.xml   (hadoop2.4.1是没有这个的  提供了mapred-site.xml.template)
[drh@drh02 hadoop]$ cp mapred-site.xml.template mapred-site.xml

  1. <configuration>
  2.     <!-- 指定mr框架为yarn方式 -->
  3.     <property>
  4.           <name>mapreduce.framework.name</name>
  5.           <value>yarn</value>
  6.     </property>     
  7. </configuration>

复制代码


(5)修改$HADOOP_HOME/etc/yarn-site.xml
  1. <configuration>
  2.             <!-- 开启RM高可靠 -->
  3.             <property>
  4.                <name>yarn.resourcemanager.ha.enabled</name>
  5.                <value>true</value>
  6.             </property>
  7.             <!-- 指定RM的cluster id -->
  8.             <property>
  9.                <name>yarn.resourcemanager.cluster-id</name>
  10.                <value>yrc</value>
  11.             </property>
  12.             <!-- 指定RM的名字 -->
  13.             <property>
  14.                <name>yarn.resourcemanager.ha.rm-ids</name>
  15.                <value>rm1,rm2</value>
  16.             </property>
  17.             <!-- 分别指定RM的地址 -->
  18.             <property>
  19.                <name>yarn.resourcemanager.hostname.rm1</name>
  20.                <value>drh02</value>
  21.             </property>
  22.             <property>
  23.                <name>yarn.resourcemanager.hostname.rm2</name>
  24.                <value>drh03</value>
  25.             </property>
  26.             <!-- 指定zk集群地址 -->
  27.             <property>
  28.                <name>yarn.resourcemanager.zk-address</name>
  29.                <value>drh02:2181,drh03:2181,drh04:2181</value>
  30.             </property>
  31.             <property>
  32.                <name>yarn.nodemanager.aux-services</name>
  33.                <value>mapreduce_shuffle</value>
  34.             </property>
  35. </configuration>

复制代码




(6)修改slaves



  1. [drh@drh02 hadoop]$ vim slaves
    添加:
  2. drh02
  3. drh03
  4. drh04
复制代码


3)分发程序

  1. [drh@drh02 hadoop]$ scp -r /hadoop/hadoop-2.4.1/ drh@drh03:/hadoop/
  2. [drh@drh02 hadoop]$ scp -r /hadoop/hadoop-2.4.1/ drh@drh04:/hadoop/
复制代码




4)启动HDFS

(1)启动zookeeper集群(分别在drh02,drh03,drh03上启动zk)
             
  1. [drh@drh02 hadoop]$ cd /hadoop/zookeeper-3.4.5/bin/
  2. [drh@drh02 bin]$ ./zkServer.sh start
  3. JMX enabled by default
  4. Using config: /hadoop/zookeeper-3.4.5/bin/../conf/zoo.cfg
  5. Starting zookeeper ... STARTED
  6. [drh@drh03 bin]$ ./zkServer.sh start
    JMX enabled by default
    Using config: /hadoop/zookeeper-3.4.5/bin/../conf/zoo.cfg
    Starting zookeeper ... STARTED
  7. [drh@drh04 bin]$ ./zkServer.sh start
    JMX enabled by default
    Using config: /hadoop/zookeeper-3.4.5/bin/../conf/zoo.cfg
    Starting zookeeper ... STARTED
复制代码






(1)启动JournalNode(drh02,drh03,drh04分别启动)

格式化前需要在JournalNode结点上启动JournalNode:

  1. [zero@CentOS-Cluster-03 hadoop-2.3.0-cdh5.0.1]$ hadoop-daemon.sh start journalnode

  2. starting journalnode, logging to /home/puppet/hadoop/cdh4.4/hadoop-2.0.0-cdh4.4.0/logs/hadoop-puppet-journalnode-BigData-03.out
复制代码


验证JournalNode:

  1. [drh@drh02 hadoop-2.4.1]$ jps
    3212 QuorumPeerMain
    4307 Jps
    3841 JournalNode
复制代码




(2)NameNode 格式化:

结点drh01:hdfs namenode -format

  1. [drh@drh02 hadoop-2.4.1]$ hdfs namenode -format
    15/02/06 17:14:07 INFO namenode.NameNode: STARTUP_MSG:
    /************************************************************
    STARTUP_MSG: Starting NameNode
    STARTUP_MSG:   host = drh02/192.168.68.102
    STARTUP_MSG:   args = [-format]
    STARTUP_MSG:   version = 2.4.1
复制代码

15/02/07 20:56:51 FATAL namenode.NameNode: Exception in namenode join
java.lang.IllegalArgumentException: Unable to construct journal, qjournal://drh02:8485;drh03:8485;drh04:8485/ns1
     at org.apache.hadoop.hdfs.server.namenode.FSEditLog.createJournal(FSEditLog.java:1523)
     at org.apache.hadoop.hdfs.server.namenode.FSEditLog.initJournals(FSEditLog.java:264)
     at org.apache.hadoop.hdfs.server.namenode.FSEditLog.initJournalsForWrite(FSEditLog.java:230)
     at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:893)
     at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1310)
     at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1424)
Caused by: java.lang.reflect.InvocationTargetException
     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
     at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
     at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
     at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
     at org.apache.hadoop.hdfs.server.namenode.FSEditLog.createJournal(FSEditLog.java:1521)
     ... 5 more
Caused by: java.lang.NullPointerException
     at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannelMetrics.getName(IPCLoggerChannelMetrics.java:107)
     at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannelMetrics.create(IPCLoggerChannelMetrics.java:91)
     at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel.<init>(IPCLoggerChannel.java:166)
     at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$1.createLogger(IPCLoggerChannel.java:146)
     at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createLoggers(QuorumJournalManager.java:367)
     at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createLoggers(QuorumJournalManager.java:149)
     at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.<init>(QuorumJournalManager.java:116)
     at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.<init>(QuorumJournalManager.java:105)
     ... 10 more
15/02/07 20:56:51 INFO util.ExitUtil: Exiting with status 1
15/02/07 20:56:51 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at java.net.UnknownHostException: drh02: drh02

以上报错解决:查看juornalnode日志发现是hosts文件127.0.0.1对应的主机名错误!

(3)同步NameNode元数据:

同步drh02元数据到drh03

主要是:dfs.namenode.name.dir,dfs.namenode.edits.dir还应该确保共享存储目录下(dfs.namenode.shared.edits.dir ) 包含NameNode 所有的元数据。

  1. [drh@drh02 hadoop-2.4.1]$ scp -r tmp/ drh@drh03:/hadoop/hadoop-2.4.1/
  2. fsimage_0000000000000000000.md5               100%   62     0.1KB/s   00:00   
    fsimage_0000000000000000000                   100%  350     0.3KB/s   00:00   
    seen_txid                                     100%    2     0.0KB/s   00:00   
    VERSION                                       100%  204     0.2KB/s   00:00   
    [drh@drh02 hadoop-2.4.1]$ 
复制代码
注意:启动hdfs是报错:
drh04: /hadoop/hadoop-2.4.1/sbin/hadoop-daemon.sh: line 157: /tmp/hadoop-drh-datanode.pid: 权限不够
drh04上tmp的所属用户和组    我显示是root root  需要改成drh drh     命令:chown drh.drh tmp/



(4)初始化Zkfc:

创建ZNode,记录状态信息。

结点drh02:hdfs zkfc -formatZK

  1. [drh@drh02 hadoop-2.4.1]$ hdfs zkfc -formatZK
    15/02/06 17:21:46 INFO tools.DFSZKFailoverController: Failover controller configured for NameNode NameNode at drh02/192.168.68.102:9000
    15/02/06 17:21:46 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT.......................................................
复制代码






(5)启动hdfs

集群启动法:drh02:

第一次:启动失败:原因是在hosts中配置了drh02    192.168.68.102   
解决:需要删除
  1. [drh@drh02 sbin]$ ./start-dfs.sh
    Starting namenodes on [drh02 drh03]
    Warning: the RSA host key for 'drh02' differs from the key for the IP address '192.168.68.102'
    Offending key for IP in /home/drh/.ssh/known_hosts:4
    Matching host key in /home/drh/.ssh/known_hosts:7
    Are you sure you want to continue connecting (yes/no)? drh03: starting namenode, logging to /hadoop/hadoop-2.4.1/logs/hadoop-drh-namenode-drh03.out
复制代码
再次启动还有这问题   执行命令:ssh-keygen -R 192.168.68.102
 第二次:
  1. [drh@drh02 sbin]$ ./start-dfs.sh
    15/02/06 17:28:49 WARN hdfs.DFSUtil: Namenode for ns1 remains unresolved for ID nn1.  Check your hdfs-site.xml file to ensure namenodes are configured properly.
    Starting namenodes on [drh02 drh03]
    drh02: ssh: Could not resolve hostname drh02: Temporary failure in name resolution
    drh03: namenode running as process 9318. Stop it first.
    drh02: ssh: Could not resolve hostname drh02: Temporary failure in name resolution
    localhost: datanode running as process 4813. Stop it first.
    drh03: datanode running as process 9387. Stop it first.
    drh01: ssh: connect to host drh01 port 22: No route to host
    Starting journal nodes [drh02 drh03 drh04]
    drh02: ssh: Could not resolve hostname drh02: Temporary failure in name resolution
    drh03: journalnode running as process 8805. Stop it first.
    drh04: starting journalnode, logging to /hadoop/hadoop-2.4.1/logs/hadoop-drh-journalnode-drh04.out
    drh04: /hadoop/hadoop-2.4.1/sbin/hadoop-daemon.sh: line 157: /tmp/hadoop-drh-journalnode.pid: 权限不够
    Starting ZK Failover Controllers on NN hosts [drh02 drh03]
    drh02: ssh: Could not resolve hostname drh02: Temporary failure in name resolution
    drh03: starting zkfc, logging to /hadoop/hadoop-2.4.1/logs/hadoop-drh-zkfc-drh03.out






启动hdfs(drh02):


  1. [drh@drh02 sbin]$ ./start-dfs.sh
    Starting namenodes on [drh02 drh03]
    drh03: starting namenode, logging to /hadoop/hadoop-2.4.1/logs/hadoop-drh-namenode-drh03.out
    drh02: starting namenode, logging to /hadoop/hadoop-2.4.1/logs/hadoop-drh-namenode-drh02.out
    localhost: ssh: Could not resolve hostname localhost: Temporary failure in name resolution
    drh02: starting datanode, logging to /hadoop/hadoop-2.4.1/logs/hadoop-drh-datanode-drh02.out
    drh03: starting datanode, logging to /hadoop/hadoop-2.4.1/logs/hadoop-drh-datanode-drh03.out
    drh01: ssh: connect to host drh01 port 22: No route to host
    Starting journal nodes [drh02 drh03 drh04]
    drh02: journalnode running as process 4050. Stop it first.
    drh04: journalnode running as process 3944. Stop it first.
    drh03: journalnode running as process 4000. Stop it first.
    Starting ZK Failover Controllers on NN hosts [drh02 drh03]
    drh02: starting zkfc, logging to /hadoop/hadoop-2.4.1/logs/hadoop-drh-zkfc-drh02.out
    drh03: starting zkfc, logging to /hadoop/hadoop-2.4.1/logs/hadoop-drh-zkfc-drh03.out
复制代码










<2>启动yarn(drh03):



  1. [drh@drh03 sbin]$ ./start-yarn.sh
    starting yarn daemons
    starting resourcemanager, logging to /hadoop/hadoop-2.4.1/logs/yarn-drh-resourcemanager-drh03.out
复制代码




(6)验证



<1>进程


drh02

  1. [drh@drh03 sbin]$ jps
    3080 QuorumPeerMain
    10836 ResourceManager
    9317 DFSZKFailoverController
    9004 NameNode
    10945 NodeManager
    9185 JournalNode
    9085 DataNode
    10982 Jps
复制代码


drh02

  1. [drh@drh02 hadoop-2.4.1]$ jps
    13095 JournalNode
    3165 QuorumPeerMain
    12800 NameNode
    14194 NodeManager
    14236 Jps
    13281 DFSZKFailoverController
    12910 DataNode
复制代码


drh04

[root@drh04 tmp]# jps
9802 JournalNode
9905 NodeManager
10319 Jps
6304 DataNode
3121 QuorumPeerMain



<2>页面:

Active结点:http://drh02:50070






StandBy结点:http://drh02:50070




这是你可以验证他的HA,你kill活动的那个namenode,你会发现之前standby的namenode会切换成active


s


以上文章部分参考了:http://www.aboutyun.com/thread-9115-1-1.html

以上文章作者按照操作是成功了! 
 
 
  
0 0
原创粉丝点击