Hadoop2下Hadoop Federation、Automatic HA、Yarn完全分布式集群搭建

来源:互联网 发布:lamp编程 编辑:程序博客网 时间:2024/05/02 01:37
  • Hadoop1玩了有不少时间了,随着系统上线,手头事情略微少些。So,抓紧时间走通了一遍Hadoop2下的Hadoop联盟(Federation)、Hadoop2高可用(HA)及Yarn的完全分布式配置,现记录在博客中,互相交流学习,话不多说,直入正文。非常感谢摸索过程中受益颇深的网络资源,分享让技术更美好。哈哈

    本文采用倒叙手法,先将最终结果呈现出来,如下:

    结果展现一,通过jps查看集群守护进程

    \

    结果展现二,通过web端,查看集群运行情况

    \

    结果展现三,运行Hadoop2自带的wordcount程序,通过web查看,如下图,

    可以看出Application Type是MapReduce,哈哈,快点在Yarn上把自己的Storm跑起来吧

    \

    OK,3张截图已献上,下文按照如下思路进行

    \

    本文只讲诉安装过程中的重点,对于有些步骤未做详细说明,欢迎留言交流。

    一、集群环境

    \

    \

    \

    软件解压后,放在/usr/local路径下

    二、具体步骤

    准备工作

    查看CentOS系统版本

    arch/uname–a x86_64(32位的是i386、i686)

    修改主机名(重启生效)

    vi/etc/sysconfig/network

    设定IP地址

    修改hosts映射文件

    vi/etc/hosts

    202.196.37.240 hadoop0

    202.196.37.241 hadoop1

    202.196.37.242 hadoop2

    202.196.37.243 hadoop3

    配置SSH

    hadoop0上执行,生成

    cp id_rsa.pubauthorized_keys

    非hadoop0上执行,聚集

    ssh-copy-id -i hadoop0

    hadoop0上执行,分发

    scp authorized_keys hadoop1:/root/.ssh/

    配置JDK

    安装Zookeeper

    修改核心文件zoo.cfg

    dataDir=/usr/local/zookeeper-3.4.5/data

    logDir=/usr/local/zookeeper-3.4.5/log

    server.0=hadoop0:2887:3887

    server.1=hadoop1:2887:3887

    server.2=hadoop2:2887:3887

    启动、验证Zookeeper集群

    zkServer.shstart/status

    安装Hadoop2

    将自编译的64位的hadoop-2.2.0-src放到/usr/local路径下

    cp -R/usr/local/hadoop-2.2.0-src/hadoop-dist/target/hadoop-2.2.0 /usr/local/

    mvhadoop-2.2.0 hadoop

    本文中的所有xml配置文件,都在/usr/local/hadoop/etc/hadoop路径下,

    所有配置文件,均已测试通过,稍微整理格式后,可直接copy使用。

    配置分为两部分,一部分是对Hadoop2的Hadoop Federation、HA的配置;另一部分是对Hadoop2的Yarn配置,请看下图:

    \

    开启配置文件模式,哈哈

    \

    首先在cluster1_hadoop0上配置,然后再往其他节点scp

    core-site.xml

    <configuration>

    <property>
    <name>fs.defaultFS</name>
    <value>hdfs://cluster1</value>

    <description>此处是默认的HDFS路径,在节点hadoop0和hadoop1中使用cluster1,在节点hadoop2和hadoop3中使用cluster2</description>
    < /property>
    < property>
    <name>hadoop.tmp.dir</name>
    <value>/usr/local/hadoop/tmp</value>
    < /property>
    < property>
    <name>ha.zookeeper.quorum</name>
    < value>hadoop0:2181,hadoop1:2181,hadoop2:2181</value>

    <description>Zookeeper集群<description>

    </property>

    </configuration>

    hdfs-site.xml

    <configuration>

    <!--1描述cluster1集群的信息-->

    <property>

    <name>dfs.replication</name>

    <value>2</value>

    </property>

    <property>

    <name>dfs.nameservices</name>

    <value>cluster1,cluster2</value>

    </property>

    <property>

    <name>dfs.ha.namenodes.cluster1</name>

    <value>hadoop0,hadoop1</value>

    </property>

    <property>

    <name>dfs.namenode.rpc-address.cluster1.hadoop0</name>

    <value>hadoop0:9000</value>

    </property>

    <property>

    <name>dfs.namenode.http-address.cluster1.hadoop0</name>

    <value>hadoop0:50070</value>

    </property>

    <property>

    <name>dfs.namenode.rpc-address.cluster1.hadoop1</name>

    <value>hadoop1:9000</value>

    </property>

    <property>

    <name>dfs.namenode.http-address.cluster1.hadoop1</name>

    <value>hadoop1:50070</value>

    </property>

    <!--在cluster1中此处的注释是关闭的,cluster2反之-->

    <property>

    <name>dfs.namenode.shared.edits.dir</name>

    <value>qjournal://hadoop0:8485;hadoop1:8485;hadoop2:8485/cluster1</value>

    <description>指定cluster1的两个NameNode共享edits文件目录时,使用的是JournalNode集群来维护</description>

    </property>

    <property>

    <name>dfs.ha.automatic-failover.enabled.cluster1</name>

    <value>true</value>

    </property>

    <property>

    <name>dfs.client.failover.proxy.provider.cluster1</name>

    <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>

    </property>

    <!--2下面描述cluster2集群的信息-->

    <property>

    <name>dfs.ha.namenodes.cluster2</name>

    <value>hadoop2,hadoop3</value>

    </property>

    <property>

    <name>dfs.namenode.rpc-address.cluster2.hadoop2</name>

    <value>hadoop2:9000</value>

    </property>

    <property>

    <name>dfs.namenode.http-address.cluster2.hadoop2</name>

    <value>hadoop2:50070</value>

    </property>

    <property>

    <name>dfs.namenode.rpc-address.cluster2.hadoop3</name>

    <value>hadoop3:9000</value>

    </property>

    <property>

    <name>dfs.namenode.http-address.cluster2.hadoop3</name>

    <value>hadoop3:50070</value>

    </property>

    <!-- 在cluster1中此处的注释是打开的,cluster2反之

    <property>

    <name>dfs.namenode.shared.edits.dir</name>

    <value>qjournal://hadoop0:8485;hadoop1:8485;hadoop2:8485/cluster1</value>

    <description>指定cluster2的两个NameNode共享edits文件目录时,使用的是JournalNode集群来维护</description>

    </property>

    -->

    <property>

    <name>dfs.ha.automatic-failover.enabled.cluster2</name>

    <value>true</value>

    </property>

    <property>

    <name>dfs.client.failover.proxy.provider.cluster2</name>

    <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>

    </property>

    <!--3配置cluster1、cluster2公共的信息-->

    <property>

    <name>dfs.journalnode.edits.dir</name>

    <value>/usr/local/hadoop/tmp/journal</value>

    </property>

    <property>

    <name>dfs.ha.fencing.methods</name>

    <value>sshfence</value>

    </property>

    <property>

    <name>dfs.ha.fencing.ssh.private-key-files</name>

    <value>/root/.ssh/id_rsa</value>

    </property>

    </configuration>

    以上配置完成后,分发scp

    scp -rq hadoop hadoop1:/usr/local/

    scp -rq hadoop hadoop2:/usr/local/

    scp -rq hadoop hadoop3:/usr/local/

    在其他节点修改时,需要注意的地方

    hadoop-env.sh 无需修改

    slaves 无需修改

    core-site.xml

    1、<property>

    <name>fs.defaultFS</name>

    <value>hdfs://cluster1</value>

    </property>

    cluster1节点中的value值:hdfs://cluster1

    cluster2节点中的value值:hdfs://cluster2

    hdfs-site.xml

    <property>

    <name>dfs.namenode.shared.edits.dir</name>

    <value>qjournal://hadoop0:8485;hadoop1:8485;hadoop2:8485/cluster2</value>

    </property>

    cluster1节点中的value值:qjournal://hadoop0:8485;hadoop1:8485;hadoop2:8485/cluster1

    cluster2节点中的value值:qjournal://hadoop0:8485;hadoop1:8485;hadoop2:8485/cluster2

    此处的实质是使用JournalNode集群来维护Hadoop集群中两个NameNode共享edits文件目录的信息。重在理解,不可盲目copy哟

    只需对应修改这两个地方即可。

    测试启动

    1、启动Zookeeper

    在hadoop0、hadoop1、hadoop2上执行zkServer.shstart

    2、启动JournalNode

    在hadoop0、hadoop1、hadoop2上执行sbin/hadoop-daemon.shstart journalnode

    3、格式化ZooKeeper

    在hadoop0、hadoop2上执行bin/hdfs zkfc -formatZK

    因为Zookeeper穴ky"http://www.it165.net/qq/" target="_blank" class="keylink">qq1o7Wx1rTQ0Ljfv8nTwyhIQSnH0Lu7tcTIzs7xPC9zdHJvbmc+PC9wPgo8cD4gttRjbHVzdGVyMTwvcD4KPHA+NDGhorbUaGFkb29wML3ateO9+NDQJiMyNjY4NDvKvbuvus3G9LavPC9wPgo8cD5iaW4vaGRmcyAgbmFtZW5vZGUgLWZvcm1hdDwvcD4KPHA+c2Jpbi9oYWRvb3AtZGFlbW9uLnNoICBzdGFydCBuYW1lbm9kZTwvcD4KPHA+IDUxoaK21GhhZG9vcDG92rXjvfjQ0CYjMjY2ODQ7yr27r7rNxvS2ryA8L3A+CjxwPmJpbi9oZGZzICBuYW1lbm9kZSAtYm9vdHN0cmFwU3RhbmRieTwvcD4KPHA+c2Jpbi9oYWRvb3AtZGFlbW9uLnNoICBzdGFydCBuYW1lbm9kZTwvcD4KPHA+IDYxoaLU2mhhZG9vcDChomhhZG9vcDHJz8b0tq96a2ZjPC9wPgo8cD5zYmluL2hhZG9vcC1kYWVtb24uc2ggICBzdGFydCAgemtmYzwvcD4KPHA+1rTQ0Lrzo6wgaGFkb29wMKGiaGFkb29wMdPQ0ru49r3ateO+zbvhseTOqmFjdGl2Zde0zKyhozwvcD4KPHA+ILbUY2x1c3RlcjI8L3A+CjxwPjQyoaK21GhhZG9vcDK92rXjvfjQ0CYjMjY2ODQ7yr27r7rNxvS2rzwvcD4KPHA+YmluL2hkZnMgIG5hbWVub2RlIC1mb3JtYXQ8L3A+CjxwPnNiaW4vaGFkb29wLWRhZW1vbi5zaCAgc3RhcnQgbmFtZW5vZGU8L3A+CjxwPiA1MqGittRoYWRvb3Azvdq147340NAmIzI2Njg0O8q9u6+6zcb0tq8gPC9wPgo8cD5iaW4vaGRmcyAgbmFtZW5vZGUgLWJvb3RzdHJhcFN0YW5kYnk8L3A+CjxwPnNiaW4vaGFkb29wLWRhZW1vbi5zaCAgc3RhcnQgbmFtZW5vZGU8L3A+CjxwPiA2MqGi1NpoYWRvb3AyoaJoYWRvb3Azyc/G9LavemtmYzwvcD4KPHA+c2Jpbi9oYWRvb3AtZGFlbW9uLnNoICAgc3RhcnQgIHprZmM8L3A+CjxwPta00NC686OsIGhhZG9vcDKhomhhZG9vcDPT0NK7uPa92rXjvs274bHkzqphY3RpdmXXtMysoaM8L3A+CjxwPiA3oaI8L3A+CjxwPsb0tq9kYXRhbm9kZaOs1NpoYWRvb3Awyc/WtNDQPC9wPgo8cD5zYmluL2hhZG9vcC1kYWVtb25zLnNoICAgc3RhcnQgIGRhdGFub2RlPC9wPgo8cD68r8i6tcTUy9DQx+m/9qOsx+uyzrz7zsTVwr+qzbe1xL3Yzbw8L3A+CjxwPr3Y1sG1vbTLo6zS0b6tv8nS1LbUSGFkb29wMrXESERGU7340NCy2df3oaM8L3A+CjxwPjxpbWcgc3JjPQ=="http://www.it165.net/uploadfile/files/2014/0504/20140504192651507.jpg" alt="\">

    下面进行Yarn的配置,配置后,就可以在Yarn上运行MapReduce作业啦,哈哈

    配置Yarn

    以下配置文件依旧是在/usr/local/hadoop/etc/hadoop路径下

    mapred-site.xml

    <property>

    <name>mapreduce.framework.name</name>

    <value>yarn</value>

    </property>

    yarn-site.xml

    <property>

    <name>yarn.resourcemanager.hostname</name>

    <value>hadoop0</value>

    </property>

    <property>

    <name>yarn.nodemanager.aux-services</name>

    <value>mapreduce_shuffle</value>

    </property>

    测试Yarn

    启动yarn,在hadoop0上执行

    sbin/start-yarn.sh

    运行测试程序

    hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jarwordcount /testFile /out

    测试结果,请见博文开始。

0 0
原创粉丝点击