CENTOS7搭建HADOOP2.7.3集群

来源：互联网发布：城乡居民收入差距数据编辑：程序博客网时间：2024/05/16 02:27

准备6台虚拟机

账号root 密码123456 配置ssh免登陆

配置IP与主机名的映射关系
vim /etc/hosts
添加
10.31.18.81 itcast01
10.31.18.82 itcast02
10.31.18.83 itcast03
10.31.18.84 itcast04
10.31.18.85 itcast05
10.31.18.86 itcast06

安装zookeeper集群
一、下载地址http://archive.apache.org/dist/zookeeper/
#tar -zxvf zookeeper-3.4.5.tar.gz -C /itcast

#cd /itcast/zookeeper-3.4.5/conf
#mv zoo_sample.cfg zoo.cfg
#vim zoo.cfg
二、修改
1.tickTime:CS通信心跳时间
Zookeeper服务器之间或客户端与服务器之间维持心跳的时间间隔
tickTime=2000

2.initLimit：LF初始通信时限
集群中的follower服务器(F)与leader(L)服务器之间初始连接时能容忍的最多心跳数(tickTime数量)
initLimit=10

3.syncLimit:LF同步通信时限
集群中的的follower服务器(F)与leader(L)服务器之间请求和应答时最多容忍的最多心跳数(tickTime数量)
syncLimit=5

4.dataDir：数据文件目录
Zookeeper保存数据的目录，默认情况下，Zookeeper将数据保存在Linux的tmp目录，如果重启会消失
dataDir=/itcast/zookeeper-3.4.5/data

5.clientPort:客户端连接端口
客户端连接Zookeeper服务器的端口
clientPort=2181

6.服务名称与地址：集群信息（服务器编号，服务器地址，LF通信端口、选举端口）
这个配置项的书写格式比较特殊，规则如下：
server.N=YYY:A:B

server.1=itcast04:2888:3888
server.2=itcast05:2888:3888
server.3=itcast06:2888:3888

这里itcast04\itcast05\itcast06均可替换成IP地址，如果实在搭建不成功，可以试下换成IP地址

三、增加myid
#cd data/
#touch myid
#vim myid
修改对应
server.1=itcast04:2888:3888
server.2=itcast05:2888:3888
server.3=itcast06:2888:3888
这个配置修改为相应值.例如当前主机itcast04的myid为1、itcast05的myid为2、itcast06的myid为3

四、将数据拷贝给itcast05、itcast06
#scp -r /itcast/ root@itcast05:/
#scp -r /itcast/ root@itcast06:/

五、启动三台主机的Zookeeper
# cd /itcast/zookeeper-3.4.5/bin
#./zkServer.sh start
查看状态
#./zkServer.sh status

安装Hadoop集群
各台主机运行的进程分布
主机名 IP 安装的软件运行的进程
itcast01 10.31.18.81jdk、hadoop NameNode、DFSZKFailoverControlller
itcast02 10.31.18.82jdk、hadoop NameNode、DFSZKFailoverControlller
itcast03 10.31.18.83jdk、hadoop ResourceManager
itcast04 10.31.18.84jdk、hadoop、Zookeeper DataNode、Nodemanager、JournalNode、QuorumpeerMain
itcast05 10.31.18.85jdk、hadoop、Zookeeper DataNode、Nodemanager、JournalNode、QuorumpeerMain
itcast06 10.31.18.86jdk、hadoop、Zookeeper DataNode、Nodemanager、JournalNode、QuorumpeerMain

一、安装Hadoop2.7.3
#mkdir

二、修改配置文件
主机名itcast01
1、hadoop-env.sh
修改$JAVA_HOME为绝对路径
export JAVA_HOME=/usr/latest/jdk1.8.0_121
2、core-site.xml

<property>
<name>fs.defaultFS</name>
<value>hdfs://ns1</value>
</property>

<property>
<name>hadoop.tmp.dir</name>
<value>/itcast/hadoop-2.7.3/tmp</value>
</property>

<property>
<name>ha.zookeeper.quorum</name>
<value>itcast04:2181,itcast05:2181,itcast06:2181</value>
</property>
3、hdfs-site.xml

<property>
<name>dfs.nameservices</name>
<value>ns1</value>
</property>

<property>
<name>dfs.ha.namenodes.ns1</name>
<value>nn1,nn2</value>
</property>

<property>
<name>dfs.namenode.rpc-address.ns1.nn1</name>
<value>itcast01:9000</value>
</property>

<property>
<name>dfs.namenode.http-address.ns1.nn1</name>
<value>itcast01:50070</value>
</property>

<property>
<name>dfs.namenode.rpc-address.ns1.nn2</name>
<value>itcast02:9000</value>
</property>

<property>
<name>dfs.namenode.http-address.ns1.nn2</name>
<value>itcast02:50070</value>
</property>

<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://itcast04:8485;itcast05:8485;itcast06:8485/mycluster</value>
</property>

<property>
<name>dfs.journalnode.edits.dir</name>
<value>/itcast/hadoop2.7.3/journal</value>
</property>

<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>

<property>
<name>dfs.client.failover.proxy.provider.ns1</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>

<property>
<name>dfs.ha.fencing.methods</name>
<value>
sshfence
shell(/bin/true)
</value>
</property>

<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_rsa</value>
</property>

<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>

4、mapred-site.xml

<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

5.yarn-site.xml

<property>
<name>yarn.resourcemanager.hostname</name>
<value>itcast03</value>
</property>

<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>

6.slaves添加
itcast04
itcast05
itcast06

三、要先启动Zookeeper
1.将配置好的Hadoop拷贝到其他节点，为了拷贝快速，可以先将/share/doc文件夹删掉
#scp -r /itcast/ root/@itcast02:/
#scp -r /itcast/ root/@itcast03:/
#scp -r /itcast/hadoop-2.7.3 root/@itcast04:/itcast
#scp -r /itcast/hadoop-2.7.3 root/@itcast05:/itcast
#scp -r /itcast/hadoop-2.7.3 root/@itcast06:/itcast

2.配置好环境变量
#vim /etc/profile
添加$JAVA_HOME和$HADOOP_HOME
#scp /etc/profile itcast02:/etc
#scp /etc/profile itcast03:/etc
#scp /etc/profile itcast04:/etc
#scp /etc/profile itcast05:/etc
#scp /etc/profile itcast06:/etc
激活
#source /etc/profile

格式化的时候需要先启动，正式使用时不需要
3.启动itcast04、itcast05、itcast06的JournalNode
#cd /itcast/hadoop-2.7.3/sbin
#./hadoop-daemon.sh start journalnode

4.格式化itcast01生成tmp目录,将itcast01的tmp目录拷贝到itcast02
#hdfs namenode -format
#scp -r tmp/ itcast02:/itcast/hadoop2.7.3

5.格式化Zookeeper（在itcast01上执行即可）
#hdfs zkfc -formatZK
格式化后Zookeeper的主机上会产生一个目录hadoop-ha

6.启动HDFS(在itcast01上执行)
sbin/start-dfs.sh

7.启动YARN(在itcast03上启动)
sbin/start-yarn.sh

问题解决：
1.上传文件报错，可以先检查DATANODE主机防火墙有没有关掉，最好都关掉
2.datanode启动不了，如果配置出现错误做了多次格式化，请先将DATANODE主机(itcast04、itcast05、itcast06)的tmp/dfs/data/current/VERSION 删掉，开启journalnode再在itcast01执行格式化
[root@itcast05 sbin]#rm -rf /itcast/hadoop-2.7.3/tmp/dfs/data/current/VERSION
[root@itcast05 sbin]#cd /itcast/hadoop-2.7.3/sbin
[root@itcast05 sbin]#./hadoop-daemon.sh start journalnode
[root@itcast01 sbin]#hdfs namenode -format

0 0