hadoop环境搭建

来源:互联网 发布:python time sleep作用 编辑:程序博客网 时间:2024/06/04 23:27

前记

最近开始有时间折腾hadoop了,所以打算系统的学习一下,并做一些纪录。首先从环境搭建开始。本人使用hadoop 2.4.0版本,以及ubuntu 13.04版本。

虚拟机安装

本人只有两台电脑,貌似不够用,只能靠虚拟机解决了,网上有人在用EXSi,据说可以在一台物理机上虚拟出多台电脑,并且性能快很多,可惜木有条件去用,只能手工的安装多台VMWare的Ubuntu版本的虚拟机了。虚拟机的具体安装过程不在本文介绍范围内,因为貌似比较简单。

本人采用的架构是一个NameNode,一个SecondaryNameNode,一个ResourceManager,三个DataNode,所有的六个虚拟机都在一台MAC电脑上运行,而且其实安装一台虚拟机,将其配置好,其他的虚拟机都做一份拷贝即可,介个浪费了偶好多时间,伤不起啊。。。。

虚拟机安装完成后,需要做一些基本的配置。首先是修改添加ubuntu源(/etc/apt/sources.list):
1. 如果默认使用美国的源,需要将它们改成国内的源,即将文件中所有“us”改成“cn“
2. 添加一下一些国内的源,可以加快安装速度。可恶的CSDN,貌似不能添加这些链接了。。。。

然后可以修改各自的主机名,以区分不同的主机:/etc/hostname, /etc/hosts,其中hostname文件保存当前主机名,而hosts文件保存IP地址到主机的映射关系,可以一起改变。如本人有六台虚拟机,它们的IP映射分别为:

172.16.112.133     hadoop-lion<pre name="code" class="plain">172.16.112.136     hadoop-tiger
172.16.112.129 hadoop-eagle172.16.112.134 hadoop-rabbit172.16.112.132 hadoop-snake172.16.112.135 hadoop-cat

另外,这次ubuntu的安装好像各种软件都不存在,因而需要手动安装一些,从而避免以后要用到时需要手动的在每台虚拟机上安装。

sudo apt-get install gccsudo apt-get install g++sudo apt-get install vimsudo apt-get install rsync

SSH安装

为了方便管理,需要在各个电脑之间建立相互的SSH连接。首先要安装OpenSSH Server:

sudo apt-get install openssh-server
然后在每台机器上创建公钥和私钥,将公钥拷贝到所有其他机器以及本身的.ssh/authorized_keys文件中:
ssh-keygen -t rsacat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

创建独立的hadoop帐号和用户组

为了方便区分,可以为hadoop使用添加专门的用户组和用户帐号:

sudo mkdir /home/hadoopsudo groupadd hadoopsudo useradd -s /bin/bash -d /home/hadoop -g hadoop -G hadoop,sudo hadoopsudo passwd hadoop#这里设置hadoop帐号的密码为hadoopsudo chown /home/hadoop hadoopsudo chgrp /home/hadoop hadoop

JDK安装

到Oracle主页中下载JDK8

http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
加压文件
$ sudo mkdir /usr/lib/jvm$ sudo tar zxvf jdk-8u5-linux-x64.gz -C /usr/lib/jvm$ cd /usr/lib/jvm$ sudo ln -s jdk1.8.0_05 java
在~/.bashrc文件中添加以下环境变量:
export JAVA_HOME=/usr/lib/jvm/java export JRE_HOME=${JAVA_HOME}/jre  export CLASSPATH=.:${JAVA_HOME}/lib/tools.jar:${JAVA_HOME}/lib/dt.jar  export PATH=${JAVA_HOME}/bin:${PATH}  

配置默认JDK版本:

sudo update-alternatives --install /usr/bin/java java /usr/lib/jvm/java/bin/java 300  sudo update-alternatives --install /usr/bin/javac javac /usr/lib/jvm/java/bin/javac 300  sudo update-alternatives --install /usr/bin/jar jar /usr/lib/jvm/java/bin/jar 300   sudo update-alternatives --install /usr/bin/javah javah /usr/lib/jvm/java/bin/javah 300   sudo update-alternatives --install /usr/bin/javap javap /usr/lib/jvm/java/bin/javap 300  sudo update-alternatives --config java 

zookeeper安装

解压zookeeper-3.4.6.tar.gz到/opt目录,建立/opt/zookeeper 到/opt/zookeeper-3.4.6的软链接。重命名/opt/zookeeper/conf/zoo_sample.cfg文件为/opt/zookeeper/conf/zoo.cfg文件,修改/添加一下配置:

dataDir=/home/hadoop/zookeeperserver.1=hadoop-rabbit:2888:3888server.2=hadoop-cat:2888:3888server.3=hadoop-snake:2888:3888
在/home/hadoop/zookeeper/目录中添加myid文件,其内容为各个机器上配置的编号:1/2/3。

启动zookeeper以及检查状态:
./zkServer.sh start./zkServer.sh statusjps#Result:Mode: follower/leader1170 QuorumPeerMain


Hadoop安装

类似JDK安装,Hadoop安装只需要将安装包解压到指定位置,然后设置环境变量即可,这里将hadoop安装在/opt/目录下。
sudo tar zxvf hadoop-2.4.0.tar.gz -C /opt/sudo ln -s hadoop-2.4.0 hadoopexport HADOOP_HOME=/opt/hadoopexport PATH=${HADOOP_HOME}/bin:${PATH}

集群架构

主机名安装软件运行程序hadoop-lionjdk, hadoopNameNode, DFSZKFailoverControllerhadoop-tigerjdk, hadoopNameNode, DFSZKFailoverControllerhadoop-eaglejdk, hadoopResourceManagerhadoop-rabbitjdk, hadoop, zookeeperDataNode, NodeManager, JournalNode, QuorumPeerMainhadoop-snakejdk, hadoop, zookeeperDataNode, NodeManager, JournalNode, QuorumPeerMainhadoop-catjdk, hadoop, zookeeperDataNode, NodeManager, JournalNode, QuorumPeerMain

配置修改

在hadoop 2.4.0版本中,所有配置位于${HADOOP_HOME}/etc/hadoop目录中,我们一般需要hadoop-env.sh、slaves、core-site.xml、hdfs-site.xml、yarn-site.xml、mapred-site.xml等文件

#hadoop-env.sh文件修改export JAVA_HOME=/usr/lib/jvm/javaexport HADOOP_WORK=/home/hadoopexport HADOOP_LOG_DIR=${HADOOP_WORK}/logsexport HADOOP_PID_DIR=${HADOOP_WORK}/pid#yarn-env.sh文件修改export YARN_LOG_DIR=/home/hadoop/logs/yarn
对每台NameNode, ResourceManager,配置它的Slaves(hadoop-lion, hadoop-tiger, hadoop-eagle),修改${HADOOP_HOME}/etc/hadoop/slaves文件:
hadoop-rabbithadoop-snakehadoop-cat
core-site.xml配置(所有节点):
  <property>    <name>fs.defaultFS</name>    <value>hdfs://ns</value>  </property>  <property>    <name>hadoop.tmp.dir</name>    <value>/home/hadoop/tmp</value>  </property>  <property>    <name>ha.zookeeper.quorum</name>    <value>hadoop-rabbit:2181,hadoop-cat:2181,hadoop-snake:2181</value>  </property>
hdfs-site.xml配置:
<configuration>  <property>    <name>dfs.nameservices</name>    <value>mycluster</value>  </property>  <property>    <name>dfs.ha.namenodes.mycluster</name>    <value>nn,snn</value>  </property>  <property>    <name>dfs.namenode.rpc-address.mycluster.nn</name>    <value>hadoop-lion:9000</value>  </property>  <property>    <name>dfs.namenode.http-address.mycluster.nn</name>    <value>hadoop-lion:50070</value>  </property>  <property>    <name>dfs.namenode.rpc-address.mycluster.snn</name>    <value>hadoop-tiger:9000</value>  </property>  <property>    <name>dfs.namenode.http-address.mycluster.snn</name>    <value>hadoop-tiger:50070</value>  </property>  <property>    <name>dfs.namenode.shared.edits.dir</name>    <value>qjournal://hadoop-rabbit:8485;hadoop-cat:8485;hadoop-snake:8485/journal</value>  </property>  <property>    <name>dfs.journalnode.edits.dir</name>    <value>/home/hadoop/journal</value>  </property>  <property>    <name>dfs.ha.automatic-failover.enabled</name>    <value>true</value>  </property>  <property>    <name>dfs.client.failover.proxy.provider.ns</name>    <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>  </property>  <property>    <name>dfs.ha.fencing.methods</name>    <value>sshfence</value>  </property>  <property>    <name>dfs.ha.fencing.ssh.private-key-files</name>    <value>/home/hadoop/.ssh/id_rsa</value>  </property></configuration>

mapred-site.xml配置:

  <property>    <name>mapreduce.framework.name</name>    <value>yarn</value>  </property>

yarn-site.xml配置:
  <property> <!--ResourceManager/NodeManager节点-->    <name>yarn.resourcemanager.hostname</name>    <value>hadoop-eagle</value>  </property>  <property> <!--所有节点-->    <name>yarn.nodemanager.aux-services</name>    <value>mapreduce_shuffle</value>e  </property>

启动HDFS

#在所有DataNode中启动Journal节点,在NameNode中执行(hadoop-lion, ${HADOOP_HOME}/sbin/)./hadoop-daemons.sh start journalnode#格式化NameNode,在NameNode节点中执行(hadoop-lion)hadoop namenode -format#将hadoop-lion格式化后的name文件拷贝给hadoop-tiger中的namenodescp -r /home/hadoop/name/* hadoop-tiger:/home/hadoop/name/#在zookeeper中格式化:hdfs zkfc -formatZK#在hadoop-rabbit中使用${ZOOKEEPER_HOME}/bin/zkCli.sh验证:ls /WatchedEvent state:SyncConnected type:None path:null[zk: localhost:2181(CONNECTED) 0] ls /[zookeeper, hadoop-ha]#启动hdfs,在NodeNode节点中执行(hadoop-lion, ${HADOOP_HOME}/sbin/)./start-dfs.sh#启动yarn,在ResourceManager节点中执行(hadoop-eagle)${HADOOP_HOME}/sbin/start-yarn.sh

启动验证

在每台Server上运行jps命令,对应的结果:

#hadoop-lion/hadoop-tigerNameNodeDFSZKFailoverController#hadoop-eagleResourceManager#hadoop-rabbit/hadoo-cat/hadoop-snakeNodeManagerDataNodeQuorumPeerMainJournalNode

向cluster中添加文件:
#${HADOOP_HOME}/bin./hdfs dfs -put <your-file> hdfs://hadoop-lion:9001/<dest-path>

列出cluster中的文件
#${HADOOP_HOME}/bin./hdfs dfs -ls hdfs://hadoop-lion:9001/<dest-path>#Result:Found 1 items-rw-r--r--   3 hadoop supergroup  138943699 2014-06-24 12:04 hdfs://hadoop-lion:9001/hadoop-2.4.0.tar.gz

删除cluster中的文件
#${HADOOP_HOME}/bin./hdfs dfs -rm hdfs://hadoop-lion:9001/<dest-path>

查看所有正在运行的NodeManager
#${HADOOP_HOME}/bin./yarn node -list#Result:Total Nodes:3         Node-Id             Node-StateNode-Http-AddressNumber-of-Running-Containershadoop-cat:35488        RUNNING  hadoop-cat:8042                           0hadoop-snake:41088        RUNNINGhadoop-snake:8042                           0hadoop-rabbit:60470        RUNNINGhadoop-rabbit:8042                           0


查看指定NodeManager的状态

#${HADOOP_HOME}/bin./yarn node -status hadoop-rabbit:60470#Result:Node Report :         Node-Id : hadoop-rabbit:60470        Rack : /default-rack        Node-State : RUNNING        Node-Http-Address : hadoop-rabbit:8042        Last-Health-Update : Tue 24/Jun/14 03:25:51:193PDT        Health-Report :         Containers : 0        Memory-Used : 0MB        Memory-Capacity : 8192MB        CPU-Used : 0 vcores        CPU-Capacity : 8 vcores


问题解决

问题1 - hadoop native库基于32位系统编译,如果使用64位机器会遇到如下消息:
Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library /opt/hadoop-2.4.0/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.14/06/24 15:02:36 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
解决方案:在64位机器上重新编译hadoop,并将native库替换位编译后的native库。

问题2 - HDFS使用的默认端口为8020,这里我们将它修改成9001,因而如果不指定端口,会得到如下错误:
ls: Call From hadoop-rabbit/172.16.112.134 to hadoop-lion:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused

问题3 - DataNode启动时连不上NameNode:
2014-05-04 10:43:33,970 WARNorg.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server:hadoop-lion/172.16.112.133:9000 2014-05-04 10:43:55,009 INFOorg.apache.hadoop.ipc.Client: Retrying connect to server:hadoop-lion/172.16.112.133:9000. Already tried 0 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
解决方案:该问题在于NameNode的/etc/hosts文件中存在:“127.0.0.1 hadoop-lion”的映射,导致NameNode在启动时,监听了127.0.0.1中的端口(netstat: tcp 0 0 127.0.0.1:37072 127.0.0.1:9000 TIME_WAIT),因而将/etc/hosts文件中这个映射移除即可。(参考:http://blog.csdn.net/renfengjun/article/details/25320043)


问题4 - 在跑MapReduce Job时,遇到因时钟引起的YarnException,如下所示:

14/06/24 17:47:56 INFO mapreduce.Job: Job job_1403655403913_0001 failed with state FAILED due to: Application application_1403655403913_0001 failed 2 times due to Error launching appattempt_1403655403913_0001_000002. Got exception: org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container. This token is expired. current time is 1403656180363 found 1403656025286at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
这时因为不同机器的时间、时区设置不同引起的,即我们需要保证所有的机器使用相同的时钟:1. 保证它们使用同一个时区;2. 定时的和网络服务器同步时间。
1. 修改/etc/timezone内容为:Asia/Shanghai2. 替换/etc/localtime内容为/usr/share/zoneinfo/Asia/Shanghaisudo cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime3. 在.bashrc文件中设置TZ变量:TZ='Asia/Shanghai'; export TZ4. 创建一个crontab,在每天23:00和网络服务器同步时钟(sudo crontab -e):0 23 * * * ntpdate time.asia.apple.com >> /var/log/ntpdate.log为了查看crontab日志,以防以上命令没有执行成功但是不知道哪里出错了,需要将crontab的日志打开:#取消“cron.* /var/log/cron.log”注释sudo vi /etc/rsyslog.d/50-default.conf 

问题5 - 在跑MapReduce程序时出现如下错误
14/06/28 11:28:33 INFO mapreduce.Job: Task Id : attempt_1403925887413_0001_m_000000_1, Status : FAILEDContainer launch failed for container_1403925887413_0001_01_000003 : org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:mapreduce_shuffle does not existat sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)at java.lang.reflect.Constructor.newInstance(Constructor.java:408)at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152)at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
解决方案:这是因为在yarn-site.xml配置文件中少了"yarn.nodemanager.aux-services"项,将它添加对应配置文件中即可:
  <property> <!--所有节点-->    <name>yarn.nodemanager.aux-services</name>    <value>mapreduce_shuffle</value>e  </property>


问题6 - 在运行格式化命令时出现如下错误:

14/06/30 00:13:33 ERROR namenode.FSNamesystem: FSNamesystem initialization failed.java.io.IOException: Invalid configuration: a shared edits dir must not be specified if HA is not enabled.at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:710)at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:654)at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:838)at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1256)
问题解决:配置dfs.ha.namenodes.<mycluster>


例子程序

echo "This is just a test for hadoop! Hello world." > /tmp/words.txthdfs dfs -mkdir -p hdfs://hadoop-lion:9000/sample/wordcounthdfs dfs -put /tmp/words.txt hdfs://hadoop-lion:9000/sample/wordcount/words.txthadoop jar ${HADOOP_HOME}/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.0.jar wordcount hdfs://hadoop-lion:9000/sample/wordcount/words.txt hdfs://hadoop-lion:9000/sample/wordcount/output
查看输出结果:
#hdfs dfs -cat hdfs://hadoop-lion:9001/sample/wordcount/output/part-r-00000Hello1This1World.1a1for1hadoop!.1is1just1test1







0 0