hadoop2.6 安装

来源:互联网 发布:js商品计算公式 编辑:程序博客网 时间:2024/05/14 18:05

Hadoop2.6.0集群搭建

===============

在多台服务器上搭建Hadoop2.6.0平台,操作系统为CentOS/RHEL 64位。使用Hadoop为2.6.0的源文件进行64位版本编译。

===============

注意事项:

1.环境:笔记本电脑,装了2台centos6.6虚拟机,每台虚拟机分配:1G内存,20G硬盘。其中xml做了较多修改,参考:安装配置手册:http://blog.csdn.net/licongcong_0224/article/details/12972889

2.命令基本在hadoop用户下进行,不要贪图简便在root用户下运行,会出错

3.区分root和hadoop用户(# 和 $)

4./etc/profile是在根目录,由root修改就行;~/.bash_profile是在每个用户的根目录下(例如:/home/hadoop;/home/hongyuqin)

5.配置错误,重新配置时,建议删除~/hdfs,重新mkdir…,删除~/hadoop-2.6.0/logs,重新mkdir…;再hadoop namenode -format

6.添加了 wordcount 测试实例

===============

第一步:创建用户


CentOS增加hadoop用户,设定密码,加入/etc/sudoers配置(需要chmod u+w)

# useradd hadoop# passwd hadoop

如果需要配置成sudo用户的话:

# chmod u+w /etc/sudoers# vi /etc/sudoers 添加 xxx ALL=(ALL) ALL# chmod u-w /etc/sudoers

补充:更改主机名

#vim /etc/hosts(先把备份后内容清空)  192.168.136.131 master  192.168.136.130 slave#vim /etc/selinux/config#改成disabledsetenforce 0#vim /etc/sysconfig/networkHOSTNAME=slave

第二步:安装Java


原先的linux里已经安装了Java,在安装完新版后选择新版本。

下载jdk-7u71-linux-x64.rpm

$ sudo rpm -ivh jdk-7u71-linux-x64.rpm

编辑sudo vim /etc/profile

export JAVA_HOME=/usr/java/jdk1.7.0_71export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jarexport PATH=$PATH:$JAVA_HOME/bin

把新的jdk加入到使用的备选项

$ sudo update-alternatives --install /usr/bin/java java /usr/java/jdk1.7.0_71/bin/java 300$ sudo update-alternatives --install /usr/bin/javac javac /usr/java/jdk1.7.0_71/bin/javac 300 

更改默认版本

$ sudo update-alternatives --config java  

会出现一些备选项,选择刚刚加入的版本即可

查验:

 #java -version # source /etc/profile # echo $PATH

第三步:配置Key


$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys$ chmod 600 ~/.ssh/authorized_keys$ chmod 700 ~/.ssh/

设置并重启sshd服务

#vi /etc/ssh/sshd_configRSAAuthentication yesPubkeyAuthentication yesAuthorizedKeysFile      .ssh/authorized_keys#service sshd restart

测试一下本机ok否:

$ ssh localhost 

对slave机器分别执行:

#ssh-copy-id hadoop@slave

测试登录

 $ ssh slave

第四步:编译Hadoop


在64位机器上想安装64位的Hadoop只能下载源码自己编译,因为Hadoop只提供了32位的bin。编译工具集需要,如Maven、Findbugs、protobuf、CMake、zlib等等。

# yum -y update# yum -y install svn ncurses-devel gcc*# yum -y install lzo-devel zlib-devel autoconf automake libtool cmake openssl-devel# yum -y install gcc make cmake zlib zlib-devel openssl glibc-headers gcc-c++

然后检查下基本的有没有齐全

#rpm -ql zlib-devel/usr/include/zconf.h/usr/include/zlib.h/usr/lib64/libz.so/usr/lib64/pkgconfig/zlib.pc/usr/share/doc/zlib-devel-1.2.3/usr/share/doc/zlib-devel-1.2.3/README/usr/share/doc/zlib-devel-1.2.3/algorithm.txt/usr/share/doc/zlib-devel-1.2.3/example.c/usr/share/doc/zlib-devel-1.2.3/minigzip.c/usr/share/man/man3/zlib.3.gz
#rpm -ql zlib      /lib64/libz.so.1/lib64/libz.so.1.2.3/usr/share/doc/zlib-1.2.3/usr/share/doc/zlib-1.2.3/ChangeLog/usr/share/doc/zlib-1.2.3/FAQ/usr/share/doc/zlib-1.2.3/README
# vi + /etc/profile(添加下述)export LD_LIBRARY_PATH=/lib64:/usr/lib64export ZLIB_INCLUDE_DIR=/lib64:/usr/lib64

1. 安装maven

wget http://mirror.bit.edu.cn/apache/maven/maven-3/3.2.5/binaries/apache-maven-3.2.5-bin.tar.gz

解压到目录/usr/local/apache-maven

# vi + /etc/profile添加

export MAVEN_HOME=/usr/local/apache-maven-3.2.5export PATH=$PATH:$MAVEN_HOME/bin

执行source /etc/profile使之生效,测试一下

$ echo $MAVEN_HOME$ mvn –v

2. 安装ant

百度云盘下载 :http://pan.baidu.com/s/1c0vjhBy$tar xvzf apache-ant-1.9.4-bin.tar.gz$sudo mv apache-ant-1.9.4 /usr/local

然后在/etc/profile中添加环境变量

export ANT_HOME=/usr/local/apache-ant-1.9.4export PATH=$PATH:$ANT_HOME/binexport LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$ANT_HOME/lib

3. 安装protocol buffers

百度云盘 :http://pan.baidu.com/s/1pJlZubT$cd protobuf-2.5.0$./configure$sudo make $sudo make check$sudo make install

%默认安装在/usr/local/bin/protoc和/usr/local/lib/*.so,最后检查一下

protoc --version 

显示如下

      libprotoc 2.5.0

4. 安装findbugs

$ wget ...$ tar xvzf findbugs-3.0.0.tar.gz$ mv findbugs-3.0.0 findbugs$ sudo mv findbugs /usr/local/$ vim /etc/profile
export FINDBUGS_HOME=/usr/local/findbugsexport PATH=$PATH:$FINDBUGS_HOME/bin

#PATH中加入$FINDBUGS_HOME/bin

5. 编译Hadoop

# cd ~官网下载:http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.6.0/hadoop-2.6.0-src.tar.gz# tar zxvf hadoop-2.6.0-src.tar.gz -C /home/hadoop/# chown –R hadoop /home/hadoop/# source /etc/profile

进入Hadoop源码目录根目录

# cd /home/hadoop-2.6.0-src# mvn clean package -Pdist,native -DskipTests -Dtar或者# mvn package -Pdist,native,doc -DskipTest -Dtar -Dmaven.test.skip=true  跳过测试

编译成功后 source目录下的 /hadoop-dist/target/hadoop-2.6.0.tar.gz就是需要的文件了

编译好的文件传到各个slave机器上:

# scp hadoop-2.6.0.tar.gz hadoop@slave:~

在master,slave机器上:

# tar -xzvf hadoop-2.6.0.tar.gz -C /home/hadoop# chown -R hadoop:hadoop /home/hadoop

第五步:配置Hadoop


先为当前hadoop用户配置环境变量

配置用户名为Hadoop的默认环境变量(如果没有在/etc/profile里配置JAVA_HOME也可以配置在bash_profile)

$cd ~$vi + .bash_profile#Hadoop variablesexport HADOOP_HOME=/home/hadoop/hadoop-2.6.0export HADOOP_INSTALL=/home/hadoop/hadoop-2.6.0export PATH=$PATH:$HADOOP_INSTALL/binexport PATH=$PATH:$HADOOP_INSTALL/sbinexport HADOOP_MAPRED_HOME=$HADOOP_INSTALLexport HADOOP_COMMON_HOME=$HADOOP_INSTALLexport HADOOP_HDFS_HOME=$HADOOP_INSTALLexport YARN_HOME=$HADOOP_INSTALLexport HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/nativeexport HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"alias h='cd $HADOOP_HOME'alias etc='cd $HADOOP_HOME/etc/hadoop'$source .bash_profile

hadoop-env.shyarn-env.sh里加入下面的内容:

$ vim hadoop-env.sh#modify JAVA_HOMEexport JAVA_HOME=/usr/java/jdk1.7.0_71

.bash_profile要在hadoop用户下编辑和传给slave,.bash_profile是在/home/hadoop/目录下:

$scp .bash_profile slave:~

还有十台机器

目录规划

  • HDFSNameNode元数据文件 dfs.namenode.name.dir /home/hadoop/hdfs/name

  • HDFS数据文件1 dfs.datanode.data.dir /home/hadoop/hdfs/data

  • HDFSNameNode备份文件目录 fs.checkpoint.dir /home/hadoop/hdfs/checkpoint

  • 临时文件 hadoop.tmp.dir /home/hadoop/hdfs/tmp

创建目录:

$ mkdir -p /home/hadoop/hdfs/name /home/hadoop/hdfs/data /home/hadoop/hdfs/checkpoint /home/hadoop/hdfs/tmp /home/hadoop/hdfs/tmp/nodemanager/local /home/hadoop/hdfs/tmp/nodemanager/remote /home/hadoop/hdfs/tmp/nodemanager/logs

检查一下所属是不是hadoop

$ ll /home/hadoop/hdfs drwxrwxr-x 2 hadoop hadoop 4096 Aug 12 09:25 checkpoint drwxrwxr-x 2 hadoop hadoop 4096 Aug 12 09:25 data drwxrwxr-x 3 hadoop hadoop 4096 Aug 13 00:23 name drwxrwxr-x 4 hadoop hadoop 4096 Aug 12 23:13 tmp

在各台slave机器上执行上述命令。

master进入目录cd /home/hadoop/hadoop-2.6.0/etc/hadoop

需要配置yarn-env.sh,hadoop-env.sh(这2个.sh本文前部分添加了JAVA_HOME),slaves,masters, core-site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.xml这8个文件.

1. 配置etc/master和etc/slaves

etc目录下

echo "master" > mastersecho -ne "slave\n" > slaves

2. 配置core-site.xml

需要先mkdir temp

<configuration>    <property>       <name>fs.defaultFS</name>       <value>hdfs://master:9000</value>    </property>   <property>        <name>hadoop.logfile.size</name>        <value>104857600</value>    </property>    <property>        <name>hadoop.tmp.dir</name>        <value>file:/home/hadoop/hdfs/tmp</value>    </property></configuration>

3. 配置hdfs-site.xml

configuration>    <property>    <name>dfs.name.dir</name>    <value>file:/home/hadoop/hdfs/name</value>    <description>Path on the local filesystem where the NameNode stores the namespace and transactions logs persistently.</description></property><property>    <name>dfs.data.dir</name>    <value>file:/home/hadoop/hdfs/data</value>    <description>Comma separated list of paths on the local filesystem of a DataNode where it should store its blocks.</description></property><property>    <name>dfs.replication</name>    <value>2</value></property></configuration>

4. 配置mapred-site.xml

首先copy mapred-site.xml.template到mapred-site.xml:

$ cp mapred-site.xml.template mapred-site.xml 
<configuration>  <property>    <name>mapreduce.framework.name</name>    <value>yarn</value>    <description>Execution framework set to Hadoop YARN.</description>  </property>  <property>    <name>mapreduce.jobhistory.address</name>    <value>master:10020</value>    <description>MapReduce JobHistory Server host:port, default port is 10020.</description>  </property>  <property>    <name>mapreduce.jobhistory.webapp.address</name>    <value>master:19888</value>    <description>MapReduce JobHistory Server Web UI host:port, default port is 19888.</description>  </property></configuration>

5. 配置yarn-site.xml

<configuration>    <property>    <name>yarn.nodemanager.aux-services</name>    <value>mapreduce_shuffle</value>    <description>shuffle service that needs to be set for Map Reduce to run </description>  </property>  <property>    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>    <value>org.apache.hadoop.mapred.ShuffleHandler</value>  </property>     <property>    <name>yarn.resourcemanager.address</name>    <value>master:8032</value>    <description>the host is the hostname of the ResourceManager and the  port is the port on which the clients can talk to the Resource Manager.    </description>  </property>    <property>    <name>yarn.resourcemanager.scheduler.address</name>    <value>master:8030</value>    <description>host is the hostname of the resourcemanager and port isthe port on which the Applications in the cluster talk to the Resource Manager.    </description>  </property>  <property>    <name>yarn.resourcemanager.resource-tracker.address</name>    <value>master:8031</value>    <description>host is the hostname of the resource manager and port is the port on which the NodeManagers contact the Resource Manager.     </description>  </property></configuration>
$ scp mapred-site.xml core-site.xml hdfs-site.xml yarn-site.xml masters slaves hadoop-env.sh yarn-env.sh  slave:~/hadoop-2.6.0/etc/hadoop/

第六步:开启Hadoop服务


进入/home/hadoop/hadoop-2.6.0/bin/下执行命令格式化namenode

$ hadoop namenode -format或者执行$ hdfs namenode -format

只要初始化一次就行,因为dfs里面有数据的话会弄丢

开启hadoop服务其实只用在master上运行

start-all.sh

可以在master:50070查看dataNode信息;

也可以命令行输入:

$ hdfs dfsadmin -report

查看,

Configured Capacity: 18645180416 (17.36 GB)Present Capacity: 12578476032 (11.71 GB)DFS Remaining: 12578250752 (11.71 GB)DFS Used: 225280 (220 KB)DFS Used%: 0.00%Under replicated blocks: 6Blocks with corrupt replicas: 0Missing blocks: 0-------------------------------------------------Live datanodes (1):Name: 192.168.136.130:50010 (slave)Hostname: slaveDecommission Status : NormalConfigured Capacity: 18645180416 (17.36 GB)DFS Used: 225280 (220 KB)Non DFS Used: 6066704384 (5.65 GB)DFS Remaining: 12578250752 (11.71 GB)DFS Used%: 0.00%DFS Remaining%: 67.46%Configured Cache Capacity: 0 (0 B)Cache Used: 0 (0 B)Cache Remaining: 0 (0 B)Cache Used%: 100.00%Cache Remaining%: 0.00%Xceivers: 1Last contact: Thu Aug 13 16:39:54 CST 2015

第七步:测试并访问Hadoop服务


在master机器上

配置好core-site.xml, hdfs-site.xml, yarn-site.xml mapred-site.xml是关键

在master运行 start-all.sh

$start-all.sh创建目录:$mkdir /home/hadoop/input$cd /home/hadoop/input创建文件:$touch wordcount1.txt$touch wordcount2.txt二、添加内容$echo "Hello World" > wordcount1$echo "Hello Hadoop" > wordcount2三、在hdfs上创建input目录$hadoop fs -mkdir /input四、拷贝文件到/input目录$hadoop fs -put /home/hadoop/input/* /input五、执行程序$hadoop jar /home/hadoop/hadoop-2.6.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount /input /output六、完成后查看输出目录$hadoop fs -ls /output七、查看输出结果,正确如下$hadoop fs -cat /output/part-r-00000 Hadoop 1 Hello 2 World 1

备注:

/etc/profile的内容

export JAVA_HOME=/usr/java/jdk1.7.0_71export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jarexport PATH=$PATH:$JAVA_HOME/binexport LD_LIBRARY_PATH=/lib64:/usr/lib64export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/libexport ZLIB_INCLUDE_DIR=/lib64:/usr/lib64export MAVEN_HOME=/usr/local/apache-maven-3.2.5export PATH=$PATH:$MAVEN_HOME/binexport ANT_HOME=/usr/local/apache-ant-1.9.4export PATH=$PATH:$ANT_HOME/binexport LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$ANT_HOME/libexport FINDBUGS_HOME=/usr/local/findbugsexport PATH=$PATH:$FINDBUGS_HOME/bin

~/.bash_profile内容

#Hadoop variablesexport HADOOP_HOME=/home/hadoop/hadoop-2.6.0export HADOOP_INSTALL=/home/hadoop/hadoop-2.6.0export PATH=$PATH:$HADOOP_INSTALL/binexport PATH=$PATH:$HADOOP_INSTALL/sbinexport HADOOP_MAPRED_HOME=$HADOOP_INSTALLexport HADOOP_COMMON_HOME=$HADOOP_INSTALLexport HADOOP_HDFS_HOME=$HADOOP_INSTALLexport YARN_HOME=$HADOOP_INSTALLexport HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/nativeexport HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"alias h='cd $HADOOP_HOME'alias etc='cd $HADOOP_HOME/etc/hadoop'

目录/home/hadoop/hadoop-2.6.0/etc/hadoop下

slaves的内容

slave

masters的内容

master

参考资料:

安装配置手册:http://blog.csdn.net/licongcong_0224/article/details/12972889

Hadoop2.3搭建完全实践:http://www.debugo.com/hadoop2-3-install/

1 0