Hadoop2.7集群安装

来源:互联网 发布:windows android忘仙 编辑:程序博客网 时间:2024/06/04 22:47


 

 


目 录

1      准备... 1

2      Host1

3      SSH.. 1

4      文件目录... 2

5      Hadoop的安装与配置... 2

5.1            解压缩... 2

5.2            环境变量... 3

5.3            Hadoop的配置... 3

5.3.1       core-site.xml3

5.3.2       hdfs-site.xml4

5.3.3       mapred-site.xml  (新增)5

5.3.4       yarn-site.xml5

5.3.5       配置hadoop-env.sh. 6

5.3.6       配置slaves6

5.3.7       scp远程拷贝... 7

5.4            运行Hadoop. 7

5.4.1       运行HDFS. 7

5.4.1.1              格式化NameNode. 7

5.4.1.2              启动NameNode. 7

5.4.1.3              启动DataNode. 8

5.4.2       SLAVE2上执行命令... 8

5.4.3       SLAVE3上执行命令... 8

5.4.4       start-dfs.sh  运行HDFS. 9

5.4.5       运行YARN.. 9

5.4.6       start-yarn.sh 运行YARN.. 10

5.5            快捷启动整个hadoop. 11

5.6            测试Hadoop. 12

5.6.1       测试HDFS. 12

5.6.2       测试YARN.. 12

5.6.3       测试mapreduce. 12

 


1      准备

1、JDK7,详情请参考《CentOS7下安装JDK7.docx》

2、安装包hadoop-2.7.2.tar.gz

 

三台机器

10.1.1.241 MASTER1

10.1.1.242 SLAVE2

10.1.1.243 SLAVE3

 

2     Host

由于我搭建Hadoop集群包含三台机器,所以需要修改调整各台机器的hosts文件配置,命令如下

三台服务器都要增加如下配置

[root@MASTER1 bin]# vi /etc/hosts

10.1.1.241 MASTER1

10.1.1.242 SLAVE2

10.1.1.243 SLAVE3

 

3     SSH

由于NameNodeDataNode之间通信,使用了SSH,所以需要配置免登录。

首先登录Master机器,生成SSH的公钥,命令如下:

[root@MASTER1 bin]# ssh-keygen -t rsa 

Generating public/private rsa key pair.

Enter file in which to save the key(/root/.ssh/id_rsa):

/root/.ssh/id_rsa already exists.

Overwrite (y/n)? y

Enter passphrase (empty for no passphrase):

Enter same passphrase again:

Your identification has been saved in/root/.ssh/id_rsa.

Your public key has been saved in/root/.ssh/id_rsa.pub.

The key fingerprint is:

ba:78:58:05:bf:36:a6:68:6a:b1:36:18:a1:b2:35:f8root@MASTER1

The key's randomart image is:

+--[ RSA 2048]----+

|                 |

|     .          |

|      o         |

|.      o        |

|.o    .S.       |

|= +  ..=        |

|.* + +.+ .       |

|o E +.o.         |

| o.+...          |

执行命令后会在当前用户目录下生成.ssh目录,然后进入此目录将id_rsa.pub追加到authorized_keys文件中,命令如下:

 

[root@MASTER1 bin]# cd ~/.ssh

[root@MASTER1 .ssh]# cat id_rsa.pub>> authorized_keys

最后将authorized_keys文件复制到其它机器节点,命令如下

[root@MASTER1 .ssh]# scp authorized_keysroot@SLAVE2:/root/.ssh/

[root@MASTER1 .ssh]# scp authorized_keysroot@SLAVE3:/root/.ssh/

The authenticity of host 'slave3(10.1.1.243)' can't be established.

ECDSA key fingerprint is3e:77:b7:27:eb:c7:6c:d8:50:b1:1d:d2:8f:78:ee:2e.

Are you sure you want to continueconnecting (yes/no)? yes

Warning: Permanently added'slave3,10.1.1.243' (ECDSA) to the list of known hosts.

root@slave3's password: #此处输入root用户密码

scp: /root/.ssh/: Is a directory

 

4     文件目录

#三台服务器都需要执行

[root@MASTER1 .ssh]# mkdir -p/data/program/hdfs/{name,data,tmp}

 

5     Hadoop的安装与配置

5.1    解压缩

[root@ MASTER1 ~]# cd /data/software

[root@MASTER1 software]# mkdir/data/program/hadoop

[root@MASTER1 software]# tar zxvfhadoop-2.7.2.tar.gz -C /data/program/hadoop

 

 

5.2    环境变量

[root@MASTER1 software]# vi /etc/profile  #完全拷贝下面内容

#hadoop

exportHADOOP_DEV_HOME=/data/program/hadoop/hadoop-2.7.2

export PATH=$PATH:$HADOOP_DEV_HOME/bin

export PATH=$PATH:$HADOOP_DEV_HOME/sbin

exportHADOOP_MAPARED_HOME=${HADOOP_DEV_HOME}

exportHADOOP_COMMON_HOME=${HADOOP_DEV_HOME}

export HADOOP_HDFS_HOME=${HADOOP_DEV_HOME}

export YARN_HOME=${HADOOP_DEV_HOME}

exportHADOOP_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop

exportHDFS_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop

exportYARN_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop

 

#使其立即生效

[root@MASTER1 software]# . /etc/profile

 

 

5.3    Hadoop的配置

进入hadoop的配置目录

[root@MASTER1software]# cd /data/program/hadoop/hadoop-2.7.2/etc/hadoop

 

依次修改core-site.xmlhdfs-site.xmlmapred-site.xmlyarn-site.xml文件

 

 

5.3.1       core-site.xml

<configuration>

<property>

 <name>hadoop.tmp.dir</name>

 <value>file: /data/program/hdfs/tmp</value>

 <description>A base for other temporarydirectories.</description>

</property>

<property>

 <name>io.file.buffer.size</name>

 <value>131072</value>

</property>

<property>

 <name>fs.default.name</name>

 <value>hdfs://MASTER1:9000</value>

</property>

<property>

<name>hadoop.proxyuser.root.hosts</name>

<value>*</value>

</property>

<property>

<name>hadoop.proxyuser.root.groups</name>

<value>*</value>

</property>

</configuration>

 

 

 

5.3.2       hdfs-site.xml

 

<configuration>

<property>

  <name>dfs.replication</name>

 <value>2</value>

</property>

<property>

 <name>dfs.namenode.name.dir</name>

 <value>/data/program/hdfs/name</value>

 <final>true</final>

</property>

<property>

 <name>dfs.datanode.data.dir</name>

 <value>file:/data/program/hdfs/data</value>

 <final>true</final>

</property>

<property>

 <name>dfs.namenode.secondary.http-address</name>

 <value>MASTER1:9001</value>

</property>

<property>

 <name>dfs.webhdfs.enabled</name>

 <value>true</value>

</property>

<property>

 <name>dfs.permissions</name>

 <value>false</value>

</property>

</configuration>

 

5.3.3       mapred-site.xml  (新增)

<configuration>

<property>

 <name>mapreduce.framework.name</name>

 <value>yarn</value>

</property>

</configuration>

 

5.3.4       yarn-site.xml

<?xml version="1.0"?>

<configuration>

       <property>

        <name>yarn.resourcemanager.address</name>

         <value>MASTER1:18040</value>

       </property>

       <property>

        <name>yarn.resourcemanager.scheduler.address</name>

         <value>MASTER1:18030</value>

       </property>

       <property>

        <name>yarn.resourcemanager.webapp.address</name>

         <value>MASTER1:18088</value>

       </property>

       <property>

        <name>yarn.resourcemanager.resource-tracker.address</name>

         <value>MASTER1:18025</value>

       </property>

       <property>

         <name>yarn.resourcemanager.admin.address</name>

         <value>MASTER1:18141</value>

       </property>

       <property>

        <name>yarn.nodemanager.aux-services</name>

         <value>mapreduce_shuffle</value>

       </property>

       <property>

        <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>

        <value>org.apache.hadoop.mapred.ShuffleHandler</value>

       </property>

   <property>

       <name>yarn.nodemanager.resource.memory-mb</name>

       <value>2048</value>

   </property>

   <property>

       <name>yarn.nodemanager.resource.cpu-vcores</name>

       <value>1</value>

   </property>

</configuration>

 

5.3.5       配置hadoop-env.sh

[root@MASTER1 .ssh]# vi/data/program/hadoop/hadoop-2.7.2/etc/hadoop/hadoop-env.sh

export JAVA_HOME=/usr/java/jdk1.7.0_40

5.3.6       配置slaves

[root@MASTER1 program]# cd /data/program/hadoop/hadoop-2.7.2/etc/hadoop

[root@MASTER1 hadoop]# vi slaves

#输入以下内容,删除默认的localhost,增加2个从节点,

10.1.1.242

10.1.1.243

5.3.7       scp远程拷贝

将整个hadoop文件夹及其子文件夹使用scp复制到两台Slave的相同目录中

[root@MASTER1 hadoop]# cd /data/program

[root@MASTER1 program]# scp -r hadooproot@SLAVE2:/data/program/

[root@MASTER1 program]# scp -r hadooproot@SLAVE3:/data/program/

 

5.4    运行Hadoop

 

5.4.1      运行HDFS

5.4.1.1    格式化NameNode

 

执行命令

[root@MASTER1program]# hadoop namenode –format

 

5.4.1.2    启动NameNode

[root@MASTER1 program]# hadoop-daemon.shstart namenode

starting namenode, logging to/data/program/hadoop/hadoop-2.7.2/logs/hadoop-root-namenode-MASTER1.out

MASTER1上执行jps命令,得到如下结果

[root@MASTER1 program]# jps

15056 NameNode

15129 Jps

 

5.4.1.3    启动DataNode

执行命令如下

[root@MASTER1 hadoop]# hadoop-daemons.shstart datanode

10.1.1.243: starting datanode, logging to/data/program/hadoop/hadoop-2.7.2/logs/hadoop-root-datanode-SLAVE3.out

10.1.1.242: starting datanode, logging to/data/program/hadoop/hadoop-2.7.2/logs/hadoop-root-datanode-SLAVE2.out

5.4.2       SLAVE2上执行命令

[root@SLAVE2 hadoop]# ps -ef | grep hadoop

root     7610     1 18 22:50 ?        00:00:07 /usr/java/jdk1.7.0_40/bin/java-Dproc_datanode -Xmx1000m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/data/program/hadoop/hadoop-2.7.2/logs-Dhadoop.log.file=hadoop.log-Dhadoop.home.dir=/data/program/hadoop/hadoop-2.7.2 -Dhadoop.id.str=root-Dhadoop.root.logger=INFO,console-Djava.library.path=/data/program/hadoop/hadoop-2.7.2/lib/native-Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true-Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true-Dhadoop.log.dir=/data/program/hadoop/hadoop-2.7.2/logs-Dhadoop.log.file=hadoop-root-datanode-SLAVE2.log-Dhadoop.home.dir=/data/program/hadoop/hadoop-2.7.2 -Dhadoop.id.str=root-Dhadoop.root.logger=INFO,RFA-Djava.library.path=/data/program/hadoop/hadoop-2.7.2/lib/native-Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -server-Dhadoop.security.logger=ERROR,RFAS -Dhadoop.security.logger=ERROR,RFAS-Dhadoop.security.logger=ERROR,RFAS -Dhadoop.security.logger=INFO,RFASorg.apache.hadoop.hdfs.server.datanode.DataNode

root     7737  3729  0 22:50 pts/0    00:00:00 grep --color=auto hadoop

5.4.3       SLAVE3上执行命令

[root@SLAVE3 hadoop]# ps -ef | grep hadoop

root     5469     1 12 22:50 ?        00:00:07 /usr/java/jdk1.7.0_40/bin/java-Dproc_datanode -Xmx1000m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/data/program/hadoop/hadoop-2.7.2/logs-Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/data/program/hadoop/hadoop-2.7.2-Dhadoop.id.str=root -Dhadoop.root.logger=INFO,console-Djava.library.path=/data/program/hadoop/hadoop-2.7.2/lib/native-Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true-Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true-Dhadoop.log.dir=/data/program/hadoop/hadoop-2.7.2/logs-Dhadoop.log.file=hadoop-root-datanode-SLAVE3.log-Dhadoop.home.dir=/data/program/hadoop/hadoop-2.7.2 -Dhadoop.id.str=root-Dhadoop.root.logger=INFO,RFA -Djava.library.path=/data/program/hadoop/hadoop-2.7.2/lib/native-Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -server-Dhadoop.security.logger=ERROR,RFAS -Dhadoop.security.logger=ERROR,RFAS-Dhadoop.security.logger=ERROR,RFAS -Dhadoop.security.logger=INFO,RFASorg.apache.hadoop.hdfs.server.datanode.DataNode

root     5644  3333  0 22:50 pts/0    00:00:00 grep --color=auto hadoop

 

5.4.4       start-dfs.sh  运行HDFS

以上启动NameNodeDataNode的方式,可以用start-dfs.sh脚本替代

[root@MASTER1 hadoop]# start-dfs.sh

Starting namenodes on [MASTER1]

MASTER1: starting namenode, logging to/data/program/hadoop/hadoop-2.7.2/logs/hadoop-root-namenode-MASTER1.out

10.1.1.242: starting datanode, logging to/data/program/hadoop/hadoop-2.7.2/logs/hadoop-root-datanode-SLAVE2.out

10.1.1.243: starting datanode, logging to/data/program/hadoop/hadoop-2.7.2/logs/hadoop-root-datanode-SLAVE3.out

Starting secondary namenodes [MASTER1]

MASTER1: starting secondarynamenode,logging to/data/program/hadoop/hadoop-2.7.2/logs/hadoop-root-secondarynamenode-MASTER1.out

 

stop-dfs.sh停止

[root@MASTER1hadoop]# stop-dfs.sh

Stoppingnamenodes on [MASTER1]

MASTER1:stopping namenode

10.1.1.242:stopping datanode

10.1.1.243:stopping datanode

Stoppingsecondary namenodes [MASTER1]

MASTER1:stopping secondarynamenode

 

5.4.5      运行YARN

 

运行Yarn也有与运行HDFS类似的方式。启动ResourceManager使用以下命令

[root@MASTER1 hadoop]# yarn-daemon.sh startresourcemanager

starting resourcemanager, logging to/data/program/hadoop/hadoop-2.7.2/logs/yarn-root-resourcemanager-MASTER1.out

 

批量启动多个NodeManager使用以下命令

[root@MASTER1 hadoop]# yarn-daemons.shstart nodemanager 

10.1.1.242: starting nodemanager, loggingto /data/program/hadoop/hadoop-2.7.2/logs/yarn-root-nodemanager-SLAVE2.out

10.1.1.243: starting nodemanager, loggingto /data/program/hadoop/hadoop-2.7.2/logs/yarn-root-nodemanager-SLAVE3.out

 

在两台Slave上执行jps,也会看到NodeManager运行正常

[root@SLAVE2hadoop]# jps

14504NodeManager

14680 Jps

11887 DataNode

 

5.4.6       start-yarn.sh 运行YARN

以上方式我们就不赘述了,来看看使用start-yarn.sh的简洁的启动方式

[root@MASTER1 hadoop]# start-yarn.sh

starting yarn daemons

starting resourcemanager, logging to/data/program/hadoop/hadoop-2.7.2/logs/yarn-root-resourcemanager-MASTER1.out

10.1.1.242: starting nodemanager, loggingto /data/program/hadoop/hadoop-2.7.2/logs/yarn-root-nodemanager-SLAVE2.out

10.1.1.243: starting nodemanager, loggingto /data/program/hadoop/hadoop-2.7.2/logs/yarn-root-nodemanager-SLAVE3.out

 

stop-yarn.sh停止命令

[root@MASTER1 hadoop]# stop-yarn.sh

stopping yarn daemons

stopping resourcemanager

10.1.1.242: no nodemanager to stop

10.1.1.243: no nodemanager to stop

 

Master上执行jps

[root@MASTER1 hadoop]# jps

11729 ResourceManager

10933 NameNode

11999 Jps

说明ResourceManager运行正常

 

 

5.5    快捷启动整个hadoop

#启动

[root@MASTER1 mapreduce]# start-all.sh

This script is Deprecated. Instead usestart-dfs.sh and start-yarn.sh

Starting namenodes on [MASTER1]

MASTER1: starting namenode, logging to/data/program/hadoop/hadoop-2.7.2/logs/hadoop-root-namenode-MASTER1.out

10.1.1.243: starting datanode, logging to/data/program/hadoop/hadoop-2.7.2/logs/hadoop-root-datanode-SLAVE3.out

10.1.1.242: starting datanode, logging to/data/program/hadoop/hadoop-2.7.2/logs/hadoop-root-datanode-SLAVE2.out

Starting secondary namenodes [MASTER1]

MASTER1: starting secondarynamenode,logging to /data/program/hadoop/hadoop-2.7.2/logs/hadoop-root-secondarynamenode-MASTER1.out

starting yarn daemons

starting resourcemanager, logging to/data/program/hadoop/hadoop-2.7.2/logs/yarn-root-resourcemanager-MASTER1.out

10.1.1.243: starting nodemanager, loggingto /data/program/hadoop/hadoop-2.7.2/logs/yarn-root-nodemanager-SLAVE3.out

10.1.1.242: starting nodemanager, loggingto /data/program/hadoop/hadoop-2.7.2/logs/yarn-root-nodemanager-SLAVE2.out

#停止

[root@MASTER1 mapreduce]# stop-all.sh

This script is Deprecated. Instead usestop-dfs.sh and stop-yarn.sh

Stopping namenodes on [MASTER1]

MASTER1: stopping namenode

10.1.1.242: stopping datanode

10.1.1.243: stopping datanode

Stopping secondary namenodes [MASTER1]

MASTER1: stopping secondarynamenode

stopping yarn daemons

stopping resourcemanager

10.1.1.242: stopping nodemanager

10.1.1.243: stopping nodemanager

no proxyserver to stop

5.6    测试Hadoop

5.6.1       测试HDFS

5.6.2       测试YARN

http://10.1.1.241:18088/cluster

可以访问YARN的管理界面,验证YARN,如下图所示:

 

5.6.3      测试mapreduce

本人比较懒,不想编写mapreduce代码。幸好Hadoop安装包里提供了现成的例子,在Hadoopshare/hadoop/mapreduce目录下。运行例子

[root@MASTER1 hadoop]# cd/data/program/hadoop/hadoop-2.7.2/share/hadoop/mapreduce

[root@MASTER1 mapreduce]# hadoop jarhadoop-mapreduce-examples-2.7.2.jar  pi 510

 

 

0 0
原创粉丝点击