集群安装:YARN安装(测试)

来源:互联网 发布:tuigirlba最新域名 编辑:程序博客网 时间:2024/05/29 08:49

1、准备工作:修改网卡和主机名(在三个节点同时操作)

   1)、修改网卡1: 

注意:此处只用一个网卡就可以了,本地IP10.10.10.1)可以不用配置

Vi/etc/sysconfig/network-scripts/ifcfg-eth0

 

修改网卡2:

 

 

设置好后重启网络服务:

 

看网卡的配置信息:

 

 

2)、修改主机名(修改/etc/hosts文件)(在三个节点同时操作)

[root@hadoop2 ~]# vi /etc/hosts

 

# Do not remove the following line, orvarious programs

# that require network functionality willfail.

127.0.0.1                localhost

192.168.6.52 hadoop2

10.10.10.2   hadoop2priv

 

192.168.6.51 hadoop1

10.10.10.1   hadoop1priv

 

192.168.6.53 hadoop3

10.10.10.3   hadoop3priv

 

注意:可以把此文件拷贝到节点2和节点3

 

修改完文件后,最后记得在相应的机器上执行hostname master(你修改后的名字) ,hostname slave1等;(此步可以省略)

[root@hadoop2 ~]# vi /etc/hostname

hadoop2

[root@hadoop2 ~]# hostname hadoop2

 

3)、关闭防火墙

需要关闭SELINX,执行:/usr/sbin/setenforce0    

注意:最好是手动关闭。

还有:要把各个服务器的防火墙给关闭了,不然,后面运行时会报错。

 

Linux关闭防火墙命令:1) 永久性生效,重启后不会复原 开启:chkconfigiptables on   关闭:chkconfig iptables off  2) 即时生效,重启后复原   开启:serviceiptables start    关闭:service iptables stop

 

 

2、创建用户和组,并配置信任关系来实现免密码登陆

1)、创建用户和组(在三个节点同时操作)

[root@hadoop ~]# groupadd -g 200 hadoop

[root@hadoop ~]# useradd -u 200 -g hadoophadoop

[root@hadoop ~]# passwd hadoop

Changing password for user hadoop.

New UNIX password:

BAD PASSWORD: it is based on a dictionaryword

Retype new UNIX password:

passwd: all authentication tokens updatedsuccessfully.

[root@hadoop ~]# su - hadoop

 

2)、配置信任关系

在节点1

在hadoop用户下生成密钥:rsa格式的密钥都选择默认格式   

[hadoop@hadoop ~]$ ssh-keygen -t rsa

Generating public/private rsa key pair.

Enter file in which to save the key(/home/hadoop/.ssh/id_rsa):

Enter passphrase (empty for nopassphrase):

Enter same passphrase again:

Your identification has been saved in/home/hadoop/.ssh/id_rsa.

Your public key has been saved in/home/hadoop/.ssh/id_rsa.pub.

The key fingerprint is:

1a:d9:48:f8:de:5b:be:e7:1f:5b:fd:48:df:59:59:94hadoop@hadoop

[hadoop@hadoop ~]$ cd .ssh

[hadoop@hadoop .ssh]$ ls

id_rsa id_rsa.pub

 

  把公钥添加到密钥中去

[hadoop@hadoop ~]$  cat .ssh/id_rsa.pub >>.ssh/authorized_keys

 

#修改master密钥权限,非常容易错误的地方。

chmod go-rwx/home/hadoop/.ssh/authorized_keys

 

[hadoop@h1 .ssh]$ ll

total 24

-rw------- 1 hadoop hadoop  391 Jun 7 17:07 authorized_keys   注意:权限为600

-rw------- 1 hadoop hadoop 1675 Jun  7 17:06 id_rsa

-rw-r--r-- 1 hadoop hadoop  391 Jun 7 17:06 id_rsa.pub

 

 

在节点2:(注意:在hadoop用户下)

[root@hadoop2 ~]# su - hadoop

[hadoop@hadoop2 ~]$ ssh-keygen -t rsa

Generating public/private rsa key pair.

Enter file in which to save the key(/home/hadoop/.ssh/id_rsa):

Created directory '/home/hadoop/.ssh'.

Enter passphrase (empty for nopassphrase):

Enter same passphrase again:

Your identification has been saved in/home/hadoop/.ssh/id_rsa.

Your public key has been saved in/home/hadoop/.ssh/id_rsa.pub.

The key fingerprint is:

43:c8:05:b9:6d:44:2c:b9:f3:c9:da:2d:64:b7:e9:83hadoop@hadoop2

[hadoop@hadoop2 ~]$ cd .ssh

[hadoop@hadoop2 .ssh]$ ls

id_rsa id_rsa.pub

[hadoop@hadoop2 .ssh]$

 

 

在节点3:(注意:在hadoop用户下)

[root@hadoop3 ~]# su - hadoop

[hadoop@hadoop3 ~]$ ssh-keygen -t rsa

Generating public/private rsa key pair.

Enter file in which to save the key(/home/hadoop/.ssh/id_rsa):

Created directory '/home/hadoop/.ssh'.

Enter passphrase (empty for nopassphrase):

Enter same passphrase again:

Your identification has been saved in/home/hadoop/.ssh/id_rsa.

Your public key has been saved in/home/hadoop/.ssh/id_rsa.pub.

The key fingerprint is:

be:2b:89:28:99:e2:46:1d:86:3c:cf:01:78:eb:5d:f3hadoop@hadoop3

[hadoop@hadoop3 ~]$ cd .ssh

[hadoop@hadoop3 .ssh]$ ls

id_rsa id_rsa.pub

[hadoop@hadoop3 .ssh]$

 

 

在节点1上,添加节点2和节点3的公钥到节点1的authorized_keys中:

[hadoop@hadoop1 ~]$ ssh hadoop2 cat.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

The authenticity of host 'hadoop2(192.168.6.52)' can't be established.

RSA key fingerprint isbe:ac:97:91:50:9c:63:b6:4d:35:3f:60:be:e1:ab:3d.

Are you sure you want to continueconnecting (yes/no)? yes

Warning: Permanently added'hadoop2,192.168.6.52' (RSA) to the list of known hosts.

hadoop@hadoop2's password:

 

[hadoop@hadoop1 ~]$ ssh hadoop3 cat.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

The authenticity of host 'hadoop3(192.168.6.53)' can't be established.

RSA key fingerprint isbe:ac:97:91:50:9c:63:b6:4d:35:3f:60:be:e1:ab:3d.

Are you sure you want to continueconnecting (yes/no)? yes

Warning: Permanently added 'hadoop3,192.168.6.53'(RSA) to the list of known hosts.

hadoop@hadoop3's password:

 

 

把节点上的authorized_keys 文件拷贝到节点2和节点3上:

[hadoop@hadoop1 ~]$ scp.ssh/authorized_keys hadoop2:~/.ssh

hadoop@hadoop2's password:

authorized_keys                                                   100%  792    0.8KB/s   00:00   

 

[hadoop@hadoop1 ~]$ scp.ssh/authorized_keys hadoop3:~/.ssh

hadoop@hadoop3's password:

authorized_keys                                                  100% 1188     1.2KB/s   00:00   

   注意:拷贝到节点2和节点3上的authorized_keys文件的权限和节点1修改后的一样,所以,在节点2和节点3上就不用修改了

[hadoop@hadoop2 .ssh]$ ll

total 32

-rw------- 1 hadoop hadoop 1188 Dec 1914:20 authorized_keys

 

 

验证节点1能否连通节点1、节点2和节点3

[hadoop@hadoop1 ~]$ ssh hadoop1

Last login: Thu Dec 19 14:14:24 2013 fromhadoop1

[hadoop@hadoop1 ~]$ ssh hadoop2 date

Thu Dec 19 14:16:09 CST 2013

[hadoop@hadoop1 ~]$ ssh hadoop3 date

Thu Dec 19 14:22:31 CST 2013

 

验证节点2能否连通节点1、节点2和节点3

[hadoop@hadoop2 ~]$ ssh hadoop2

Last login: Thu Dec 19 14:15:42 2013 fromhadoop2

[hadoop@hadoop2 ~]$ ssh hadoop1 date

Thu Dec 19 14:23:12 CST 2013

[hadoop@hadoop2 ~]$ ssh hadoop3 date

The authenticity of host 'hadoop3(192.168.6.53)' can't be established.

RSA key fingerprint isbe:ac:97:91:50:9c:63:b6:4d:35:3f:60:be:e1:ab:3d.

Are you sure you want to continueconnecting (yes/no)? yes

Warning: Permanently added'hadoop3,192.168.6.53' (RSA) to the list of known hosts.

Thu Dec 19 14:23:18 CST 2013

 

验证节点3能否连通节点1、节点2和节点3

hadoop@hadoop3 .ssh]$ ssh hadoop3

The authenticity of host 'hadoop3 (192.168.6.53)'can't be established.

RSA key fingerprint isbe:ac:97:91:50:9c:63:b6:4d:35:3f:60:be:e1:ab:3d.

Are you sure you want to continueconnecting (yes/no)? yes

Warning: Permanently added'hadoop3,192.168.6.53' (RSA) to the list of known hosts.

Last login: Thu Dec 19 14:22:03 2013 fromhadoop1

[hadoop@hadoop3 ~]$ ssh hadoop2 date

The authenticity of host 'hadoop2(192.168.6.52)' can't be established.

RSA key fingerprint isbe:ac:97:91:50:9c:63:b6:4d:35:3f:60:be:e1:ab:3d.

Are you sure you want to continueconnecting (yes/no)? yes

Warning: Permanently added'hadoop2,192.168.6.52' (RSA) to the list of known hosts.

Thu Dec 19 14:24:08 CST 2013

[hadoop@hadoop3 ~]$ ssh hadoop2 date

Thu Dec 19 14:24:16 CST 2013

 

 

3、下载并安装 JAVA JDK系统软件

(注意:在root用户下安装,不然会报错)

请参考linux中安装jdk(在三个节点同时操作)

#下载jdk

wget http://60.28.110.228/source/package/jdk-6u21-linux-i586-rpm.bin

#安装jdk

chmod +x jdk-6u21-linux-i586-rpm.bin       别忘了赋权限

./jdk-6u21-linux-i586-rpm.bin

 

安装,执行命令

[root@hn ~]# rpm -ivh jdk-6u17-linux-i586.rpm

(jdk的默认路径为/usr/java/jdk1.6.0_17)

 

 

 

#配置环境变量

注意:此处可以修改.bash_profile,也可以修改/etc/profile,也可以修改 /etc/profile.d/java.sh。通常修改.bash_profile,下面以此为例。(此处最好修改/etc/profile文件)

[root@linux64 ~]# vi .bash_profile

# .bash_profile

 

# Get the aliases and functions

if [ -f ~/.bashrc ]; then

       . ~/.bashrc

fi

 

# User specific environment and startupprograms

 

PATH=$PATH:$HOME/bin

 

export PATH

unset USERNAME

 

添加java的参数

export JAVA_HOME=/home/hadoop/java/jdk1.7.0_45

exportHADOOP_HOME=/home/hadoop/hadoop/hadoop-1.1.2

export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

exportPATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$PATH

 

 

vi /etc/profile 

#复制粘贴一下内容 到 vi 中。

export JAVA_HOME=/home/hadoop/java/jdk1.8.0_25

exportHADOOP_HOME=/home/hadoop/hadoop/hadoop-2.2.0

exportPATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

export HADOOP_COMMON_HOME=$HADOOP_HOME

export HADOOP_HDFS_HOME=$HADOOP_HOME

export HADOOP_MAPRED_HOME=$HADOOP_HOME

export HADOOP_YARN_HOME=$HADOOP_HOME

exportHADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

exportHADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native

exportHADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"

 

 

#手动立即生效

source /etc/profile 

 

#测试 :下面两个命令都可能报下面的错误

[hadoop@ha2 ~]$ java

[hadoop@ha2 ~]$ jps

Error: dl failure on line 863

Error: failed/home/hadoop/java/jdk1.7.0_45/jre/lib/i386/client/libjvm.so, because/home/hadoop/java/jdk1.7.0_45/jre/lib/i386/client/libjvm.so: cannot restoresegment prot after reloc: Permission denied

 

 

这是因为SELINUX的问题,需要关闭SELINX,执行:/usr/sbin/setenforce 0    

注意:最好是手动关闭。

还有:要把各个服务器的防火墙给关闭了,不然,后面运行时会报错。

 4、检查基础环境

/sbin/ifconfig

[hadoop@master root]$ /sbin/ifconfig

eth0      Link encap:Ethernet  HWaddr 00:0C:29:7A:DE:12 

          inet addr:192.168.1.100  Bcast:192.168.1.255  Mask:255.255.255.0

          inet6 addr: fe80::20c:29ff:fe7a:de12/64 Scope:Link

          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

          RX packets:14 errors:0 dropped:0 overruns:0 frame:0

          TX packets:821 errors:0 dropped:0 overruns:0 carrier:0

          collisions:0 txqueuelen:1000

          RX bytes:1591 (1.5 KiB)  TX bytes:81925 (80.0 KiB)

          Interrupt:67 Base address:0x2024

 

ping master

ssh master

jps

echo $JAVA_HOME

echo $HADOOP_HOME

Hadoop

5、在节点1解压安装hadoop

创建hadoop的安装目录(在三个节点同时操作)

[root@hadoop1 hadoop]# mkdir -p/opt/hadoop

[root@hadoop1 hadoop]# chown -Rhadoop:hadoop /opt/hadoop/

[root@hadoop1 hadoop]# chmod 755 *

[root@hadoop1 hadoop]# ll

total 61036

-rwxr-xr-x 1 hadoop hadoop 62428860 Dec 1722:48 hadoop-1.0.3.tar.gz

 

解压hadoop软件   (只需要在节点1上安装,配置好后拷贝到其他节点即可)

tar -xzvf hadoop-1.0.3.tar.gz

 

 6、安装Hadoop2.2,搭建集群

     在hadoopMaster上安装hadoop

      首先到Apache官网上下载hadoop2.2的压缩文件,将其解压到当前用户的根文件夹中(home/fyzwjd/),将解压出的文件夹改名为hadoop。

 

$ sudo mv hadoop-2.2.0 hadoop   

     配置之前,先在本地文件系统创建以下文件夹:~/hadoop/tmp、~/dfs/data、~/dfs/name。 主要涉及的配置文件有7个:都在/hadoop/etc/hadoop文件夹下,可以用gedit命令对其进行编辑。

 

~/hadoop/etc/hadoop/hadoop-env.sh 

~/hadoop/etc/hadoop/yarn-env.sh 

~/hadoop/etc/hadoop/slaves 

~/hadoop/etc/hadoop/core-site.xml 

~/hadoop/etc/hadoop/hdfs-site.xml 

~/hadoop/etc/hadoop/mapred-site.xml 

~/hadoop/etc/hadoop/yarn-site.xml 

        (1)     配置文件1:hadoop-env.sh

              修改JAVA_HOME值(export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_51)

 

        (2)     配置文件2:yarn-env.sh

              修改JAVA_HOME值(export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_51)

 

        (3)     配置文件3:slaves

hadoopSalve1 

hadoopSlave2 

 

        (4)    配置文件4:core-site.xml

创建文件夹:

[hadoop@had1 hadoop-2.2.0]$ mkdir tmp

 

<configuration> 

<property> 

<name>fs.defaultFS</name> 

<value>hdfs://had1:9000</value> 

</property> 

<property> 

<name>io.file.buffer.size</name> 

<value>131072</value> 

</property> 

<property> 

<name>hadoop.tmp.dir</name> 

<value>file:/home/hadoop/hadoop/hadoop-2.2.0/tmp</value> 

<description>Abasefor othertemporary directories.</description> 

</property> 

<property> 

<name>hadoop.proxyuser.fyzwjd.hosts</name> 

<value>*</value> 

</property> 

<property> 

<name>hadoop.proxyuser.fyzwjd.groups</name> 

<value>*</value> 

</property> 

</configuration> 

 

        (5)     配置文件5:hdfs-site.xml

创建目录:

[hadoop@had1 hadoop-2.2.0]$ mkdir dfs

 

<configuration> 

<property> 

<name>dfs.namenode.secondary.http-address</name> 

<value>had1:9001</value> 

</property> 

<property> 

<name>dfs.namenode.name.dir</name> 

<value>file:/home/hadoop/hadoop/hadoop-2.2.0/dfs/name</value> 

</property> 

<property> 

<name>dfs.datanode.data.dir</name> 

<value>file:/home/hadoop/hadoop/hadoop-2.2.0/dfs/data</value> 

</property> 

<property> 

<name>dfs.replication</name> 

<value>3</value> 

</property> 

<property> 

<name>dfs.webhdfs.enabled</name> 

<value>true</value> 

</property> 

</configuration> 

 

         (6)    配置文件6:mapred-site.xml

<configuration> 

<property> 

<name>mapreduce.framework.name</name> 

<value>yarn</value> 

</property> 

<property> 

<name>mapreduce.jobhistory.address</name> 

<value>had1:10020</value> 

</property> 

<property> 

<name>mapreduce.jobhistory.webapp.address</name> 

<value>had1:19888</value> 

</property> 

</configuration> 

 

         (7)    配置文件7:yarn-site.xml

<configuration> 

<property> 

<name>yarn.nodemanager.aux-services</name> 

<value>mapreduce_shuffle</value> 

</property> 

<property> 

<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> 

<value>org.apache.hadoop.mapred.ShuffleHandler</value> 

</property> 

<property> 

<name>yarn.resourcemanager.address</name> 

<value>had1:8032</value> 

</property> 

<property> 

<name>yarn.resourcemanager.scheduler.address</name> 

<value>had1:8030</value> 

</property> 

<property> 

<name>yarn.resourcemanager.resource-tracker.address</name> 

<value>had1:8035</value> 

</property> 

<property> 

<name>yarn.resourcemanager.admin.address</name> 

<value>had1:8033</value> 

</property> 

<property> 

<name>yarn.resourcemanager.webapp.address</name> 

<value>had1:8088</value> 

</property> 

</configuration> 

 

    2、将hadoop文件夹拷贝到hadoopSlave1和hadoopSlave2上。

scp –r/home/fyzwjd/hadoopfyzwjd@hadoopSlave1:~/ 

scp –r/home/fyzwjd/hadoopfyzwjd@hadoopSlave2:~/ 

 

7、验证与运行

    所有的组件启动和停止服务都在/hadoop/sbin目录下,一般启动hadoop前会格式化namenode。具体命令参考如下:

 

进入安装目录: cd ~/hadoop/ 

格式化namenode./bin/hdfsnamenode –format

 

启动hdfs:./sbin/start-dfs.sh 

此时在hadoopMaster上面运行的进程有:namenodesecondarynamenode 

hadoopSlave1和hadoopSlave2上面运行的进程有:datanode 

 

启动yarn:./sbin/start-yarn.sh 

此时在hadoopMaster上面运行的进程有:namenode secondarynamenode resourcemanager 

hadoopSlave1和hadoopSlave2上面运行的进程有:datanode  nodemanaget 

 

查看集群状态:./bin/hdfs dfsadmin –report 

查看文件块组成: ./bin/hdfs fsck/ -files -blocks 

查看HDFS: http://hadoopMaster:50070 

查看RM: http://hadoopMaster:8088

 

8、配置系统参数:

#配置 hadoop-env.sh 环境变量(在三个节点同时操作)

1)、#配置Hadoop 最大HADOOP_HEAPSIZE 大小,     默认 为 2000。

[hadoop@hadoop1 hadoop]$ vi/opt/hadoop/hadoop-1.0.3/conf/hadoop-env.sh

# The maximum amount of heap to use, inMB. Default is 1000.

  export HADOOP_HEAPSIZE=1000  去掉前面的注释符,值默认为2000,改为1000,也可以不修改

 

2)、hadoop-env.sh文件中增加JAVA_HOME的路径,它的作用是配置与hadoop运行环境相关的变量

# The java implementation to use.  Required.

# export JAVA_HOME=/usr/lib/j2sdk1.5-sun

export JAVA_HOME=/usr/java/jdk1.6.0_21

9、设置主节点和子节点

修改masters和slaves文件

配置主节点

[hadoop@hadoop1 conf]$ vi  maters

Hadoop1

 

配置其他节点

[hadoop@hadoop1 conf]$ vi slaves

hadoop2

hadoop3

 

 

10、在 root 下创建Hadoopmapred 、 hdfs namenode 和 datanode 目录

(在三个节点同时操作)

mkdir -p /opt/data/hadoop/

chown -R hadoop:hadoop  /opt/data/*

 

#切换到 hadoop用户下

su hadoop

 

#创建mapreduce

mkdir -p /opt/data/hadoop/mapred/mrlocal

mkdir -p /opt/data/hadoop/mapred/mrsystem

 

mkdir -p /opt/data/hadoop/hdfs/name

mkdir -p /opt/data/hadoop/hdfs/data

mkdir -p /opt/data/hadoop/hdfs/var

mkdir -p /opt/data/hadoop/hdfs/namesecondary

#Hadoop Common组件 配置 core-site.xml (文件入口)

#编辑 core-site.xml 文件

vi/opt/modules/hadoop/hadoop-1.0.3/conf/core-site.xml

<property>

<name>fs.default.name</name>

<value>hdfs://hadoop1:9000</value>

</property>

 

<property>

<name>hadoop.tmp.dir</name>             注意:此配置最好不要少,也不要忘记创建此临时文件夹(三个节点操作)。

<value>/home/hadoop/hadoop-1.0.3/tmp</value>

</property>                        

hadoop.tmp.dir:Hadoop的默认临时路径,这个最好配置,如果在新增节点或者其他情况下莫名其妙的DataNode启动不了,就删除此文件中的tmp目录即可。不过如果删除了NameNode机器的此目录,那么就需要重新执行NameNode格式化的命令。

 

 

注意:上面此处为最简单的配置,最好再加一个Hadoop的临时文件夹,如下

 

 

#HDFS NameNode,DataNode组建配置hdfs-site.xml

vi /opt/modules/hadoop/hadoop-1.0.3/conf/hdfs-site.xml

 

<property>

<name>dfs.name.dir</name>

<value>/home/hadoop/hadoop/hdfs/name</value>

</property>

 

<property>

<name>dfs.data.dir</name>

<value>/home/hadoop/hadoop/hdfs/data</value>

</property>

/home/hadoop/hadoop/hdfs/data/data1,/home/hadoop/hadoop/hdfs/data/data2

 

<property>

<name>dfs.replication</name>

<value>2</value>

</property>

指定HDFS的复制因子:如下

 

#配置MapReduce - JobTracker TaskTracker 启动配置

vi /opt/modules/hadoop/hadoop-1.0.3/conf/mapred-site.xml

<property>

<name>mapred.job.tracker</name>

<value>hadoop1:9001</value>

</property>

 

 <property>

  <name>mapred.local.dir</name>

 <value>/home/hadoop/hadoop_home/var</value>

 </property>

指定JOBtracker的端口和地址     

 

 

11、copy Hadoop目录

把节点1上的hadoop目录分别拷贝到节点2和节点3    注意:节点1上的master和slaves这2个配置文件可以不拷贝到节点2和节点3上,只在节点1上保存即可

#切换到 hadoop 用户下

su hadoop

scp -r /opt/hadoop/hadoop-1.0.3/    hadoop1:/opt/hadoop/

scp -r /opt/hadoop/hadoop-1.0.3/    hadoop2:/opt/hadoop/

 

注意:在操作前最好先把防火墙和selinux给关闭:最好用图形界面关闭,否则有时命令没有关闭导致后面程序报错。

 

service iptables status 查看iptables状态

service iptablesrestart iptables服务重启

service iptables stopiptables服务禁用

 

Linux下设置selinux有三种方法。

a、在图形界面中:

   桌面-->管理-->安全级别和防火墙,设置为disable

b、在命令模式下:

   修改文件:/etc/selinux/config,然后重启系统。具体修改如图:

 

c、运行命令:setup,进入”防火墙配置“,在selinux栏,选择”禁用“。(此方法很少用)

12、初始配置    

在节点1即主节点上操作

格式化HDFS文件系统。进入/jz/hadoop-0.20.2/bin目录。执行:

hadoopnamenode –format

[hadoop@hadoop1conf]$ hadoop namenode -format

13/12/1917:39:34 INFO namenode.NameNode: STARTUP_MSG:

/************************************************************

STARTUP_MSG:Starting NameNode

STARTUP_MSG:   host = hadoop1/192.168.6.51

STARTUP_MSG:   args = [-format]

STARTUP_MSG:   version = 1.0.3

STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1335192; compiled by 'hortonfo' on TueMay  8 20:31:25 UTC 2012

************************************************************/

Re-formatfilesystem in /opt/data/hadoop/hdfs/name ? (Y or N) Y

13/12/1917:42:35 INFO util.GSet: VM type       =32-bit

13/12/1917:42:35 INFO util.GSet: 2% max memory = 19.33375 MB

13/12/19 17:42:35INFO util.GSet: capacity      = 2^22 =4194304 entries

13/12/1917:42:35 INFO util.GSet: recommended=4194304, actual=4194304

13/12/1917:42:35 INFO namenode.FSNamesystem: fsOwner=hadoop

13/12/1917:42:35 INFO namenode.FSNamesystem: supergroup=supergroup

13/12/1917:42:35 INFO namenode.FSNamesystem: isPermissionEnabled=false

13/12/1917:42:35 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100

13/12/1917:42:35 INFO namenode.FSNamesystem: isAccessTokenEnabled=falseaccessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)

13/12/1917:42:35 INFO namenode.NameNode: Caching file names occuring more than 10 times

13/12/1917:42:36 INFO common.Storage: Image file of size 112 saved in 0 seconds.

13/12/1917:42:36 INFO common.Storage: Storage directory /opt/data/hadoop/hdfs/name has been successfullyformatted.

13/12/1917:42:36 INFO namenode.NameNode: SHUTDOWN_MSG:

/************************************************************

SHUTDOWN_MSG:Shutting down NameNode at hadoop1/192.168.6.51

************************************************************/

 

 

 

13、#批量启动和关闭集群  

 在节点1即主节点上操作

1)、#全部启动

/opt/modules/hadoop/hadoop-1.0.3/bin/start-all.sh

在主节点hadoop1上面启动hadoop,主节点会启动所有从节点的hadoop

启动后用jps命令看到

DataNode,Jps,NameNode,JobTracker,TaskTracker,SecondaryNameNode则正常。

[hadoop@hadoop1 hadoop-1.0.3]$ jps

10560 JobTracker

10474 SecondaryNameNode

10184 NameNode

10705 TaskTracker

10896 Jps

[hadoop@hadoop2 ~]$ jps

6053 Jps

5826 DataNode

5938 TaskTracker

[root@hadoop3 hadoop-1.0.3]# jps

5724 DataNode

5944 Jps

5835 TaskTracker

 

 

#全部关闭

/opt/modules/hadoop/hadoop-1.0.3/bin/stop-all.sh

 

从主节点master关闭hadoop,主节点会关闭所有从节点的hadoop

 

2)、在/jz/hadoop-0.20.2/bin目录下,     注意:此步骤要在下面11步骤执行完后执行

执行:  hadoop fs -ls /

[hadoop@hadoop1 conf]$ hadoop fs -ls /

Found 1 items

drwxr-xr-x   - hadoop supergroup          0 2013-12-19 17:45 /opt

 

如果控制台返回结果,表示初始化成功。可以向里面录入数据。

 

 

14、测试经典示例

通过执行 Wordcount运行样例检查集群是否成功

[hadoop@hadoop1 hadoop-1.0.3]$ hadoop fs-ls /

Found 2 items

drwxr-xr-x   - hadoop supergroup          0 2014-05-11 10:16 /home

drwxr-xr-x   - hadoop supergroup          0 2014-05-11 10:41 /user

[hadoop@hadoop1 hadoop-1.0.3]$ hadoop fs -mkdir /input

[hadoop@hadoop1 hadoop-1.0.3]$ hadoop fs-ls /

Found 3 items

drwxr-xr-x   - hadoop supergroup          0 2014-05-11 10:16 /home

drwxr-xr-x   - hadoop supergroup          0 2014-05-11 10:48 /input

drwxr-xr-x   - hadoop supergroup          0 2014-05-11 10:41 /user

[hadoop@hadoop1 bin]$ hadoop fs -put *.sh /input

[hadoop@hadoop1 bin]$ hadoop fs -ls /input

Found 14 items

-rw-r--r--   3 hadoop supergroup       2373 2014-05-11 10:49/input/hadoop-config.sh

-rw-r--r--   3 hadoop supergroup       4336 2014-05-11 10:49/input/hadoop-daemon.sh

-rw-r--r--   3 hadoop supergroup       1329 2014-05-11 10:49/input/hadoop-daemons.sh

-rw-r--r--   3 hadoop supergroup       2143 2014-05-11 10:49 /input/slaves.sh

-rw-r--r--   3 hadoop supergroup       1166 2014-05-11 10:49/input/start-all.sh

-rw-r--r--   3 hadoop supergroup       1065 2014-05-11 10:49/input/start-balancer.sh

-rw-r--r--   3 hadoop supergroup       1745 2014-05-11 10:49/input/start-dfs.sh

-rw-r--r--   3 hadoop supergroup       1145 2014-05-11 10:49/input/start-jobhistoryserver.sh

-rw-r--r--   3 hadoop supergroup       1259 2014-05-11 10:49/input/start-mapred.sh

-rw-r--r--   3 hadoop supergroup       1119 2014-05-11 10:49 /input/stop-all.sh

-rw-r--r--   3 hadoop supergroup       1116 2014-05-11 10:49/input/stop-balancer.sh

-rw-r--r--   3 hadoop supergroup       1246 2014-05-11 10:49 /input/stop-dfs.sh

-rw-r--r--   3 hadoop supergroup       1131 2014-05-11 10:49/input/stop-jobhistoryserver.sh

-rw-r--r--   3 hadoop supergroup       1168 2014-05-11 10:49/input/stop-mapred.sh

 

[hadoop@hadoop1 hadoop-1.0.3]$ hadoop jar hadoop-examples-1.0.3.jarwordcount /input /output

14/05/11 10:50:34 INFOinput.FileInputFormat: Total input paths to process : 14

14/05/11 10:50:34 INFOutil.NativeCodeLoader: Loaded the native-hadoop library

14/05/11 10:50:34 WARN snappy.LoadSnappy:Snappy native library not loaded

14/05/11 10:50:34 INFO mapred.JobClient:Running job: job_201405111029_0002

14/05/11 10:50:35 INFOmapred.JobClient:  map 0% reduce 0%

14/05/11 10:50:48 INFOmapred.JobClient:  map 7% reduce 0%

14/05/11 10:50:51 INFO mapred.JobClient:  map 28% reduce 0%

14/05/11 10:50:52 INFOmapred.JobClient:  map 42% reduce 0%

14/05/11 10:50:59 INFOmapred.JobClient:  map 57% reduce 0%

14/05/11 10:51:00 INFOmapred.JobClient:  map 71% reduce 0%

14/05/11 10:51:01 INFOmapred.JobClient:  map 85% reduce 0%

14/05/11 10:51:08 INFOmapred.JobClient:  map 100% reduce 0%

14/05/11 10:51:11 INFOmapred.JobClient:  map 100% reduce 28%

14/05/11 10:51:22 INFOmapred.JobClient:  map 100% reduce 100%

14/05/11 10:51:27 INFO mapred.JobClient:Job complete: job_201405111029_0002

14/05/11 10:51:27 INFO mapred.JobClient:Counters: 30

14/05/11 10:51:27 INFOmapred.JobClient:   Job Counters

14/05/11 10:51:27 INFOmapred.JobClient:     Launched reducetasks=1

14/05/11 10:51:27 INFOmapred.JobClient:     SLOTS_MILLIS_MAPS=105264

14/05/11 10:51:27 INFOmapred.JobClient:     Total time spent byall reduces waiting after reserving slots (ms)=0

14/05/11 10:51:27 INFOmapred.JobClient:     Total time spent byall maps waiting after reserving slots (ms)=0

14/05/11 10:51:27 INFOmapred.JobClient:     Rack-local maptasks=4

14/05/11 10:51:27 INFOmapred.JobClient:     Launched maptasks=14

14/05/11 10:51:27 INFOmapred.JobClient:     Data-local maptasks=10

14/05/11 10:51:27 INFOmapred.JobClient:    SLOTS_MILLIS_REDUCES=31039

14/05/11 10:51:27 INFOmapred.JobClient:   File Output FormatCounters

14/05/11 10:51:27 INFOmapred.JobClient:     Bytes Written=6173

14/05/11 10:51:27 INFOmapred.JobClient:   FileSystemCounters

14/05/11 10:51:27 INFOmapred.JobClient:     FILE_BYTES_READ=28724

14/05/11 10:51:27 INFOmapred.JobClient:    HDFS_BYTES_READ=23830

14/05/11 10:51:27 INFOmapred.JobClient:    FILE_BYTES_WRITTEN=382189

14/05/11 10:51:27 INFOmapred.JobClient:    HDFS_BYTES_WRITTEN=6173

14/05/11 10:51:27 INFO mapred.JobClient:   File Input Format Counters

14/05/11 10:51:27 INFOmapred.JobClient:     Bytes Read=22341

14/05/11 10:51:27 INFOmapred.JobClient:   Map-Reduce Framework

14/05/11 10:51:27 INFOmapred.JobClient:     Map outputmaterialized bytes=28802

14/05/11 10:51:27 INFOmapred.JobClient:     Map inputrecords=691

14/05/11 10:51:27 INFOmapred.JobClient:     Reduce shufflebytes=28802

14/05/11 10:51:27 INFOmapred.JobClient:     SpilledRecords=4018

14/05/11 10:51:27 INFOmapred.JobClient:     Map output bytes=34161

14/05/11 10:51:27 INFOmapred.JobClient:     Total committedheap usage (bytes)=2266947584

14/05/11 10:51:27 INFOmapred.JobClient:     CPU time spent(ms)=7070

14/05/11 10:51:27 INFOmapred.JobClient:     Combine inputrecords=3137

14/05/11 10:51:27 INFOmapred.JobClient:    SPLIT_RAW_BYTES=1489

14/05/11 10:51:27 INFOmapred.JobClient:     Reduce inputrecords=2009

14/05/11 10:51:27 INFOmapred.JobClient:     Reduce inputgroups=497

14/05/11 10:51:27 INFOmapred.JobClient:     Combine output records=2009

14/05/11 10:51:27 INFOmapred.JobClient:     Physical memory(bytes) snapshot=2002677760

14/05/11 10:51:27 INFOmapred.JobClient:     Reduce outputrecords=497

14/05/11 10:51:27 INFOmapred.JobClient:     Virtual memory(bytes) snapshot=5151145984

14/05/11 10:51:27 INFOmapred.JobClient:     Map outputrecords=3137

 

通过执行 Hadoop pi 运行样例检查集群是否成功

[hadoop@hadoop1 hadoop-1.0.3]$ hadoop jar/home/hadoop/hadoop-1.0.3/hadoop-examples-1.0.3.jar  pi 10 100

Number of Maps  = 10

Samples per Map = 100

Wrote input for Map #0

Wrote input for Map #1

Wrote input for Map #2

Wrote input for Map #3

Wrote input for Map #4

Wrote input for Map #5

Wrote input for Map #6

Wrote input for Map #7

Wrote input for Map #8

Wrote input for Map #9

Starting Job

14/05/11 10:41:40 INFOmapred.FileInputFormat: Total input paths to process : 10

14/05/11 10:41:40 INFO mapred.JobClient:Running job: job_201405111029_0001

14/05/11 10:41:41 INFOmapred.JobClient:  map 0% reduce 0%

14/05/11 10:41:54 INFOmapred.JobClient:  map 20% reduce 0%

14/05/11 10:42:00 INFOmapred.JobClient:  map 40% reduce 0%

14/05/11 10:42:01 INFOmapred.JobClient:  map 60% reduce 0%

14/05/11 10:42:03 INFOmapred.JobClient:  map 80% reduce 0%

14/05/11 10:42:05 INFOmapred.JobClient:  map 100% reduce 0%

14/05/11 10:42:11 INFOmapred.JobClient:  map 100% reduce 26%

14/05/11 10:42:17 INFOmapred.JobClient:  map 100% reduce 100%

14/05/11 10:42:22 INFO mapred.JobClient:Job complete: job_201405111029_0001

14/05/11 10:42:22 INFO mapred.JobClient:Counters: 31

14/05/11 10:42:22 INFOmapred.JobClient:   Job Counters

14/05/11 10:42:22 INFOmapred.JobClient:     Launched reducetasks=1

14/05/11 10:42:22 INFOmapred.JobClient:    SLOTS_MILLIS_MAPS=70846

14/05/11 10:42:22 INFOmapred.JobClient:     Total time spent byall reduces waiting after reserving slots (ms)=0

14/05/11 10:42:22 INFOmapred.JobClient:     Total time spent byall maps waiting after reserving slots (ms)=0

14/05/11 10:42:22 INFOmapred.JobClient:     Rack-local maptasks=2

14/05/11 10:42:22 INFO mapred.JobClient:     Launched map tasks=10

14/05/11 10:42:22 INFOmapred.JobClient:     Data-local maptasks=8

14/05/11 10:42:22 INFOmapred.JobClient:    SLOTS_MILLIS_REDUCES=22725

14/05/11 10:42:22 INFOmapred.JobClient:   File Input FormatCounters

14/05/11 10:42:22 INFOmapred.JobClient:     Bytes Read=1180

14/05/11 10:42:22 INFOmapred.JobClient:   File Output FormatCounters

14/05/11 10:42:22 INFOmapred.JobClient:     Bytes Written=97

14/05/11 10:42:22 INFOmapred.JobClient:   FileSystemCounters

14/05/11 10:42:22 INFOmapred.JobClient:     FILE_BYTES_READ=226

14/05/11 10:42:22 INFOmapred.JobClient:    HDFS_BYTES_READ=2390

14/05/11 10:42:22 INFOmapred.JobClient:    FILE_BYTES_WRITTEN=239428

14/05/11 10:42:22 INFOmapred.JobClient:     HDFS_BYTES_WRITTEN=215

14/05/11 10:42:22 INFOmapred.JobClient:   Map-Reduce Framework

14/05/11 10:42:22 INFOmapred.JobClient:     Map outputmaterialized bytes=280

14/05/11 10:42:22 INFOmapred.JobClient:     Map inputrecords=10

14/05/11 10:42:22 INFO mapred.JobClient:     Reduce shuffle bytes=280

14/05/11 10:42:22 INFOmapred.JobClient:     Spilled Records=40

14/05/11 10:42:22 INFOmapred.JobClient:     Map outputbytes=180

14/05/11 10:42:22 INFOmapred.JobClient:     Total committedheap usage (bytes)=1623891968

14/05/11 10:42:22 INFOmapred.JobClient:     CPU time spent(ms)=4990

14/05/11 10:42:22 INFOmapred.JobClient:     Map input bytes=240

14/05/11 10:42:22 INFOmapred.JobClient:    SPLIT_RAW_BYTES=1210

14/05/11 10:42:22 INFOmapred.JobClient:     Combine input records=0

14/05/11 10:42:22 INFOmapred.JobClient:     Reduce inputrecords=20

14/05/11 10:42:22 INFOmapred.JobClient:     Reduce inputgroups=20

14/05/11 10:42:22 INFOmapred.JobClient:     Combine outputrecords=0

14/05/11 10:42:22 INFO mapred.JobClient:     Physical memory (bytes)snapshot=1441292288

14/05/11 10:42:22 INFOmapred.JobClient:     Reduce outputrecords=0

14/05/11 10:42:22 INFOmapred.JobClient:     Virtual memory(bytes) snapshot=3777863680

14/05/11 10:42:22 INFO mapred.JobClient:     Map output records=20

Job Finished in 42.086 seconds

Estimated value of Pi is3.14800000000000000000

 

 

 

12、通过WEB查看hadoop

查看集群状态

http://192.168.3.131:50070/dfshealth.jsp

查看JOB状态

http://192.168.3.131:50030/jobtracker.jsp

#通过界面查看集群部署部署成功

#检查 namenode 和 datanode 是否正常

http://master:50070/

 

#检查 jobtracker 和 tasktracker 是否正常

http://master:50030/

 

hadoop fs -ls/

hadoop fs-mkdir /data/

 

#通过执行 Hadoop pi 运行样例检查集群是否成功

cd /opt/modules/hadoop/hadoop-1.0.3

bin/hadoop jar hadoop-examples-1.0.3.jarpi 10 100

 

#集群正常效果如下

12/07/15 10:50:48 INFO mapred.FileInputFormat: Total input paths to process : 10

12/07/15 10:50:48 INFO mapred.JobClient: Running job: job_201207151041_0001

12/07/15 10:50:49 INFO mapred.JobClient:  map 0% reduce 0%

12/07/15 10:51:42 INFO mapred.JobClient:  map 40% reduce 0%

12/07/15 10:52:07 INFO mapred.JobClient:  map 70% reduce 13%

12/07/15 10:52:10 INFO mapred.JobClient:  map 80% reduce 16%

12/07/15 10:52:11 INFO mapred.JobClient:  map 90% reduce 16%

12/07/15 10:52:22 INFO mapred.JobClient:  map 100% reduce 100%

.....................

12/07/15 10:52:28 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=2155343872

12/07/15 10:52:28 INFO mapred.JobClient:     Map output records=20

Job Finished in 100.608 seconds

Estimated value of Pi is 3.14800000000000000000

 

0 0