[Linux下Hadoop部署] CentOS6.4_64位下部署Hadoop2.2.0

来源:互联网 发布:电脑清理软件 知乎 编辑:程序博客网 时间:2024/06/06 01:15
前提是你已经按照上一篇文章,编译好Hadoop了,接下来就可以进行多节点部署了。
首先我们配置管理节点,管理节点配置好了,直接把Hadoop执行程序scp到其他计算节点就行了。
下面进入管理节点的配置说明:

1、创建Hadoop用户
其实用root应该也是可以的,但是大多数人都使用hadoop用户,我也不例外。
创建用户:useradd hadoop
如果进入hadoop用户,使用su - hadoop



2、修改主机名
为什么修改主机名?因为Hadoop的一些配置文件里都是用的别名如,master,slave
vim /etc/sysconfig/network 
如果network文件不存在,那就vim创建一个,输入:
hostname master
想要让修改的名字生效,得重启一下系统,看到:



3、修改hosts文件
vim /etc/hosts(这个文件一般都是存在的)

追加如下:

10.40.3.53 master  10.40.3.54 slave1


4、配置ssh 无密码登陆
其实也就是做互信,网上有专业的做互信脚本,此处就给出配置原理。
对于RedHat 、CcentOS这对兄弟来说,做互信需要关闭SELINU,到/etc/selinux/config下,
把SELINUX=enforcing修改为SELINUX=disabled,重启一下。

4.1 检查ssh包是否安装齐全
输入:rpm -qa|grep ssh,
openssh-server-5.3p1-84.1.el6.x86_64
openssh-5.3p1-84.1.el6.x86_64
libssh2-1.4.2-1.el6.x86_64
openssh-clients-5.3p1-84.1.el6.x86_64
trilead-ssh2-213-6.2.el6.noarch
openssh-askpass-5.3p1-84.1.el6.x86_64
可以到有这么多,就可以了,如果缺少openssh-clients的话,就yum install openssh-clients  安装一下吧
4.2 修改sshd_config
vim /etc/ssh/sshd_config
去掉以下三行前面的# 
RSAAuthentication yes
PubkeyAuthentication yes
AuthorizedKeysFile      .ssh/authorized_keys
然后重启ssh服务:service ssh restart
4.3 生成hadoop用户秘钥

此时需要更换为hadoop用户了,为什么呢?因为我们是要给hadoop用户配置无密码ssh登陆其他节点的,

所以肯定要用hadoop用户了。

当然,为了方便,如果你要是想给root用户配置无密码ssh登陆其他节点的话,就用root用户生成秘钥,其他操作都是一样的。
1)进入hadoop用户
su- hadoop
2)生成秘钥
cd /home/hadoop/(创建用户时,已经生成这个hadoop目录了)
ssh-keygen -t rsa     (各种提示,直接回车)
3)创建 authorized_keys  
[hadoop@master ~]$ cd .ssh/  
[hadoop@master .ssh]$ cp id_rsa.pub authorized_keys  
[hadoop@master .ssh]$ chmod 600 authorized_keys 
4)将authorized_keys复制到其他节点上去,此处为slave1
默认已经在slave1上创建了hadoop用户了,并且执行过ssh-keygen -t rsa,这样才有.ssh目录。
此处拷贝方式不拘泥与scp方式,如果用scp则需要使用root@ip,因为Hadoop用户本身没有密码,无法scp过去。
[hadoop@master .ssh]$ scp authorized_keys root@10.40.3.54:/home/hadoop/.ssh/ 
拷贝完之后,在slave1节点上还需要对authorized_keys 文件拥有者更为hadoop用户的:
chown hadoop:hadoop authorized_keys (用root执行)
在slave1上执行 cat id_rsa.pub >> .ssh/authorized_keys

5)验证ssh无密码登陆
在master上执行 ssh slave1,不出意外,就能看到:



5、开始集群配置工作
5.1 首先在master上创建三个目录,用来存放Hadoop文件
[hadoop@master ~]$mkdir -p dfs/name  
[hadoop@master ~]$mkdir -p dfs/data  
[hadoop@master ~]$mkdir -p temp  


5.2 把之前编译好的hadoop目录,cp到/home/hadoop目录下

5.3 接下来就是vim编辑7个文件了:

~/hadoop-2.2.0/etc/hadoop/hadoop-env.sh~/hadoop-2.2.0/etc/hadoop/yarn-env.sh~/hadoop-2.2.0/etc/hadoop/slaves~/hadoop-2.2.0/etc/hadoop/core-site.xml~/hadoop-2.2.0/etc/hadoop/hdfs-site.xml~/hadoop-2.2.0/etc/hadoop/mapred-site.xml~/hadoop-2.2.0/etc/hadoop/yarn-site.xml

1)~/hadoop-2.2.0/etc/hadoop/hadoop-env.sh

修改JAVA_HOME更改为实际路径:
export JAVA_HOME=/opt/jdk1.7.0_15


2)~/hadoop-2.2.0/etc/hadoop/yarn-env.sh
同样修改JAVA_HOME
export JAVA_HOME=/opt/jdk1.7.0_15


3)~/hadoop-2.2.0/etc/hadoop/slaves
追加slave1,等以后节点多了,慢慢增加


4)~/hadoop-2.2.0/etc/hadoop/core-site.xml

<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://master:9000</value> </property> <property> <name>io.file.buffer.size</name> <value>131072</value> </property> <property> <name>hadoop.tmp.dir</name> <value>file:/home/hadoop/temp</value> </property> <property> <name>hadoop.proxyuser.hadoop.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.hadoop.groups</name> <value>*</value> </property></configuration>

5)~/hadoop-2.2.0/etc/hadoop/hdfs-site.xml

<configuration><property><name>dfs.namenode.secondary.http-address</name><value>master:9001</value></property><property><name>dfs.namenode.name.dir</name><value>/home/hadoop/dfs/name</value></property><property><name>dfs.datanode.data.dir</name><value>/home/hadoop/dfs/data</value></property><property><name>dfs.replication</name> <!--replication 是数据副本数量,默认为3,salve少于3台就会报错--><value>1</value></property><property><name>dfs.webhdfs.enabled</name><value>true</value></property><property><name>dfs.permission</name><value>false</value></property></configuration>

6)~/hadoop-2.2.0/etc/hadoop/mapred-site.xml

<configuration><property><name>mapreduce.framework.name</name><value>yarn</value></property><property><name>mapreduce.job.tracker</name><value>hdfs://master:9001</value></property><property><name>mapreduce.jobhistory.address</name><value>master:10020</value></property><property><name>mapreduce.jobhistory.webapp.address</name><value>master:19888</value></property></configuration>

7)~/hadoop-2.2.0/etc/hadoop/yarn-site.xml

<configuration><!-- Site specific YARN configuration properties --><property><name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value></property><property><name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name><value>org.apache.hadoop.mapred.ShuffleHandler</value></property><property><name>yarn.resourcemanager.address</name><value>master:8032</value></property><property><name>yarn.resourcemanager.scheduler.address</name><value>master:8030</value></property><property><name>yarn.resourcemanager.resource-tracker.address</name><value>master:8031</value></property><property><name>yarn.resourcemanager.admin.address</name><value>master:8033</value></property><property><name>yarn.resourcemanager.webapp.address</name><value>master:8088</value></property></configuration>

6、启动hadoop
此处建议关闭防火墙
(1) 重启后永久性生效:    
开启:chkconfig iptables on  
关闭:chkconfig iptables off  
(2) 即时生效,重启后失效:  
开启:service iptables start  
关闭:service iptables stop 


6.1 格式化namenode
[hadoop@master hadoop]$ cd /home/hadoop/hadoop-2.2.0/bin/  
[hadoop@master bin]$ ./hdfs namenode -format 


6.2 启动hdfs
[hadoop@master bin]$ cd ../sbin/  
[hadoop@master sbin]$ ./start-dfs.sh
这时候在master中输入jps应该看到
namenode  服务启动
secondarynamenode  服务启动
slave中输入jps看到
datanode服务启动


6.3 启动yarn
[hadoop@master bin]$ cd ../sbin/  
[hadoop@master sbin]$ ./start-yarn.sh 
master中应该有:
ResourceManager  服务
slave中应该有:
nodemanager  服务 


6.4 查看
查看集群状态:./bin/hdfs dfsadmin –report
查看文件块组成:  ./bin/hdfsfsck / -files -blocks
查看各节点状态:    http://10.40.3.53:50070
查看resourcemanager上cluster运行状态:    http://10.40.3.54:8088


6.5 建议
第一次启动很可能会失败,注意看看log,不要着急再次格式化namenode,多在网上查查原因,一般都能解决。


7. 测试hadoop,运行小例子
1)下载一个文件:
wget http://www.gutenberg.org/cache/epub/20417/pg20417.txt

2)执行以下操作

[hadoop@master hadoop-2.2.0]$ bin/hdfs dfs -mkdir /home/hadoop/input[hadoop@master hadoop-2.2.0]$ bin/hdfs dfs -copyFromLocal /home/hadoop/pg20417.txt /home/hadoop/input[hadoop@master hadoop-2.2.0]$ bin/hdfs dfs -ls /home/hadoop/inputFound 1 items-rw-r--r--   1 hadoop supergroup     674570 2014-05-07 00:42 /home/hadoop/input/pg20417.txt
ps:hdfs 创建的目录,是不能直接进入ls查看内容的。

3)开始运行

[hadoop@master hadoop-2.2.0]$ ./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount /home/hadoop/input 


/home/hadoop/output
4)观看运行结果:
14/05/07 00:45:28 INFO client.RMProxy: Connecting to ResourceManager at master/10.40.3.53:8032
14/05/07 00:45:29 INFO input.FileInputFormat: Total input paths to process : 1
14/05/07 00:45:29 INFO mapreduce.JobSubmitter: number of splits:1
14/05/07 00:45:29 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
14/05/07 00:45:29 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
14/05/07 00:45:29 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
14/05/07 00:45:29 INFO Configuration.deprecation: mapreduce.combine.class is deprecated. Instead, use mapreduce.job.combine.class
14/05/07 00:45:29 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
14/05/07 00:45:29 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
14/05/07 00:45:29 INFO Configuration.deprecation: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class
14/05/07 00:45:29 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
14/05/07 00:45:29 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
14/05/07 00:45:29 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
14/05/07 00:45:29 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
14/05/07 00:45:29 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
14/05/07 00:45:29 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1399392543103_0002
14/05/07 00:45:30 INFO impl.YarnClientImpl: Submitted application application_1399392543103_0002 to ResourceManager at master/10.40.3.53:8032
14/05/07 00:45:30 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1399392543103_0002/
14/05/07 00:45:30 INFO mapreduce.Job: Running job: job_1399392543103_0002
14/05/07 00:45:38 INFO mapreduce.Job: Job job_1399392543103_0002 running in uber mode : false
14/05/07 00:45:38 INFO mapreduce.Job:  map 0% reduce 0%
14/05/07 00:45:45 INFO mapreduce.Job:  map 100% reduce 0%
14/05/07 00:45:53 INFO mapreduce.Job:  map 100% reduce 100%
14/05/07 00:45:53 INFO mapreduce.Job: Job job_1399392543103_0002 completed successfully
14/05/07 00:45:54 INFO mapreduce.Job: Counters: 43
File System Counters
FILE: Number of bytes read=267026
FILE: Number of bytes written=693051
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=674683
HDFS: Number of bytes written=196192
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters 
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=5118
Total time spent by all reduces in occupied slots (ms)=5211
Map-Reduce Framework
Map input records=12760
Map output records=109844
Map output bytes=1086547
Map output materialized bytes=267026
Input split bytes=113
Combine input records=109844
Combine output records=18040
Reduce input groups=18040
Reduce shuffle bytes=267026
Reduce input records=18040
Reduce output records=18040
Spilled Records=36080
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=111
CPU time spent (ms)=5890
Physical memory (bytes) snapshot=413360128
Virtual memory (bytes) snapshot=1758056448
Total committed heap usage (bytes)=308150272
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters 
Bytes Read=674570
File Output Format Counters 
Bytes Written=196192
[hadoop@master hadoop-2.2.0]$ 
0 0
原创粉丝点击