hadoop 2.2.0 集群模式安装配置和测试
来源:互联网 发布:淘宝商户中心关联产品 编辑:程序博客网 时间:2024/05/01 04:14
hadoop 2.2.0 集群模式安装配置和测试
本文详细记录Hadoop 2.2.0 集群安装配置的步骤,并运行演示一个简单的job。基本目录结构如下:
- 环境准备
- Hadoop安装配置
- 启动及演示
[一]、环境准备
本文所有集群节点的操作系统均为:CentOS 6.0 32位,不管是实体机还是虚拟机都可以,在这里统一叫做 “实例”吧,以4 台主机实例作为集群配置的演示,具体的划分如下:
hostnameIP用途Master.Hadoop192.168.13.33NameNode/ResouceManagerSlave0.Hadoop192.168.13.30DataNode/NodeManagerSlave1.Hadoop192.168.13.31DataNode/NodeManagerSlave2.Hadoop192.168.13.32DataNode/NodeManagerps:如果是虚拟机可以把环境配置好后,copy多个实例即可,需要注意修改hostname/hosts,关闭所有机器防火墙(service iptables stop)
1、vi /etc/hosts
添加如下内容:
192.168.13.30 Slave0.Hadoop
192.168.13.31 Slave1.Hadoop
192.168.13.32 Slave2.Hadoop
192.168.13.34 Slave4.Hadoop
192.168.13.35 Slave5.Hadoop
2、JDK
到Java 的官网下载jdk6 64位的版本,安装最基础的安装即可,当然由于CentOS6 自带了OpenJDK,本文直接用OpenJDK来演示(ps: OpenJDK的目录一般在/usr/lib/jvm/
路径下),该系统的JAVA_HOME 配置如下:export JAVA_HOME = /usr/lib/jvm/java-1.6.0-openjdk.x86_64
3、SSHD服务
确保系统已经安装了SSHD相关服务,并启动(CentOS默认已经安装好)。
4、创建用户(未创建,搭建过程使用Root可以正常搭建测试)
创建一个专用的账户:hadoop
1
$
useradd
hadoop
5、配置SSH无密码登录
需要实现 Master到所有的Slave的SSH无密码登录(所有Slave 到Master的SSH无密码登录不需要)
有关SSH无密码登录的详细介绍可以参见:Linux(Centos)配置OpenSSH无密码登陆
3
$
crontab
-e
4
*/2 * * * * /usr/sbin/ntpdate 192.168.9.21、下载源码编译本地库----(可直接下载发布包)
由于官方的发布包中的本地库是32位的,不符合我们的要求,需要自己编译本地库,编译本地库的过程可以参考:Hadoop 2.x build native library on Mac os x ,大同小异,编译完成后,替换<HADOOP_HOME>/lib/native/
下的文件即可,注意lib文件名
2、下载发布包
打开官方下载链接 http://hadoop.apache.org/releases.html#Download ,选择2.2.0版本的发布包下载后解压到指定路径下:
1
$
tar
-zxf hadoop-2.2.0.
tar
.gz
那么本文中 HADOOP_HOME = /home/hadoop/
.
3、配置hadoop用户的环境变量 vi ~/.bash_profile
,添加如下内容:
1
# set java environment
export CLASSPATH=.:$JAVA_HOME/lib:$CLASSPATH
export PATH=$JAVA_HOME/bin:$PATH
# Hadoop
export HADOOP_PREFIX=/home/hadoop/
export PATH=$PATH:$HADOOP_PREFIX/bin:$HADOOP_PREFIX/sbin
export HADOOP_COMMON_HOME=${HADOOP_PREFIX}
export HADOOP_HDFS_HOME=${HADOOP_PREFIX}
export HADOOP_MAPRED_HOME=${HADOOP_PREFIX}
export HADOOP_YARN_HOME=${HADOOP_PREFIX}
4、编辑 <HADOOP_HOME>/etc/hadoop/hadoop-env.sh
修改JAVA_HOME的配置:
1
export
JAVA_HOME=
/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.95、编辑 <HADOOP_HOME>/etc/hadoop/yarn-env.sh
修改JAVA_HOME的配置:
1
export
JAVA_HOME=
/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.96、编辑 <HADOOP_HOME>/etc/hadoop/core-site.xml
在<configuration>节点下添加或者更新下面的配置信息:
1
<!-- 新变量f:s.defaultFS 代替旧的:fs.default.name |micmiu.com-->
2
<
property
>
3
<
name
>fs.defaultFS</
name
>
4
<
value
>hdfs://Master.Hadoop:9000</
value
>
5
<
description
>The name of the default file system.</
description
>
6
</
property
>
7
<
property
>
8
<
name
>hadoop.tmp.dir</
name
>
9
<!-- 注意创建相关的目录结构 -->
10
<
value
>/usr/local/hadoop/temp</
value
>
11
<
description
>A base for other temporary directories.</
description
>
12
</
property
>
7、编辑<HADOOP_HOME>/etc/hadoop/hdfs-site.xml
在<configuration>节点下添加或者更新下面的配置信息:
1
<
property
>
2
<
name
>dfs.replication</
name
>
3
<!-- 值需要与实际的DataNode节点数要一致,本文为3 -->
4
<
value
>3</
value
>
5
</
property
>
6
<
property
>
7
<
name
>dfs.namenode.name.dir</
name
>
8
<!-- 注意创建相关的目录结构 -->
9
<
value
>file:/usr/local/hadoop/dfs/name</
value
>
10
<
final
>true</
final
>
11
</
property
>
12
<
property
>
13
<
name
>dfs.datanode.data.dir</
name
>
14
<!-- 注意创建相关的目录结构 -->
15
<
value
>file:/usr/local/hadoop/dfs/data</
value
>
16
</
property
>
8、编辑<HADOOP_HOME>/etc/hadoop/yarn-site.xml
在<configuration>节点下添加或者更新下面的配置信息:
1
<!-- micmiu.com -->
2
<
property
>
3
<
name
>yarn.nodemanager.aux-services</
name
>
4
<
value
>mapreduce_shuffle</
value
>
5
</
property
>
6
<
property
>
7
<
name
>yarn.nodemanager.aux-services.mapreduce.shuffle.class</
name
>
8
<
value
>org.apache.hadoop.mapred.ShuffleHandler</
value
>
9
</
property
>
10
11
<!-- resourcemanager hostname或ip地址-->
12
<
property
>
13
<
name
>yarn.resourcemanager.hostname</
name
>
14
<
value
>Master.Hadoop</
value
>
15
</
property
>
9、编辑 <HADOOP_HOME>/etc/hadoop/mapred-site.xml
默认没有mapred-site.xml文件,copy mapred-site.xml.template 一份为 mapred-site.xml即可
在<configuration>节点下添加或者更新下面的配置信息:
1
<!-- micmiu.com -->
2
<
property
>
3
<
name
>mapreduce.framework.name</
name
>
4
<
value
>yarn</
value
>
5
<
final
>true</
final
>
6
</
property
>
<HADOOP_HOME>/etc/hadoop/slaves
[三]、启动和测试
1、启动Hadoop
hdfs namenode -format
:1
[hadoop@Master ~]$ hdfs namenode -
format
2
14/01/22 15:43:10 INFO namenode.NameNode: STARTUP_MSG:
3
/************************************************************
4
STARTUP_MSG: Starting NameNode
5
STARTUP_MSG: host = Master.Hadoop/192.168.6.77
6
STARTUP_MSG: args = [-
format
]
7
STARTUP_MSG: version = 2.2.0
8
STARTUP_MSG: classpath =
9
........................................
10
............micmiu.com.............
11
........................................
12
STARTUP_MSG: java = 1.6.0_20
13
************************************************************/
14
14/01/22 15:43:10 INFO namenode.NameNode: registered UNIX signal handlers
for
[TERM, HUP, INT]
15
Formatting using clusterid: CID-645f2ed2-6f02-4c24-8cbc-82b09eca963d
16
14/01/22 15:43:11 INFO namenode.HostFileManager:
read
includes:
17
HostSet(
18
)
19
14/01/22 15:43:11 INFO namenode.HostFileManager:
read
excludes:
20
HostSet(
21
)
22
14/01/22 15:43:11 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
23
14/01/22 15:43:11 INFO util.GSet: Computing capacity
for
map BlocksMap
24
14/01/22 15:43:11 INFO util.GSet: VM
type
= 64-bit
25
14/01/22 15:43:11 INFO util.GSet: 2.0% max memory = 888.9 MB
26
14/01/22 15:43:11 INFO util.GSet: capacity = 2^21 = 2097152 entries
27
14/01/22 15:43:11 INFO blockmanagement.BlockManager: dfs.block.access.token.
enable
=
false
28
14/01/22 15:43:11 INFO blockmanagement.BlockManager: defaultReplication = 3
29
14/01/22 15:43:11 INFO blockmanagement.BlockManager: maxReplication = 512
30
14/01/22 15:43:11 INFO blockmanagement.BlockManager: minReplication = 1
31
14/01/22 15:43:11 INFO blockmanagement.BlockManager: maxReplicationStreams = 2
32
14/01/22 15:43:11 INFO blockmanagement.BlockManager: shouldCheckForEnoughRacks =
false
33
14/01/22 15:43:11 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
34
14/01/22 15:43:11 INFO blockmanagement.BlockManager: encryptDataTransfer =
false
35
14/01/22 15:43:11 INFO namenode.FSNamesystem: fsOwner = hadoop (auth:SIMPLE)
36
14/01/22 15:43:11 INFO namenode.FSNamesystem: supergroup = supergroup
37
14/01/22 15:43:11 INFO namenode.FSNamesystem: isPermissionEnabled =
true
38
14/01/22 15:43:11 INFO namenode.FSNamesystem: HA Enabled:
false
39
14/01/22 15:43:11 INFO namenode.FSNamesystem: Append Enabled:
true
40
14/01/22 15:43:11 INFO util.GSet: Computing capacity
for
map INodeMap
41
14/01/22 15:43:11 INFO util.GSet: VM
type
= 64-bit
42
14/01/22 15:43:11 INFO util.GSet: 1.0% max memory = 888.9 MB
43
14/01/22 15:43:11 INFO util.GSet: capacity = 2^20 = 1048576 entries
44
14/01/22 15:43:11 INFO namenode.NameNode: Caching
file
names occuring
more
than 10
times
45
14/01/22 15:43:11 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
46
14/01/22 15:43:11 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
47
14/01/22 15:43:11 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension = 30000
48
14/01/22 15:43:11 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
49
14/01/22 15:43:11 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry
time
is 600000 millis
50
14/01/22 15:43:11 INFO util.GSet: Computing capacity
for
map Namenode Retry Cache
51
14/01/22 15:43:11 INFO util.GSet: VM
type
= 64-bit
52
14/01/22 15:43:11 INFO util.GSet: 0.029999999329447746% max memory = 888.9 MB
53
14/01/22 15:43:11 INFO util.GSet: capacity = 2^15 = 32768 entries
54
14/01/22 15:43:11 INFO common.Storage: Storage directory /usr/
local
/hadoop/dfs/name has been successfully formatted.
55
14/01/22 15:43:11 INFO namenode.FSImage: Saving image
file
/usr/
local
/hadoop/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
56
14/01/22 15:43:11 INFO namenode.FSImage: Image
file
/usr/
local
/hadoop/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 198 bytes saved
in
0 seconds.
57
14/01/22 15:43:11 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
58
14/01/22 15:43:11 INFO util.ExitUtil: Exiting with status 0
59
14/01/22 15:43:11 INFO namenode.NameNode: SHUTDOWN_MSG:
60
/************************************************************
61
SHUTDOWN_MSG: Shutting down NameNode at Master.Hadoop/192.168.6.77
62
************************************************************/
1.2、在Master.Hadoop执行 start-dfs.sh
:
1
[hadoop@Master ~]$ start-dfs.sh
2
Starting namenodes on [Master.Hadoop]
3
Master.Hadoop: starting namenode, logging to /usr/
local
/hadoop-2.2.0/logs/hadoop-hadoop-namenode-Master.Hadoop.out
4
Slave7.Hadoop: starting datanode, logging to /usr/
local
/hadoop-2.2.0/logs/hadoop-hadoop-datanode-Slave7.Hadoop.out
5
Slave5.Hadoop: starting datanode, logging to /usr/
local
/hadoop-2.2.0/logs/hadoop-hadoop-datanode-Slave5.Hadoop.out
6
Slave6.Hadoop: starting datanode, logging to /usr/
local
/hadoop-2.2.0/logs/hadoop-hadoop-datanode-Slave6.Hadoop.out
7
Starting secondary namenodes [0.0.0.0]
8
0.0.0.0: starting secondarynamenode, logging to /usr/
local
/hadoop-2.2.0/logs/hadoop-hadoop-secondarynamenode-Master.Hadoop.out
在Master.Hadoop 验证启动进程:
1
[hadoop@Master ~]$ jps
2
7695 Jps
3
7589 SecondaryNameNode
4
7403 NameNode
在SlaveX.Hadop 验证启动进程如下:
1
[hadoop@Slave5 ~]$ jps
2
8724 DataNode
3
8815 Jps
1.3、在Master.Hadoop 执行 start-yarn.sh
:
1
[hadoop@Master ~]$ start-yarn.sh
2
starting yarn daemons
3
starting resourcemanager, logging to /usr/
local
/hadoop-2.2.0/logs/yarn-hadoop-resourcemanager-Master.Hadoop.out
4
Slave7.Hadoop: starting nodemanager, logging to /usr/
local
/hadoop-2.2.0/logs/yarn-hadoop-nodemanager-Slave7.Hadoop.out
5
Slave5.Hadoop: starting nodemanager, logging to /usr/
local
/hadoop-2.2.0/logs/yarn-hadoop-nodemanager-Slave5.Hadoop.out
6
Slave6.Hadoop: starting nodemanager, logging to /usr/
local
/hadoop-2.2.0/logs/yarn-hadoop-nodemanager-Slave6.Hadoop.out
在Master.Hadoop 验证启动进程:
1
[hadoop@Master ~]$ jps
2
8071 Jps
3
7589 SecondaryNameNode
4
7821 ResourceManager
5
7403 NameNode
在SlaveX.Hadop 验证启动进程如下:
1
[hadoop@Slave5 ~]$ jps
2
9013 Jps
3
8724 DataNode
4
8882 NodeManager
2、演示
2.1、演示hdfs 一些常用命令,为wordcount演示做准备:
1
[hadoop@Master ~]$ hdfs dfs -
ls
/
2
[hadoop@Master ~]$ hdfs dfs -
mkdir
/user
3
[hadoop@Master ~]$ hdfs dfs -
mkdir
-p /user/micmiu/wordcount/
in
4
[hadoop@Master ~]$ hdfs dfs -
ls
/user/micmiu/wordcount
5
Found 1 items
6
drwxr-xr-x - hadoop supergroup 0 2014-01-22 16:01 /user/micmiu/wordcount/
in
2.2、本地创建三个文件 micmiu-01.txt、micmiu-03.txt、micmiu-03.txt, 分别写入如下内容:
micmiu-01.txt:
Hi Michael welcome to Hadoop more see micmiu.com
micmiu-02.txt:
Hi Michael welcome to BigDatamore see micmiu.com
micmiu-03.txt:
Hi Michael welcome to Spark more see micmiu.com
把 micmiu 打头的三个文件上传到hdfs:
1
[hadoop@Master ~]$ hdfs dfs -put micmiu*.txt /user/micmiu/wordcount/
in
2
[hadoop@Master ~]$ hdfs dfs -
ls
/user/micmiu/wordcount/
in
3
Found 3 items
4
-rw-r--r-- 3 hadoop supergroup 50 2014-01-22 16:06 /user/micmiu/wordcount/
in
/micmiu-01.txt
5
-rw-r--r-- 3 hadoop supergroup 50 2014-01-22 16:06 /user/micmiu/wordcount/
in
/micmiu-02.txt
6
-rw-r--r-- 3 hadoop supergroup 49 2014-01-22 16:06 /user/micmiu/wordcount/
in
/micmiu-03.txt
2.3、然后cd 切换到Hadoop的根目录下执行:
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount /user/micmiu/wordcount/in /user/micmiu/wordcount/out
ps: hdfs 中 /user/micmiu/wordcount/out 目录不能存在 否则运行报错。
看到类似如下的日志信息:
1
[hadoop@Master hadoop]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount /user/micmiu/wordcount/
in
/user/micmiu/wordcount/out
2
14/01/22 16:36:28 INFO client.RMProxy: Connecting to ResourceManager at Master.Hadoop/192.168.6.77:8032
3
14/01/22 16:36:29 INFO input.FileInputFormat: Total input paths to process : 3
4
14/01/22 16:36:29 INFO mapreduce.JobSubmitter: number of splits:3
5
............................
6
.....micmiu.com........
7
............................
8
File System Counters
9
FILE: Number of bytes
read
=297
10
FILE: Number of bytes written=317359
11
FILE: Number of
read
operations=0
12
FILE: Number of large
read
operations=0
13
FILE: Number of write operations=0
14
HDFS: Number of bytes
read
=536
15
HDFS: Number of bytes written=83
16
HDFS: Number of
read
operations=12
17
HDFS: Number of large
read
operations=0
18
HDFS: Number of write operations=2
19
Job Counters
20
Launched map tasks=3
21
Launched reduce tasks=1
22
Data-
local
map tasks=3
23
Total
time
spent by all maps
in
occupied slots (ms)=55742
24
Total
time
spent by all reduces
in
occupied slots (ms)=3933
25
Map-Reduce Framework
26
Map input records=6
27
Map output records=24
28
Map output bytes=243
29
Map output materialized bytes=309
30
Input
split
bytes=387
31
Combine input records=24
32
Combine output records=24
33
Reduce input
groups
=10
34
Reduce shuffle bytes=309
35
Reduce input records=24
36
Reduce output records=10
37
Spilled Records=48
38
Shuffled Maps =3
39
Failed Shuffles=0
40
Merged Map outputs=3
41
GC
time
elapsed (ms)=1069
42
CPU
time
spent (ms)=12390
43
Physical memory (bytes) snapshot=846753792
44
Virtual memory (bytes) snapshot=5155561472
45
Total committed heap usage (bytes)=499580928
46
Shuffle Errors
47
BAD_ID=0
48
CONNECTION=0
49
IO_ERROR=0
50
WRONG_LENGTH=0
51
WRONG_MAP=0
52
WRONG_REDUCE=0
53
File Input Format Counters
54
Bytes Read=149
55
File Output Format Counters
56
Bytes Written=83
到此 wordcount的job已经执行完成,执行如下命令可以查看刚才job的执行结果:
1
[hadoop@Master hadoop]$ hdfs dfs -
ls
/user/micmiu/wordcount/out
2
Found 2 items
3
-rw-r--r-- 3 hadoop supergroup 0 2014-01-22 16:38 /user/micmiu/wordcount/out/_SUCCESS
4
-rw-r--r-- 3 hadoop supergroup 83 2014-01-22 16:38 /user/micmiu/wordcount/out/part-r-00000
5
[hadoop@Master hadoop]$ hdfs dfs -
cat
/user/micmiu/wordcount/out/part-r-00000
6
BigData 1
7
Hadoop 1
8
Hi 3
9
Michael 3
10
Spark 1
11
micmiu.com 3
12
more
3
13
see 3
14
to 3
15
welcome 3
- hadoop 2.2.0 集群模式安装配置和测试
- 测试集群模式安装实施Hadoop
- Hadoop-2.2.0集群安装配置实践
- Hadoop-2.2.0集群安装配置实践
- Hadoop-2.8.0集群搭建、hadoop源码编译和安装、host配置、ssh免密登录、hadoop配置文件中的参数配置参数总结、hadoop集群测试,安装过程中的常见错误
- 安装linux虚拟机和配置hadoop集群
- hadoop集群的安装步骤和配置
- Ambari安装,配置和部署Hadoop集群
- Centos中安装配置local/standalone模式和伪分布式模式hadoop集群
- Hadoop集群安装配置
- Hadoop集群安装配置
- Hadoop集群安装配置
- Hadoop集群安装配置
- HADOOP 集群安装配置
- Hadoop集群安装配置
- hadoop集群安装配置
- Hadoop集群安装配置
- hadoop集群安装配置
- myeclipse中Html更改文件名后HTML Editor无法打开文件
- 经典算法大全之费式数列
- android application
- extern的解释
- Detect CSS3 Support in Browsers with JavaScript
- hadoop 2.2.0 集群模式安装配置和测试
- Xcode 6 模拟器丢失问题(小问题)
- dp和px的转化
- 视频通话中的数据安全加密技术
- Win8.1安装MySql5.6时服务启动失败的解决方法
- memcached学习笔记2(CAS协议)
- JQuery实现相同内容合并单元格
- assert
- 英特尔和瑞芯微合作发布面向入门市场的XMM6321双核处理器