VMWare10下基于Ubuntu14搭建Hadoop-1.2.1集群
来源:互联网 发布:mac恢复单一分区 编辑:程序博客网 时间:2024/06/06 19:56
最近在学习Hadoop,把hadoop集群环境搭建的过程记录一下,方便查询,方案中有好多细节的东西,可能会比较啰嗦,对于新手来说或许更有帮助,闲话不多说,进入正题。
搭建5个节点的Hadoop集群环境
1. 环境说明
使用VMWare创建5台Ubuntu虚拟机,环境详细信息如下:
虚拟机
操作系统
JDK
Hadoop
VMWare Workstation 10
ubuntukylin-14.10-desktop-i386
java-7-openjdk-i386
hadoop-1.2.1
主机名
IP地址
虚拟机名
节点内容
master
192.168.1.30
Ubuntu32-Master
namenode, Jobtracker
secondary
192.168.1.39
Ubuntu32-Secondary
secondarynamenode
slaver1
192.168.1.31
Ubuntu32-slaver1
datanode, tasktracker
slaver2
192.168.1.32
Ubuntu32-slaver2
datanode, tasktracker
slaver3
192.168.1.33
Ubuntu32-slaver3
datanode, tasktracker
2. 搭建虚拟机系统
下载ubuntukylin-14.10-desktop-i386版32位系统,iso版本,方便在vmware上安装。
每台虚拟机配置1个双核cpu,1G RAM,20G硬盘,设置ShareFolder为共享文件夹,方便Windows主机向虚拟机传送文件包。
Ubuntu系统easy方式安装,创建hadoop用户,后续hadoop,zookeeper,hbase都用hadoop用户来部署。
可以先安装一台主机,以master为模板,配置好了之后,用vmvare的克隆功能复制出其它主机,然后调整下ip和主机名。
创建用户
先创建用户组
sudo addgroup hadoop
然后创建用户
sudo adduser -ingroup hadoop hadoop
更新安装源
先备份系统自带源内容(hadoop用户登录,所以要sudo)。
sudo cp /etc/apt/sources.list /etc/apt/sources.list.backup
修改源内容
sudo vi /etc/apt/sources.list
从网上搜索到的源内容,复制到vi中
##Ubuntu 官方更新服务器(欧洲,此为官方源,国内较慢,但无同步延迟问题,电信、移动/铁通、联通等公网用户可以使用):
deb http://archive.ubuntu.com/ubuntu/ quantal main restricted universe multiverse
deb http://archive.ubuntu.com/ubuntu/ quantal-security main restricted universe multiverse
deb http://archive.ubuntu.com/ubuntu/ quantal-updates main restricted universe multiverse
deb http://archive.ubuntu.com/ubuntu/ quantal-proposed main restricted universe multiverse
deb http://archive.ubuntu.com/ubuntu/ quantal-backports main restricted universe multiverse
deb-src http://archive.ubuntu.com/ubuntu/ quantal main restricted universe multiverse
deb-src http://archive.ubuntu.com/ubuntu/ quantal-security main restricted universe multiverse
deb-src http://archive.ubuntu.com/ubuntu/ quantal-updates main restricted universe multiverse
deb-src http://archive.ubuntu.com/ubuntu/ quantal-proposed main restricted universe multiverse
deb-src http://archive.ubuntu.com/ubuntu/ quantal-backports main restricted universe multiverse
##Ubuntu官方提供的其他软件(第三方闭源软件等):
deb http://archive.canonical.com/ubuntu/ quantal partner
deb http://extras.ubuntu.com/ubuntu/ quantal main
##骨头兄亲自搭建并维护的 Ubuntu 源(该源位于浙江杭州百兆共享宽带的电信机房),包含 Deepin 等镜像:
deb http://ubuntu.srt.cn/ubuntu/ quantal main restricted universe multiverse
deb http://ubuntu.srt.cn/ubuntu/ quantal-security main restricted universe multiverse
deb http://ubuntu.srt.cn/ubuntu/ quantal-updates main restricted universe multiverse
deb http://ubuntu.srt.cn/ubuntu/ quantal-proposed main restricted universe multiverse
deb http://ubuntu.srt.cn/ubuntu/ quantal-backports main restricted universe multiverse
deb-src http://ubuntu.srt.cn/ubuntu/ quantal main restricted universe multiverse
deb-src http://ubuntu.srt.cn/ubuntu/ quantal-security main restricted universe multiverse
deb-src http://ubuntu.srt.cn/ubuntu/ quantal-updates main restricted universe multiverse
deb-src http://ubuntu.srt.cn/ubuntu/ quantal-proposed main restricted universe multiverse
deb-src http://ubuntu.srt.cn/ubuntu/ quantal-backports main restricted universe multiverse
##搜狐更新服务器(山东联通千兆接入,官方中国大陆地区镜像跳转至此) ,包含其他开源镜像:
deb http://mirrors.sohu.com/ubuntu/ quantal main restricted universe multiverse
deb http://mirrors.sohu.com/ubuntu/ quantal-security main restricted universe multiverse
deb http://mirrors.sohu.com/ubuntu/ quantal-updates main restricted universe multiverse
deb http://mirrors.sohu.com/ubuntu/ quantal-proposed main restricted universe multiverse
deb http://mirrors.sohu.com/ubuntu/ quantal-backports main restricted universe multiverse
deb-src http://mirrors.sohu.com/ubuntu/ quantal main restricted universe multiverse
deb-src http://mirrors.sohu.com/ubuntu/ quantal-security main restricted universe multiverse
deb-src http://mirrors.sohu.com/ubuntu/ quantal-updates main restricted universe multiverse
deb-src http://mirrors.sohu.com/ubuntu/ quantal-proposed main restricted universe multiverse
deb-src http://mirrors.sohu.com/ubuntu/ quantal-backports main restricted universe multiverse
执行更新源命令,这样系统中的安装源才会被刷新。
sudo apt-get update
安装vim
还是用不习惯vi,安装vim替代vi。
sudo apt-get install vim
配置ip
ubuntu下修改ip地址,直接修改/etc/network/interfaces文件即可。
sudo vim /etc/network/interfaces
以master主机为例,修改为如下配置
# The primary network interface
auto eth0
iface eth0 inet static
address 192.168.1.30
netmask 255.255.255.0
network 192.168.1.0
broadcast 192.168.1.255
gateway 192.168.1.1
# dns-* options are implemented by the resolvconf package, if installed
dns-nameservers 8.8.8.8
配置主机名
ubuntu下主机名文件为/etc/hostname,还有/etc/hosts用来配置主机名ip地址转换关系。
先配置主机名
sudo vim /etc/hostname
配置为
master
然后配置所有主机名ip地址转换
sudo vim /etc/hosts
配置为(一次把所有服务器主机都配置上,一劳永逸)
127.0.0.1 localhost
192.168.1.30 master
192.168.1.31 slaver1
192.168.1.32 slaver2
192.168.1.33 slaver3
192.168.1.39 secondary
hosts文件中的配置参数格式为
ip地址 主机名 别名(可以有0个或n个,空格分开)
克隆系统副本
把安装配置好的ubuntu克隆出多个副本,构建出5台ubuntu小集群,然后分别修改ip、修改主机名。
3. 安装配置SSH
安装SSH
采用apt-get方式安装,方便省事。
sudo apt-get install openssh-server
用命令查看ssh服务是否启动
ps –ef|grep ssh
有如下信息就是启动了
hadoop 2147 2105 0 13:11 ? 00:00:00 /usr/bin/ssh-agent /usr/bin/dbus-launch --exit-with-session gnome-session --session=ubuntu
root 7226 1 0 23:31 ? 00:00:00 /usr/sbin/sshd -D
hadoop 7287 6436 0 23:33 pts/0 00:00:00 grep --color=auto ssh
ssh分为client和server,client用来ssh登录其它服务器,server用来提供ssh服务,供用户ssh远程登录。ubuntu默认安装了ssh client,所以要安装sshserver。
生成RSA密钥对
在hadoop用户下,用ssh的命令生成密钥对。
ssh-keygen –t rsa
如果出现如下提示:只需把命令换成ssh-keygen -t rsa -C “文件名”
期间会询问是否为密钥设置密码,空密码即可,没有错误的话,在hadoop的.ssh目录下会生成密钥对文件(id_rsa和id_rsa.pub文件),id_rsa文件为私钥,服务器自己保存,防止外泄,id_rsa.pub文件为公钥,分发给其它需要免密码访问的服务器。
注:ssh和-keygen之间不能有空格,ssh-keygen –t rsa –P “” 命令可以免密钥密码。
进入.ssh目录,将公钥追加到授权认证文件(authorized_keys)中,authorized_keys用来存储所有服务器的公钥信息。
cat id_rsa.pub >> authorized_keys
authorized_keys文件中,公钥以ssh-rsa开头,用户名@主机名结尾,多个服务器的公钥顺序保存,示例如下。
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDs5A9sjk+44DtptGw4fXm5n0qbpnSnFsqRQJnbyD4DGMG7AOpfrEZrMmRiNJA8GZUIcrN71pHEgQimoQGD5CWyVgi1ctWFrULOnGksgixJj167m+FPdpcCFJwfAS34bD6DoVXJgyjWIDT5UFz+RnElNC14s8F0f/w44EYM49y2dmP8gGmzDQ0jfIgPSknUSGoL7fSFJ7PcnRrqWjQ7iq3B0gwyfCvWnq7OmzO8VKabUnzGYST/lXCaSBC5WD2Hvqep8C9+dZRukaa00g2GZVH3UqWO4ExSTefyUMjsal41YVARMGLEfyZzvcFQ8LR0MWhx2WMSkYp6Z6ARbdHZB4MN hadoop@master
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC2Hb6mCi6sd6IczIn/pBbj8L9PMS1ac0tlalex/vlSRj2E6kzUrw/urEUVeO76zFcZgjUgKvoZsNAHGrr1Bfw8FiiDcxPtlIREl2L9Qg8Vd0ozgE22bpuxBTn1Yed/bbJ/VxGJsYbOyRB/mBCvEI4ECy/EEPf5CRMDgiTL9XP86MNJ/kgG3odR6hhSE3Ik/NMARTZySXE90cFB0ELr/Io4SaINy7b7m6ssaP16bO8aPbOmsyY2W2AT/+O726Py6tcxwhe2d9y2tnJiELfrMLUPCYGEx0Z/SvEqWhEvvoGn8qnpPJCGg6AxYaXy8jzSqWNZwP3EcFqmVrg9I5v8mvDd hadoop@slaver1
分发公钥
服务器把自己的公钥内容分发给其它服务器,就是为了能免密码登录其它服务器,把所有服务器的公钥集中到一台服务器上,然后集中分发给其它服务器,这样处理,5台服务器可以随意互相免密码访问。
分发采用scp命令,scp需要双方服务器都启动ssh服务。scp初次访问需要输入密码。
除master服务器执行如下命令,复制公钥。
cd .ssh
scp id_rsa.pub hadoop@master:/home/hadoop/.ssh/id_rsa.pub.slaver1
scp id_rsa.pub hadoop@master:/home/hadoop/.ssh/id_rsa.pub.slaver2
scp id_rsa.pub hadoop@master:/home/hadoop/.ssh/id_rsa.pub.slaver3
scp id_rsa.pub hadoop@master:/home/hadoop/.ssh/id_rsa.pub.secondary
master服务器执行如下命令,集中公钥。
cd .ssh
cat id_rsa.pub.slaver1 >> authorized_keys
cat id_rsa.pub.slaver2 >> authorized_keys
cat id_rsa.pub.slaver3 >> authorized_keys
cat id_rsa.pub.secondary >> authorized_keys
master服务器执行如下命令,分发公钥。
scp authorized_keys hadoop @slaver1:/home/hadoop/.ssh/authorized_keys
scp authorized_keys hadoop @slaver2:/home/hadoop/.ssh/authorized_keys
scp authorized_keys hadoop @slaver3:/home/hadoop/.ssh/authorized_keys
scp authorized_keys hadoop @secondary:/home/hadoop/.ssh/authorized_keys
测试免密码访问,输入
ssh slaver1
4. 安装配置JDK
部署JDK
解jdk-7u51-linux-x64.tar.gz包到/usr/lib/jdk1.7.0_51
tar –zxvf jdk-7u51-linux-x64.tar.gz –C /usr/lib/
配置环境变量
把jdk设置到全局环境变量中
sudo vim /etc/profile
在最下面添加如下内容
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-i386export JRE_HOME=/usr/lib/jvm/java-7-openjdk-i386/jre
export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH
注:linux下环境变量分隔符为半角冒号’:’,windows下为半角分号’;’,CLASSPATH中必须有’.’
通过如下命令来刷新环境变量
source /etc/profile
分发JDK
通过scp命令分发安装好的jdk,目录分发需要加-r参数
scp -r /usr/lib/jvm/java-7-openjdk-i386 hadoop@slaver1: /usr/lib/jvm/
分发环境变量
/etc/profile为root用户所有,所以scp要加sudo,而且需要传输到slaver1的root用户下。
sudo scp /etc/profile root@slaver1:/etc/profile
5. 安装配置Hadoop
部署Hadoop
解hadoop-1.2.1.tar.gz包到/home/hadoop/hadoop-1.2.1
tar –zxvf hadoop-1.2.1.tar.gz –C /usr/local/
配置环境变量
把hadoop设置到全局环境变量中
sudo vim /etc/profile
在最下面添加如下内容
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
刷新环境变量
source /etc/profile
conf/hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-i386
export HADOOP_TASKTRACKER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_TASKTRACKER_OPTS"
export HADOOP_LOG_DIR=/usr/local/hadoop/logs
export HADOOP_MASTER=master:/usr/local/hadoop
export HADOOP_SLAVE_SLEEP=0.1
export HADOOP_MASTER设置hadoop应用rsync同步配置的主机目录,hadoop启动,就会从主机把配置同步到从机。
export HADOOP_SLAVE_SLEEP=0.1设置hadoop从机同步配置请求的休眠时间(秒),避免多节点同时请求同步对主机造成负担过重。
conf/core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/hadoop_home/tmp</value>
</property>
<property>
<name>fs.trash.interval</name>
<value>10080</value>
<description>Number of minutes between trash checkpoints.If zero, the trash feature is disabled.</description>
</property>
<property>
<name>fs.checkpoint.period</name>
<value>600</value>
<description>The number of seconds between two periodic checkpoints.</description>
</property>
<property>
<name>fs.checkpoint.size</name>
<value>67108864</value>
<description>The size of the current edit log (in bytes) that triggers a periodic checkpoint even if the fs.checkpoint.period hasn't expired.</description>
</property>
</configuration>
conf/hdfs-site.xml
<configuration>
<property>
<name>dfs.name.dir</name>
<value>/home/hadoop/hadoop_home/name1,/home/hadoop/hadoop_home/name2</value>
<description> </description>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/hadoop/hadoop_home/data1,/home/hadoop/hadoop_home/data2</value>
<description> </description>
</property>
<property>
<name>fs.checkpoint.dir</name>
<value>/home/hadoop/hadoop_home/namesecondary1,/home/hadoop/hadoop_home/namesecondary2</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.http.address</name>
<value>master:50070</value>
</property>
<property>
<name>dfs.https.address</name>
<value>master:50470</value>
</property>
<property>
<name>dfs.secondary.http.address</name>
<value>secondary:50090</value>
</property>
<property>
<name>dfs.datanode.address</name>
<value>0.0.0.0:50010</value>
</property>
<property>
<name>dfs.datanode.ipc.address</name>
<value>0.0.0.0:50020</value>
</property>
<property>
<name>dfs.datanode.http.address</name>
<value>0.0.0.0:50075</value>
</property>
<property>
<name>dfs.datanode.https.address</name>
<value>0.0.0.0:50475</value>
</property>
</configuration>
conf/mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>master:9001</value>
</property>
<property>
<name>mapred.local.dir</name>
<value>/home/hadoop/hadoop_home/local</value>
</property>
<property>
<name>mapred.system.dir</name>
<value>/home/hadoop/hadoop_home/system</value>
</property>
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>5</value>
</property>
<property>
<name>mapred.tasktracker.reduce.tasks.maximum</name>
<value>5</value>
</property>
<property>
<name>mapred.job.tracker.http.address</name>
<value>0.0.0.0:50030</value>
</property>
<property>
<name>mapred.task.tracker.http.address</name>
<value>0.0.0.0:50060</value>
</property>
</configuration>
conf/masters
secondary
conf/masters配置secondarynamenode的主机名,本方案中secondarynamenode有单独的服务器,与namenode无关。
conf/slaves
slaver1
slaver2
slaver3
分发hadoop副本
scp -r /home/hadoop/hadoop-1.2.1 hadoop@slaver1: /home/hadoop/
分发环境变量
/etc/profile为root用户所有,所以scp要加sudo,而且需要传输到slaver1的root用户下。
sudo scp /etc/profile root@slaver1:/etc/profile
6. 启动Hadoop测试
启动hadoop集群
hadoop启动命令如下
命令
作用
start-all.sh
启动hdfs和mapreduce守护进程,包括namenode,secondarynamenode,datanode,jobtracker,tasktracker
stop-all.sh
停止hdfs和mapreduce守护进程,包括namenode,secondarynamenode,datanode,jobtracker,tasktracker
start-dfs.sh
启动hdfs守护进程,包括namenode,secondarynamenode,datanode
stop-dfs.sh
停止hdfs守护进程,包括namenode,secondarynamenode,datanode
start-mapred.sh
启动mapreduce守护进程,包括jobtracker,tasktracker
stop-mapred.sh
停止mapreduce守护进程,包括jobtracker,tasktracker
hadoop-daemons.sh start namenode
单独启动namenode守护进程
hadoop-daemons.sh stop namenode
单独停止namenode守护进程
hadoop-daemons.sh start datanode
单独启动datanode守护进程
hadoop-daemons.sh stop datanode
单独停止datanode守护进程
hadoop-daemons.sh start secondarynamenode
单独启动secondarynamenode守护进程
hadoop-daemons.sh stop secondarynamenode
单独停止secondarynamenode守护进程
hadoop-daemons.sh start jobtracker
单独启动jobtracker守护进程
hadoop-daemons.sh stop jobtracker
单独停止jobtracker守护进程
hadoop-daemons.sh start tasktracker
单独启动tasktracker守护进程
hadoop-daemons.sh stop tasktracker
单独停止tasktracker守护进程
启动hadoop集群
start-all.sh
在用stop-all.sh脚本来停止hadoop的时候,查看日志,发现datanode中总会有错误出现,如下
2014-06-10 15:52:20,216 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Call to master/192.168.1.30:9000 failed on local exception: java.io.EOFExcept
ion
at org.apache.hadoop.ipc.Client.wrapException(Client.java:1150)
at org.apache.hadoop.ipc.Client.call(Client.java:1118)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
at com.sun.proxy.$Proxy5.sendHeartbeat(Unknown Source)
at org.apache.hadoop.hdfs.server.datanode.DataNode.offerService(DataNode.java:1031)
at org.apache.hadoop.hdfs.server.datanode.DataNode.run(DataNode.java:1588)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:845)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:790)
分析原因,发现datanode停止时间在namenode之后,导致与namenode连接失败,出现上面的异常,研究一下停止脚本,发现在stop-dfs.sh中停止顺序有些不太妥当(个人认为),先停止namenode,后停止datanode,我认为可以调整一下停止顺序,让namenode最后停止,这样应该能避免出现连接异常警告。
调整之后内容如下
"$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR stop datanode
"$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR --hosts masters stop secondarynamenode
"$bin"/hadoop-daemon.sh --config $HADOOP_CONF_DIR stop namenode
经过测试,就没有发现datanode有上面的异常出现了,不知道这么调整会不会对hadoop有影响,欢迎大家指正。
通过http://master:50070查看hdfs的监控页面。
通过http://master:50030查看mapreduce的监控页面。
用命令查看hdfs的情况
hadoop dfsadmin –report
hadoop fsck /
测试hadoop集群
通过运行hadoop自带的wordcount程序来测试hadoop集群是否运行正常。
首先创建两个input数据文件。
echo “Hello World Bye World” > text1.txt
echo “Hello Hadoop Goodbye Hadoop” > text2.txt
上传数据文件到hdfs中
hadoop fs –put text1.txt hdfs://master:9000/user/hadoop/input/text1.txt
hadoop fs –put text2.txt hdfs://master:9000/user/hadoop/input/text2.txt
运行wordcount程序
hadoop jar hadoop-examples-1.2.1.jar wordcount input/file*.txt output-0
运行日志如下
14/06/12 01:55:21 INFO input.FileInputFormat: Total input paths to process : 2
14/06/12 01:55:21 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/06/12 01:55:21 WARN snappy.LoadSnappy: Snappy native library not loaded
14/06/12 01:55:21 INFO mapred.JobClient: Running job: job_201406111818_0001
14/06/12 01:55:22 INFO mapred.JobClient: map 0% reduce 0%
14/06/12 01:55:28 INFO mapred.JobClient: map 50% reduce 0%
14/06/12 01:55:30 INFO mapred.JobClient: map 100% reduce 0%
14/06/12 01:55:36 INFO mapred.JobClient: map 100% reduce 33%
14/06/12 01:55:37 INFO mapred.JobClient: map 100% reduce 100%
14/06/12 01:55:38 INFO mapred.JobClient: Job complete: job_201406111818_0001
14/06/12 01:55:38 INFO mapred.JobClient: Counters: 29
14/06/12 01:55:38 INFO mapred.JobClient: Job Counters
14/06/12 01:55:38 INFO mapred.JobClient: Launched reduce tasks=1
14/06/12 01:55:38 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=8281
14/06/12 01:55:38 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
14/06/12 01:55:38 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
14/06/12 01:55:38 INFO mapred.JobClient: Launched map tasks=2
14/06/12 01:55:38 INFO mapred.JobClient: Data-local map tasks=2
14/06/12 01:55:38 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=8860
14/06/12 01:55:38 INFO mapred.JobClient: File Output Format Counters
14/06/12 01:55:38 INFO mapred.JobClient: Bytes Written=41
14/06/12 01:55:38 INFO mapred.JobClient: FileSystemCounters
14/06/12 01:55:38 INFO mapred.JobClient: FILE_BYTES_READ=79
14/06/12 01:55:38 INFO mapred.JobClient: HDFS_BYTES_READ=272
14/06/12 01:55:38 INFO mapred.JobClient: FILE_BYTES_WRITTEN=166999
14/06/12 01:55:38 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=41
14/06/12 01:55:38 INFO mapred.JobClient: File Input Format Counters
14/06/12 01:55:38 INFO mapred.JobClient: Bytes Read=50
14/06/12 01:55:38 INFO mapred.JobClient: Map-Reduce Framework
14/06/12 01:55:38 INFO mapred.JobClient: Map output materialized bytes=85
14/06/12 01:55:38 INFO mapred.JobClient: Map input records=2
14/06/12 01:55:38 INFO mapred.JobClient: Reduce shuffle bytes=85
14/06/12 01:55:38 INFO mapred.JobClient: Spilled Records=12
14/06/12 01:55:38 INFO mapred.JobClient: Map output bytes=82
14/06/12 01:55:38 INFO mapred.JobClient: Total committed heap usage (bytes)=336338944
14/06/12 01:55:38 INFO mapred.JobClient: CPU time spent (ms)=3010
14/06/12 01:55:38 INFO mapred.JobClient: Combine input records=8
14/06/12 01:55:38 INFO mapred.JobClient: SPLIT_RAW_BYTES=222
14/06/12 01:55:38 INFO mapred.JobClient: Reduce input records=6
14/06/12 01:55:38 INFO mapred.JobClient: Reduce input groups=5
14/06/12 01:55:38 INFO mapred.JobClient: Combine output records=6
14/06/12 01:55:38 INFO mapred.JobClient: Physical memory (bytes) snapshot=394276864
14/06/12 01:55:38 INFO mapred.JobClient: Reduce output records=5
14/06/12 01:55:38 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2918625280
14/06/12 01:55:38 INFO mapred.JobClient: Map output records=8
查看运行输出目录,有_SUCCESS文件,说明运行成功,证明hadoop集群环境搭建基本没有问题。
hadoop fs -ls output-0
Found 3 items
-rw-r--r-- 3 hadoop supergroup 0 2014-06-12 01:55 /user/hadoop/output-0/_SUCCESS
drwxr-xr-x - hadoop supergroup 0 2014-06-12 01:55 /user/hadoop/output-0/_logs
-rw-r--r-- 3 hadoop supergroup 41 2014-06-12 01:55 /user/hadoop/output-0/part-r-00000
查看运行结果
hadoop fs -cat output-0/part-r-00000
Bye 1
Goodbye 1
Hadoop 2
Hello 2
World 2
与预设数据的预期结果一致。
- VMWare10下基于Ubuntu14搭建Hadoop-1.2.1集群
- VMWare9下基于Ubuntu12.10搭建Hadoop-1.2.1集群
- VMware10+CentOS6.8搭建Hadoop集群
- ubuntu14.04搭建hadoop集群
- Ubuntu14.04下基于MPICH2框架的集群搭建
- VMWare9下基于Ubuntu12.10搭建Hadoop-1.2.1集群—整合Zookeeper和Hbase
- CentOS下搭建Hadoop-1.2.1集群(一):搭建Hadoop集群
- ubuntu14.04搭建hadoop伪集群环境
- 基于Docker搭建Hadoop集群
- hadoop 1.2.1 集群搭建
- CentOS下搭建Hadoop-1.2.1集群(二):搭建HBase集群
- Hadoop集群搭建(hadoop+zookeeper+hbase)Ubuntu14.04
- Eclipse在基于Ubuntu14.04的hadoop2.7.1分布式集群下的搭建
- ubuntu14.0下搭建redis集群
- ubuntu14.04下搭建GraphLab集群
- ubuntu14.04下搭建GraphLab集群
- Linux下搭建Hadoop集群
- VM下搭建hadoop集群
- 【thinkphp3.1.x】thinkphp3.1.x中有关memcache缓存相关的文件
- 递归小结
- 38. 数字在排序数组中出现的次数
- Notepad++ 打开文件报错load langs.xml
- C# 相对路径(整合)
- VMWare10下基于Ubuntu14搭建Hadoop-1.2.1集群
- xUtils解析
- spark1.4.0基于yarn的安装心得体会
- 读取XML文件并生成DataTable
- piwik阅读(整体结构)
- 在Windows7上安装MySQL5.6后没有服务,无法启动的问题,报错10061
- 基于 OpenFlow 实现网络虚拟化
- Tachyon0.6.4+Spark1.3+hadoop2.6.0 配置教程详解
- 【thinkphp3.1.x】thinkphp3.1.x中有关redis缓存相关的文件