VMWare10下基于Ubuntu14搭建Hadoop-1.2.1集群

来源:互联网 发布:mac恢复单一分区 编辑:程序博客网 时间:2024/06/06 19:56

最近在学习Hadoop,把hadoop集群环境搭建的过程记录一下,方便查询,方案中有好多细节的东西,可能会比较啰嗦,对于新手来说或许更有帮助,闲话不多说,进入正题。

搭建5个节点的Hadoop集群环境

1. 环境说明

使用VMWare创建5台Ubuntu虚拟机,环境详细信息如下:

虚拟机

操作系统

JDK

Hadoop

VMWare Workstation 10

ubuntukylin-14.10-desktop-i386

java-7-openjdk-i386

hadoop-1.2.1

主机名

IP地址

虚拟机名

节点内容

master

192.168.1.30

Ubuntu32-Master

namenode, Jobtracker

secondary

192.168.1.39

Ubuntu32-Secondary

secondarynamenode

slaver1

192.168.1.31

Ubuntu32-slaver1

datanode, tasktracker

slaver2

192.168.1.32

Ubuntu32-slaver2

datanode, tasktracker

slaver3

192.168.1.33

Ubuntu32-slaver3

datanode, tasktracker

2. 搭建虚拟机系统

下载ubuntukylin-14.10-desktop-i386版32位系统,iso版本,方便在vmware上安装。

每台虚拟机配置1个双核cpu,1G RAM,20G硬盘,设置ShareFolder为共享文件夹,方便Windows主机向虚拟机传送文件包。

Ubuntu系统easy方式安装,创建hadoop用户,后续hadoop,zookeeper,hbase都用hadoop用户来部署。

可以先安装一台主机,以master为模板,配置好了之后,用vmvare的克隆功能复制出其它主机,然后调整下ip和主机名。

创建用户

先创建用户组

sudo addgroup hadoop

然后创建用户

sudo adduser -ingroup hadoop hadoop

更新安装源

先备份系统自带源内容(hadoop用户登录,所以要sudo)。

sudo cp /etc/apt/sources.list /etc/apt/sources.list.backup

修改源内容

sudo vi /etc/apt/sources.list

从网上搜索到的源内容,复制到vi中

##Ubuntu 官方更新服务器(欧洲,此为官方源,国内较慢,但无同步延迟问题,电信、移动/铁通、联通等公网用户可以使用):

deb http://archive.ubuntu.com/ubuntu/ quantal main restricted universe multiverse

deb http://archive.ubuntu.com/ubuntu/ quantal-security main restricted universe multiverse

deb http://archive.ubuntu.com/ubuntu/ quantal-updates main restricted universe multiverse

deb http://archive.ubuntu.com/ubuntu/ quantal-proposed main restricted universe multiverse

deb http://archive.ubuntu.com/ubuntu/ quantal-backports main restricted universe multiverse

deb-src http://archive.ubuntu.com/ubuntu/ quantal main restricted universe multiverse

deb-src http://archive.ubuntu.com/ubuntu/ quantal-security main restricted universe multiverse

deb-src http://archive.ubuntu.com/ubuntu/ quantal-updates main restricted universe multiverse

deb-src http://archive.ubuntu.com/ubuntu/ quantal-proposed main restricted universe multiverse

deb-src http://archive.ubuntu.com/ubuntu/ quantal-backports main restricted universe multiverse

##Ubuntu官方提供的其他软件(第三方闭源软件等):

deb http://archive.canonical.com/ubuntu/ quantal partner

deb http://extras.ubuntu.com/ubuntu/ quantal main

##骨头兄亲自搭建并维护的 Ubuntu 源(该源位于浙江杭州百兆共享宽带的电信机房),包含 Deepin 等镜像:

deb http://ubuntu.srt.cn/ubuntu/ quantal main restricted universe multiverse

deb http://ubuntu.srt.cn/ubuntu/ quantal-security main restricted universe multiverse

deb http://ubuntu.srt.cn/ubuntu/ quantal-updates main restricted universe multiverse

deb http://ubuntu.srt.cn/ubuntu/ quantal-proposed main restricted universe multiverse

deb http://ubuntu.srt.cn/ubuntu/ quantal-backports main restricted universe multiverse

deb-src http://ubuntu.srt.cn/ubuntu/ quantal main restricted universe multiverse

deb-src http://ubuntu.srt.cn/ubuntu/ quantal-security main restricted universe multiverse

deb-src http://ubuntu.srt.cn/ubuntu/ quantal-updates main restricted universe multiverse

deb-src http://ubuntu.srt.cn/ubuntu/ quantal-proposed main restricted universe multiverse

deb-src http://ubuntu.srt.cn/ubuntu/ quantal-backports main restricted universe multiverse

##搜狐更新服务器(山东联通千兆接入,官方中国大陆地区镜像跳转至此) ,包含其他开源镜像:

deb http://mirrors.sohu.com/ubuntu/ quantal main restricted universe multiverse

deb http://mirrors.sohu.com/ubuntu/ quantal-security main restricted universe multiverse

deb http://mirrors.sohu.com/ubuntu/ quantal-updates main restricted universe multiverse

deb http://mirrors.sohu.com/ubuntu/ quantal-proposed main restricted universe multiverse

deb http://mirrors.sohu.com/ubuntu/ quantal-backports main restricted universe multiverse

deb-src http://mirrors.sohu.com/ubuntu/ quantal main restricted universe multiverse

deb-src http://mirrors.sohu.com/ubuntu/ quantal-security main restricted universe multiverse

deb-src http://mirrors.sohu.com/ubuntu/ quantal-updates main restricted universe multiverse

deb-src http://mirrors.sohu.com/ubuntu/ quantal-proposed main restricted universe multiverse

deb-src http://mirrors.sohu.com/ubuntu/ quantal-backports main restricted universe multiverse

执行更新源命令,这样系统中的安装源才会被刷新。

sudo apt-get update

安装vim

还是用不习惯vi,安装vim替代vi。

sudo apt-get install vim

配置ip

ubuntu下修改ip地址,直接修改/etc/network/interfaces文件即可。

sudo vim /etc/network/interfaces

以master主机为例,修改为如下配置

# The primary network interface

auto eth0

iface eth0 inet static

address 192.168.1.30

netmask 255.255.255.0

network 192.168.1.0

broadcast 192.168.1.255

gateway 192.168.1.1

# dns-* options are implemented by the resolvconf package, if installed

dns-nameservers 8.8.8.8

配置主机名

ubuntu下主机名文件为/etc/hostname,还有/etc/hosts用来配置主机名ip地址转换关系。

先配置主机名

sudo vim /etc/hostname

配置为

master

然后配置所有主机名ip地址转换

sudo vim /etc/hosts

配置为(一次把所有服务器主机都配置上,一劳永逸)

127.0.0.1 localhost

192.168.1.30 master

192.168.1.31 slaver1

192.168.1.32 slaver2

192.168.1.33 slaver3

192.168.1.39 secondary

hosts文件中的配置参数格式为

ip地址 主机名 别名(可以有0个或n个,空格分开)

克隆系统副本

把安装配置好的ubuntu克隆出多个副本,构建出5台ubuntu小集群,然后分别修改ip、修改主机名。

3. 安装配置SSH

安装SSH

采用apt-get方式安装,方便省事。

sudo apt-get install openssh-server

用命令查看ssh服务是否启动

ps –ef|grep ssh

有如下信息就是启动了

hadoop 2147 2105 0 13:11 ? 00:00:00 /usr/bin/ssh-agent /usr/bin/dbus-launch --exit-with-session gnome-session --session=ubuntu

root 7226 1 0 23:31 ? 00:00:00 /usr/sbin/sshd -D

hadoop 7287 6436 0 23:33 pts/0 00:00:00 grep --color=auto ssh

ssh分为client和server,client用来ssh登录其它服务器,server用来提供ssh服务,供用户ssh远程登录。ubuntu默认安装了ssh client,所以要安装sshserver。

生成RSA密钥对

在hadoop用户下,用ssh的命令生成密钥对。

ssh-keygen –t rsa

如果出现如下提示:


只需把命令换成ssh-keygen -t rsa -C “文件名”

期间会询问是否为密钥设置密码,空密码即可,没有错误的话,在hadoop的.ssh目录下会生成密钥对文件(id_rsa和id_rsa.pub文件),id_rsa文件为私钥,服务器自己保存,防止外泄,id_rsa.pub文件为公钥,分发给其它需要免密码访问的服务器。

注:ssh和-keygen之间不能有空格,ssh-keygen –t rsa –P “” 命令可以免密钥密码。

进入.ssh目录,将公钥追加到授权认证文件(authorized_keys)中,authorized_keys用来存储所有服务器的公钥信息。

cat id_rsa.pub >> authorized_keys

authorized_keys文件中,公钥以ssh-rsa开头,用户名@主机名结尾,多个服务器的公钥顺序保存,示例如下。

ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDs5A9sjk+44DtptGw4fXm5n0qbpnSnFsqRQJnbyD4DGMG7AOpfrEZrMmRiNJA8GZUIcrN71pHEgQimoQGD5CWyVgi1ctWFrULOnGksgixJj167m+FPdpcCFJwfAS34bD6DoVXJgyjWIDT5UFz+RnElNC14s8F0f/w44EYM49y2dmP8gGmzDQ0jfIgPSknUSGoL7fSFJ7PcnRrqWjQ7iq3B0gwyfCvWnq7OmzO8VKabUnzGYST/lXCaSBC5WD2Hvqep8C9+dZRukaa00g2GZVH3UqWO4ExSTefyUMjsal41YVARMGLEfyZzvcFQ8LR0MWhx2WMSkYp6Z6ARbdHZB4MN hadoop@master

ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC2Hb6mCi6sd6IczIn/pBbj8L9PMS1ac0tlalex/vlSRj2E6kzUrw/urEUVeO76zFcZgjUgKvoZsNAHGrr1Bfw8FiiDcxPtlIREl2L9Qg8Vd0ozgE22bpuxBTn1Yed/bbJ/VxGJsYbOyRB/mBCvEI4ECy/EEPf5CRMDgiTL9XP86MNJ/kgG3odR6hhSE3Ik/NMARTZySXE90cFB0ELr/Io4SaINy7b7m6ssaP16bO8aPbOmsyY2W2AT/+O726Py6tcxwhe2d9y2tnJiELfrMLUPCYGEx0Z/SvEqWhEvvoGn8qnpPJCGg6AxYaXy8jzSqWNZwP3EcFqmVrg9I5v8mvDd hadoop@slaver1

分发公钥

服务器把自己的公钥内容分发给其它服务器,就是为了能免密码登录其它服务器,把所有服务器的公钥集中到一台服务器上,然后集中分发给其它服务器,这样处理,5台服务器可以随意互相免密码访问。

分发采用scp命令,scp需要双方服务器都启动ssh服务。scp初次访问需要输入密码。

除master服务器执行如下命令,复制公钥。

cd .ssh

scp id_rsa.pub hadoop@master:/home/hadoop/.ssh/id_rsa.pub.slaver1

scp id_rsa.pub hadoop@master:/home/hadoop/.ssh/id_rsa.pub.slaver2

scp id_rsa.pub hadoop@master:/home/hadoop/.ssh/id_rsa.pub.slaver3

scp id_rsa.pub hadoop@master:/home/hadoop/.ssh/id_rsa.pub.secondary

master服务器执行如下命令,集中公钥。

cd .ssh

cat id_rsa.pub.slaver1 >> authorized_keys

cat id_rsa.pub.slaver2 >> authorized_keys

cat id_rsa.pub.slaver3 >> authorized_keys

cat id_rsa.pub.secondary >> authorized_keys

master服务器执行如下命令,分发公钥。

scp authorized_keys hadoop @slaver1:/home/hadoop/.ssh/authorized_keys

scp authorized_keys hadoop @slaver2:/home/hadoop/.ssh/authorized_keys

scp authorized_keys hadoop @slaver3:/home/hadoop/.ssh/authorized_keys

scp authorized_keys hadoop @secondary:/home/hadoop/.ssh/authorized_keys

测试免密码访问,输入

ssh slaver1

4. 安装配置JDK

部署JDK

解jdk-7u51-linux-x64.tar.gz包到/usr/lib/jdk1.7.0_51

tar –zxvf jdk-7u51-linux-x64.tar.gz –C /usr/lib/

配置环境变量

把jdk设置到全局环境变量中

sudo vim /etc/profile

在最下面添加如下内容

export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-i386
export JRE_HOME=/usr/lib/jvm/java-7-openjdk-i386/jre
export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH

注:linux下环境变量分隔符为半角冒号’:’,windows下为半角分号’;’,CLASSPATH中必须有’.’

通过如下命令来刷新环境变量

source /etc/profile

分发JDK

通过scp命令分发安装好的jdk,目录分发需要加-r参数

scp -r /usr/lib/jvm/java-7-openjdk-i386 hadoop@slaver1: /usr/lib/jvm/

分发环境变量

/etc/profile为root用户所有,所以scp要加sudo,而且需要传输到slaver1的root用户下。

sudo scp /etc/profile root@slaver1:/etc/profile

5. 安装配置Hadoop

部署Hadoop

解hadoop-1.2.1.tar.gz包到/home/hadoop/hadoop-1.2.1

tar –zxvf hadoop-1.2.1.tar.gz –C /usr/local/

配置环境变量

把hadoop设置到全局环境变量中

sudo vim /etc/profile

在最下面添加如下内容

export HADOOP_HOME=/usr/local/hadoop

export PATH=$PATH:$HADOOP_HOME/bin

刷新环境变量

source /etc/profile

conf/hadoop-env.sh

export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-i386

export HADOOP_TASKTRACKER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_TASKTRACKER_OPTS"

export HADOOP_LOG_DIR=/usr/local/hadoop/logs

export HADOOP_MASTER=master:/usr/local/hadoop

export HADOOP_SLAVE_SLEEP=0.1

export HADOOP_MASTER设置hadoop应用rsync同步配置的主机目录,hadoop启动,就会从主机把配置同步到从机。

export HADOOP_SLAVE_SLEEP=0.1设置hadoop从机同步配置请求的休眠时间(秒),避免多节点同时请求同步对主机造成负担过重。

conf/core-site.xml

<configuration>

<property>

<name>fs.default.name</name>

<value>hdfs://master:9000</value>

</property>

<property>

<name>hadoop.tmp.dir</name>

<value>/home/hadoop/hadoop_home/tmp</value>

</property>

<property>

<name>fs.trash.interval</name>

<value>10080</value>

<description>Number of minutes between trash checkpoints.If zero, the trash feature is disabled.</description>

</property>

<property>

<name>fs.checkpoint.period</name>

<value>600</value>

<description>The number of seconds between two periodic checkpoints.</description>

</property>

<property>

<name>fs.checkpoint.size</name>

<value>67108864</value>

<description>The size of the current edit log (in bytes) that triggers a periodic checkpoint even if the fs.checkpoint.period hasn't expired.</description>

</property>

</configuration>

conf/hdfs-site.xml

<configuration>

<property>

<name>dfs.name.dir</name>

<value>/home/hadoop/hadoop_home/name1,/home/hadoop/hadoop_home/name2</value>

<description> </description>

</property>

<property>

<name>dfs.data.dir</name>

<value>/home/hadoop/hadoop_home/data1,/home/hadoop/hadoop_home/data2</value>

<description> </description>

</property>

<property>

<name>fs.checkpoint.dir</name>

<value>/home/hadoop/hadoop_home/namesecondary1,/home/hadoop/hadoop_home/namesecondary2</value>

</property>

<property>

<name>dfs.replication</name>

<value>3</value>

</property>

<property>

<name>dfs.http.address</name>

<value>master:50070</value>

</property>

<property>

<name>dfs.https.address</name>

<value>master:50470</value>

</property>

<property>

<name>dfs.secondary.http.address</name>

<value>secondary:50090</value>

</property>

<property>

<name>dfs.datanode.address</name>

<value>0.0.0.0:50010</value>

</property>

<property>

<name>dfs.datanode.ipc.address</name>

<value>0.0.0.0:50020</value>

</property>

<property>

<name>dfs.datanode.http.address</name>

<value>0.0.0.0:50075</value>

</property>

<property>

<name>dfs.datanode.https.address</name>

<value>0.0.0.0:50475</value>

</property>

</configuration>

conf/mapred-site.xml

<configuration>

<property>

<name>mapred.job.tracker</name>

<value>master:9001</value>

</property>

<property>

<name>mapred.local.dir</name>

<value>/home/hadoop/hadoop_home/local</value>

</property>

<property>

<name>mapred.system.dir</name>

<value>/home/hadoop/hadoop_home/system</value>

</property>

<property>

<name>mapred.tasktracker.map.tasks.maximum</name>

<value>5</value>

</property>

<property>

<name>mapred.tasktracker.reduce.tasks.maximum</name>

<value>5</value>

</property>

<property>

<name>mapred.job.tracker.http.address</name>

<value>0.0.0.0:50030</value>

</property>

<property>

<name>mapred.task.tracker.http.address</name>

<value>0.0.0.0:50060</value>

</property>

</configuration>

conf/masters

secondary

conf/masters配置secondarynamenode的主机名,本方案中secondarynamenode有单独的服务器,与namenode无关。

conf/slaves

slaver1

slaver2

slaver3

分发hadoop副本

scp -r /home/hadoop/hadoop-1.2.1 hadoop@slaver1: /home/hadoop/

分发环境变量

/etc/profile为root用户所有,所以scp要加sudo,而且需要传输到slaver1的root用户下。

sudo scp /etc/profile root@slaver1:/etc/profile

6. 启动Hadoop测试

启动hadoop集群

hadoop启动命令如下

命令

作用

start-all.sh

启动hdfs和mapreduce守护进程,包括namenode,secondarynamenode,datanode,jobtracker,tasktracker

stop-all.sh

停止hdfs和mapreduce守护进程,包括namenode,secondarynamenode,datanode,jobtracker,tasktracker

start-dfs.sh

启动hdfs守护进程,包括namenode,secondarynamenode,datanode

stop-dfs.sh

停止hdfs守护进程,包括namenode,secondarynamenode,datanode

start-mapred.sh

启动mapreduce守护进程,包括jobtracker,tasktracker

stop-mapred.sh

停止mapreduce守护进程,包括jobtracker,tasktracker

hadoop-daemons.sh start namenode

单独启动namenode守护进程

hadoop-daemons.sh stop namenode

单独停止namenode守护进程

hadoop-daemons.sh start datanode

单独启动datanode守护进程

hadoop-daemons.sh stop datanode

单独停止datanode守护进程

hadoop-daemons.sh start secondarynamenode

单独启动secondarynamenode守护进程

hadoop-daemons.sh stop secondarynamenode

单独停止secondarynamenode守护进程

hadoop-daemons.sh start jobtracker

单独启动jobtracker守护进程

hadoop-daemons.sh stop jobtracker

单独停止jobtracker守护进程

hadoop-daemons.sh start tasktracker

单独启动tasktracker守护进程

hadoop-daemons.sh stop tasktracker

单独停止tasktracker守护进程

启动hadoop集群

start-all.sh

在用stop-all.sh脚本来停止hadoop的时候,查看日志,发现datanode中总会有错误出现,如下

2014-06-10 15:52:20,216 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Call to master/192.168.1.30:9000 failed on local exception: java.io.EOFExcept

ion

at org.apache.hadoop.ipc.Client.wrapException(Client.java:1150)

at org.apache.hadoop.ipc.Client.call(Client.java:1118)

at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)

at com.sun.proxy.$Proxy5.sendHeartbeat(Unknown Source)

at org.apache.hadoop.hdfs.server.datanode.DataNode.offerService(DataNode.java:1031)

at org.apache.hadoop.hdfs.server.datanode.DataNode.run(DataNode.java:1588)

at java.lang.Thread.run(Thread.java:744)

Caused by: java.io.EOFException

at java.io.DataInputStream.readInt(DataInputStream.java:392)

at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:845)

at org.apache.hadoop.ipc.Client$Connection.run(Client.java:790)

分析原因,发现datanode停止时间在namenode之后,导致与namenode连接失败,出现上面的异常,研究一下停止脚本,发现在stop-dfs.sh中停止顺序有些不太妥当(个人认为),先停止namenode,后停止datanode,我认为可以调整一下停止顺序,让namenode最后停止,这样应该能避免出现连接异常警告。

调整之后内容如下

"$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR stop datanode

"$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR --hosts masters stop secondarynamenode

"$bin"/hadoop-daemon.sh --config $HADOOP_CONF_DIR stop namenode

经过测试,就没有发现datanode有上面的异常出现了,不知道这么调整会不会对hadoop有影响,欢迎大家指正。

通过http://master:50070查看hdfs的监控页面。

通过http://master:50030查看mapreduce的监控页面。

用命令查看hdfs的情况

hadoop dfsadmin –report

hadoop fsck /

测试hadoop集群

通过运行hadoop自带的wordcount程序来测试hadoop集群是否运行正常。

首先创建两个input数据文件。

echo “Hello World Bye World” > text1.txt

echo “Hello Hadoop Goodbye Hadoop” > text2.txt

上传数据文件到hdfs中

hadoop fs –put text1.txt hdfs://master:9000/user/hadoop/input/text1.txt

hadoop fs –put text2.txt hdfs://master:9000/user/hadoop/input/text2.txt

运行wordcount程序

hadoop jar hadoop-examples-1.2.1.jar wordcount input/file*.txt output-0

运行日志如下

14/06/12 01:55:21 INFO input.FileInputFormat: Total input paths to process : 2

14/06/12 01:55:21 INFO util.NativeCodeLoader: Loaded the native-hadoop library

14/06/12 01:55:21 WARN snappy.LoadSnappy: Snappy native library not loaded

14/06/12 01:55:21 INFO mapred.JobClient: Running job: job_201406111818_0001

14/06/12 01:55:22 INFO mapred.JobClient: map 0% reduce 0%

14/06/12 01:55:28 INFO mapred.JobClient: map 50% reduce 0%

14/06/12 01:55:30 INFO mapred.JobClient: map 100% reduce 0%

14/06/12 01:55:36 INFO mapred.JobClient: map 100% reduce 33%

14/06/12 01:55:37 INFO mapred.JobClient: map 100% reduce 100%

14/06/12 01:55:38 INFO mapred.JobClient: Job complete: job_201406111818_0001

14/06/12 01:55:38 INFO mapred.JobClient: Counters: 29

14/06/12 01:55:38 INFO mapred.JobClient: Job Counters

14/06/12 01:55:38 INFO mapred.JobClient: Launched reduce tasks=1

14/06/12 01:55:38 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=8281

14/06/12 01:55:38 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0

14/06/12 01:55:38 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0

14/06/12 01:55:38 INFO mapred.JobClient: Launched map tasks=2

14/06/12 01:55:38 INFO mapred.JobClient: Data-local map tasks=2

14/06/12 01:55:38 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=8860

14/06/12 01:55:38 INFO mapred.JobClient: File Output Format Counters

14/06/12 01:55:38 INFO mapred.JobClient: Bytes Written=41

14/06/12 01:55:38 INFO mapred.JobClient: FileSystemCounters

14/06/12 01:55:38 INFO mapred.JobClient: FILE_BYTES_READ=79

14/06/12 01:55:38 INFO mapred.JobClient: HDFS_BYTES_READ=272

14/06/12 01:55:38 INFO mapred.JobClient: FILE_BYTES_WRITTEN=166999

14/06/12 01:55:38 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=41

14/06/12 01:55:38 INFO mapred.JobClient: File Input Format Counters

14/06/12 01:55:38 INFO mapred.JobClient: Bytes Read=50

14/06/12 01:55:38 INFO mapred.JobClient: Map-Reduce Framework

14/06/12 01:55:38 INFO mapred.JobClient: Map output materialized bytes=85

14/06/12 01:55:38 INFO mapred.JobClient: Map input records=2

14/06/12 01:55:38 INFO mapred.JobClient: Reduce shuffle bytes=85

14/06/12 01:55:38 INFO mapred.JobClient: Spilled Records=12

14/06/12 01:55:38 INFO mapred.JobClient: Map output bytes=82

14/06/12 01:55:38 INFO mapred.JobClient: Total committed heap usage (bytes)=336338944

14/06/12 01:55:38 INFO mapred.JobClient: CPU time spent (ms)=3010

14/06/12 01:55:38 INFO mapred.JobClient: Combine input records=8

14/06/12 01:55:38 INFO mapred.JobClient: SPLIT_RAW_BYTES=222

14/06/12 01:55:38 INFO mapred.JobClient: Reduce input records=6

14/06/12 01:55:38 INFO mapred.JobClient: Reduce input groups=5

14/06/12 01:55:38 INFO mapred.JobClient: Combine output records=6

14/06/12 01:55:38 INFO mapred.JobClient: Physical memory (bytes) snapshot=394276864

14/06/12 01:55:38 INFO mapred.JobClient: Reduce output records=5

14/06/12 01:55:38 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2918625280

14/06/12 01:55:38 INFO mapred.JobClient: Map output records=8

查看运行输出目录,有_SUCCESS文件,说明运行成功,证明hadoop集群环境搭建基本没有问题。

hadoop fs -ls output-0

Found 3 items

-rw-r--r-- 3 hadoop supergroup 0 2014-06-12 01:55 /user/hadoop/output-0/_SUCCESS

drwxr-xr-x - hadoop supergroup 0 2014-06-12 01:55 /user/hadoop/output-0/_logs

-rw-r--r-- 3 hadoop supergroup 41 2014-06-12 01:55 /user/hadoop/output-0/part-r-00000

查看运行结果

hadoop fs -cat output-0/part-r-00000

Bye 1

Goodbye 1

Hadoop 2

Hello 2

World 2

与预设数据的预期结果一致。


0 0
原创粉丝点击
热门问题 老师的惩罚 人脸识别 我在镇武司摸鱼那些年 重生之率土为王 我在大康的咸鱼生活 盘龙之生命进化 天生仙种 凡人之先天五行 春回大明朝 姑娘不必设防,我是瞎子 ipad3开不了机怎么办 ipad电源键失灵怎么办 ipad开关键失灵怎么办 ipadair关不了机怎么办 联想电脑开不了机怎么办 ipad来不了机怎么办 深圳车牌租用怎么办的 电脑连wiwf老掉线怎么办 电脑要设置密码怎么办 苹果笔记本电脑忘记密码怎么办 换了外屏有缝隙怎么办 苹果7屏幕松动怎么办 平板电脑屏幕一直闪怎么办 平板电脑屏幕模糊了怎么办 平板电脑模糊怎么办啊 苹果没有声音了怎么办 微信视频回声怎么办 苹果手机总卡机不动怎么办 苹果手机不掉电怎么办 iphonex屏幕漏液怎么办 苹果6sp充不进电怎么办 iphone6充电越少怎么办 苹果6不能充电怎么办 脸部苹果肌很大怎么办 液晶屏上有划痕怎么办 苹果手机屏幕翘起来怎么办? iphone5s翘屏了怎么办 lpad触屏失灵怎么办 平板一直重启怎么办 苹果平板老死机怎么办 华为屏幕反应慢怎么办 平板屏幕没反应怎么办 iPadmini3触屏了怎么办 联想电脑触摸屏没反应怎么办 电脑触摸屏没反应怎么办 换屏后屏幕乱跳怎么办 联想平板进不去系统怎么办 平板触摸屏坏了怎么办 苹果6触摸不灵敏怎么办 5s home失灵怎么办? 平板触控失灵怎么办