redhat 5.4部署单机伪分布Hadoop集群

来源:互联网 发布:ubuntu 16.04 amd驱动 编辑:程序博客网 时间:2024/04/26 14:49

2010-06-0

一、所需相关软件版本
RedHat enterprise 5.4,kernel 2.6.18; jdk 1.6update16.rpm.bin for linux;hadoop2.0.2,虚拟机VMware workstation 7.0.1 build-227600
二、设置RedHat
Rpm源:http://rpm.pbone.net/ (注:算是比较全的了,rpm包很集中,下载方便)
1.解决中文乱码问题
(1) 先找到需要的rpm包:
中文支持
fonts-chinese-3.02-12.el5.noarch.rpm
m17n-db-common-cjk-1.3.3-46.el5.noarch.rpm
m17n-db-chinese-1.3.3-46.el5.noarch.rpm
(注:这些包安装完后就可以显示中文了)
修改 /etc/sysconfig/i18n
LANG="zh_CN.GB18030"
SUPPORTED="zh_CN.GB18030:zh_CN:zh:zh_TW.Big5:zh_TW:zh:en_US.iso885915:en_US:en"
SYSFONT="lat0-sun16"
SYSFONTACM="iso15"
接下来Reboot系统
//下面是中文输入法解决方案 , 上面的配置足以解决系统中文乱码
中文输入法
scim-libs-1.4.4-39.el5.i386.rpm
scim-1.4.4-39.el5.i386.rpm
scim-chinese-standard-0.0.2-1.el5.i386.rpm
scim-tables-0.5.6-7.i386.rpm
scim-tables-chinese-0.5.6-7.i386.rpm
scim-pinyin-0.5.91-15.el5.i386.rpm
(2)、挂载光驱:
1》在/mnt目录下新建目录,用于挂载光驱;命令:mkdir /mnt/cdrom (新建目录名为cdrom);
2》挂载光驱命令:mount -t auto /dev/cdrom /mnt/cdrom
显示:mount: block device /dev/cdrom is write-protected, mounting read-only --挂载成功
(3)、中文包在光驱Server目录中,按上面包列出的顺序安装
安装命令:
rpm -ivh 文件路径 //按路径安装并显示进度
rpm -ivh --test 文件路径 //用来检查依赖关系;并不是真正的安装
2.修改更新源
rhel5系统安装的时候其实已经有yum了,只是因为如果用官方的网站更新的话除非你是用钱买的rhel5.否则它会提示注册之类的。所以只要把 yum的更新地址改成开源的就行了。而限定yum更新地址的文件在/etc/yum.repos.d/里。先把它们改成备份文件,即在后面加.bak。
[root@killgoogle ~]# mv /etc/yum.repos.d/rhel-debuginfo.repo /etc/yum.repos.d/rhel-debuginfo.repo.bak
[root@killgoogle ~]# mv /etc/yum.repos.d/rpmforge.repo.rpmnew /etc/yum.repos.d/rpmforge.repo.rpmnew.bak
建立新的配置文件:
[root@killgoogle ~]# cd /etc/yum.repos.d
[root@killgoogle ~]# touch rhel-debuginfo.repo
[root@killgoogle ~]# touch mirrors-rpmforge
[root@killgoogle ~]# touch rpmforge.repo
往新的配置文件写东西:
[root@killgoogle ~]#vi rhel-debuginfo.repo
[base]
name=CentOS-5 - Base
#mirrorlist=http://mirrorlist.centos.org/?release=$releasever5&arch=$basearch&
repo=os
#baseurl=http://mirror.centos.org/centos/$releasever/os/$basearch/
baseurl=http://ftp.sjtu.edu.cn/centos/5/os/$basearch/
gpgcheck=1
gpgkey=http://mirror.centos.org/centos/RPM-GPG-KEY-centos5
#released updates
[update]
name=CentOS-5 - Updates
#mirrorlist=http://mirrorlist.centos.org/?release=4&arch=$basearch&repo=updates
baseurl=http://ftp.sjtu.edu.cn/centos/5/updates/$basearch/
gpgcheck=1
gpgkey=http://mirror.centos.org/centos/RPM-GPG-KEY-centos5
#packages used/produced in the build but not released
[addons]
name=CentOS-5 - Addons
#mirrorlist=http://mirrorlist.centos.org/?release=4&arch=$basearch&repo=addons
baseurl=http://ftp.sjtu.edu.cn/centos/5/addons/$basearch/
gpgcheck=1
gpgkey=http://mirror.centos.org/centos/RPM-GPG-KEY-centos5
#additional packages that may be useful
[extras]
name=CentOS-5 - Extras
#mirrorlist=http://mirrorlist.centos.org/?release=4&arch=$basearch&repo=extras
baseurl=http://ftp.sjtu.edu.cn/centos/5/extras/$basearch/
gpgcheck=1
gpgkey=http://mirror.centos.org/centos/RPM-GPG-KEY-centos5

#additional packages that extend functionality of existing packages
[centosplus]
name=CentOS-5 - Plus
#mirrorlist=http://mirrorlist.centos.org/?release=4&arch=$basearch&repo=centosplus
baseurl=http://ftp.sjtu.edu.cn/centos/5/centosplus/$basearch/
gpgcheck=1
enabled=0
gpgkey=http://mirror.centos.org/centos/RPM-GPG-KEY-centos5
#contrib - packages by Centos Users
[contrib]
name=CentOS-5 - Contrib
#mirrorlist=http://mirrorlist.centos.org/?release=4&arch=$basearch&repo=contrib
baseurl=http://ftp.sjtu.edu.cn/centos/5/contrib/$basearch/
gpgcheck=1
enabled=0
gpgkey=http://mirror.centos.org/centos/RPM-GPG-KEY-centos5
# vi dag.repo
[dag]
name=Dag RPM Repository for RHEL5
baseurl=http://ftp.riken.jp/Linux/dag/redhat/el5/en/$basearch/dag/
enabled=1
gpgcheck=1
gpgkey=http://ftp.riken.jp/Linux/dag/packages/RPM-GPG-KEY.dag.txt
修改第二个配置文件:
[root@killgoogle ~]vi mirrors-rpmforge
http://apt.sw.be/redhat/el5/en/$ARCH/dag
http://archive.cs.uu.nl/mirror/dag.wieers/redhat/el5/en/$ARCH/dag
http://ftp2.lcpe.uni-sofia.bg/freshrpms/pub/dag/redhat/el5/en/$ARCH/dag
#http://ftp.heanet.ie/pub/freshrpms/pub/dag/redhat/el5/en/$ARCH/dag
http://ftp-stud.fht-esslingen.de/dag/redhat/el5/en/$ARCH/dag
http://mirror.cpsc.ucalgary.ca/mirror/dag/redhat/el5/en/$ARCH/dag
http://mirrors.ircam.fr/pub/dag/redhat/el5/en/$ARCH/dag
http://rh-mirror.linux.iastate.edu/pub/dag/redhat/el5/en/$ARCH/dag
http://rpmfind.net/linux/dag/redhat/el5/en/$ARCH/dag
http://wftp.tu-chemnitz.de/pub/linux/dag/redhat/el5/en/$ARCH/dag
http://www.mirrorservice.org/sites/apt.sw.be/redhat/el5/en/$ARCH/dag
修改第三个配置文件:
[root@killgoogle ~]# vi rpmforge.repo
# Name: RPMforge RPM Repository for Red Hat Enterprise 5 - dag
# URL: http://rpmforge.net/
[rpmforge]
name = Red Hat Enterprise $releasever - RPMforge.net - dag
#baseurl = http://apt.sw.be/redhat/el5/en/$basearch/dag
mirrorlist = http://apt.sw.be/redhat/el5/en/mirrors-rpmforge
#mirrorlist = file:///etc/yum.repos.d/mirrors-rpmforge
enabled = 1
protect = 0
gpgkey = file:///etc/pki/rpm-gpg/RPM-GPG-KEY-rpmforge-dag
gpgcheck = 1
这里注意vi /etc/yum.conf
加上这么一句:timeout=120
和 CentOS-Base.repo 文件中的 5.0 都修改为 5.2 就可以了
方法2 :
如果风速慢的话可以通过增加yum的超时时间,这样就不会总是因为超时而退出。
[root@killgoogle ~]vi /etc/yum.conf
加上这么一句:timeout=120
到这里配置差不多就完了。还有一个包需要安装:rpmforge-release-0.3.6-1.el5.rf.i386.rpm
如果不安装的话有可能会出现以下错误:GPG key retrieval failed: [Errno 5] OSError: [Errno 2] 没有那个文件或目录: '/etc/pki/rpm-gpg/RPM-GPG-KEY-rpmforge-dag'
到http://rpmfind.net/linux/RPM/找到这个包,然后
[root@killgoogle ~]rpm -ivh rpmforge-release-0.3.6-1.el5.rf.i386.rpm
#我这安装地址
#wget ftp://rpmfind.net/linux/dag/redhat/el5/en/i386/dag/RPMS/rpmforge-release-0.3.6-1.el5.rf.i386.rpm
接下来就是输入KEY了。
[root@killgoogle ~]
rpm --import http://mirror.centos.org/centos/RPM-GPG-KEY-CentOS-5
这样基本上yum就可以用了。不过如果觉得不爽的话还可以优化:
加速yum
[root@killgoogle ~]yum install yum-fastestmirror yum-presto
指定或去掉软件源的mirror:
可以在baseurl中将比较慢的mirror去掉
你的yum镜像的速度测试记录文件:
/var/cache/yum/timedhosts.txt
yum Existing lock 错误的解决办法
如果系统启动的时候, yum 出现Existing lock /var/run/yum.pid: another copy is running as pid 3380. Aborting. 可以用下面的办法解决:
[root@killgoogle ~]/etc/init.d/yum-updatesd stop
也可以用以下方法:
[root@killgoogle ~]rm -f /var/run/yum.pid
主要原因就是yum在自动更新只要关掉它就可以了
完了。现在就可以测试能不能用了哦:
[root@killgoogle ~]yum install mplayer
这是安装mplayer如果要删除则是:
[root@killgoogle ~]yum remove mplayer
三、安装jdk
1.采用的jdk版本是jdk 1.6update16,可到这个网站下载http://ajava.org/tool/,SUN官网上的下载不了,不知道为什么。
2.安装步骤
a.下载最新的J2SE jdk-6u16-linux-x64-rpm.bin或jdk-6u16-linux-i586-rpm.bin
b.将JDK安装文件jdk-6u16-linux-x64-rpm.bin或jdk-6u16-linux-i586-rpm.bin拷贝到Redhat任意目录下.例如:/opt/jdk(目录jdk需要手动新建)
c.执行#chmod +x jdk-6u16-linux-x64-rpm.bin或jdk-6u16-linux-i586-rpm.bin,为jdk-6u16-linux-x64-rpm.bin或jdk-6u16-linux-i586-rpm.bin添加可执行权限
d.执行./ jdk-6u16-linux-x64-rpm.bin或jdk-6u16-linux-i586-rpm.bin,此时会出现JDK安装授权协议.可以一路按Enter浏览.如果等的不耐烦可以直接按Ctrl+C,直接会出现Do you agree to the above license terms? [yes or no]的字样.
e.键入yes,同意该授权协议.此时系统会开始解压jdk-6u16-linux-x64-rpm.bin或 jdk-6u16-linux-i586-rpm.bin
f.解压完毕后,回到/opt/jdk目录,键入dir.会发现多出了一个解压好的安装文件: jdk-6u16-linux-x64.rpm或jdk-6u16-linux-i586-rpm.bin
g.执行#rpm -ivh jdk-6u16-linux-x64.rpm或jdk-6u16-linux-i586-rpm.bin此时,系统会开始安装JDK.安装结束后可以在/usr目录下发现新增了一个名为java的文件夹.该文件夹就是安装好的JDK目录.
3.设置环境变量
a.进入/etc文件夹,找到文件profile并打开#vi /etc/profile[注意:profile是指文件不是指文件夹]
b.在/etc/profile最后语句的上面添加以下语句:
JAVA_HOME=/usr/java/jdk1.6.0_16
PATH=$JAVA_HOME/bin:$PATH
CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export JAVA_HOME PATH CLASSPATH
[注意:"="两侧不能有空格]
c.设置完毕后,保存文件.重启Redhat后登录控制终端,键入:java -version;如果出现java version "1.6.0.16"等字样,说明您的JDK已经安装成功了!祝贺你!
四、一些相关的设置
1.用户设置
方法一
[root@localhost ~]# useradd hadoop //使用useradd <username>命令添加用户hadoop
[root@localhost ~]# passwd hadoop //使用passwd <username>命令为用户修改添加密码并解锁
Changing password for user hadoop.
New UNIX password: //输入密码并重新以确认,密码不会显示
Retype new UNIX password:
passwd: all authentication tokens updated successfully.
[root@localhost ~]# groupadd hadoop //使用groupadd <groupname>添加群组hadoop
groupadd:hadoop 组已存在
[root@localhost ~]# gpasswd -a hadoop hadoop //使用gpasswd -a <username> <groupname>将用户hadoop添加到组hadoop
正在将用户"hadoop"加入到"hadoop"组中
方法二
在图形界面管理中添加用户和组,很简单,不再细述。
2.用户权限的设置
(以下配置是参考Hadoop学习笔记的设置,在会出现"XX is not in the sudoers file "问题时,是有效的。但在redhat中是否有必要我还没测试,而且用意也不很清楚,还有相关的安全性问题值得考虑)
[root@localhost ~]# chmod u+w /etc/sudoers 为文件加写权限
[root@localhost ~]# vi /etc/sudoers 修改/etc/sudoers配置
在root ALL=(ALL) ALL后 hadoop ALL=(ALL) ALL使hadoop用户具有root权限
[root@localhost ~]# chmod u-w /etc/sudoers 去写权限
五、ssh配置
1.OpenSSH的安装
RedHat5.4本身自带了OpenSSH,可以使用命令[root@localhost ~]# rpm -qa | grep openssh 查询系统是否安装了OpenSSH。
如果未安装可以到OpenSSH的主页http://www.openssh.com下载rpm包自行安装。主要的包为openssh-4.3p2-16.e15.i386 、openssh-server-4.3p2-16.e15.i386、openssh-clients-4.3p2-16.e15.i386和openssh-askpass-4.3p2-16.e15.i386.
安装完成后,使用命令[root@localhost ~]# service sshd start 启动ssh。也可以使用netstat -nutap |grep sshd; ps -ef |grep sshd 查看进程和端口号是否正常。
安装完成过后就可以使用命令 ssh localhost进行测试。
实际运行如下:
[hadoop@localhost ~]# rpm -qa | grep openssh
openssh-askpass-4.3p2-36.el5
openssh-4.3p2-36.el5
openssh-server-4.3p2-36.el5
openssh-clients-4.3p2-36.el5
[hadoop@localhost ~]# service sshd start
Starting sshd: [ OK ]
[hadoop@localhost ~]# netstat -nutap |grep sshd; ps -ef |grep sshd
tcp 0 0 :::22 :::* LISTEN 3127/sshd
root 3127 1 0 22:19 ? 00:00:00 /usr/sbin/sshd
root 10075 9946 0 22:53 pts/0 00:00:00 grep sshd
[hadoop@localhost ~]# ssh localhost
The authenticity of host 'localhost (127.0.0.1)' can't be established.
RSA key fingerprint is 62:b5:1f:37:49:95:63:da:9e:f3:88:ea:9b:a1:82:35.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (RSA) to the list of known hosts.
hadoop@localhost's password:
Last login: Wed Apr 7 22:50:12 2010
[hadoop@localhost ~]#
2.OpenSSH的设置
为了避免ssh、scp或sftp在每次连接远程主机时输入密码的步骤,可以生成一对授权密钥,为每个用户生成密钥。为ssh协议版本2生成RSA密钥对的设置如下:
[hadoop@localhost ~]$ ssh-keygen -t rsa -P "" //生成rsa密钥对
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
15:9c:92:a7:93:f0:2a:ff:08:a1:36:5f:b5:14:7d:84 hadoop@localhost.localdomain
[hadoop@localhost ~]$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys //复制
[hadoop@localhost ~]$ chmod 755 ~/.ssh //修改权限
[hadoop@localhost ~]$ chmod 644 ~/.ssh/authorized_keys
[hadoop@localhost ~]$ sudo /etc/init.d/sshd reload //重载ssh以使上面的设置生效
We trust you have received the usual lecture from the local System
Administrator. It usually boils down to these three things:
#1) Respect the privacy of others.
#2) Think before you type.
#3) With great power comes great responsibility.
Password:
Reloading sshd: [ OK ]
[hadoop@localhost ~]$ ssh localhost //测试第一次
The authenticity of host 'localhost (127.0.0.1)' can't be established.
RSA key fingerprint is 62:b5:1f:37:49:95:63:da:9e:f3:88:ea:9b:a1:82:35.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (RSA) to the list of known hosts.//将用户永久记录,以后不再输密码
Last login: Wed Apr 7 23:09:45 2010 from localhost.localdomain
[hadoop@localhost ~]$ exit
Connection to localhost closed. //退出再次测试,与前面对比已不再需要密码
[hadoop@localhost ~]$ ssh localhost
Last login: Wed Apr 7 23:14:21 2010 from localhost.localdomain
六、Hadoop安装和配置
1.Hadoop安装
我们采用的Hadoop版本是最新的Hadoop-0.20.2,可到Apache基金会官方主页下载http://www.apache.org/dyn/closer.cgi/hadoop/core,然后使用tar或直接解压到/home/hadoop下,解压后得到一个hadoop-0.20.2的一个文件夹。
如果不是解压在hadoop用户的主目录下,则需要使用命令将hadoop0.20.2的所有者变为hadoop,具体命令为:chown <username>:<groupname> files
2.配置Hadoop
(1)配置$HADOOP_HOME/conf/hadoop-env.sh
切换到Hadoop的安装路径找到hadoop-0.20.2下的conf/hadoop-env.sh文件,使用vi或图形界面方法打开

# export JAVA_HOME=/usr/lib/j2sdk1.5-sun改为
export JAVA_HOME=/usr/java/jdk1.6.0_16
(2) 配置$HADOOP_HOME/conf/core-site.xml
切换到Hadoop的安装路径找到hadoop-0.20.2下的conf/core-site.xml文件,使用vi或图形界面方法打开
内容如下
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/tmp</value>
</property>
</configuration>
注:如果没有配置hadoop.tmp.dir参数,此时系统默认的临时目录为:/tmp/hadoop-hadoop.而这个目录在每次重启后都会被干掉,必须重新执行format才行,否则会出错。
(3) 配置$HADOOP_HOME/conf/hdfs-site.xml
切换到Hadoop的安装路径找到hadoop-0.20.2下的conf/hdfs-site.xml文件,使用vi或图形界面方法打开
内容如下:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.data.dir</name>
<value>/home/hadoop/data</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/home/hadoop/name</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>
(4) 配置$HADOOP_HOME/conf/mapred-site.xml
切换到hadoop的安装路径找到hadoop-0.20.2下的conf/mapred-site.xml文件,使用vi或图形界面方法打开
内容如下:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
</property>
<property>
<name>mapred.local.dir</name>
<value>/home/hadoop/temp</value>
</property>
<property>
<name>mapred.map.tasks</name>
<value>40</value>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>4</value>
</property>
</configuration>
七、测试
1.格式化namenode
进入hadoop-0.20.2目录下执行./bin/hadoop namenode -format 命令,就可完成namenode的格式化。
2.启动或关闭Hadoop
进入hadoop-0.20.2目录下执行./bin/start-all.sh即可启动hadoop
[hadoop@localhost hadoop-0.20.2]$ ./bin/start-all.sh
starting namenode, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-namenode-localhost.localdomain.out
localhost: starting datanode, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-datanode-localhost.localdomain.out
localhost: starting secondarynamenode, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-secondarynamenode-localhost.localdomain.out
starting jobtracker, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-jobtracker-localhost.localdomain.out
localhost: starting tasktracker, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-tasktracker-localhost.localdomain.out
同样在hadoop-0.20.2目录下执行./bin/stop-all.sh即可关闭hadoop
hadoop@localhost hadoop-0.20.2]$ ./bin/stop-all.sh
stopping jobtracker
localhost: stopping tasktracker
stopping namenode
localhost: stopping datanode
localhost: stopping secondarynamenode
3.验证
首先启动hadoop
方法一:jps
[hadoop@localhost hadoop]$ jps //这次测试是最全的,至于稳定性还有待测试
8383 TaskTracker
7943 NameNode
8049 DataNode
8271 JobTracker
8191 SecondaryNameNode
8434 Jps
方法二:
[hadoop@localhost hadoop]$ ./bin/hadoop dfsadmin -report
Configured Capacity: 14644453376 (13.64 GB)
Present Capacity: 13557112832 (12.63 GB)
DFS Remaining: 13557035008 (12.63 GB)
DFS Used: 77824 (76 KB)
DFS Used%: 0%
Under replicated blocks: 1
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 1 (1 total, 0 dead)

Name: 127.0.0.1:50010
Decommission Status : Normal
Configured Capacity: 14644453376 (13.64 GB)
DFS Used: 77824 (76 KB)
Non DFS Used: 1087340544 (1.01 GB)
DFS Remaining: 13557035008(12.63 GB)
DFS Used%: 0%
DFS Remaining%: 92.57%
Last contact: Wed Apr 07 07:17:51 PDT 2010
这就基本正常了,可以查看$HADOOP_HOME/logs下的日志。
八、运行Wordcount
1.准备工作
准备两个文本文件并拷贝到dfs里,具体命令操作如下:
[hadoop@localhost hadoop-0.20.2]$ echo "hello hadoop woeld." > /tmp/test_file1.txt
[hadoop@localhost hadoop-0.20.2]$ echo "hello world hadoop.i'm wangzhenfei." > /tmp/test_file2.txt
[hadoop@localhost hadoop-0.20.2]$ ./bin/hadoop dfs -mkdir test-in
[hadoop@localhost hadoop-0.20.2]$ ./bin/hadoop dfs -copyFromLocal /tmp/test*.txt test-in
[hadoop@localhost hadoop-0.20.2]$ ./bin/hadoop dfs -ls test-in
Found 2 items
-rw-r--r-- 2 hadoop supergroup 20 2010-04-08 08:03 /user/hadoop/test-in/test_file1.txt
-rw-r--r-- 2 hadoop supergroup 36 2010-04-08 08:03 /user/hadoop/test-in/test_file2.txt
这里的test-in其实是hdfs路径下的目录,其绝对路径为"hdfs://localhost:54310/user/hadoop/test-in"
2.运行
[hadoop@localhost hadoop-0.20.2]$ ./bin/hadoop jar hadoop-0.20.2-examples.jar wordcount test-in test-out
10/04/08 08:05:26 INFO input.FileInputFormat: Total input paths to process : 2
10/04/08 08:05:28 INFO mapred.JobClient: Running job: job_201004080756_0001
10/04/08 08:05:29 INFO mapred.JobClient: map 0% reduce 0%
10/04/08 08:06:06 INFO mapred.JobClient: map 100% reduce 0%
10/04/08 08:06:18 INFO mapred.JobClient: map 100% reduce 8%
10/04/08 08:06:21 INFO mapred.JobClient: map 100% reduce 33%
10/04/08 08:06:24 INFO mapred.JobClient: map 100% reduce 50%
10/04/08 08:06:33 INFO mapred.JobClient: map 100% reduce 75%
10/04/08 08:06:37 INFO mapred.JobClient: map 100% reduce 100%
10/04/08 08:06:41 INFO mapred.JobClient: Job complete: job_201004080756_0001
10/04/08 08:06:41 INFO mapred.JobClient: Counters: 17
10/04/08 08:06:41 INFO mapred.JobClient: Job Counters
10/04/08 08:06:42 INFO mapred.JobClient: Launched reduce tasks=4
10/04/08 08:06:42 INFO mapred.JobClient: Launched map tasks=2
10/04/08 08:06:42 INFO mapred.JobClient: Data-local map tasks=2
10/04/08 08:06:42 INFO mapred.JobClient: FileSystemCounters
10/04/08 08:06:42 INFO mapred.JobClient: FILE_BYTES_READ=122
10/04/08 08:06:42 INFO mapred.JobClient: HDFS_BYTES_READ=56
10/04/08 08:06:42 INFO mapred.JobClient: FILE_BYTES_WRITTEN=476
10/04/08 08:06:42 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=62
10/04/08 08:06:42 INFO mapred.JobClient: Map-Reduce Framework
10/04/08 08:06:42 INFO mapred.JobClient: Reduce input groups=6
10/04/08 08:06:42 INFO mapred.JobClient: Combine output records=7
10/04/08 08:06:42 INFO mapred.JobClient: Map input records=2
10/04/08 08:06:42 INFO mapred.JobClient: Reduce shuffle bytes=146
10/04/08 08:06:42 INFO mapred.JobClient: Reduce output records=6
10/04/08 08:06:42 INFO mapred.JobClient: Spilled Records=14
10/04/08 08:06:42 INFO mapred.JobClient: Map output bytes=84
10/04/08 08:06:42 INFO mapred.JobClient: Combine input records=7
10/04/08 08:06:42 INFO mapred.JobClient: Map output records=7
10/04/08 08:06:42 INFO mapred.JobClient: Reduce input records=7
3.查看结果
[hadoop@localhost hadoop-0.20.2]$ ./bin/hadoop dfs -ls test-out
Found 5 items
drwxr-xr-x - hadoop supergroup 0 2010-04-08 08:05 /user/hadoop/test-out/_logs
-rw-r--r-- 2 hadoop supergroup 0 2010-04-08 08:06 /user/hadoop/test-out/part-r-00000
-rw-r--r-- 2 hadoop supergroup 16 2010-04-08 08:06 /user/hadoop/test-out/part-r-00001
-rw-r--r-- 2 hadoop supergroup 18 2010-04-08 08:06 /user/hadoop/test-out/part-r-00002
-rw-r--r-- 2 hadoop supergroup 28 2010-04-08 08:06 /user/hadoop/test-out/part-r-00003
[hadoop@localhost hadoop-0.20.2]$ ./bin/hadoop dfs -cat test-out/part-r-00000
[hadoop@localhost hadoop-0.20.2]$ ./bin/hadoop dfs -cat test-out/part-r-00001
hello 2
world 1
[hadoop@localhost hadoop-0.20.2]$ ./bin/hadoop dfs -cat test-out/part-r-00002
hadoop 1
woeld. 1
[hadoop@localhost hadoop-0.20.2]$ ./bin/hadoop dfs -cat test-out/part-r-00003
hadoop.i'm 1
wangzhenfei. 1
九、配置中遇到的问题及解决方法
1.jps命令找不到
在安装好后运行jps,报错为bash:jps command not found,仔细查看才知道,原来jps是jdk中bin目录下的一个可执行的文件,类似于javac,那么这就不难找到错误,既然找不到就说明不认识路,那问题的所在就直指环境变量了。最后果然是一开始配置的环境变量出问题了。如果按上面配置就不会错了。
Linux中的环境变量为系统或应用程序调用系统的可执行文件指明了路径,而/etc/profile里的环境变量的配置是为整个系统(包括root和其他用户)提供了整体路径指示。针对不同的用户在他们各自的主目录下(即/home/uesrname)的隐藏文件.bash_profile和.bashrc为他们的用户自己也设定了环境变量。为保险和效率更高起见,以上三个文件全部设好环境变量。在以后针对这样设置可能存在的(安全)等问题在做取舍。
修改好以上文件后需重启或重新登录才可生效,也有个办法就是利用source /etc/profile命令可使其不用重启或重新登录就生效。

备注:配置过程中我遇到的错误:
1.在增加用户和工作组的时候,系统说找不到useradd命令和groupadd命令。解决办法就是输入whereis useradd和whereis groupadd,系统将显示着两个命令的路径,在执行这两个命令之前加上这两个路径就可以了。
2.在格式化的时候报了一个错好像是JAVA_HOME is not set. 问题出在配置hadoop-0.20.2下的conf/hadoop-env.sh文件时,没有去掉JAVA_HOME路径前面的#,因此只是一条注释,而没有起到作用。解决办法就是把#去掉就可以了。

 

原创粉丝点击