在Linux单机上运行Hadoop-0.19.0实例http://blog.csdn.net/shirdrn/article/details/5781776
来源:互联网 发布:免费地磅称重软件 编辑:程序博客网 时间:2024/06/04 06:53
Hadoop-0.19.0的代码可以到Apache上下载,链接为http://archive.apache.org/dist/hadoop/core/hadoop-0.19.0/,我使用的Linux机器是RHEL 5,Linux上安装的Java版本为1.6.0_16,并且JAVA_HOME=/usr/java/jdk1.6.0_16。
实践过程
1、ssh无密码验证登陆localhost
保证Linux系统的ssh服务已经启动,并保证能够通过无密码验证登陆本机Linux系统。如果不能保证,可以按照如下的步骤去做:
(1)启动命令行窗口,执行命令行:
- $ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
- $ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
(2)ssh登陆localhost,执行命令行:
$ ssh localhost
第一次登录,会提示你无法建立到127.0.0.1的连接,是否要建立,输入yes即可,下面是能够通过无密码验证登陆的信息:
[root@localhost hadoop-0.19.0]# ssh localhost
Last login: Sun Aug 1 18:35:37 2010 from 192.168.0.104
[root@localhost ~]#
2、Hadoop-0.19.0配置
下载hadoop-0.19.0.tar.gz,大约是40.3M,解压缩到Linux系统指定目录,这里我的是/root/hadoop-0.19.0目录下。
下面按照有序的步骤来说明配置过程:
(1)修改hadoop-env.sh配置
将Java环境的配置进行修改后,并取消注释“#”,修改后的行为:
export JAVA_HOME=/usr/java/jdk1.6.0_16
(2)修改hadoop-site.xml配置
在<configuration>与</configuration>加上3个属性的配置,修改后的配置文件内容为:
- <?xml version="1.0"?>
- <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
- <!-- Put site-specific property overrides in this file. -->
- <configuration>
- <property>
- <name>fs.default.name</name>
- <value>hdfs://localhost:9000</value>
- </property>
- <property>
- <name>mapred.job.tracker</name>
- <value>localhost:9001</value>
- </property>
- <property>
- <name>dfs.replication</name>
- <value>1</value>
- </property>
- </configuration>
3、运行wordcount实例
wordcount例子是hadoop发行包中自带的实例,通过运行实例可以感受并尝试理解hadoop在执行MapReduce任务时的执行过程。按照官方的“Hadoop Quick Start”教程基本可以容易地实现,下面简单说一下我的练习过程。
导航到hadoop目录下面,我的是/root/hadoop-0.19.0。
(1)格式化HDFS
执行格式化HDFS的命令行:
- [root@localhost hadoop-0.19.0]# bin/hadoop namenode -format
格式化执行信息如下所示:
- 10/08/01 19:04:02 INFO namenode.NameNode: STARTUP_MSG:
- /************************************************************
- STARTUP_MSG: Starting NameNode
- STARTUP_MSG: host = localhost/127.0.0.1
- STARTUP_MSG: args = [-format]
- STARTUP_MSG: version = 0.19.0
- STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.19 -r 713890; compiled by 'ndaley' on Fri Nov 14 03:12:29 UTC 2008
- ************************************************************/
- Re-format filesystem in /tmp/hadoop-root/dfs/name ? (Y or N) y
- Format aborted in /tmp/hadoop-root/dfs/name
- 10/08/01 19:04:05 INFO namenode.NameNode: SHUTDOWN_MSG:
- /************************************************************
- SHUTDOWN_MSG: Shutting down NameNode at localhost/127.0.0.1
- ************************************************************/
(2)启动Hadoop相关后台进程
执行命令行:
[root@localhost hadoop-0.19.0]# bin/start-all.sh
启动执行信息如下所示:
- starting namenode, logging to /root/hadoop-0.19.0/bin/../logs/hadoop-root-namenode-localhost.out
- localhost: starting datanode, logging to /root/hadoop-0.19.0/bin/../logs/hadoop-root-datanode-localhost.out
- localhost: starting secondarynamenode, logging to /root/hadoop-0.19.0/bin/../logs/hadoop-root-secondarynamenode-localhost.out
- starting jobtracker, logging to /root/hadoop-0.19.0/bin/../logs/hadoop-root-jobtracker-localhost.out
- localhost: starting tasktracker, logging to /root/hadoop-0.19.0/bin/../logs/hadoop-root-tasktracker-localhost.out
(3)准备执行wordcount任务的数据
首先,这里在本地创建了一个数据目录input,并拷贝一些文件到该目录下面,如下所示:
[root@localhost hadoop-0.19.0]# mkdir input
[root@localhost hadoop-0.19.0]# cp CHANGES.txt LICENSE.txt NOTICE.txt README.txt input/
然后,将本地目录input上传到HDFS文件系统上,执行如下命令:
[root@localhost hadoop-0.19.0]# bin/hadoop fs -put input/ input
(4)启动wordcount任务
执行如下命令行:
[root@localhost hadoop-0.19.0]# bin/hadoop jar hadoop-0.19.0-examples.jar wordcount input output
元数据目录为input,输出数据目录为output。
任务执行信息如下所示:
- 10/08/01 19:06:15 INFO mapred.FileInputFormat: Total input paths to process : 4
- 10/08/01 19:06:15 INFO mapred.JobClient: Running job: job_201008011904_0002
- 10/08/01 19:06:16 INFO mapred.JobClient: map 0% reduce 0%
- 10/08/01 19:06:22 INFO mapred.JobClient: map 20% reduce 0%
- 10/08/01 19:06:24 INFO mapred.JobClient: map 40% reduce 0%
- 10/08/01 19:06:25 INFO mapred.JobClient: map 60% reduce 0%
- 10/08/01 19:06:27 INFO mapred.JobClient: map 80% reduce 0%
- 10/08/01 19:06:28 INFO mapred.JobClient: map 100% reduce 0%
- 10/08/01 19:06:38 INFO mapred.JobClient: map 100% reduce 26%
- 10/08/01 19:06:40 INFO mapred.JobClient: map 100% reduce 100%
- 10/08/01 19:06:41 INFO mapred.JobClient: Job complete: job_201008011904_0002
- 10/08/01 19:06:41 INFO mapred.JobClient: Counters: 16
- 10/08/01 19:06:41 INFO mapred.JobClient: File Systems
- 10/08/01 19:06:41 INFO mapred.JobClient: HDFS bytes read=301489
- 10/08/01 19:06:41 INFO mapred.JobClient: HDFS bytes written=113098
- 10/08/01 19:06:41 INFO mapred.JobClient: Local bytes read=174004
- 10/08/01 19:06:41 INFO mapred.JobClient: Local bytes written=348172
- 10/08/01 19:06:41 INFO mapred.JobClient: Job Counters
- 10/08/01 19:06:41 INFO mapred.JobClient: Launched reduce tasks=1
- 10/08/01 19:06:41 INFO mapred.JobClient: Launched map tasks=5
- 10/08/01 19:06:41 INFO mapred.JobClient: Data-local map tasks=5
- 10/08/01 19:06:41 INFO mapred.JobClient: Map-Reduce Framework
- 10/08/01 19:06:41 INFO mapred.JobClient: Reduce input groups=8997
- 10/08/01 19:06:41 INFO mapred.JobClient: Combine output records=10860
- 10/08/01 19:06:41 INFO mapred.JobClient: Map input records=7363
- 10/08/01 19:06:41 INFO mapred.JobClient: Reduce output records=8997
- 10/08/01 19:06:41 INFO mapred.JobClient: Map output bytes=434077
- 10/08/01 19:06:41 INFO mapred.JobClient: Map input bytes=299871
- 10/08/01 19:06:41 INFO mapred.JobClient: Combine input records=39193
- 10/08/01 19:06:41 INFO mapred.JobClient: Map output records=39193
- 10/08/01 19:06:41 INFO mapred.JobClient: Reduce input records=10860
(5)查看任务执行结果
可以通过如下命令行:
bin/hadoop fs -cat output/*
执行结果,截取部分显示如下所示:
- vijayarenu 20
- violations. 1
- virtual 3
- vis-a-vis 1
- visible 1
- visit 1
- volume 1
- volume, 1
- volumes 2
- volumes. 1
- w.r.t 2
- wait 9
- waiting 6
- waiting. 1
- waits 3
- want 1
- warning 7
- warning, 1
- warnings 12
- warnings. 3
- warranties 1
- warranty 1
- warranty, 1
(6)终止Hadoop相关后台进程
执行如下命令行:
[root@localhost hadoop-0.19.0]# bin/stop-all.sh
执行信息如下所示:
stopping jobtracker
localhost: stopping tasktracker
stopping namenode
localhost: stopping datanode
localhost: stopping secondarynamenode
已经将上面列出的5个进程jobtracker、tasktracker、namenode、datanode、secondarynamenode终止。
异常分析
在进行上述实践过程中,可能会遇到某种异常情况,大致分析如下:
1、Call to localhost/127.0.0.1:9000 failed on local exception异常
(1)异常描述
可能你会在执行如下命令行的时候出现:
[root@localhost hadoop-0.19.0]# bin/hadoop jar hadoop-0.19.0-examples.jar wordcount input output
出错异常信息如下所示:
- 10/08/01 19:50:55 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 0 time(s).
- 10/08/01 19:50:56 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 1 time(s).
- 10/08/01 19:50:57 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 2 time(s).
- 10/08/01 19:50:58 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 3 time(s).
- 10/08/01 19:50:59 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 4 time(s).
- 10/08/01 19:51:00 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 5 time(s).
- 10/08/01 19:51:01 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 6 time(s).
- 10/08/01 19:51:02 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 7 time(s).
- 10/08/01 19:51:03 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 8 time(s).
- 10/08/01 19:51:04 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 9 time(s).
- java.lang.RuntimeException: java.io.IOException: Call to localhost/127.0.0.1:9000 failed on local exception: Connection refused
- at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:323)
- at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:295)
- at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:268)
- at org.apache.hadoop.examples.WordCount.run(WordCount.java:146)
- at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
- at org.apache.hadoop.examples.WordCount.main(WordCount.java:155)
- at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
- at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
- at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
- at java.lang.reflect.Method.invoke(Method.java:597)
- at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
- at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:141)
- at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:61)
- at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
- at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
- at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
- at java.lang.reflect.Method.invoke(Method.java:597)
- at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
- at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
- at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
- at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
- at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
- Caused by: java.io.IOException: Call to localhost/127.0.0.1:9000 failed on local exception: Connection refused
- at org.apache.hadoop.ipc.Client.call(Client.java:699)
- at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
- at $Proxy0.getProtocolVersion(Unknown Source)
- at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:319)
- at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:104)
- at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:177)
- at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:74)
- at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1367)
- at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:56)
- at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1379)
- at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:215)
- at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:120)
- at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:319)
- ... 21 more
- Caused by: java.net.ConnectException: Connection refused
- at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
- at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
- at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100)
- at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:299)
- at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:176)
- at org.apache.hadoop.ipc.Client.getConnection(Client.java:772)
- at org.apache.hadoop.ipc.Client.call(Client.java:685)
- ... 33 more
(2)异常分析
从上述异常信息分析,这句是关键:
Retrying connect to server: localhost/127.0.0.1:9000.
是说在尝试10次连接到“server”时都无法成功,这就说明到server的通信链路是不通的。我们已经在hadoop-site.xml中配置了namenode结点的值,如下所示:
- <property>
- <name>fs.default.name</name>
- <value>hdfs://localhost:9000</value>
- </property>
所以,敢肯定是无法连接到server,也就是很可能namenode进程根本就没有启动,更不必谈要执行任务了。
上述异常,我模拟的过程是:
格式化了HDFS,但是没有执行bin/start-all.sh,直接启动wordcount任务,就出现上述异常。
所以,应该执行bin/start-all.sh以后再启动wordcount任务。
2、Input path does not exist异常
(1)异常描述
当你在当前hadoop目录下面创建一个input目录,并cp某些文件到里面,开始执行:
[root@localhost hadoop-0.19.0]# bin/hadoop namenode -format
[root@localhost hadoop-0.19.0]# bin/start-all.sh
这时候,你认为input已经存在,应该可以执行wordcount任务了:
[root@localhost hadoop-0.19.0]# bin/hadoop jar hadoop-0.19.0-examples.jar wordcount input output
结果抛出一堆异常,信息如下:
- org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://localhost:9000/user/root/input
- at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:179)
- at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:190)
- at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:782)
- at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1127)
- at org.apache.hadoop.examples.WordCount.run(WordCount.java:149)
- at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
- at org.apache.hadoop.examples.WordCount.main(WordCount.java:155)
- at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
- at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
- at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
- at java.lang.reflect.Method.invoke(Method.java:597)
- at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
- at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:141)
- at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:61)
- at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
- at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
- at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
- at java.lang.reflect.Method.invoke(Method.java:597)
- at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
- at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
- at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
- at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
- at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
上述异常,我模拟的过程是:
[root@localhost hadoop-0.19.0]# bin/hadoop fs -rmr input
Deleted hdfs://localhost:9000/user/root/input
[root@localhost hadoop-0.19.0]# bin/hadoop fs -rmr output
Deleted hdfs://localhost:9000/user/root/output
因为之前我已经成功执行过一次。
(2)异常分析
应该不用多说了,是因为本地的input目录并没有上传到HDFS上,所出现org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://localhost:9000/user/root/input
在我的印象中,好像使用hadoop-0.16.4的时候,只要input目录存在,是不用执行上传命令,就可以运行的,后期的版本是不行的。
只需要执行上传的命令即可:
[root@localhost hadoop-0.19.0]# bin/hadoop fs -put input/ input
- 在Linux单机上运行Hadoop-0.19.0实例http://blog.csdn.net/shirdrn/article/details/5781776
- Hadoop-0.20.0源代码分析(7)http://blog.csdn.net/shirdrn/article/details/4581666
- Hadoop-0.20.0源代码分析(01)http://blog.csdn.net/shirdrn/article/details/4569702
- ZooKeeper-3.3.4集群安装配置-http://blog.csdn.net/shirdrn/article/details/7183503
- Linux管道编程实例 http://blog.csdn.net/pcliuguangtao/article/details/6453743
- linux 裁剪 http://blog.csdn.net/jinsen/article/details/4890970
- 在Linux单机上运行Hadoop-0.19.0实例
- 在Linux单机上运行Hadoop-0.19.0实例
- 在linux进程中的信号屏蔽 http://blog.csdn.net/fjb2080/article/details/5174306
- 在Linux单机上运行Hadoop-0.20.0实例
- Hadoop集群配置详解 转载地址:http://blog.csdn.net/flyqwang/article/details/7244552
- 内核窥秘之一:start_kernel()运行过程记录 http://blog.csdn.net/yyplc/article/details/7030983
- 运行目录和工作目录 http://blog.csdn.net/ghevinn/article/details/17399001
- runtime运行时 http://blog.csdn.net/ComeOnZhao/article/details/50982808
- Ubuntu上搭建Hadoop环境(单机模式+伪分布模式) - 狂奔的蜗牛 - 博客频道 - CSDN.NET http://blog.csdn.net/hitwengqi/article/detai
- python的类变量和实例变量 http://blog.csdn.net/chenggong2dm/article/details/9030481
- 【Linux】linux常用基本命令 http://blog.csdn.net/xiaoguaihai/article/details/8705992/
- http://blog.csdn.net/IBM_hoojo/article/details/5688947
- HDOJ Dividing DP
- 3087_Shuffle'm Up
- 开机特殊数据包的格式内容
- Mysql源代码阅读笔记(四) 服务器监听
- Debian / Ubuntu linux install kernel headers package
- 在Linux单机上运行Hadoop-0.19.0实例http://blog.csdn.net/shirdrn/article/details/5781776
- HDOJ Big Event in HDU 多重背包
- jquery 无刷新上传文件,xmlhttprequest是无法上传文件的。
- poj-1088滑雪
- C语言解析pcap文件得到HTTP信息实例
- 秀一下软件的最新版本截图!我的心血啊
- GDAL1.9.0版本编译后,打不开含中文路径文件的解决办法
- Handler的使用(二)
- pcap编程深入解析