在Linux单机上运行Hadoop-0.20.0实例

来源:互联网 发布:吉他记谱软件 编辑:程序博客网 时间:2024/05/11 15:53

其实,Hadoop-0.20.0与Hadoop-0.19.0的入门运行非常相似,基本步骤都是相同的。不同的是:Hadoop-0.19.0的配置文件hadoop-site.xml中内容,在Hadoop-0.20.0的配置中进行了拆分,分别放在三个配置文件中,如下:

1、core-site.xml配置文件

内容配置如下所示:

2、hdfs-site.xml配置文件

内容配置如下所示:

3、mapred-site.xml配置文件

配置内容如下所示:

 

除了这一处不同之外,运行的基本过程与Hadoop-0.19.0相同,可以非常容易地运行wordcount实例。

这里,主要是讨论几点,在运行Hadoop-0.20.0例子的过程,出现的几个异常,及其问题出现的原因和解决方法。

 

异常分析

 

 1、“could only be replicated to 0 nodes, instead of 1”异常

(1)异常描述

上面配置都正确无误,并且,已经完成了如下运行步骤:

[root@localhost hadoop-0.20.0]# bin/hadoop namenode -format

[root@localhost hadoop-0.20.0]# bin/start-all.sh

这时,看到5个进程jobtracker、tasktracker、namenode、datanode、secondarynamenode已经给出了启动成功信息,但是运行jps命令查看进程的时候,发现并不是那样,如下所示:

4281 Jps
4007 SecondaryNameNode
3771 NameNode

可见,只有两个进程启动成功了,其它的并没有成功,如果你再继续向下执行,准备运行wordcount实例之前执行上传文件的命令:

[root@localhost hadoop-0.20.0]# bin/hadoop fs -put input in

现在就会抛出一堆异常了,如下所示:

10/08/02 15:36:04 INFO hdfs.DFSClient: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/root/in/LICENSE.txt could only be replicated to 0 nodes, instead of 1        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1256)        at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)        at java.lang.reflect.Method.invoke(Method.java:597)        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)        at java.security.AccessController.doPrivileged(Native Method)        at javax.security.auth.Subject.doAs(Subject.java:396)        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)        at org.apache.hadoop.ipc.Client.call(Client.java:739)        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)        at $Proxy0.addBlock(Unknown Source)        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)        at java.lang.reflect.Method.invoke(Method.java:597)        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)        at $Proxy0.addBlock(Unknown Source)        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2873)        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2755)        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2046)        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2232)10/08/02 15:36:04 WARN hdfs.DFSClient: NotReplicatedYetException sleeping /user/root/in/LICENSE.txt retries left 410/08/02 15:36:04 INFO hdfs.DFSClient: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/root/in/LICENSE.txt could only be replicated to 0 nodes, instead of 1        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1256)        at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)        at java.lang.reflect.Method.invoke(Method.java:597)        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)        at java.security.AccessController.doPrivileged(Native Method)        at javax.security.auth.Subject.doAs(Subject.java:396)        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)        at org.apache.hadoop.ipc.Client.call(Client.java:739)        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)        at $Proxy0.addBlock(Unknown Source)        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)        at java.lang.reflect.Method.invoke(Method.java:597)        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)        at $Proxy0.addBlock(Unknown Source)        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2873)        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2755)        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2046)        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2232)10/08/02 15:36:04 WARN hdfs.DFSClient: NotReplicatedYetException sleeping /user/root/in/LICENSE.txt retries left 310/08/02 15:36:05 INFO hdfs.DFSClient: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/root/in/LICENSE.txt could only be replicated to 0 nodes, instead of 1        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1256)        at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)        at java.lang.reflect.Method.invoke(Method.java:597)        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)        at java.security.AccessController.doPrivileged(Native Method)        at javax.security.auth.Subject.doAs(Subject.java:396)        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)        at org.apache.hadoop.ipc.Client.call(Client.java:739)        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)        at $Proxy0.addBlock(Unknown Source)        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)        at java.lang.reflect.Method.invoke(Method.java:597)        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)        at $Proxy0.addBlock(Unknown Source)        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2873)        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2755)        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2046)        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2232)10/08/02 15:36:05 WARN hdfs.DFSClient: NotReplicatedYetException sleeping /user/root/in/LICENSE.txt retries left 210/08/02 15:36:07 INFO hdfs.DFSClient: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/root/in/LICENSE.txt could only be replicated to 0 nodes, instead of 1        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1256)        at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)        at java.lang.reflect.Method.invoke(Method.java:597)        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)        at java.security.AccessController.doPrivileged(Native Method)        at javax.security.auth.Subject.doAs(Subject.java:396)        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)        at org.apache.hadoop.ipc.Client.call(Client.java:739)        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)        at $Proxy0.addBlock(Unknown Source)        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)        at java.lang.reflect.Method.invoke(Method.java:597)        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)        at $Proxy0.addBlock(Unknown Source)        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2873)        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2755)        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2046)        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2232)10/08/02 15:36:07 WARN hdfs.DFSClient: NotReplicatedYetException sleeping /user/root/in/LICENSE.txt retries left 110/08/02 15:36:10 WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/root/in/LICENSE.txt could only be replicated to 0 nodes, instead of 1        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1256)        at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)        at java.lang.reflect.Method.invoke(Method.java:597)        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)        at java.security.AccessController.doPrivileged(Native Method)        at javax.security.auth.Subject.doAs(Subject.java:396)        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)        at org.apache.hadoop.ipc.Client.call(Client.java:739)        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)        at $Proxy0.addBlock(Unknown Source)        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)        at java.lang.reflect.Method.invoke(Method.java:597)        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)        at $Proxy0.addBlock(Unknown Source)        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2873)        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2755)        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2046)        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2232)10/08/02 15:36:10 WARN hdfs.DFSClient: Error Recovery for block null bad datanode[0] nodes == null10/08/02 15:36:10 WARN hdfs.DFSClient: Could not get block locations. Source file "/user/root/in/LICENSE.txt" - Aborting...put: java.io.IOException: File /user/root/in/LICENSE.txt could only be replicated to 0 nodes, instead of 1

看到“could only be replicated to 0 nodes, instead of 1”这句信息的时候,你可能会首先想到是否hdfs-site.xml配置文件中的属性dfs.replication配置错误,但事实上并不是这样。

这时,就要查看启动日志了,我的是位于/root/hadoop-0.20.0/logs下面,如下所示:

hadoop-root-datanode-localhost.log    hadoop-root-namenode-localhost.log           hadoop-root-tasktracker-localhost.log
hadoop-root-datanode-localhost.out    hadoop-root-namenode-localhost.out           hadoop-root-tasktracker-localhost.out
hadoop-root-jobtracker-localhost.log  hadoop-root-secondarynamenode-localhost.log  history
hadoop-root-jobtracker-localhost.out  hadoop-root-secondarynamenode-localhost.out

查看hadoop-root-datanode-localhost.log日志文件,看到异常信息:

2010-08-02 15:38:34,642 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG: /************************************************************STARTUP_MSG: Starting DataNodeSTARTUP_MSG:   host = localhost/127.0.0.1STARTUP_MSG:   args = []STARTUP_MSG:   version = 0.20.0STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.20 -r 763504; compiled by 'ndaley' on Thu Apr  9 05:18:40 UTC 2009************************************************************/2010-08-02 15:38:35,381 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Incompatible namespaceIDs in /tmp/hadoop-root/dfs/data: namenode namespaceID = 409052671; datanode namespaceID = 769845957        at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:233)        at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:148)        at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:298)        at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:216)        at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1283)        at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1238)        at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1246)        at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1368)2010-08-02 15:38:35,382 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG: /************************************************************SHUTDOWN_MSG: Shutting down DataNode at localhost/127.0.0.1************************************************************/

通过上面的信息,大致可以了解到,通过“Incompatible namespaceIDs in /tmp/hadoop-root/dfs/data”可知,是由于 /tmp/hadoop-root/dfs/data中的namespaceIDs不兼容导致的,也就是说,很可能是由于上次运行其它版本的Hadoop在/tmp/hadoop-root/dfs/data目录下有残留的不兼容的数据。事实上在我运行过程中出现这个问题就是由于,刚刚尝试了Hadoop-0.19.0版本的运行,运行后并没有清理这些数据。

(2)解决方法

清理对应目录的数据以后,就可以正常运行了,这时执行启动各个进程之后,通过jps命令可以查看到结果如下所示:

5386 JobTracker
5253 DataNode
5529 Jps
4874 SecondaryNameNode
5489 TaskTracker
4649 NameNode

上面5个进程都启动起来了,可以上传文件到HDFS,并执行wordcount例子。

 

原创粉丝点击