hadoop 2.2.0配置遇到的问题总结

来源：互联网发布：虎头软件是什么编辑：程序博客网时间：2024/06/05 11:28

1.datanode 连接namenote出现超时

datanode.log的报错：

2014-02-2416:10:36,194 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool<registering> (storage id unknown) service to node2/10.103.243.23:9010starting to offer service

2014-02-2416:10:36,230 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting

2014-02-2416:10:36,234 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 50020:starting

2014-02-2416:10:37,373 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:9010. Already tried 0 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

2014-02-2416:10:38,374 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:9010. Already tried 1 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

2014-02-2416:10:39,375 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:9010. Already tried 2 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

2014-02-2416:10:40,376 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:9010. Already tried 3 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

2014-02-2416:10:41,377 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:9010. Already tried 4 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

2014-02-2416:10:42,379 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:9010. Already tried 5 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

2014-02-2416:10:43,380 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:9010. Already tried 6 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

2014-02-2416:10:44,381 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:9010. Already tried 7 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

2014-02-2416:10:45,383 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:9010. Already tried 8 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

2014-02-2416:10:46,384 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:9010. Already tried 9 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

2014-02-2416:10:46,390 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problemconnecting to server: node2/10.103.243.23:9010

Nodemanager.log报错：

2014-02-2416:10:55,925 INFO org.apache.hadoop.yarn.webapp.WebApps: Web app /node startedat 8042

2014-02-2416:10:56,678 INFO org.apache.hadoop.yarn.webapp.WebApps: Registered webappguice modules

2014-02-2416:10:56,734 INFO org.apache.hadoop.yarn.client.RMProxy: Connecting toResourceManager at node2/10.103.243.23:8031

2014-02-2416:10:57,788 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:8031. Already tried 0 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

2014-02-2416:10:58,789 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:8031. Already tried 1 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

2014-02-2416:10:59,790 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:8031. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,sleepTime=1 SECONDS)

2014-02-2416:11:00,791 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:8031. Already tried 3 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

2014-02-2416:11:01,792 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:8031. Already tried 4 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

2014-02-2416:11:02,793 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:8031. Already tried 5 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

2014-02-2416:11:03,794 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:8031. Already tried 6 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

2014-02-2416:11:04,796 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:8031. Already tried 7 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

2014-02-2416:11:05,797 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:8031. Already tried 8 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

2014-02-2416:11:06,798 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:8031. Already tried 9 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

2014-02-2416:11:37,810 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:8031. Already tried 0 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

搜索网上，相类似的报错也是很多，但是没有解决，ssh都能免密钥登录，配置文件都检查过没有问题，很是奇怪。最后问题结果就出在了hosts文件上，需要主节点的hosts文件中的主机名对应的ip改成实际的ip地址，不能为127.0.0.1。于是一鼓作气，把节点自己主机名对应的ip全改成了真实ip，集群成功启动。

2.datanode启动出现问题：

日志报错：

2006-01-01 23:19:23,737 FATALorg.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed forblock pool Block pool BP-1873323460-127.0.1.1-1393249411040 (storage idDS-738355263-127.0.0.1-50010-1393191985983) service to node2/10.103.243.23:9010

java.io.IOException: IncompatibleclusterIDs in /home/tseg637/hadoop-2.2.0/dfsdata/data: namenode clusterID =CID-ea0e9701-33a3-4f77-999f-a0da13b502f1; datanode clusterID =CID-9b4729d5-bb23-4609-9a85-8e6047efd956

atorg.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:391)

atorg.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:191)

atorg.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:219)

atorg.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:837)

atorg.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:808)

atorg.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:280)

atorg.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:222)

atorg.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664)

atjava.lang.Thread.run(Thread.java:662)

网上搜索得到文件的原因：每次hadoop namenode format会重新创建一个namenodeId,而/home/tseg637/hadoop-2.2.0/dfsdata/data下包含了上次format下的id,namenode format清空了name下的数据,但是没有清空data下的数据,导致启动时失败,所要做的就是每次fotmat前,清空/home/tseg637/hadoop-2.2.0/dfsdata/data的所有内容，这样datanode启动就会成功。

3.运行hadoop2.2.0自带的wordcount程序报错

运行自带的示例hadoop-mapreduce-examples-2.2.0.jar报错：

14/02/24 15:27:36 INFO mapreduce.Job: Jobjob_1393225741554_0003 failed with sta te FAILED due to: Application application_1393225741554_0003 failed 2times due to Error launchingappattempt_1393225741554_0003_000002. Got exception: org.apac he.hadoop.yarn.exceptions.YarnException:Unauthorized request to start container .

This token is expired. current time is1393313243534 found 1393227455640

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstruct orAccessorImpl.java:39)

atsun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingC onstructorAccessorImpl.java:27)

at java.lang.reflect.Constructor.newInstance(Constructor.java:513)

atorg.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl. instantiateException(SerializedExceptionPBImpl.java:152)

atorg.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl. deSerialize(SerializedExceptionPBImpl.java:106)

atorg.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.l aunch(AMLauncher.java:122)

atorg.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.r un(AMLauncher.java:249)

at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec utor.java:895)

atjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor .java:918)

at java.lang.Thread.run(Thread.java:662)

. Failing the application.

这个问题，后来发现是不同的node之前的系统时间不同，一个节点的日期错了…… 让时间基本相同利用date –s 设置时间或者运行ntpdate time-a.nist.gov 进行同步时间，时间同步后就可以了。

0 0