hadoop 2.2.0配置遇到的问题总结

来源:互联网 发布:虎头软件是什么 编辑:程序博客网 时间:2024/06/05 11:28

1.datanode 连接namenote出现超时

datanode.log的报错:

2014-02-2416:10:36,194 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool<registering> (storage id unknown) service to node2/10.103.243.23:9010starting to offer service

2014-02-2416:10:36,230 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting

2014-02-2416:10:36,234 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 50020:starting

2014-02-2416:10:37,373 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:9010. Already tried 0 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

2014-02-2416:10:38,374 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:9010. Already tried 1 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

2014-02-2416:10:39,375 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:9010. Already tried 2 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

2014-02-2416:10:40,376 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:9010. Already tried 3 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

2014-02-2416:10:41,377 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:9010. Already tried 4 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

2014-02-2416:10:42,379 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:9010. Already tried 5 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

2014-02-2416:10:43,380 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:9010. Already tried 6 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

2014-02-2416:10:44,381 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:9010. Already tried 7 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

2014-02-2416:10:45,383 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:9010. Already tried 8 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

2014-02-2416:10:46,384 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:9010. Already tried 9 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

2014-02-2416:10:46,390 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problemconnecting to server: node2/10.103.243.23:9010

Nodemanager.log报错:

2014-02-2416:10:55,925 INFO org.apache.hadoop.yarn.webapp.WebApps: Web app /node startedat 8042

2014-02-2416:10:56,678 INFO org.apache.hadoop.yarn.webapp.WebApps: Registered webappguice modules

2014-02-2416:10:56,734 INFO org.apache.hadoop.yarn.client.RMProxy: Connecting toResourceManager at node2/10.103.243.23:8031

2014-02-2416:10:57,788 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:8031. Already tried 0 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

2014-02-2416:10:58,789 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:8031. Already tried 1 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

2014-02-2416:10:59,790 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:8031. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,sleepTime=1 SECONDS)

2014-02-2416:11:00,791 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:8031. Already tried 3 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

2014-02-2416:11:01,792 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:8031. Already tried 4 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

2014-02-2416:11:02,793 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:8031. Already tried 5 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

2014-02-2416:11:03,794 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:8031. Already tried 6 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

2014-02-2416:11:04,796 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:8031. Already tried 7 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

2014-02-2416:11:05,797 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:8031. Already tried 8 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

2014-02-2416:11:06,798 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:8031. Already tried 9 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

2014-02-2416:11:37,810 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:8031. Already tried 0 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

搜索网上,相类似的报错也是很多,但是没有解决,ssh都能免密钥登录,配置文件都检查过没有问题,很是奇怪。最后问题结果就出在了hosts文件上,需要主节点的hosts文件中的主机名对应的ip改成实际的ip地址,不能为127.0.0.1。于是一鼓作气,把节点自己主机名对应的ip全改成了真实ip,集群成功启动。

 

2.datanode启动出现问题:

日志报错:

2006-01-01 23:19:23,737 FATALorg.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed forblock pool Block pool BP-1873323460-127.0.1.1-1393249411040 (storage idDS-738355263-127.0.0.1-50010-1393191985983) service to node2/10.103.243.23:9010

java.io.IOException: IncompatibleclusterIDs in /home/tseg637/hadoop-2.2.0/dfsdata/data: namenode clusterID =CID-ea0e9701-33a3-4f77-999f-a0da13b502f1; datanode clusterID =CID-9b4729d5-bb23-4609-9a85-8e6047efd956

         atorg.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:391)

         atorg.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:191)

         atorg.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:219)

         atorg.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:837)

         atorg.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:808)

         atorg.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:280)

         atorg.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:222)

         atorg.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664)

         atjava.lang.Thread.run(Thread.java:662)

网上搜索得到文件的原因:每次hadoop namenode format会重新创建一个namenodeId,/home/tseg637/hadoop-2.2.0/dfsdata/data下包含了上次format下的id,namenode format清空了name下的数据,但是没有清空data下的数据,导致启动时失败,所要做的就是每次fotmat,清空/home/tseg637/hadoop-2.2.0/dfsdata/data的所有内容,这样datanode启动就会成功。

 

3.运行hadoop2.2.0自带的wordcount程序报错

运行自带的示例hadoop-mapreduce-examples-2.2.0.jar报错:

14/02/24 15:27:36 INFO mapreduce.Job: Jobjob_1393225741554_0003 failed with sta    te FAILED due to: Application application_1393225741554_0003 failed 2times due      to Error launchingappattempt_1393225741554_0003_000002. Got exception: org.apac     he.hadoop.yarn.exceptions.YarnException:Unauthorized request to start container    .

This token is expired. current time is1393313243534 found 1393227455640

       at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

       at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstruct     orAccessorImpl.java:39)

       atsun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingC     onstructorAccessorImpl.java:27)

       at java.lang.reflect.Constructor.newInstance(Constructor.java:513)

       atorg.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.    instantiateException(SerializedExceptionPBImpl.java:152)

       atorg.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.     deSerialize(SerializedExceptionPBImpl.java:106)

       atorg.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.l     aunch(AMLauncher.java:122)

       atorg.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.r     un(AMLauncher.java:249)

       at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec     utor.java:895)

       atjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor     .java:918)

       at java.lang.Thread.run(Thread.java:662)

. Failing the application.

这个问题,后来发现是不同的node之前的系统时间不同,一个节点的日期错了…… 让时间基本相同利用date –s 设置时间或者运行ntpdate time-a.nist.gov 进行同步时间,时间同步后就可以了。

0 0
原创粉丝点击