NodeManager启动错误

来源:互联网 发布:evisu淘宝正品 编辑:程序博客网 时间:2024/06/05 16:45

1、NodeManager 没起来

2013-07-25 20:06:22,266 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManagerorg.apache.hadoop.yarn.YarnException: Failed to Start org.apache.hadoop.yarn.server.nodemanager.NodeManagerat org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:78)at org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:196)at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:329)at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:351)Caused by: org.apache.hadoop.yarn.YarnException: Failed to Start org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImplat org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:78)at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.start(ContainerManagerImpl.java:248)at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68)... 3 moreCaused by: org.apache.hadoop.yarn.YarnException: Failed to check for existence of remoteLogDir [/yarn/apps]at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.verifyAndCreateRemoteLogDir(LogAggregationService.java:179)at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.start(LogAggregationService.java:132)at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68)... 5 more

/yarn/apps 目录其实存在的 

重启后居然又起来了,莫名其妙 

这种情况有时是因为 IP 不对 :

SHUTDOWN_MSG: Shutting down NodeManager at localhost.localdomain/192.168.1.109

日志发现不是当前 IP,待ip手动或自动配置正确后重启



2、NodeManager 又没起来,这是个更常见的错误 

Caused by: java.net.ConnectException: Call From localhost.localdomain/192.168.1.109 to localhost.localdomain:8031 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefusedat sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
Caused by: java.net.ConnectException: Connection refusedat sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
SHUTDOWN_MSG: Shutting down NodeManager at localhost.localdomain/192.168.1.109************************************************************/

检查 hosts 文件

192.168.1.109 localhost localhost.localdomain

检查 yarn 监控页面 http://192.168.1.109:8088/ 不能访问

查看系统有 RM 进程.

查看 RM 日志 ,并没有启动日志,每次给 RM 进程加上 Debug 参数这个进程就没日志了,看来还是参数没加好啊


调整参数后,再启动,在 Eclipse 中连接到调试端口后,再用 jps 查看时就不会出现 cannot sync ..错误了

但发现 NodeManager  还是没起来,查看日志还是上面的错误,又是8031:

<property>  <name>yarn.resourcemanager.resource-tracker.address</name>  <value>127.0.0.1:8031</value>  <description>   host is the hostname of the resource manager and port is the port on which the NodeManagers contact the Resource Manager.  </description>  </property><property>

yarn 监控页面可以访问,其实8031也有监听

[root@localhost yuming]# netstat -tln | grep 8031tcp6       0      0 192.168.1.109:8031      :::*                    LISTEN     


3、为 RM 加上调试参数后,NM 又又没起来的问题:

RM 日志:

2013-07-29 09:40:48,750 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 8031

NM 日志:

2013-07-29 09:36:17,783 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManagerorg.apache.hadoop.yarn.YarnException: Failed to Start org.apache.hadoop.yarn.server.nodemanager.NodeManager
Caused by: java.net.ConnectException: Call From localhost.localdomain/192.168.0.137 to localhost.localdomain:8031 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused

时间现实 NM 连接 8031 时 8031 还没起来呢,差了4秒,因为 RM 在等待调试器连接

单独再启一次 NM 就可以了



原创粉丝点击