hadoop群集出现crontab job不执行的情况
来源:互联网 发布:网络存在劫持 编辑:程序博客网 时间:2024/05/17 07:41
今天hadoop群集出现crontab job不执行的情况,手动运行job,报错如下:
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RetriableException): org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot
delete /user/hdfs/.staging/job_1441592436807_1892. Name node is in safe mode.
The reported blocks 4710619 needs additional 51773 blocks to reach the threshold 1.0000 of total blocks 4762391.
The number of live datanodes 34 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1211)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:3354)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:3314)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3298)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:733)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete
(ClientNamenodeProtocolServerSideTranslatorPB.java:547)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod
(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
Caused by: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete /user/hdfs/.staging/job_1441592436807_1892. Name node is in safe
mode.
The reported blocks 4710619 needs additional 51773 blocks to reach the threshold 1.0000 of total blocks 4762391.
The number of live datanodes 34 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1207)
... 14 more
at org.apache.hadoop.ipc.Client.call(Client.java:1410)
at org.apache.hadoop.ipc.Client.call(Client.java:1363)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
报错中提示Namenode safe mode,
我查看namenode节点,hadoop dfsadmin -safemode get
但是状态显示的是off,很奇怪,
是不是这个namenode节点进程死掉了?
我尝试将另外的namenode节点调整为active状态,
hdfs haadmin -transitionToActive --forcemanual nn2
nn2节点变成了active状态,之后查看nn1
hdfs haadmin -getServiceState nn1尽然还是active状态,
手动将它调整为standby试试,hdfs haadmin -transitionToStandby --forcemanual nn1
有时候会报错:forcefence and forceactive flags not supported with auto-failover enabled.
意思是自动切换,不能手动。可以关闭这个Namenode节点服务,重新启动。
折腾一下,跑个MR,终于成功了,记录下,帮助遇到这个问题的朋友。
至于什么原因造成的,大概是近期一直在进行大量的MR并同时进行-put上传操作造成的。
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RetriableException): org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot
delete /user/hdfs/.staging/job_1441592436807_1892. Name node is in safe mode.
The reported blocks 4710619 needs additional 51773 blocks to reach the threshold 1.0000 of total blocks 4762391.
The number of live datanodes 34 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1211)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:3354)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:3314)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3298)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:733)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete
(ClientNamenodeProtocolServerSideTranslatorPB.java:547)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod
(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
Caused by: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete /user/hdfs/.staging/job_1441592436807_1892. Name node is in safe
mode.
The reported blocks 4710619 needs additional 51773 blocks to reach the threshold 1.0000 of total blocks 4762391.
The number of live datanodes 34 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1207)
... 14 more
at org.apache.hadoop.ipc.Client.call(Client.java:1410)
at org.apache.hadoop.ipc.Client.call(Client.java:1363)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
报错中提示Namenode safe mode,
我查看namenode节点,hadoop dfsadmin -safemode get
但是状态显示的是off,很奇怪,
是不是这个namenode节点进程死掉了?
我尝试将另外的namenode节点调整为active状态,
hdfs haadmin -transitionToActive --forcemanual nn2
nn2节点变成了active状态,之后查看nn1
hdfs haadmin -getServiceState nn1尽然还是active状态,
手动将它调整为standby试试,hdfs haadmin -transitionToStandby --forcemanual nn1
有时候会报错:forcefence and forceactive flags not supported with auto-failover enabled.
意思是自动切换,不能手动。可以关闭这个Namenode节点服务,重新启动。
折腾一下,跑个MR,终于成功了,记录下,帮助遇到这个问题的朋友。
至于什么原因造成的,大概是近期一直在进行大量的MR并同时进行-put上传操作造成的。
0 0
- hadoop群集出现crontab job不执行的情况
- hadoop群集运行job慢的问题
- crontab job不能执行的问题
- crontab不执行的问题
- crontab不执行的解决。
- linux下使用crontab定时执行kettle的job布曙
- hadoop如何杀掉正在执行的job
- Hadoop执行MR Job的基本过程
- hadoop群集get文件的时候出现报错
- 【Hadoop】在执行job时,hadoop与linux和win7之间的交互出现IO拒绝访问的解决方案
- crontab介绍 + 小示例 + crontab不执行的原因
- crontab介绍 + 小示例 + crontab不执行的原因
- crontab介绍 + 小示例 + crontab不执行的原因
- crontab介绍 + 小示例 + crontab不执行的原因
- crontab介绍 + 小示例 + crontab不执行的原因
- crontab oracle expdb 不执行的问题
- crontab 不执行的原因解析
- crontab不执行的原因和解决方案
- 监控入门-Linux的平均负载(load average)
- 189. Rotate Array
- 数据结构与算法——散列表类的C++实现(探测散列表)
- hadoop群集get文件的时候出现报错
- Linux下Shell编程实现基于Hadoop的ETL(导出篇)
- hadoop群集出现crontab job不执行的情况
- Struts 2的struts.xml中Return json类型配置详解
- ambari-server启动报错 mysqladmin flush-hosts
- ambari安装storm后,所有supervisor无法正常启动
- ambari安装Namenode HA
- 一台linux同时安装两个mysql库,使用不同端口
- supervisor无法正常运行Caused by: java.io.EOFException: null
- ambari管理界面服务显示问题
- Linux下Squid代理服务器的安装与配置,带用户认证功能