【集群问题解决】Hbase的"Failed deleting my ephemeral node"错误解决

来源:互联网 发布:视频马赛克去除软件 编辑:程序博客网 时间:2024/06/05 03:07

HBase的一个RegionServer莫名其妙的挂了,看了下日志,还是不明白。

-----------------------------------------------------------------------------------------------------
5:18:27.795 PM    WARN    org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper    

Node /hbase/rs/0.0.0.0,60020,1411723106922 already deleted, retry=false

5:18:27.795 PM    WARN    org.apache.hadoop.hbase.regionserver.HRegionServer    

Failed deleting my ephemeral node
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase/rs/0.0.0.0,60020,1411723106922
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
    at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)
    at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:156)
    at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1265)
    at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1254)
    at org.apache.hadoop.hbase.regionserver.HRegionServer.deleteMyEphemeralNode(HRegionServer.java:1212)
    at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:942)
    at java.lang.Thread.run(Thread.java:744)

5:18:27.816 PM    INFO    org.apache.zookeeper.ZooKeeper    

Session: 0x148b13fd9000018 closed

5:18:27.816 PM    INFO    org.apache.zookeeper.ClientCnxn    

EventThread shut down

5:18:27.816 PM    INFO    org.apache.hadoop.hbase.regionserver.HRegionServer    

stopping server null; zookeeper connection closed.

5:18:27.816 PM    INFO    org.apache.hadoop.hbase.regionserver.HRegionServer    

regionserver60020 exiting

5:18:27.816 PM    ERROR    org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine    

Region server exiting
java.lang.RuntimeException: HRegionServer Aborted
    at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:66)
    at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:85)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126)
    at org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:2320)
-----------------------------------------------------------------------------------------------------


正头大,日志往前翻,有了。

-----------------------------------------------------------------------------------------------------
5:18:27.634 PM    FATAL    org.apache.hadoop.hbase.regionserver.HRegionServer    

Master rejected startup because clock is out of sync
org.apache.hadoop.hbase.ClockOutOfSyncException: org.apache.hadoop.hbase.ClockOutOfSyncException: Server bdtest04,60020,1411723106922 has been rejected; Reported time is too far out of sync with master.  Time difference of 39663ms > max allowed of 30000ms
    at org.apache.hadoop.hbase.master.ServerManager.checkClockSkew(ServerManager.java:314)
    at org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:215)
    at org.apache.hadoop.hbase.master.HMaster.regionServerStartup(HMaster.java:1291)
    at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:5085)
    at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175)
    at org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879)

    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
    at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
    at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:277)
    at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:1935)
    at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:781)
    at java.lang.Thread.run(Thread.java:744)
Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.ClockOutOfSyncException): org.apache.hadoop.hbase.ClockOutOfSyncException: Server bdtest04,60020,1411723106922 has been rejected; Reported time is too far out of sync with master.  Time difference of 39663ms > max allowed of 30000ms
    at org.apache.hadoop.hbase.master.ServerManager.checkClockSkew(ServerManager.java:314)
    at org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:215)
    at org.apache.hadoop.hbase.master.HMaster.regionServerStartup(HMaster.java:1291)
    at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:5085)
    at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175)
    at org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879)

    at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1449)
    at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1653)
    at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1711)
    at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.regionServerStartup(RegionServerStatusProtos.java:5402)
    at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:1933)
    ... 2 more

5:18:27.650 PM    FATAL    org.apache.hadoop.hbase.regionserver.HRegionServer    

ABORTING region server 0.0.0.0,60020,1411723106922: Unhandled: org.apache.hadoop.hbase.ClockOutOfSyncException: Server bdtest04,60020,1411723106922 has been rejected; Reported time is too far out of sync with master.  Time difference of 39663ms > max allowed of 30000ms
    at org.apache.hadoop.hbase.master.ServerManager.checkClockSkew(ServerManager.java:314)
    at org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:215)
    at org.apache.hadoop.hbase.master.HMaster.regionServerStartup(HMaster.java:1291)
    at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:5085)
    at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175)
    at org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879)

org.apache.hadoop.hbase.ClockOutOfSyncException: org.apache.hadoop.hbase.ClockOutOfSyncException: Server bdtest04,60020,1411723106922 has been rejected; Reported time is too far out of sync with master.  Time difference of 39663ms > max allowed of 30000ms
    at org.apache.hadoop.hbase.master.ServerManager.checkClockSkew(ServerManager.java:314)
    at org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:215)
    at org.apache.hadoop.hbase.master.HMaster.regionServerStartup(HMaster.java:1291)
    at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:5085)
    at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175)
    at org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879)

    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
    at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
    at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:277)
    at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:1935)
    at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:781)
    at java.lang.Thread.run(Thread.java:744)
Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.ClockOutOfSyncException): org.apache.hadoop.hbase.ClockOutOfSyncException: Server bdtest04,60020,1411723106922 has been rejected; Reported time is too far out of sync with master.  Time difference of 39663ms > max allowed of 30000ms
    at org.apache.hadoop.hbase.master.ServerManager.checkClockSkew(ServerManager.java:314)
    at org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:215)
    at org.apache.hadoop.hbase.master.HMaster.regionServerStartup(HMaster.java:1291)
    at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:5085)
    at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175)
    at org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879)

    at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1449)
    at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1653)
    at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1711)
    at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.regionServerStartup(RegionServerStatusProtos.java:5402)
    at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:1933)
    ... 2 more

5:18:27.652 PM    FATAL    org.apache.hadoop.hbase.regionserver.HRegionServer    

RegionServer abort: loaded coprocessors are: []
-----------------------------------------------------------------------------------------------------


很明显,是时间同步上有问题,查了一下RegionServer和Master两台机的时间,果然不一致。

猜测集群机子虽然是对同一台外部服务器进行对时,但是公众的国外对时服务器的原因。

对策就是Master机子对外对时,RegionServer跟Master进行对时。

0 0