zk异常导致hbase和curatorRecipes抛出的异常

来源:互联网 发布:jq 数组中是否包含 编辑:程序博客网 时间:2024/06/16 11:22

一、fps实例一(执行获取文件信息):
2015-03-13 04:00:00,656 ERROR HBaseSystem(?) -
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=2, exceptions:
Fri Mar 13 04:00:00 CST 2015, org.apache.hadoop.hbase.client.RpcRetryingCaller@30202e03, java.net.NoRouteToHostException: 没有到主
机的路由
Fri Mar 13 04:00:00 CST 2015, org.apache.hadoop.hbase.client.RpcRetryingCaller@30202e03, org.apache.hadoop.hbase.ipc.RpcClient$Fai
ledServerException: This server is in the failed servers list: bdp-dev-2/10.12.75.73:60020

        at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:129)
        at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:90)
        at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:283)
        at org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:188)
        at org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:183)
        at org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:110)
        at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:738)
        at cn.***.adapter.HBaseSystem.scaner(HBaseSystem.java:339)
        at cn.***.service.impl.UserServiceImpl.getUserList(UserServiceImpl.java:94)
        at cn.***.service.impl.TaskServiceImpl.executeTask(TaskServiceImpl.java:79)
        at cn.***.service.impl.TaskServiceImpl.executeDelFileTask(TaskServiceImpl.java:73)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.springframework.scheduling.support.ScheduledMethodRunnable.run(ScheduledMethodRunnable.java:65)
        at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54)
        at org.springframework.scheduling.concurrent.ReschedulingRunnable.run(ReschedulingRunnable.java:81)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException: This server is in the failed servers list: bdp-dev-2/10.12
.75.73:60020
        at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:853)
        at org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1543)
        at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1442)
        at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661)
        at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
        at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.get(ClientProtos.java:29966)
        at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRowOrBefore(ProtobufUtil.java:1508)
        at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRowOrBefore(ProtobufUtil.java:1488)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:
1231)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1105)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:1082
)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:9
03)
        at org.apache.hadoop.hbase.client.RegionServerCallable.prepare(RegionServerCallable.java:72)
        at org.apache.hadoop.hbase.client.ScannerCallable.prepare(ScannerCallable.java:125)
        at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:113)
        ... 24 more
2015-03-13 04:00:00,658 ERROR TaskServiceImpl(?) - executeTask para is null

二、fps实例二(执行删除过期文件的操作,要从zk获取分布式锁)
2015-03-13 04:03:28,172 INFO appMonitor(?) - {"s":"cn.***.controller.FileUploadController.download","b":"null","t":"4
54","r":"0"}
2015-03-13 04:03:54,183 INFO ClientCnxn(?) - Client session timed out, have not heard from server in 20000ms for session
id 0x14b591ca4e30910, closing socket connection and attempting reconnect
2015-03-13 04:03:54,283 INFO ConnectionStateManager(?) - State change: SUSPENDED
2015-03-13 04:03:54,301 ERROR DistributedLock(?) - SUSPENDED
2015-03-13 04:03:54,301 ERROR DistributedLock(?) - SUSPENDED
..........
2015-03-13 04:03:54,394 ERROR DistributedLock(?) - SUSPENDED
2015-03-13 04:03:54,394 ERROR DistributedLock(?) - SUSPENDED
2015-03-13 04:03:55,591 INFO ClientCnxn(?) - Opening socket connection to server bdp-dev-3/10.12.75.74:2181. Will not at
tempt to authenticate using SASL (unknown error)
2015-03-13 04:03:55,633 INFO ClientCnxn(?) - Socket connection established to bdp-dev-3/10.12.75.74:2181, initiating ses
sion
2015-03-13 04:03:55,678 INFO ClientCnxn(?) - Session establishment complete on server bdp-dev-3/10.12.75.74:2181, sessio
nid = 0x14b591ca4e30910, negotiated timeout = 30000
2015-03-13 04:03:55,679 INFO ConnectionStateManager(?) - State change: RECONNECTED
2015-03-13 04:03:55,679 ERROR DistributedLock(?) - RECONNECTED
2015-03-13 04:03:55,679 ERROR DistributedLock(?) - RECONNECTED
2015-03-13 04:03:55,679 ERROR DistributedLock(?) - RECONNECTED
........
2015-03-13 04:03:55,768 ERROR DistributedLock(?) - RECONNECTED
2015-03-13 04:03:55,768 ERROR DistributedLock(?) - RECONNECTED
2015-03-13 04:05:19,567 INFO FileServiceImpl(?) - download, fileId:80000000142659175894500-83548895000000014259005589450
980,clientVer:null

 

这两台机器存在4分钟的时间误差。

原因:zk因为某些原因挂了1s,由于hbase的hbase.client.retries.number只设置了2次,结果没等的及zk恢复正常(尽管zk只要了1s来恢复)
解决fps实例一的方法是将hbase的hbase.client.retries.number改为默认的30次。

fps实例二打印的这个异常可以直接忽略,因为只要重新连接成功,分布式锁就能够正常的获取和释放了。

 

 

 

 

 

 

 

 

 

 


 

0 0
原创粉丝点击