phoenix-hbase 服务频繁挂掉问题排查

来源:互联网 发布:新浪微博数据2017 编辑:程序博客网 时间:2024/06/06 09:37

近日监控系统改造,使用了phoenix+hbase,最近演示环境监控经常出问题,初步查看为hbase挂掉。
经过log排查发现,是由于centos7.0默认没有fuser命令导致hadoop ha切换失败,hadoop集群挂掉导致;
namenode挂掉是由于zookeeper超时时间设置太小导致。

以下为具体排查过程,

1.首先查看hbase-master log,log显示由于hadoop集群连不上,导致hbase关闭。

2017-08-01 11:46:46,304 INFO  [master/hadoop171/172.16.31.171:60000] regionserver.HRegionServer: stopping server hadoop171,60000,1501497159832; zookeeper connection closed.2017-08-01 11:46:46,304 INFO  [master/hadoop171/172.16.31.171:60000] regionserver.HRegionServer: master/hadoop171/172.16.31.171:60000 exiting2017-08-01 11:46:46,307 ERROR [Thread-7] hdfs.DFSClient: Failed to close inode 63944java.net.ConnectException: Call From hadoop171/172.16.31.171 to hadoop171:9000 failed on connection exception: java.net.ConnectException: 拒绝连接; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused        at sun.reflect.GeneratedConstructorAccessor11.newInstance(Unknown Source)        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)        at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)        at org.apache.hadoop.ipc.Client.call(Client.java:1475)        at org.apache.hadoop.ipc.Client.call(Client.java:1408)        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)        at com.sun.proxy.$Proxy16.addBlock(Unknown Source)        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:404)        at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)        at java.lang.reflect.Method.invoke(Method.java:498)        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)        at com.sun.proxy.$Proxy17.addBlock(Unknown Source)        at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)        at java.lang.reflect.Method.invoke(Method.java:498)        at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:279)        at com.sun.proxy.$Proxy18.addBlock(Unknown Source)        at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)        at java.lang.reflect.Method.invoke(Method.java:498)        at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:279)        at com.sun.proxy.$Proxy18.addBlock(Unknown Source)        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1704)        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1500)        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:668)Caused by: java.net.ConnectException: 拒绝连接        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)        at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)        at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)        at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:713)

2.然后查看hadoop-namenode log,log显示由于zookeeper超时导致namenode挂掉。
集群配置了ha,正常来说,一个namenode挂掉应该切换到另外一个namenode才对。

2017-08-03 05:31:30,999 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 19016 ms (timeout=20000 ms) for a response for startLogSegment(562081). Succeeded so far: [172.16.31.171:8485]2017-08-03 05:31:31,984 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: starting log segment 562081 failed for required journal (JournalAndStream(mgr=QJM to [172.16.31.171:8485, 172.16.31.172:8485, 172.16.31.173:8485], stream=null))java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond.        at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137)        at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.startLogSegment(QuorumJournalManager.java:403)        at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalAndStream.startLogSegment(JournalSet.java:107)        at org.apache.hadoop.hdfs.server.namenode.JournalSet$3.apply(JournalSet.java:222)        at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393)        at org.apache.hadoop.hdfs.server.namenode.JournalSet.startLogSegment(JournalSet.java:219)        at org.apache.hadoop.hdfs.server.namenode.FSEditLog.startLogSegment(FSEditLog.java:1206)        at org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:1175)        at org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1249)        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:6422)        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:1003)        at org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:142)        at org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:12025)        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)        at java.security.AccessController.doPrivileged(Native Method)        at javax.security.auth.Subject.doAs(Subject.java:422)        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)2017-08-03 05:31:31,987 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 12017-08-03 05:31:31,996 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:/************************************************************SHUTDOWN_MSG: Shutting down NameNode at hadoop171/172.16.31.171************************************************************/

3.接下来排查zkfc log。
hadoop171 zkfc log显示171退出选举,具体log如下

2017-08-03 05:31:32,700 WARN org.apache.hadoop.ha.HealthMonitor: Transport-level exception trying to monitor health of NameNode at hadoop171/172.16.31.171:9000: java.io.EOFException End of File Exception between local host is: "hadoop171/172.16.31.171"; destination host is: "hadoop171":9000; : java.io.EOFException; For more details see:  http://wiki.apache.org/hadoop/EOFException2017-08-03 05:31:32,701 INFO org.apache.hadoop.ha.HealthMonitor: Entering state SERVICE_NOT_RESPONDING2017-08-03 05:31:32,701 INFO org.apache.hadoop.ha.ZKFailoverController: Local service NameNode at hadoop171/172.16.31.171:9000 entered state: SERVICE_NOT_RESPONDING2017-08-03 05:31:32,704 WARN org.apache.hadoop.hdfs.tools.DFSZKFailoverController: Can't get local NN thread dump due to 拒绝连接2017-08-03 05:31:32,704 INFO org.apache.hadoop.ha.ZKFailoverController: Quitting master election for NameNode at hadoop171/172.16.31.171:9000 and marking that fencing is necessary2017-08-03 05:31:32,704 INFO org.apache.hadoop.ha.ActiveStandbyElector: Yielding from election2017-08-03 05:31:32,756 INFO org.apache.zookeeper.ZooKeeper: Session: 0x35d9be43dc1019b closed2017-08-03 05:31:32,756 WARN org.apache.hadoop.ha.ActiveStandbyElector: Ignoring stale result from old client with sessionId 0x35d9be43dc1019b2017-08-03 05:31:32,756 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down2017-08-03 05:31:34,758 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop171/172.16.31.171:9000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)

hadoop172 zkfc log 显示,ha切换要首先fence hadoop171,
**联系过程中 提示,fuser: 未找到命令,经查为centos7 没有fuser命令,遂陷入死循环。
参考 : http://f.dataguru.cn/hadoop-707120-1-1.html

2017-08-03 05:31:32,812 INFO org.apache.hadoop.ha.ActiveStandbyElector: Checking for any old active which needs to be fenced...2017-08-03 05:31:32,813 INFO org.apache.hadoop.ha.ActiveStandbyElector: Old node exists: 0a0362656812036e6e311a096861646f6f7031373120a84628d33e2017-08-03 05:31:32,816 INFO org.apache.hadoop.ha.ZKFailoverController: Should fence: NameNode at hadoop171/172.16.31.171:90002017-08-03 05:31:33,822 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop171/172.16.31.171:9000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)2017-08-03 05:31:33,921 WARN org.apache.hadoop.ha.FailoverController: Unable to gracefully make NameNode at hadoop171/172.16.31.171:9000 standby (unable to connect)java.net.ConnectException: Call From hadoop172/172.16.31.172 to hadoop171:9000 failed on connection exception: java.net.ConnectException: 拒>绝连接; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)        at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)        at org.apache.hadoop.ipc.Client.call(Client.java:1475)        at org.apache.hadoop.ipc.Client.call(Client.java:1408)        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)        at com.sun.proxy.$Proxy9.transitionToStandby(Unknown Source)        at org.apache.hadoop.ha.protocolPB.HAServiceProtocolClientSideTranslatorPB.transitionToStandby(HAServiceProtocolClientSideTranslatorPB.java:112)        at org.apache.hadoop.ha.FailoverController.tryGracefulFence(FailoverController.java:172)        at org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:511)        at org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:502)        at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:60)        at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:888)        at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:909)        at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:808)        at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:417)        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)Caused by: java.net.ConnectException: 拒绝连接        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)        at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)        at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)        at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:713)        at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1524)        at org.apache.hadoop.ipc.Client.call(Client.java:1447)        ... 14 more        2017-08-03 05:31:33,927 INFO org.apache.hadoop.ha.NodeFencer: ====== Beginning Service Fencing Process... ======2017-08-03 05:31:33,927 INFO org.apache.hadoop.ha.NodeFencer: Trying method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)2017-08-03 05:31:34,092 INFO org.apache.hadoop.ha.SshFenceByTcpPort: Connecting to hadoop171...2017-08-03 05:31:34,096 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connecting to hadoop171 port 222017-08-03 05:31:34,104 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connection established2017-08-03 05:31:34,122 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Remote version string: SSH-2.0-OpenSSH_6.6.12017-08-03 05:31:34,122 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Local version string: SSH-2.0-JSCH-0.1.422017-08-03 05:31:34,123 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: CheckCiphers: aes256-ctr,aes192-ctr,aes128-ctr,aes256-cbc,aes192-cbc,aes128-cbc,3des-ctr,arcfour,arcfour128,arcfour2562017-08-03 05:31:35,373 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes256-ctr is not available.2017-08-03 05:31:35,374 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes192-ctr is not available.2017-08-03 05:31:35,374 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes256-cbc is not available.2017-08-03 05:31:35,374 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes192-cbc is not available.2017-08-03 05:31:35,374 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: arcfour256 is not available.2017-08-03 05:31:35,376 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT sent2017-08-03 05:31:35,376 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT received2017-08-03 05:31:35,377 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: server->client aes128-ctr hmac-md5 none2017-08-03 05:31:35,377 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client->server aes128-ctr hmac-md5 none2017-08-03 05:31:35,430 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXDH_INIT sent2017-08-03 05:31:35,431 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: expecting SSH_MSG_KEXDH_REPLY2017-08-03 05:31:35,447 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: ssh_rsa_verify: signature true2017-08-03 05:31:35,450 WARN org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Permanently added 'hadoop171' (RSA) to the list of known hosts.2017-08-03 05:31:35,451 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS sent2017-08-03 05:31:35,451 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS received2017-08-03 05:31:35,456 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_REQUEST sent2017-08-03 05:31:35,457 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_ACCEPT received2017-08-03 05:31:35,459 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can continue: gssapi-with-mic,publickey,keyboard-interactive,password2017-08-03 05:31:35,460 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method: gssapi-with-mic2017-08-03 05:31:35,468 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can continue: publickey,keyboard-interactive,password2017-08-03 05:31:35,468 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method: publickey2017-08-03 05:31:35,628 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentication succeeded (publickey).2017-08-03 05:31:35,629 INFO org.apache.hadoop.ha.SshFenceByTcpPort: Connected to hadoop1712017-08-03 05:31:35,629 INFO org.apache.hadoop.ha.SshFenceByTcpPort: Looking for process running on port 90002017-08-03 05:31:35,840 WARN org.apache.hadoop.ha.SshFenceByTcpPort: PATH=$PATH:/sbin:/usr/sbin fuser -v -k -n tcp 9000 via ssh: bash: fuser: 未找到命令2017-08-03 05:31:35,844 INFO org.apache.hadoop.ha.SshFenceByTcpPort: rc: 1272017-08-03 05:31:35,844 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Disconnecting from hadoop171 port 222017-08-03 05:31:35,847 WARN org.apache.hadoop.ha.NodeFencer: Fencing method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.2017-08-03 05:31:35,847 ERROR org.apache.hadoop.ha.NodeFencer: Unable to fence service by any configured method.2017-08-03 05:31:35,847 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Caught an exception, leaving main loop due to Socket closed2017-08-03 05:31:35,905 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of electionjava.lang.RuntimeException: Unable to fence NameNode at hadoop171/172.16.31.171:9000        at org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:530)        at org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:502)        at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:60)        at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:888)        at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:909)        at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:808)        at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:417)        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)        2017-08-03 05:31:35,906 INFO org.apache.hadoop.ha.ActiveStandbyElector: Trying to re-establish ZK session2017-08-03 05:31:35,967 INFO org.apache.zookeeper.ZooKeeper: Session: 0x35d9be43dc1019c closed2017-08-03 05:31:36,968 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=hadoop171:2181,hadoop172:2181,hadoop173:2181 sessionTimeout=5000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@6562a9e92017-08-03 05:31:36,973 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server hadoop173/172.16.31.173:2181. Will not attempt to authenticate using SASL (unknown error)2017-08-03 05:31:37,731 INFO org.apache.zookeeper.ClientCnxn: Socket connection established, initiating session, client: /172.16.31.172:52192, server: hadoop173/172.16.31.173:21812017-08-03 05:31:37,952 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server hadoop173/172.16.31.173:2181, sessionid = 0x35d9be43dc1021b, negotiated timeout = 50002017-08-03 05:31:37,955 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down2017-08-03 05:31:37,956 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session connected.2017-08-03 05:31:38,047 INFO org.apache.hadoop.ha.ActiveStandbyElector: Checking for any old active which needs to be fenced...2017-08-03 05:31:38,054 INFO org.apache.hadoop.ha.ActiveStandbyElector: Old node exists: 0a0362656812036e6e311a096861646f6f7031373120a84628d33e2017-08-03 05:31:38,056 INFO org.apache.hadoop.ha.ZKFailoverController: Should fence: NameNode at hadoop171/172.16.31.171:90002017-08-03 05:31:39,061 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop171/172.16.31.171:9000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)2017-08-03 05:31:39,064 WARN org.apache.hadoop.ha.FailoverController: Unable to gracefully make NameNode at hadoop171/172.16.31.171:9000 standby (unable to connect)

4.到此已经查明原因,解决方案就是安装fuser对应的包

在namenode主、备节点上安装fuser(datanode节点不用安装)

[root@server101 ~]# yum -y install psmisc
[root@server102 ~]# yum -y install psmisc

zookeeper超时的隐患修改,增大超时 20000ms 增加到 50000ms

5.后续再仔细查看log发现,在hadoop出现问题之前,hbase 执行了一个balance操作。
http://openinx.github.io/2016/06/21/hbase-balance/

2017-08-03 05:30:04,345 TRACE [hadoop171,60000,1501568949531_ChoreService_2] access.AccessController: Access allowed for user hadoop; reason: Global check allowed; remote address: ; request: balance; context: (user=hadoop, scope=GLOBAL, action=ADMIN)2017-08-03 05:30:04,349 DEBUG [htable-pool260-t1] ipc.RpcClientImpl: Use SIMPLE authentication for service ClientService, sasl=false2017-08-03 05:30:04,349 DEBUG [htable-pool260-t1] ipc.RpcClientImpl: Connecting to hadoop172/172.16.31.172:600202017-08-03 05:30:05,994 DEBUG [hadoop171,60000,1501568949531_ChoreService_2] balancer.StochasticLoadBalancer: Finished computing new load balance plan.  Computation took 1646ms to try 73600 different iterations.  Found a solution that moves 16 regions; Going from a computed cost of 402.57591499759764 to a new cost of 87.37110284715531

完。