hbase regionserver异常退出

来源:互联网 发布:top域名值得投资吗? 编辑:程序博客网 时间:2024/05/23 10:39
2017-09-23 09:20:54,223 WARN  [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 28836msNo GCs detected2017-09-23 09:20:54,250 INFO  [regionserver60020-SendThread(bis-hadoop-datanode-s-01:2181)] zookeeper.ClientCnxn: Client session timed out, have not heard from server in 40327ms for sessionid 0x505e897ce7f900e8, closing socket connection and attempting reconnect2017-09-23 09:20:54,237 WARN  [regionserver60020] util.Sleeper: We slept 31841ms instead of 3000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired2017-09-23 09:20:54,238 INFO  [regionserver60020-SendThread(bis-backup-s-01:2181)] zookeeper.ClientCnxn: Client session timed out, have not heard from server in 41588ms for sessionid 0x875e9aa7d1a10050, closing socket connection and attempting reconnect2017-09-23 09:20:54,238 INFO  [regionserver60020-SendThread(bis-hadoop-namenode-s-01:2181)] zookeeper.ClientCnxn: Client session timed out, have not heard from server in 37261ms for sessionid 0x525cd4f74e5cfcab, closing socket connection and attempting reconnect2017-09-23 09:20:54,237 WARN  [regionserver60020.periodicFlusher] util.Sleeper: We slept 29502ms instead of 10000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired2017-09-23 09:21:01,815 INFO  [regionserver60020-SendThread(bis-hadoop-namenode-s-01:2181)] zookeeper.ClientCnxn: Opening socket connection to server bis-hadoop-namenode-s-01/10.10.10.82:2181. Will not attempt to authenticate using SASL (unknown error)2017-09-23 09:21:01,816 INFO  [regionserver60020-SendThread(bis-hadoop-datanode-s-01:2181)] zookeeper.ClientCnxn: Opening socket connection to server bis-hadoop-datanode-s-01/10.10.10.80:2181. Will not attempt to authenticate using SASL (unknown error)2017-09-23 09:21:01,816 INFO  [regionserver60020-SendThread(bis-hadoop-namenode-s-01:2181)] zookeeper.ClientCnxn: Socket connection established to bis-hadoop-namenode-s-01/10.10.10.82:2181, initiating session2017-09-23 09:21:01,816 INFO  [regionserver60020-SendThread(bis-hadoop-datanode-s-01:2181)] zookeeper.ClientCnxn: Socket connection established to bis-hadoop-datanode-s-01/10.10.10.80:2181, initiating session2017-09-23 09:21:01,966 WARN  [regionserver60020-EventThread] client.HConnectionManager$HConnectionImplementation: This client just lost it's session with ZooKeeper, closing it. It will be recreated next time someone needs itorg.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:401)        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:319)        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)2017-09-23 09:21:02,006 FATAL [regionserver60020-EventThread] regionserver.HRegionServer: ABORTING region server bis-hadoop-datanode-s2d-129,60020,1506121204494: regionserver:60020-0x525cd4f74e5cfcab, quorum=bis-backup-s-01:2181,bis-hadoop-namenode-s-01:2181,bis-hadoop-datanode-s-01:2181, baseZNode=/hbase regionserver:60020-0x525cd4f74e5cfcab received expired from ZooKeeper, abortingorg.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:401)        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:319)        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)



分析: gc回收太长,花了28836ms,这段时间内,所有线程被阻塞,导致zk客户端与服务端连接超时。
从两个方面解决:
1、将zk的timeout时间加长
<property>
    <name>zookeeper.session.timeout</name>
    <value>120000</value>
</property>
2、避免gc对zk影响,在hbase-env.sh中调整HBASE_REGIONSERVER_OPTS值:
export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS $HBASE_JMX_BASE -Xmx6000m -Xms6000m -Xmn2250m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=70 -Xloggc:/home/hadoop/hbase-0.96.2-hadoop2/logs/gc.log"

重启集群。

原创粉丝点击