hbase hregion server 异常退出排查
来源:互联网 发布:mac beta 编辑:程序博客网 时间:2024/06/16 04:47
---HBASE日志
2017-10-25 23:01:15,788 INFO [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 2907ms
GC pool 'ParNew' had collection(s): count=1 time=3312ms
2017-10-25 23:01:29,859 INFO [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 4569ms
GC pool 'ParNew' had collection(s): count=1 time=4746ms
2017-10-25 23:01:37,146 INFO [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 2285ms
GC pool 'ParNew' had collection(s): count=1 time=2530ms
2017-10-25 23:01:44,387 INFO [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 5240ms
GC pool 'ParNew' had collection(s): count=1 time=5417ms
2017-10-25 23:02:06,587 WARN [regionserver60020] util.Sleeper: We slept 22194ms instead of 3000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2017-10-25 23:02:06,588 WARN [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 20200ms
GC pool 'ParNew' had collection(s): count=1 time=20520ms
2017-10-25 23:02:06,587 WARN [regionserver60020] util.Sleeper: We slept 22194ms instead of 3000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2017-10-25 23:02:06,588 WARN [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 20200ms
GC pool 'ParNew' had collection(s): count=1 time=20520ms
2017-10-25 23:02:06,580 WARN [regionserver60020.compactionChecker] util.Sleeper: We slept 22193ms instead of 10000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2017-10-25 23:02:06,594 WARN [regionserver60020.periodicFlusher] util.Sleeper: We slept 22197ms instead of 10000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2017-10-25 23:02:06,596 INFO [regionserver60020-SendThread(bis-backup-s-01:2181)] zookeeper.ClientCnxn: Client session timed out, have not heard from server in 29326ms for sessionid 0x505e897ce7f9032d, closing socket connection and attempting reconnect
2017-10-25 23:02:06,595 INFO [regionserver60020-SendThread(bis-hadoop-datanode-s-01:2181)] zookeeper.ClientCnxn: Client session timed out, have not heard from server in 29374ms for sessionid 0x505e897ce7f9032e, closing socket connection and attempting reconnect
2017-10-25 23:02:06,659 WARN [RpcServer.handler=13,port=60020] ipc.RpcServer: (responseTooSlow): {"processingtimems":20589,"call":"Multi(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MultiRequest)","client":"10.10.10.51:34518","starttimems":1508943706054,"queuetimems":22,"class":"HRegionServer","responsesize":791,"method":"Multi"}
2017-10-25 23:02:06,962 INFO [regionserver60020-SendThread(bis-hadoop-namenode-s-01:2181)] zookeeper.ClientCnxn: Opening socket connection to server bis-hadoop-namenode-s-01/10.10.10.82:2181. Will not attempt to authenticate using SASL (unknown error)
2017-10-25 23:02:06,963 INFO [regionserver60020-SendThread(bis-hadoop-namenode-s-01:2181)] zookeeper.ClientCnxn: Socket connection established to bis-hadoop-namenode-s-01/10.10.10.82:2181, initiating session
2017-10-25 23:02:06,964 INFO [regionserver60020-SendThread(bis-hadoop-namenode-s-01:2181)] zookeeper.ClientCnxn: Session establishment complete on server bis-hadoop-namenode-s-01/10.10.10.82:2181, sessionid = 0x505e897ce7f9032e, negotiated timeout = 40000
2017-10-25 23:02:07,014 INFO [regionserver60020-SendThread(bis-hadoop-namenode-s-01:2181)] zookeeper.ClientCnxn: Opening socket connection to server bis-hadoop-namenode-s-01/10.10.10.82:2181. Will not attempt to authenticate using SASL (unknown error)
2017-10-25 23:02:07,020 INFO [regionserver60020-SendThread(bis-hadoop-namenode-s-01:2181)] zookeeper.ClientCnxn: Socket connection established to bis-hadoop-namenode-s-01/10.10.10.82:2181, initiating session
2017-10-25 23:02:07,021 INFO [regionserver60020-SendThread(bis-hadoop-namenode-s-01:2181)] zookeeper.ClientCnxn: Session establishment complete on server bis-hadoop-namenode-s-01/10.10.10.82:2181, sessionid = 0x505e897ce7f9032d, negotiated timeout = 40000
2017-10-25 23:02:58,896 INFO [regionserver60020-SendThread(bis-hadoop-namenode-s-01:2181)] zookeeper.ClientCnxn: Client session timed out, have not heard from server in 51874ms for sessionid 0x505e897ce7f9032d, closing socket connection and attempting reconnect
2017-10-25 23:02:58,896 WARN [regionserver60020.periodicFlusher] util.Sleeper: We slept 52300ms instead of 10000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2017-10-25 23:02:58,896 INFO [regionserver60020-SendThread(bis-hadoop-namenode-s-01:2181)] zookeeper.ClientCnxn: Client session timed out, have not heard from server in 51932ms for sessionid 0x505e897ce7f9032e, closing socket connection and attempting reconnect
2017-10-25 23:02:58,898 WARN [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 50309ms
GC pool 'ParNew' had collection(s): count=1 time=50566ms
2017-10-25 23:02:58,898 WARN [regionserver60020] util.Sleeper: We slept 52273ms instead of 3000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2017-10-25 23:02:58,907 WARN [regionserver60020.compactionChecker] util.Sleeper: We slept 52311ms instead of 10000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2017-10-25 23:02:58,910 WARN [ResponseProcessor for block BP-267413815-16.6.10.82-1400466369233:blk_1317254303_243853019] hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block BP-267413815-16.6.10.82-1400466369233:blk_1317254303_243853019
java.io.EOFException: Premature EOF: no length prefix available
2017-10-25 23:02:59,088 FATAL [regionserver60020] regionserver.HRegionServer: ABORTING region server bis-hadoop-datanode-s-01,60020,1508493905842: org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; currently processing bis-hadoop-datanode-s-01,60020,1508493905842 as dead server
2017-10-25 23:02:59,369 FATAL [regionserver60020.logRoller] regionserver.HRegionServer: ABORTING region server bis-hadoop-datanode-s-01,60020,1508493905842: IOE in log roller
java.io.IOException: cannot get log writer
Caused by: java.io.FileNotFoundException: Parent directory doesn't exist: /hbase/WALs/bis-hadoop-datanode-s-01,60020,1508493905842
2017-10-25 23:02:59,424 FATAL [RpcServer.handler=26,port=60020] wal.FSHLog: Could not sync. Requesting roll of hlog
org.apache.hadoop.ipc.RemoteException(java.io.IOException): BP-267413815-16.6.10.82-1400466369233:blk_1317254303_243853019 does not exist or is not under Constructionblk_1317254303_243853123
----zookeeper server 日志
2017-10-25 23:02:07,022 [myid:82] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@595] - Established session 0x505e897ce7f9032d with negotiated timeout 40000 for client /10.10.10.80:40602
2017-10-25 23:02:48,000 [myid:82] - INFO [SessionTracker:ZooKeeperServer@325] - Expiring session 0x505e897ce7f9032d, timeout of 40000ms exceeded
2017-10-25 23:02:48,000 [myid:82] - INFO [SessionTracker:ZooKeeperServer@325] - Expiring session 0x505e897ce7f9032e, timeout of 40000ms exceeded
2017-10-25 23:02:48,001 [myid:82] - INFO [ProcessThread(sid:82 cport:-1)::PrepRequestProcessor@476] - Processed session termination for sessionid: 0x505e897ce7f9032d
2017-10-25 23:02:48,004 [myid:82] - INFO [ProcessThread(sid:82 cport:-1)::PrepRequestProcessor@476] - Processed session termination for sessionid: 0x505e897ce7f9032e
2017-10-25 23:02:48,013 [myid:82] - INFO [CommitProcessor:82:NIOServerCnxn@1001] - Closed socket connection for client /10.10.10.80:40602 which had sessionid 0x505e897ce7f9032d
2017-10-25 23:01:15,788 INFO [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 2907ms
GC pool 'ParNew' had collection(s): count=1 time=3312ms
2017-10-25 23:01:29,859 INFO [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 4569ms
GC pool 'ParNew' had collection(s): count=1 time=4746ms
2017-10-25 23:01:37,146 INFO [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 2285ms
GC pool 'ParNew' had collection(s): count=1 time=2530ms
2017-10-25 23:01:44,387 INFO [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 5240ms
GC pool 'ParNew' had collection(s): count=1 time=5417ms
2017-10-25 23:02:06,587 WARN [regionserver60020] util.Sleeper: We slept 22194ms instead of 3000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2017-10-25 23:02:06,588 WARN [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 20200ms
GC pool 'ParNew' had collection(s): count=1 time=20520ms
2017-10-25 23:02:06,587 WARN [regionserver60020] util.Sleeper: We slept 22194ms instead of 3000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2017-10-25 23:02:06,588 WARN [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 20200ms
GC pool 'ParNew' had collection(s): count=1 time=20520ms
2017-10-25 23:02:06,580 WARN [regionserver60020.compactionChecker] util.Sleeper: We slept 22193ms instead of 10000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2017-10-25 23:02:06,594 WARN [regionserver60020.periodicFlusher] util.Sleeper: We slept 22197ms instead of 10000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2017-10-25 23:02:06,596 INFO [regionserver60020-SendThread(bis-backup-s-01:2181)] zookeeper.ClientCnxn: Client session timed out, have not heard from server in 29326ms for sessionid 0x505e897ce7f9032d, closing socket connection and attempting reconnect
2017-10-25 23:02:06,595 INFO [regionserver60020-SendThread(bis-hadoop-datanode-s-01:2181)] zookeeper.ClientCnxn: Client session timed out, have not heard from server in 29374ms for sessionid 0x505e897ce7f9032e, closing socket connection and attempting reconnect
2017-10-25 23:02:06,659 WARN [RpcServer.handler=13,port=60020] ipc.RpcServer: (responseTooSlow): {"processingtimems":20589,"call":"Multi(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MultiRequest)","client":"10.10.10.51:34518","starttimems":1508943706054,"queuetimems":22,"class":"HRegionServer","responsesize":791,"method":"Multi"}
2017-10-25 23:02:06,962 INFO [regionserver60020-SendThread(bis-hadoop-namenode-s-01:2181)] zookeeper.ClientCnxn: Opening socket connection to server bis-hadoop-namenode-s-01/10.10.10.82:2181. Will not attempt to authenticate using SASL (unknown error)
2017-10-25 23:02:06,963 INFO [regionserver60020-SendThread(bis-hadoop-namenode-s-01:2181)] zookeeper.ClientCnxn: Socket connection established to bis-hadoop-namenode-s-01/10.10.10.82:2181, initiating session
2017-10-25 23:02:06,964 INFO [regionserver60020-SendThread(bis-hadoop-namenode-s-01:2181)] zookeeper.ClientCnxn: Session establishment complete on server bis-hadoop-namenode-s-01/10.10.10.82:2181, sessionid = 0x505e897ce7f9032e, negotiated timeout = 40000
2017-10-25 23:02:07,014 INFO [regionserver60020-SendThread(bis-hadoop-namenode-s-01:2181)] zookeeper.ClientCnxn: Opening socket connection to server bis-hadoop-namenode-s-01/10.10.10.82:2181. Will not attempt to authenticate using SASL (unknown error)
2017-10-25 23:02:07,020 INFO [regionserver60020-SendThread(bis-hadoop-namenode-s-01:2181)] zookeeper.ClientCnxn: Socket connection established to bis-hadoop-namenode-s-01/10.10.10.82:2181, initiating session
2017-10-25 23:02:07,021 INFO [regionserver60020-SendThread(bis-hadoop-namenode-s-01:2181)] zookeeper.ClientCnxn: Session establishment complete on server bis-hadoop-namenode-s-01/10.10.10.82:2181, sessionid = 0x505e897ce7f9032d, negotiated timeout = 40000
2017-10-25 23:02:58,896 INFO [regionserver60020-SendThread(bis-hadoop-namenode-s-01:2181)] zookeeper.ClientCnxn: Client session timed out, have not heard from server in 51874ms for sessionid 0x505e897ce7f9032d, closing socket connection and attempting reconnect
2017-10-25 23:02:58,896 WARN [regionserver60020.periodicFlusher] util.Sleeper: We slept 52300ms instead of 10000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2017-10-25 23:02:58,896 INFO [regionserver60020-SendThread(bis-hadoop-namenode-s-01:2181)] zookeeper.ClientCnxn: Client session timed out, have not heard from server in 51932ms for sessionid 0x505e897ce7f9032e, closing socket connection and attempting reconnect
2017-10-25 23:02:58,898 WARN [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 50309ms
GC pool 'ParNew' had collection(s): count=1 time=50566ms
2017-10-25 23:02:58,898 WARN [regionserver60020] util.Sleeper: We slept 52273ms instead of 3000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2017-10-25 23:02:58,907 WARN [regionserver60020.compactionChecker] util.Sleeper: We slept 52311ms instead of 10000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2017-10-25 23:02:58,910 WARN [ResponseProcessor for block BP-267413815-16.6.10.82-1400466369233:blk_1317254303_243853019] hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block BP-267413815-16.6.10.82-1400466369233:blk_1317254303_243853019
java.io.EOFException: Premature EOF: no length prefix available
2017-10-25 23:02:59,088 FATAL [regionserver60020] regionserver.HRegionServer: ABORTING region server bis-hadoop-datanode-s-01,60020,1508493905842: org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; currently processing bis-hadoop-datanode-s-01,60020,1508493905842 as dead server
2017-10-25 23:02:59,369 FATAL [regionserver60020.logRoller] regionserver.HRegionServer: ABORTING region server bis-hadoop-datanode-s-01,60020,1508493905842: IOE in log roller
java.io.IOException: cannot get log writer
Caused by: java.io.FileNotFoundException: Parent directory doesn't exist: /hbase/WALs/bis-hadoop-datanode-s-01,60020,1508493905842
2017-10-25 23:02:59,424 FATAL [RpcServer.handler=26,port=60020] wal.FSHLog: Could not sync. Requesting roll of hlog
org.apache.hadoop.ipc.RemoteException(java.io.IOException): BP-267413815-16.6.10.82-1400466369233:blk_1317254303_243853019 does not exist or is not under Constructionblk_1317254303_243853123
----zookeeper server 日志
2017-10-25 23:02:07,022 [myid:82] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@595] - Established session 0x505e897ce7f9032d with negotiated timeout 40000 for client /10.10.10.80:40602
2017-10-25 23:02:48,000 [myid:82] - INFO [SessionTracker:ZooKeeperServer@325] - Expiring session 0x505e897ce7f9032d, timeout of 40000ms exceeded
2017-10-25 23:02:48,000 [myid:82] - INFO [SessionTracker:ZooKeeperServer@325] - Expiring session 0x505e897ce7f9032e, timeout of 40000ms exceeded
2017-10-25 23:02:48,001 [myid:82] - INFO [ProcessThread(sid:82 cport:-1)::PrepRequestProcessor@476] - Processed session termination for sessionid: 0x505e897ce7f9032d
2017-10-25 23:02:48,004 [myid:82] - INFO [ProcessThread(sid:82 cport:-1)::PrepRequestProcessor@476] - Processed session termination for sessionid: 0x505e897ce7f9032e
2017-10-25 23:02:48,013 [myid:82] - INFO [CommitProcessor:82:NIOServerCnxn@1001] - Closed socket connection for client /10.10.10.80:40602 which had sessionid 0x505e897ce7f9032d
2017-10-25 23:02:48,013 [myid:82] - INFO [CommitProcessor:82:NIOServerCnxn@1001] - Closed socket connection for client /10.10.10.80:40601 which had sessionid 0x505e897ce7f9032e
分析hbase和zk日志:
发现23点01-03分左右出现好几次大的gc,时间都在20s,最后一次在52s左右,期间服务全被堵塞了,无法响应外部请求;由于超出了zk允许的最大中断时间(40s),zk服务自动剔除该hbase节点。hbase gc后试图继续与zk连续,但得不到响应,故退出服务。
阅读全文
0 0
- hbase hregion server 异常退出排查
- Hbase HregionServer Hregion Hmaser
- HBase source code. HRegion
- hbase regionserver异常退出
- aix下进程异常退出问题排查
- hbase.hregion.max.filesize 默认值 到底是多少
- HBase源代码调试(6)-HRegion的启动
- hbase日志出现Session expired异常排查方法
- hbase参数调优 hbase.hregion.max.filesize
- traffic server回源异常故障的排查
- hbase 异常 Server is not running yet
- 一次软件异常退出的排查(IE、Adobe Reader无法打开问题)
- Oracle定时JOB任务异常退出排查查看Oracle日志文件
- tomcat 启动异常排查
- kafka ZookeeperConsumerConnector 异常排查
- 线上异常排查总结
- 流量异常排查
- SecondaryNameNode异常排查
- Java+MySQL:插入一条数据并删除上一条
- [剑指offer]算法4 重建二叉树
- Spark安装搭建与使用
- Js中new操作符做了什么?
- eclipse 项目出现红色感叹号
- hbase hregion server 异常退出排查
- 科大讯飞胡郁:再过四到五年时间,机器语音识别就能达到人类水准
- 赫尔曼·赫尔曼:全球机器人顶尖专家中的财神爷
- 电视猫:创新思维打造内容营销新模式
- 为什么说新兴市场有着数十亿用户的市场潜力,但硅谷都看不见
- uva10000
- Statement和PreparedStatement之间的区别
- 洛谷10月月赛R2·浴谷八连测R4
- 面向接口编程