hbase异常regionserver宕机
来源:互联网 发布:淘宝开店实名认证照片 编辑:程序博客网 时间:2024/05/22 09:45
1、直接崩溃,无任何异常日志
我们使用cloudera的hbase发行版本安装,一切正常,一旦向hbase写数据,很快就挂掉,没有任何异常日志。
后来,发现cloudera为hbase只分配了50M的内存,很快出现oom,连打印错误日志的内存空间就没有了,直接爆掉。
2、20多台节点,偶尔一两台宕机
晚上9点45:45.455WARNorg.apache.hadoop.hbase.util.SleeperWe slept 44340ms instead of 10000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired晚上9点45:45.455WARNorg.apache.hadoop.hbase.util.SleeperWe slept 44338ms instead of 10000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired晚上9点45:45.455INFOorg.apache.zookeeper.ClientCnxnClient session timed out, have not heard from server in 54340ms for sessionid 0x34859dbfa060904, closing socket connection and attempting reconnect晚上9点45:45.454WARNorg.apache.hadoop.ipc.RpcServerRpcServer.handler=18,port=60020: caught a ClosedChannelException, this means that the server was processing a request but the client went away. The error message was: null晚上9点45:59.606INFOorg.apache.zookeeper.ClientCnxnClient session timed out, have not heard from server in 54339ms for sessionid 0x1485cee1d5a03ec, closing socket connection and attempting reconnect晚上9点46:14.061WARNorg.apache.hadoop.ipc.RpcServerRpcServer.respondercallId: 18 service: ClientService methodName: Get size: 89 connection: 10.0.2.182:50259: output error晚上9点46:14.061WARNorg.apache.hadoop.hbase.util.JvmPauseMonitorDetected pause in JVM or host machine (eg GC): pause of approximately 13957msNo GCs detected晚上9点46:14.061WARNorg.apache.hadoop.hdfs.DFSClientDFSOutputStream ResponseProcessor exception for block BP-1813023907-10.0.2.161-1384842743529:blk_1112934385_1099559139217java.io.EOFException: Premature EOF: no length prefix availableat org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:1987)at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:176)at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:796)晚上9点46:14.063WARNorg.apache.hadoop.ipc.RpcServerRpcServer.handler=16,port=60020: caught a ClosedChannelException, this means that the server was processing a request but the client went away. The error message was: null晚上9点46:14.062WARNorg.apache.hadoop.ipc.RpcServerRpcServer.respondercallId: 18 service: ClientService methodName: Get size: 89 connection: 10.0.2.182:50298: output error晚上9点46:14.065WARNorg.apache.hadoop.hdfs.DFSClientError Recovery for block BP-1813023907-10.0.2.161-1384842743529:blk_1112934385_1099559139217 in pipeline 10.0.2.182:50010, 10.0.2.172:50010: bad datanode 10.0.2.182:50010晚上9点46:14.065WARNorg.apache.hadoop.ipc.RpcServerRpcServer.handler=5,port=60020: caught a ClosedChannelException, this means that the server was processing a request but the client went away. The error message was: null晚上9点46:14.062FATALorg.apache.hadoop.hbase.regionserver.HRegionServerABORTING region server stat182,60020,1410352355225: org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; currently processing stat182,60020,1410352355225 as dead serverat org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:339)at org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:254)
背景介绍:
25台regionserver,5台zookeeper,master有热备。
实时处理程序,实时的向hbase表写数据,hbase集群与hdfs、hive、spark公用,集群上每天会跑12个小时左右的分析任务。
推测原因1--compaction、split过于频繁:
由于配置里面hfile最大文件大小设置为1G,所以compaction、split比较频繁,资源消耗比较大,导致gc暂停时间过长,出现写hdfs错误,导致regionsever挂掉
推测原因2-- hdfs压力过大,datanode超负荷:
由于集群运行各种任务,hdfs读写压力大,datanode的负载比较高,导致regionserver写hdfs异常,宕机
解决办法:1、hfile大小改为100G,禁止系统自己做major compaction。2、给datanode多一些内存,调整rpc线程数。
0 0
- hbase异常regionserver宕机
- hbase regionserver异常退出
- hbase的regionserver宕机
- HBase异常——当RegionServer Crash之后
- HBase异常——当RegionServer Crash之后
- HBASE REGIONSERVER启动过程
- HBase RegionServer详解
- 启动hbase的regionserver
- Hbase源码@RegionServer启动
- Hbase regionserver 内存
- RegionServer异常超时检测
- Hbase 各regionServer 时间不一致
- HBase的RegionServer参数配置
- HBase深入分析之RegionServer
- hbase put regionserver处理分析
- HBase深入分析之RegionServer
- hbase regionserver节点连不上集群
- HBase深入分析之RegionServer
- java进阶内容--连载中
- data,bdata,idata,pdata,xdata,code存储类型与存储区
- hdu 5015 Matrix 233 矩阵快速幂
- 总结:p2p项目
- java正则表达式中的‘\\转义
- hbase异常regionserver宕机
- TIM输出比较的三种模式
- Protobuffor Java使用说明(包含nano、micro版本)附件protobuf-java-2.3.0.jar
- volatile
- HTML5添加 video 视频标签后仍然无法播放的解决方法
- 浅析Java中的final关键字
- 【LeetCode题目记录-4】插入数组间隔问题
- 读《周鸿祎自述:我的互联网方法论》有感
- html 欣赏