hbase Invalid HFile block magic on hdfs system
来源:互联网 发布:mac口红专柜多少钱一支 编辑:程序博客网 时间:2024/05/01 06:48
昨天hbase突然出错,前台的应用访问hbase链接堆死,看hbase日志,报如下错误:
java.io.IOException: Could not seek StoreFileScanner[HFileScanner for reader reader=hdfs://master:9000/hbase/metadata/7d30805699ab98a11cf1f3f4945d9609/meta/5042f025c06c45819cc5c3821e6298cf, compression=none, cacheConf=CacheConfig:enabled [cacheDataOnRead=true] [cacheDataOnWrite=false] [cacheIndexesOnWrite=false] [cacheBloomsOnWrite=false] [cacheEvictOnClose=false] [cacheCompressed=false], firstKey=001100220736330/meta:refvalue/20140922/Put, lastKey=001100301928568/meta:refvalue/20140922/Put, avgKeyLen=39, avgValueLen=8, entries=483817, length=27616128, cur=001100220819798/meta:C001/20140922/Maximum/vlen=0/ts=0] to key 001100220819798/meta:C001/LATEST_TIMESTAMP/Maximum/vlen=0/ts=0 at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:158) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:351) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:333) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:291) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:256) at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:519) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:402) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:127) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3354) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3310) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3327) at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4066) at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4039) at org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:1944) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3346) at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1376)Caused by: java.io.IOException: Invalid HFile block magic: \x00\x00\x00\x00\x00\x00\x00\x00 at org.apache.hadoop.hbase.io.hfile.BlockType.parse(BlockType.java:153) at org.apache.hadoop.hbase.io.hfile.BlockType.read(BlockType.java:164) at org.apache.hadoop.hbase.io.hfile.HFileBlock.<init>(HFileBlock.java:254) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1779) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1637) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:327) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.seekToDataBlock(HFileBlockIndex.java:213) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:455) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:475) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:226) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:145) ... 19 more
看到hbase的主题上有这个,和我的hbase版本一样,具体的链接是:https://issues.apache.org/jira/i#browse/HBASE-5885 ,描述是基于本地的fs,网上其他地方没有关于这个问题的过多的描述;然后看到代码,patch和我线上的代码一样;看来是已经打过这个patch了,先放一边;
先说说我的这个问题,我昨天下午再给hbase入数据,然后前台的监控就报hbase链接错误,最终还导致前台的应用就部分机器卡死;找了半天问题,没找见,还怀疑自己入的数据里面有什么脏字符导致的;
我先把这个服务器上的regionserver关掉,但是其他机器上的regionserver还是会报同样的错误;
后来我想到,先把hbase的这个校验给关掉;具体的参数是在hbase-site里面,配置上hbase.regionserver.checksum.verify,为false的时候,是不校验,true是校验;这个参数具体在org.apache.hadoop.hbase.HConstants里有描述;具体的描述如下:
/** * If this parameter is set to true, then hbase will read * data and then verify checksums. Checksum verification * inside hdfs will be switched off. However, if the hbase-checksum * verification fails, then it will switch back to using * hdfs checksums for verifiying data that is being read from storage. * * If this parameter is set to false, then hbase will not * verify any checksums, instead it will depend on checksum verification * being done in the hdfs client. */ public static final String HBASE_CHECKSUM_VERIFICATION = "hbase.regionserver.checksum.verify";
加载这个值的类是HRegionServer.class
0.94.0如果hbase-site中未配置此值,默认为true,0.94.2如果hbase-site中未配置此值,默认则为false,不清楚,为啥会有这个变化;
我先把他改成false,重启hbase的regionserver,照样会有报错;如下:
2014-09-24 16:02:46,492 INFO org.apache.hadoop.fs.FSInputChecker: Found checksum error: b[5136, 5648]=0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000org.apache.hadoop.fs.ChecksumException: Checksum error: /blk_9091742659655150538:of:/hbase/log/d7ed8a849af9d171b8f76a766b8a5857/log/070f0e7ddbcc413a893397eb3b3141f3 at 0
说明hdfs校验同样是失败的;
Hadoop系统为了保证数据的一致性,会对文件生成相应的校验文件,并在读写的时候进行校验,确保数据的准确性。
所以说此类的问题,只有一种情况会导致,就是硬盘故障,就是写进去的文件(无所谓完整不完整,生成的校验文件实际上是根据写进去的文件生成的,所以说校验文件和文件块是一一对应,并保持一致的),只要写进去的和读出来的内容保持一致了,校验才可以通过,这个块有3份,写到这个机器的1份如果校验不过去,实际上,hbase也不会去读其他对应冗余块,所以说,处理的方法,不是把此节点上的regionserver,而是把datanode,regionserver通通关闭,然后集群自动检测这个节点不存在后,让他自己再去补齐这个块的冗余;
我把此节点datanode,regionserver关闭后,开始对此节点的硬盘进行检测;
检测后,发现有一块硬盘,存在坏块;摘掉此硬盘,恢复hbase的校验参数,服务正常;
hadoop集群可以做到机器直接宕机的容错,但是实际上没有做到磁盘的故障的容错,看来磁盘的监控十分有必要去做;
- hbase Invalid HFile block magic on hdfs system
- 将HDFS中的数据通过MapReduce产生HFile,然后将HFile导入到HBase具体案例分析
- HBase I/O: HFile
- hfile到hbase
- Hbase的Hfile存储
- hbase HFile V3介绍
- HBase之HFile解析
- HBase StoreFile(HFile)
- HBase之HFile解析
- HBase-mapreduce生成hfile
- HBase-Spark生成hfile
- HBase之HFile详解
- error:unknown file system and invalid arch independent ELF magic
- hbase中的HFile文件格式详解
- HBase I/O – HFile
- hbase数据文件格式(HFile)解析
- [HBase]KeyValue and HFile create
- 【HBase工具】查看解析HFile
- HDU 3664 Permutation Counting
- 灵活运用 SQL SERVER FOR XML PATH
- Hibernate学习_014_级联关系中的CRUD操作
- 观众和专业眼中的“优秀”
- Android的MediaPlayer在播放时设置Http请求头
- hbase Invalid HFile block magic on hdfs system
- HDU 3952 Fruit Ninja(直线与线段相交枚举)
- Transform-style和Perspective属性
- 纯CSS箭头,气泡
- ps显示瞬时进程状态
- Linux:C应用程序终端输出字体颜色的改变
- Java知识点
- Struts2中action之间的跳转
- github-->顺手抄