HBASE 的Hlog 如果采用压缩格式，则无法使用Seek功能

来源：互联网发布：荷塘月色淘宝论坛官网编辑：程序博客网时间：2024/06/03 18:08

最近在测试hbase 的replication特性的时候发现如果打开下列选项，文件解析会有错误产生。

配置：

<property>
<name>hbase.regionserver.wal.enablecompression</name>
<value>true</value>
</property>

错误日志：

java.lang.IndexOutOfBoundsException: index (2) must be less than size (1)
at com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:301)
at com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:280)
at org.apache.hadoop.hbase.regionserver.wal.LRUDictionary$BidirectionalLRUMap.get(LRUDictionary.java:122)
at org.apache.hadoop.hbase.regionserver.wal.LRUDictionary$BidirectionalLRUMap.access$000(LRUDictionary.java:69)
at org.apache.hadoop.hbase.regionserver.wal.LRUDictionary.getEntry(LRUDictionary.java:40)
at org.apache.hadoop.hbase.regionserver.wal.Compressor.readCompressed(Compressor.java:111)
at org.apache.hadoop.hbase.regionserver.wal.HLogKey.readFields(HLogKey.java:321)
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1851)
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1891)
at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:235)
at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:206)
at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.readAllEntriesToReplicateOrNextFile(ReplicationSource.java:435)
at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:311)

这是由于Hlog采用了字典压缩模式所引起的。发现hlog的压缩是采用字典压缩算法，即一个hlog第一次出现的值（regionName,tablename,keyVal）会将其完全记录在hlog上，同时在compressionContext里面的buffer中存下来，后续每次写值时都到buffer中去查找，如果没有出现则同上述的写入方法一致，并写入一个NOT_IN_DICTIONARY的标志。如果出现了相同的值，则在hlog中只记录一个标志位dictIdx。
在读的时候，如果发现是NOT_IN_DICTIONARY的标志位，则直接读取出来。如果发现不是NOT_IN_DICTIONARY，则就是dictIdx,这样的话就到上述的compressionContext中查找。从内存的buffer中找到了返回。最终达到压缩效果。
这样如果是刚打开文件随机读（采用seek方法），那么只能读到一个标志位，无法找到compressionContext的buffer中真实的值，导致解析失败。