Hive-0.8.1 索引解析（BitMapIndex）

来源：互联网发布：新网域名登录平台编辑：程序博客网时间：2024/05/21 08:57

上来先做实验，用0.8.1下面的table03，来做这个BitMapIndex的实验。

hive> dfs -ls /user/hive/warehouse/table03;Found 6 items-rw-r--r--   1 allen supergroup   67109134 2012-03-12 21:48 /user/hive/warehouse/table03/000000_0-rw-r--r--   1 allen supergroup   67108860 2012-03-12 21:48 /user/hive/warehouse/table03/000001_0-rw-r--r--   1 allen supergroup   67108860 2012-03-12 21:48 /user/hive/warehouse/table03/000002_0-rw-r--r--   1 allen supergroup   67108860 2012-03-12 21:48 /user/hive/warehouse/table03/000003_0-rw-r--r--   1 allen supergroup   67108860 2012-03-12 21:49 /user/hive/warehouse/table03/000004_0-rw-r--r--   1 allen supergroup   21344316 2012-03-12 21:49 /user/hive/warehouse/table03/000005_0hive> create index bitmap_index on table table03(id)                    > as 'org.apache.hadoop.hive.ql.index.bitmap.BitmapIndexHandler'    > with deferred rebuild;                                        OKTime taken: 0.715 secondshive> alter index bitmap_index on table03 rebuild;Total MapReduce jobs = 1Launching Job 1 out of 1Number of reduce tasks not specified. Estimated from input data size: 1In order to change the average load for a reducer (in bytes):  set hive.exec.reducers.bytes.per.reducer=<number>In order to limit the maximum number of reducers:  set hive.exec.reducers.max=<number>In order to set a constant number of reducers:  set mapred.reduce.tasks=<number>Starting Job = job_201203141051_0004, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201203141051_0004Kill Command = /home/allen/Hadoop/hadoop-0.20.2/bin/../bin/hadoop job  -Dmapred.job.tracker=localhost:9001 -kill job_201203141051_0004Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 1.

.2012-03-14 13:49:33,749 Stage-1 map = 100%,  reduce = 100%Ended Job = job_201203141051_0004Loading data to table default.default__table03_bitmap_index__Deleted hdfs://localhost:9000/user/hive/warehouse/default__table03_bitmap_index__Table default.default__table03_bitmap_index__ stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 95701985, raw_data_size: 0]MapReduce Jobs Launched: Job 0: Map: 2  Reduce: 1   HDFS Read: 356889161 HDFS Write: 95701985 SUCESSTotal MapReduce CPU Time Spent: 0 msecOKTime taken: 283.695 secondshive>

下面看一下HDFS上都有哪些变化：

hive> dfs -ls /user/hive/warehouse/;              Found 5 itemsdrwxr-xr-x   - allen supergroup          0 2012-03-12 17:26 /user/hive/warehouse/default__table02_compact_index__drwxr-xr-x   - allen supergroup          0 2012-03-14 13:49 /user/hive/warehouse/default__table03_bitmap_index__drwxr-xr-x   - allen supergroup          0 2012-03-04 22:22 /user/hive/warehouse/table01drwxr-xr-x   - allen supergroup          0 2012-03-04 22:33 /user/hive/warehouse/table02drwxr-xr-x   - allen supergroup          0 2012-03-12 21:49 /user/hive/warehouse/table03hive> dfs -du /user/hive/warehouse/;Found 5 items74701985    hdfs://localhost:9000/user/hive/warehouse/default__table02_compact_index__95701985    hdfs://localhost:9000/user/hive/warehouse/default__table03_bitmap_index__356888890   hdfs://localhost:9000/user/hive/warehouse/table01356888890   hdfs://localhost:9000/user/hive/warehouse/table02356888890   hdfs://localhost:9000/user/hive/warehouse/table03hive> dfs -ls /user/hive/warehouse/default__table03_bitmap_index__    > ;Found 1 items-rw-r--r--   1 allen supergroup   95701985 2012-03-14 13:47 /user/hive/warehouse/default__table03_bitmap_index__/000000_0hive> exit;                                                        allen@allen-laptop:~/Desktop/hive-0.8.1$ hadoop fs -cat /user/hive/warehouse/default__table03_bitmap_index__/000000_0|head12/03/14 14:22:45 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively0hdfs://localhost:9000/user/hive/warehouse/table03/000000_001248589934592101hdfs://localhost:9000/user/hive/warehouse/table03/000000_03521248589934592102hdfs://localhost:9000/user/hive/warehouse/table03/000000_07041248589934592103hdfs://localhost:9000/user/hive/warehouse/table03/000000_010561248589934592104hdfs://localhost:9000/user/hive/warehouse/table03/000000_014081248589934592105hdfs://localhost:9000/user/hive/warehouse/table03/000000_017601248589934592106hdfs://localhost:9000/user/hive/warehouse/table03/000000_021121248589934592107hdfs://localhost:9000/user/hive/warehouse/table03/000000_024641248589934592108hdfs://localhost:9000/user/hive/warehouse/table03/000000_028161248589934592109hdfs://localhost:9000/user/hive/warehouse/table03/000000_03168124858993459210cat: Unable to write to output stream.allen@allen-laptop:~/Desktop/hive-0.8.1$ hadoop fs -text /user/hive/warehouse/default__table03_bitmap_index__/000000_0|head12/03/14 14:23:10 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively0hdfs://localhost:9000/user/hive/warehouse/table03/000000_001248589934592101hdfs://localhost:9000/user/hive/warehouse/table03/000000_03521248589934592102hdfs://localhost:9000/user/hive/warehouse/table03/000000_07041248589934592103hdfs://localhost:9000/user/hive/warehouse/table03/000000_010561248589934592104hdfs://localhost:9000/user/hive/warehouse/table03/000000_014081248589934592105hdfs://localhost:9000/user/hive/warehouse/table03/000000_017601248589934592106hdfs://localhost:9000/user/hive/warehouse/table03/000000_021121248589934592107hdfs://localhost:9000/user/hive/warehouse/table03/000000_024641248589934592108hdfs://localhost:9000/user/hive/warehouse/table03/000000_028161248589934592109hdfs://localhost:9000/user/hive/warehouse/table03/000000_03168124858993459210text: Unable to write to output stream.

对比下compact的内容：allen@allen-laptop:~/Desktop/hive-0.8.1$ hadoop fs -text /user/hive/warehouse/default__table02_compact_index__/000000_0|head12/03/14 14:23:41 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively0hdfs://localhost:9000/user/hive/warehouse/table02/000000_001hdfs://localhost:9000/user/hive/warehouse/table02/000000_03522hdfs://localhost:9000/user/hive/warehouse/table02/000000_07043hdfs://localhost:9000/user/hive/warehouse/table02/000000_010564hdfs://localhost:9000/user/hive/warehouse/table02/000000_014085hdfs://localhost:9000/user/hive/warehouse/table02/000000_017606hdfs://localhost:9000/user/hive/warehouse/table02/000000_021127hdfs://localhost:9000/user/hive/warehouse/table02/000000_024648hdfs://localhost:9000/user/hive/warehouse/table02/000000_028169hdfs://localhost:9000/user/hive/warehouse/table02/000000_03168text: Unable to write to output stream.allen@allen-laptop:~/Desktop/hive-0.8.1$

hive> show tables;OKdefault__table02_compact_index__default__table03_bitmap_index__table02table03Time taken: 3.38 secondshive> desc default__table03_bitmap_index__;OKidint_bucketnamestring_offsetbigint_bitmapsarray<bigint>Time taken: 0.546 secondshive> desc default__table02_compact_index__;OKidint_bucketnamestring_offsetsarray<bigint>Time taken: 0.3 secondshive>

在compact_index中，每个id与出现的bucket组成唯一的键，offsets是这个id在这个bucket中出现的位置们。

而bitmap_index中，一条记录是id，bucket，offset，bitmap（一个bigint数组）

allen@allen-laptop:~/Desktop/hive-0.8.1$ hadoop fs -text /user/hive/warehouse/default__table03_bitmap_index__/000000_0|tail12/03/14 14:34:53 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively999990hdfs://localhost:9000/user/hive/warehouse/table03/000005_021340746124858993459210999991hdfs://localhost:9000/user/hive/warehouse/table03/000005_021341103124858993459210999992hdfs://localhost:9000/user/hive/warehouse/table03/000005_021341460124858993459210999993hdfs://localhost:9000/user/hive/warehouse/table03/000005_021341817124858993459210999994hdfs://localhost:9000/user/hive/warehouse/table03/000005_021342174124858993459210999995hdfs://localhost:9000/user/hive/warehouse/table03/000005_021342531124858993459210999996hdfs://localhost:9000/user/hive/warehouse/table03/000005_021342888124858993459210999997hdfs://localhost:9000/user/hive/warehouse/table03/000005_021343245124858993459210999998hdfs://localhost:9000/user/hive/warehouse/table03/000005_021343602124858993459210999999hdfs://localhost:9000/user/hive/warehouse/table03/000005_021343959124858993459210

上面的说明了整个bitmap索引表中的bitmap都是一样的，值为124858993459210

0（id）

hdfs://localhost:9000/user/hive/warehouse/table03/000000_0（bucket）
0（offset of the bucket）
1 2 4 8589934592 1 0（长度为6的BIGINT数组）

我们来看看这个BitMap：

Like the complex index, this implementation uses an index table. The index table on a column "key" has four or more columns: first, the columns that are being indexed, then _bucketname, _offset, and _bitmaps. _bucketname is a string pointing to the hadoop file that is storing this block（） in the table, _offset is the block offset of a block, and _bitmaps is an uncompressed bitmap encoding (an Array of bytes) of the bitmap for this column value, bucketname, and row offset.Each bit in the bitmap corresponds to one row in the block.The bit is 1 if that row has the value of the values in the columns being indexed, and a 0 if not. If a key value does not appear in a block at all, the value is not stored in the map.

When querying this index, if there are boolean AND or OR operations done on the predicates with bitmap indexes, we can use bitwise operations to try to eliminate blocks as well. We can then eliminate blocks that do not contain the value combinations we are interested in. We can use this data to generate the filename, array of block offsets format that the compact index handler uses and reuse that in the bitmap index query.

可以看到IndexQuery的命令如下所示：

INSERTOVERWRITE DIRECTORY "file:/tmp/allen/hive_2012-03-14_16-16-59_568_3238998201582552709/-mr-10002"

SELECTbucketname AS `_bucketname` , COLLECT_SET(offset) AS `_offsets`

FROM

(SELECT`_bucketname` AS bucketname , `_offset` AS offset

FROM (SELECT * FROM `default__table03_bitmap_index__` WHERE (id =5)) ind0

WHERE NOT EWAH_BITMAP_EMPTY(ind0.`_bitmaps`) //not empty

) tmp_index

GROUP BYbucketname

最后索引文件的结果为：

allen@allen-laptop:~/Desktop/hive-0.8.1$cat/tmp/allen/hive_2012-03-14_16-16-59_568_3238998201582552709/-mr-10002/000000_0

hdfs://localhost:9000/user/hive/warehouse/table03/000000_01760

和Compact索引的文件结果一样：bucketname+ offsets。

再看一下rebuild生成索引表时的脚本：

INSERTOVERWRITE TABLE `default__table03_bitmap_index__` SELECT`id`,INPUT__FILE__NAME,BLOCK__OFFSET__INSIDE__FILE,EWAH_BITMAP(ROW__OFFSET__INSIDE__BLOCK)FROM `table03` GROUP BY INPUT__FILE__NAME,BLOCK__OFFSET__INSIDE__FILE,`id`

基本了解了，

生成的时候将ROW_OFFSET_INSIDE_BLOCK这个结果用EWAH_BITMAP这个UTAF压缩为_bitmaps，

取的时候判断_bitmaps若不为全0，则此行记录不被过滤掉。（使用UDF EWAH_BITMAP_EMPTY）

然后索引是如何作用于第二步的Mapreduce的和CompactIndex如出一辙。

好了，下面就着重看看EAWH_BITMAP（GenericUDAFEWAHBitmap）和GenericUDFEWAHBitmapEmpty

SelectINPUT__FILE__NAME, BLOCK__OFFSET__INSIDE__FILE,ROW__OFFSET__INSIDE__BLOCK fromtest;

报错了

要想了解Bitmap代表了什么，就debug“alter index bitmap_index on test rebuild;”这一句话，看看 INPUT__FILE__NAME,BLOCK__OFFSET__INSIDE__FILE,ROW__OFFSET__INSIDE__BLOCK这几个到底是什么妖魔鬼挂！！！！！！！！！！！！！！！！！！！！！！！！！

至于EWAH，只不过是一种压缩Bitmap的方法，可以不管它先。。。

hive> SelectINPUT__FILE__NAME, BLOCK__OFFSET__INSIDE__FILE,ROW__OFFSET__INSIDE__BLOCK fromtest;

FAILED: Error insemantic analysis: Line 1:54 Invalid table alias or column reference'ROW__OFFSET__INSIDE__BLOCK': (possible column names are: class, name)

hive> sethive.exec.rowoffset;

hive.exec.rowoffset=false

hive>set hive.exec.rowoffset=true;

hive> Select `class`,INPUT__FILE__NAME,BLOCK__OFFSET__INSIDE__FILE,ROW__OFFSET__INSIDE__BLOCK from test;

Total MapReduce jobs= 1

Launching Job 1 outof 1

Number of reducetasks is set to 0 since there's no reduce operator

Starting Job =job_201203181950_0003, Tracking URL =http://localhost:50030/jobdetails.jsp?jobid=job_201203181950_0003

Kill Command =/home/allen/Hadoop/hadoop-0.20.2/bin/../bin/hadoop job -Dmapred.job.tracker=localhost:9001 -killjob_201203181950_0003

Hadoop jobinformation for Stage-1: number of mappers: 1; number of reducers: 0

2012-03-1823:23:33,667 Stage-1 map = 0%, reduce =0%

2012-03-1823:23:36,727 Stage-1 map = 100%, reduce= 0%

2012-03-1823:23:39,797 Stage-1 map = 100%, reduce= 100%

Ended Job =job_201203181950_0003

MapReduce JobsLaunched:

Job 0: Map: 1 HDFS Read: 57 HDFS Write: 453 SUCESS

Total MapReduce CPUTime Spent: 0 msec

1 hdfs://localhost:9000/user/hive/warehouse/test/class-name 0 0

2 hdfs://localhost:9000/user/hive/warehouse/test/class-name 8 0

3 hdfs://localhost:9000/user/hive/warehouse/test/class-name 18 0

2 hdfs://localhost:9000/user/hive/warehouse/test/class-name 27 0

1 hdfs://localhost:9000/user/hive/warehouse/test/class-name 37 0

2 hdfs://localhost:9000/user/hive/warehouse/test/class-name 42 0

1 hdfs://localhost:9000/user/hive/warehouse/test/class-name 51 0

Time taken: 11.465seconds

hive>

对于：

INSERTOVERWRITE TABLE `default__table03_bitmap_index__` SELECT`id`,INPUT__FILE__NAME,BLOCK__OFFSET__INSIDE__FILE,EWAH_BITMAP(ROW__OFFSET__INSIDE__BLOCK)FROM `table03` GROUP BY INPUT__FILE__NAME,BLOCK__OFFSET__INSIDE__FILE,`id`

ROW__OFFSET__INSIDE__BLOCK 对每一条记录来讲都是0，因为一条block就只包含一个row，offset当然是0了。

所以那个吓人的EWAH_EMPTY() UDAF的作用就是将bitset{1}(只有一位position 0，且值为1) 转成了一个很吓唬人的压缩后格式。

而对于：

INSERTOVERWRITE DIRECTORY "file:/tmp/allen/hive_2012-03-14_16-16-59_568_3238998201582552709/-mr-10002"SELECTbucketname AS `_bucketname` , COLLECT_SET(offset) AS `_offsets`FROM(SELECT`_bucketname` AS bucketname , `_offset` AS offsetFROM (SELECT * FROM `default__table03_bitmap_index__` WHERE (id =5)) ind0WHERE NOT EWAH_BITMAP_EMPTY(ind0.`_bitmaps`) //not empty) tmp_indexGROUP BYbucketname

中的EWAH_BITMAP_EMPTY()这一UDF来说，他的意思就是如果这个bitmap全是0的话，返回true；

  @Override   // EWAH_BITMAP_EMPTY()  public Object evaluate(DeferredObject[] arguments) throws HiveException {    assert (arguments.length == 1);    Object b = arguments[0].get();    ListObjectInspector lloi = (ListObjectInspector) bitmapOI;    int length = lloi.getListLength(b);    ArrayList<LongWritable> bitmapArray = new ArrayList<LongWritable>();    for (int i = 0; i < length; i++) {      long l = PrimitiveObjectInspectorUtils.getLong(          lloi.getListElement(b, i),          (PrimitiveObjectInspector) lloi.getListElementObjectInspector());      bitmapArray.add(new LongWritable(l));    }    BitmapObjectInput bitmapObjIn = new BitmapObjectInput(bitmapArray);    EWAHCompressedBitmap bitmap = new EWAHCompressedBitmap();    try {      bitmap.readExternal(bitmapObjIn);    } catch (IOException e) {      throw new RuntimeException(e);    }     // Add return true only if bitmap is all zeros.     return new BooleanWritable(!bitmap.iterator().hasNext());  }

所以这么一说的话，这个where not ewah_bitmap_empty(ind0.`_bitmaps`)肯定都是true的。

那么bitmapindex有何意义呢？

首先它的基本功能和compactindex一样，像“select * from table03 where id=；”这样的查询，会将compactfile的内容变为：

hdfs://localhost:9000/user/hive/warehouse/table03/000000_01760

和Compact索引的文件结果一样：bucketname+ offsets。

接着做实验：

hive> select * from table03 where id=4;
INSERT OVERWRITE DIRECTORY "file:/tmp/allen/hive_2012-03-19_15-48-58_581_2498868060985106973/-mr-10002" SELECT bucketname AS `_bucketname` , COLLECT_SET(offset) AS `_offsets` FROM (SELECT `_bucketname` AS bucketname , `_offset` AS offset FROM (SELECT * FROM `default__table03_bitmap_index__` WHERE (id = 4)) ind0 WHERE NOT EWAH_BITMAP_EMPTY(ind0.`_bitmaps`) ) tmp_index GROUP BY bucketname

hive> select * from table03 where id=4 or id=500000;

居然在BitmapIndexHandler中直接返回了。擦！！！

Map<Index, ExprNodeDesc> indexPredicates  = decomposePredicate(predicate,indexes,queryContext);    if (indexPredicates == null) {      LOG.info("No decomposed predicate found");      queryContext.setQueryTasks(null);      return; // abort if we couldn't pull out anything from the predicate    }

来看看，predicate是predicate:{class org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPEqual(Column[id], Const int 4(),class org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPEqual(Column[id], Const int 500000()}

两个谓词的就不管了？那说好的bitmap索引可以对AND OR 在被索引列上的谓词加以优化的说法去哪儿了？

来看：

ExprNodeDesc residualPredicate = analyzer.analyzePredicate(predicate, searchConditions);public ExprNodeDesc analyzePredicate(    ExprNodeDesc predicate,    final List<IndexSearchCondition> searchConditions) {    Map<Rule, NodeProcessor> opRules = new LinkedHashMap<Rule, NodeProcessor>();    NodeProcessor nodeProcessor = new NodeProcessor() {      @Override      public Object process(Node nd, Stack<Node> stack,        NodeProcessorCtx procCtx, Object... nodeOutputs)        throws SemanticException {        // We can only push down stuff which appears as part of        // a pure conjunction:  reject OR, CASE, etc.

写的很清楚，只支持AND操作！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！//[class org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPOr(class org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPEqual(Column[id], Const int 4(), class org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPEqual(Column[id], Const int 500000()(), class org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPEqual(Column[id], Const int 4(),Column[id]]        for (Node ancestor : stack) {          if (nd == ancestor) {            break;          }          if (!FunctionRegistry.isOpAnd((ExprNodeDesc) ancestor)) {            return nd;          }        }        return analyzeExpr((ExprNodeDesc) nd, searchConditions, nodeOutputs);      }    };    Dispatcher disp = new DefaultRuleDispatcher(      nodeProcessor, opRules, null);    GraphWalker ogw = new DefaultGraphWalker(disp);    ArrayList<Node> topNodes = new ArrayList<Node>();    topNodes.add(predicate);    HashMap<Node, Object> nodeOutput = new HashMap<Node, Object>();    try {      ogw.startWalking(topNodes, nodeOutput);    } catch (SemanticException ex) {      throw new RuntimeException(ex);    }    ExprNodeDesc residualPredicate = (ExprNodeDesc) nodeOutput.get(predicate);    return residualPredicate;  }

于是IndexQuery就是null，就只生成了一个mapreduce，没有优化到位。。。

然后看select * from table03 where id=4 and id=500000;
INSERT OVERWRITE DIRECTORY "file:/tmp/allen/hive_2012-03-19_17-42-44_798_4048884567562256402/-mr-10002"
SELECT bucketname AS `_bucketname` , COLLECT_SET(offset) AS `_offsets` FROM (SELECT `_bucketname` AS bucketname , `_offset` AS offset FROM (SELECT * FROM `default__table03_bitmap_index__`WHERE ((id = 4) and (id = 500000))) ind0 WHERE NOT EWAH_BITMAP_EMPTY(ind0.`_bitmaps`) ) tmp_index GROUP BY bucketname

很傻的很明显没有结果，还是会去做！！！

allen@allen-laptop:~$ cat /tmp/allen/hive_2012-03-19_17-42-44_798_4048884567562256402/-mr-10002/000000_0 allen@allen-laptop:~$

什么都没有。。。不过第三步的Mapred还算不错，什么都没有查，mapper的输入就是0.。。而且根本就没有用到Bitmap！！！bitmap永远就做一件事情：EWAH_BITMAP_EMPTY(ind0.`_bitmaps`) ，而且这个函数永远返回 false......

让我们来看看官方文档上说的：

Example

Suppose we have a bitmap index on a key where, on the first block, value "a" appears in rows 5, 12, and 64, and value "b" appears in rows 7, 8, and 9. Then, for the preliminary implementation, the first entry in the index table will be:

The values in the array represent the bitmap for each block, where each 32-bit BigInt value stores 32 rows.

For the second iteration, the first entry will be:

This one uses 1-byte array entries, so each value in the array stores 8 rows. If an entry is 0x00 or 0xFF, it represents 1 or more consecutive bytes of zeros, (in this case 5 and 4, respectively)