详解oracle bitmap位图索引

来源:互联网 发布:龙薇传媒 知乎 编辑:程序博客网 时间:2024/05/22 15:41

位图索引是oracle中非常重要的一种索引形式。本文通过总结有关位图索引的资料,尝试回答如下几个问题:

1:什么是位图索引?

2:位图索引适合什么场景,不适合什么场景?

3:位图索引的性能如何?


什么是位图索引?

位图索引,顾名思义,与“位”有关。大家都知道,计算机中的所有信息最终都是通过“位bit”来运算的, 二进制位运算在计算机中是非常高效的。每一个二进制位都可以取值0或者1,而取值的确切含义是由具体的上下文环境决定的。在oracle位图索引中,每一个二进制位代表了某一行中索引列的取值情况。例如,学生表中性别列的位图索引结构如下:

男:0101001101

女:1010110010

在上面的位图结构中,存储了10条学生记录的性别分布情况,以“男”性别为例,从左到右的第n个二进制位代表了第n条记录是否性别为男,如果二进制位为1,代表true即性别为男,0代表false即性别不为男。以此类推,从图中可以看出,第一条记录的性别为女,第二条记录的性别为男,...第九条记录的性别为女,第十条记录的性别为男。

大家都知道,在oracle中是根据rowid来定位记录的,因此,我们需要引入start rowid和end rowid,通过start rowid ,end rowid 和二进制位的偏移,我们就可以非常快速的计算出二进制位所代表的表记录rowid。位图索引的最终逻辑结构如下图:


位图索引适合什么场景,不适合什么场景?

现在我们已经了解了位图索引的逻辑结构,我们称每一单元的<key ,startrowid,end rowid,bitmap>为一个位图片段。当我们修改某一行数据的时候,我们需要锁定该行列值所对应的位图片段,如果我们进行的是更新操作,同时还会锁定更新后新值所在的位图片段。例如我们将列值从01修改为03,就需要同时锁定01和03位图片段,此时如果有其他用户需要修改与01或者03关联的表记录上的索引字段,就会被阻塞,因此位图索引不适合并发环境,在并发环境下可能会造成大量事务的阻塞。

从位图索引的逻辑结构也可以看出,当索引列的distinct cardinality较大时,索引所占用的存储空间也会成比例扩大。下面我们测试一下位图索引占用空间与distinct cardinality的关系:
数据库环境:oracle 11g  
数据块大小:8k
CREATE or replace FUNCTION ind_spc_test(rn NUMBER) RETURN NUMBER  AS      v_j     NUMBER;      v_dis   NUMBER;      v_bm_sp NUMBER;      v_bt_sp NUMBER;    BEGIN      FOR i IN 1 .. 10LOOP        EXECUTE immediate 'truncate table t_easy1';        EXECUTE immediate 'truncate table t_easy2';        SELECT floor(rn/(11-i)) INTO v_j FROM dual;        FOR j IN 1 .. rn LOOP          INSERT INTO t_easy1 VALUES (mod(j,v_j));          INSERT INTO t_easy2 VALUES (mod(j,v_j));        END LOOP;        commit;        select count(distinct id) into v_j from t_easy1;        EXECUTE immediate 'analyze index i_easy1 COMPUTE STATISTICS';        SELECT lEAF_BLOCKS INTO v_bt_sp FROM user_indexes where index_name='I_EASY1';        EXECUTE immediate 'analyze index i_easy2 COMPUTE STATISTICS';        SELECT LEAF_BLOCKS INTO v_bm_sp FROM user_indexes where index_name='I_EASY2';        INSERT INTO bitmap_ind_space VALUES (v_j,v_bm_sp,v_bt_sp,rn );        COMMIT;      END LOOP;      RETURN 0;    END;

SQL> select * from bitmap_ind_space order by 1;DISTINCT_VAL BITMAP_IND_BLKS BTREE_IND_BLKS    ROW_NUM------------ --------------- -------------- ----------       10000 139300100000       11111  79335100000       12500  89285100000       14285 103220100000       16666 120257100000       20000 146310100000       25000 183293100000       33333 246262100000       50000 371296100000      100000 408200100000

这里的测试比较简单,下面看看大师的实验结果:


从这里可以看出,随着distinct columns值的增加,位图索引占用空间逐步增大,但即便在最坏的情况下,位图索引占用的空间也仅仅是普通索引的2~3倍,在存储日益普遍的今天,这恐怕并不是很大的问题。

位图索引的查询性能如何?

下面我们看一下位图索引的查询性能如何。
在很多资料中,都可以看到这样的论述:位图索引适合于 low distict cardinality的列。实际上,对于high distinct cardinality 的列,位图索引的查询性能也是非常不错的。下面我们来验证这个结论。
首先我们创建两张表:emp_normal和emp_random.
SQL> create table emp_normal(empno number(10), ename varchar2(30), sal number(10));表已创建。BeginFor i in 1..1000000Loop   Insert into emp_normal    values(i, dbms_random.string('U',30), dbms_random.value(1000,7000));   If mod(i, 10000) = 0 then   Commit;  End if;End loop; 10  End; 11  /PL/SQL 过程已成功完成。SQL> create table emp_random as select /* +append */ * from emp_normal order by dbms_random.random;
emp_random由于其记录是随机分布的,因此该表上索引的CLUSTERING_FACTOR要高一些。
我们首先看一下emp_normal表等值查询情况下,索引的效率如何:
SQL> create bitmap index bm_normal on emp_normal(empno);索引已创建。SQL> analyze table emp_normal compute statistics for table for all indexes for all indexed columns;表已分析。SQL> select index_name,clustering_factor from user_indexes;INDEX_NAME       CLUSTERING_FACTOR------------------------------ -----------------BM_NORMAL 1000000SQL> set autot traceonlySQL> select * from emp_normal where empno=&empno;输入 empno 的值:  1000原值    1: select * from emp_normal where empno=&empno新值    1: select * from emp_normal where empno=1000执行计划----------------------------------------------------------Plan hash value: 1526426521-------------------------------------------------------------------------------------------| Id  | Operation     | Name  | Rows  | Bytes | Cost (%CPU)| Time  |-------------------------------------------------------------------------------------------|   0 | SELECT STATEMENT     |  |1 |    34 |3   (0)| 00:00:01 ||   1 |  TABLE ACCESS BY INDEX ROWID | EMP_NORMAL |1 |    34 |3   (0)| 00:00:01 ||   2 |   BITMAP CONVERSION TO ROWIDS|  |  |  |       |  ||*  3 |    BITMAP INDEX SINGLE VALUE | BM_NORMAL  |  |  |       |  |-------------------------------------------------------------------------------------------Predicate Information (identified by operation id):---------------------------------------------------   3 - access("EMPNO"=1000)统计信息----------------------------------------------------------  1  recursive calls  0  db block gets  5  consistent gets  0  physical reads  0  redo size702  bytes sent via SQL*Net to client520  bytes received via SQL*Net from client  2  SQL*Net roundtrips to/from client  0  sorts (memory)  0  sorts (disk)  1  rows processedSQL> select * from emp_normal where empno=&empno;输入 empno 的值:  2398原值    1: select * from emp_normal where empno=&empno新值    1: select * from emp_normal where empno=2398执行计划----------------------------------------------------------Plan hash value: 1526426521-------------------------------------------------------------------------------------------| Id  | Operation     | Name  | Rows  | Bytes | Cost (%CPU)| Time  |-------------------------------------------------------------------------------------------|   0 | SELECT STATEMENT     |  |1 |    34 |3   (0)| 00:00:01 ||   1 |  TABLE ACCESS BY INDEX ROWID | EMP_NORMAL |1 |    34 |3   (0)| 00:00:01 ||   2 |   BITMAP CONVERSION TO ROWIDS|  |  |  |       |  ||*  3 |    BITMAP INDEX SINGLE VALUE | BM_NORMAL  |  |  |       |  |-------------------------------------------------------------------------------------------Predicate Information (identified by operation id):---------------------------------------------------   3 - access("EMPNO"=2398)统计信息----------------------------------------------------------  1  recursive calls  0  db block gets  5  consistent gets  0  physical reads  0  redo size703  bytes sent via SQL*Net to client520  bytes received via SQL*Net from client  2  SQL*Net roundtrips to/from client  0  sorts (memory)  0  sorts (disk)  1  rows processedSQL> select * from emp_normal where empno=&empno;输入 empno 的值:  8545原值    1: select * from emp_normal where empno=&empno新值    1: select * from emp_normal where empno=8545执行计划----------------------------------------------------------Plan hash value: 1526426521-------------------------------------------------------------------------------------------| Id  | Operation     | Name  | Rows  | Bytes | Cost (%CPU)| Time  |-------------------------------------------------------------------------------------------|   0 | SELECT STATEMENT     |  |1 |    34 |3   (0)| 00:00:01 ||   1 |  TABLE ACCESS BY INDEX ROWID | EMP_NORMAL |1 |    34 |3   (0)| 00:00:01 ||   2 |   BITMAP CONVERSION TO ROWIDS|  |  |  |       |  ||*  3 |    BITMAP INDEX SINGLE VALUE | BM_NORMAL  |  |  |       |  |-------------------------------------------------------------------------------------------Predicate Information (identified by operation id):---------------------------------------------------   3 - access("EMPNO"=8545)统计信息----------------------------------------------------------  1  recursive calls  0  db block gets  5  consistent gets  0  physical reads  0  redo size703  bytes sent via SQL*Net to client520  bytes received via SQL*Net from client  2  SQL*Net roundtrips to/from client  0  sorts (memory)  0  sorts (disk)  1  rows processedSQL> select * from emp_normal where empno=&empno;输入 empno 的值:  128444原值    1: select * from emp_normal where empno=&empno新值    1: select * from emp_normal where empno=128444执行计划----------------------------------------------------------Plan hash value: 1526426521-------------------------------------------------------------------------------------------| Id  | Operation     | Name  | Rows  | Bytes | Cost (%CPU)| Time  |-------------------------------------------------------------------------------------------|   0 | SELECT STATEMENT     |  |1 |    34 |3   (0)| 00:00:01 ||   1 |  TABLE ACCESS BY INDEX ROWID | EMP_NORMAL |1 |    34 |3   (0)| 00:00:01 ||   2 |   BITMAP CONVERSION TO ROWIDS|  |  |  |       |  ||*  3 |    BITMAP INDEX SINGLE VALUE | BM_NORMAL  |  |  |       |  |-------------------------------------------------------------------------------------------Predicate Information (identified by operation id):---------------------------------------------------   3 - access("EMPNO"=128444)统计信息----------------------------------------------------------  1  recursive calls  0  db block gets  5  consistent gets  0  physical reads  0  redo size704  bytes sent via SQL*Net to client520  bytes received via SQL*Net from client  2  SQL*Net roundtrips to/from client  0  sorts (memory)  0  sorts (disk)  1  rows processedSQL> drop index bm_normal;索引已删除。SQL> create index bt_normal on emp_normal(empno);索引已创建。SQL> analyze table emp_normal compute statistics for table for all indexes for all indexed columns;表已分析。SQL> select index_name,clustering_factor from user_indexes;INDEX_NAME       CLUSTERING_FACTOR------------------------------ -----------------BT_NORMAL    6210SYS_IL0000076897C00002$$PK_EMP       1PK_DEPT        1SQL> set autot traceonlySQL> select * from emp_normal where empno=&empno;输入 empno 的值:  1000原值    1: select * from emp_normal where empno=&empno新值    1: select * from emp_normal where empno=1000执行计划----------------------------------------------------------Plan hash value: 733975378------------------------------------------------------------------------------------------| Id  | Operation    | Name | Rows  | Bytes | Cost (%CPU)| Time |------------------------------------------------------------------------------------------|   0 | SELECT STATEMENT    | |     1 |    34 |     4   (0)| 00:00:01 ||   1 |  TABLE ACCESS BY INDEX ROWID| EMP_NORMAL |     1 |    34 |     4   (0)| 00:00:01 ||*  2 |   INDEX RANGE SCAN    | BT_NORMAL  |     1 | |     3   (0)| 00:00:01 |------------------------------------------------------------------------------------------Predicate Information (identified by operation id):---------------------------------------------------   2 - access("EMPNO"=1000)统计信息----------------------------------------------------------  1  recursive calls  0  db block gets  5  consistent gets  0  physical reads  0  redo size702  bytes sent via SQL*Net to client520  bytes received via SQL*Net from client  2  SQL*Net roundtrips to/from client  0  sorts (memory)  0  sorts (disk)  1  rows processedSQL> select * from emp_normal where empno=&empno;输入 empno 的值:  128444原值    1: select * from emp_normal where empno=&empno新值    1: select * from emp_normal where empno=128444执行计划----------------------------------------------------------Plan hash value: 733975378------------------------------------------------------------------------------------------| Id  | Operation    | Name | Rows  | Bytes | Cost (%CPU)| Time |------------------------------------------------------------------------------------------|   0 | SELECT STATEMENT    | |     1 |    34 |     4   (0)| 00:00:01 ||   1 |  TABLE ACCESS BY INDEX ROWID| EMP_NORMAL |     1 |    34 |     4   (0)| 00:00:01 ||*  2 |   INDEX RANGE SCAN    | BT_NORMAL  |     1 | |     3   (0)| 00:00:01 |------------------------------------------------------------------------------------------Predicate Information (identified by operation id):---------------------------------------------------   2 - access("EMPNO"=128444)统计信息----------------------------------------------------------  1  recursive calls  0  db block gets  5  consistent gets  0  physical reads  0  redo size704  bytes sent via SQL*Net to client520  bytes received via SQL*Net from client  2  SQL*Net roundtrips to/from client  0  sorts (memory)  0  sorts (disk)  1  rows processedSQL> select * from emp_normal where empno=&empno;输入 empno 的值:  2398原值    1: select * from emp_normal where empno=&empno新值    1: select * from emp_normal where empno=2398执行计划----------------------------------------------------------Plan hash value: 733975378------------------------------------------------------------------------------------------| Id  | Operation    | Name | Rows  | Bytes | Cost (%CPU)| Time |------------------------------------------------------------------------------------------|   0 | SELECT STATEMENT    | |     1 |    34 |     4   (0)| 00:00:01 ||   1 |  TABLE ACCESS BY INDEX ROWID| EMP_NORMAL |     1 |    34 |     4   (0)| 00:00:01 ||*  2 |   INDEX RANGE SCAN    | BT_NORMAL  |     1 | |     3   (0)| 00:00:01 |------------------------------------------------------------------------------------------Predicate Information (identified by operation id):---------------------------------------------------   2 - access("EMPNO"=2398)统计信息----------------------------------------------------------  1  recursive calls  0  db block gets  5  consistent gets  0  physical reads  0  redo size703  bytes sent via SQL*Net to client520  bytes received via SQL*Net from client  2  SQL*Net roundtrips to/from client  0  sorts (memory)  0  sorts (disk)  1  rows processed
总结如下:
BITMAPEMPNOB-TREEConsistent ReadsPhysical ReadsConsistent ReadsPhysical Reads50100050502398505085455050980085050853425050128444505085850

对emp_random表进行实验,得出的结果与之类似,这里不再獒述。从这里可以看出,在唯一列上的等值查询,位图索引与btree索引的效率相当。

下面,我们在针对范围查询来进行测试。

SQL> create bitmap index bm_random  on emp_random(empno);索引已创建。SQL> analyze table emp_random compute statistics for table for all indexes for all columns;表已分析。SQL> select index_name,clustering_factor from user_indexes;INDEX_NAME       CLUSTERING_FACTOR------------------------------ -----------------BM_RANDOM 1000000SQL> set autot traceonlySQL> select * from emp_random where empno between &range1 and &range2;输入 range1 的值:  1输入 range2 的值:  2300原值    1: select * from emp_random where empno between &range1 and &range2新值    1: select * from emp_random where empno between 1 and 2300已选择2300行。执行计划----------------------------------------------------------Plan hash value: 811843605-------------------------------------------------------------------------------------------| Id  | Operation     | Name  | Rows  | Bytes | Cost (%CPU)| Time  |-------------------------------------------------------------------------------------------|   0 | SELECT STATEMENT     |  |  2299 | 85063 |   418   (1)| 00:00:06 ||   1 |  TABLE ACCESS BY INDEX ROWID | EMP_RANDOM |  2299 | 85063 |   418   (1)| 00:00:06 ||   2 |   BITMAP CONVERSION TO ROWIDS|  |  |  |       |  ||*  3 |    BITMAP INDEX RANGE SCAN   | BM_RANDOM  |  |  |       |  |-------------------------------------------------------------------------------------------Predicate Information (identified by operation id):---------------------------------------------------   3 - access("EMPNO">=1 AND "EMPNO"<=2300)统计信息----------------------------------------------------------  1  recursive calls  0  db block gets       2463  consistent gets  0  physical reads  0  redo size     130225  bytes sent via SQL*Net to client       2203  bytes received via SQL*Net from client155  SQL*Net roundtrips to/from client  0  sorts (memory)  0  sorts (disk)       2300  rows processedSQL> select * from emp_random where empno between &range1 and &range2;输入 range1 的值:  8输入 range2 的值:  1980原值    1: select * from emp_random where empno between &range1 and &range2新值    1: select * from emp_random where empno between 8 and 1980已选择1973行。执行计划----------------------------------------------------------Plan hash value: 811843605-------------------------------------------------------------------------------------------| Id  | Operation     | Name  | Rows  | Bytes | Cost (%CPU)| Time  |-------------------------------------------------------------------------------------------|   0 | SELECT STATEMENT     |  |  1972 | 72964 |   366   (0)| 00:00:05 ||   1 |  TABLE ACCESS BY INDEX ROWID | EMP_RANDOM |  1972 | 72964 |   366   (0)| 00:00:05 ||   2 |   BITMAP CONVERSION TO ROWIDS|  |  |  |       |  ||*  3 |    BITMAP INDEX RANGE SCAN   | BM_RANDOM  |  |  |       |  |-------------------------------------------------------------------------------------------Predicate Information (identified by operation id):---------------------------------------------------   3 - access("EMPNO">=8 AND "EMPNO"<=1980)统计信息----------------------------------------------------------  1  recursive calls  0  db block gets       2114  consistent gets  0  physical reads  0  redo size     111758  bytes sent via SQL*Net to client       1961  bytes received via SQL*Net from client133  SQL*Net roundtrips to/from client  0  sorts (memory)  0  sorts (disk)       1973  rows processedSQL> select * from emp_random where empno between &range1 and &range2;输入 range1 的值:  28888输入 range2 的值:  31850原值    1: select * from emp_random where empno between &range1 and &range2新值    1: select * from emp_random where empno between 28888 and 31850已选择2963行。执行计划----------------------------------------------------------Plan hash value: 811843605-------------------------------------------------------------------------------------------| Id  | Operation     | Name  | Rows  | Bytes | Cost (%CPU)| Time  |-------------------------------------------------------------------------------------------|   0 | SELECT STATEMENT     |  |  2962 |   107K|   513   (0)| 00:00:07 ||   1 |  TABLE ACCESS BY INDEX ROWID | EMP_RANDOM |  2962 |   107K|   513   (0)| 00:00:07 ||   2 |   BITMAP CONVERSION TO ROWIDS|  |  |  |       |  ||*  3 |    BITMAP INDEX RANGE SCAN   | BM_RANDOM  |  |  |       |  |-------------------------------------------------------------------------------------------Predicate Information (identified by operation id):---------------------------------------------------   3 - access("EMPNO">=28888 AND "EMPNO"<=31850)统计信息----------------------------------------------------------  1  recursive calls  0  db block gets       3172  consistent gets  0  physical reads  0  redo size     170625  bytes sent via SQL*Net to client       2687  bytes received via SQL*Net from client199  SQL*Net roundtrips to/from client  0  sorts (memory)  0  sorts (disk)       2963  rows processedSQL> drop index bm_random;索引已删除。SQL> create index bt_random on emp_random(empno);索引已创建。SQL> analyze table emp_random compute statistics for table for all indexes for all columns;表已分析。SQL> set autot offSQL> select index_name,clustering_factor from user_indexes;INDEX_NAME       CLUSTERING_FACTOR------------------------------ -----------------BT_RANDOM  999834SQL> set autot traceonlySQL> select * from emp_random where empno between &range1 and &range2;输入 range1 的值:  1输入 range2 的值:  2300原值    1: select * from emp_random where empno between &range1 and &range2新值    1: select * from emp_random where empno between 1 and 2300已选择2300行。执行计划----------------------------------------------------------Plan hash value: 731629521--------------------------------------------------------------------------------| Id  | Operation  | Name       | Rows  | Bytes | Cost (%CPU)| Time     |--------------------------------------------------------------------------------|   0 | SELECT STATEMENT  |       |  2299 | 85063 |  1735 (1)| 00:00:21 ||*  1 |  TABLE ACCESS FULL| EMP_RANDOM |  2299 | 85063 |  1735 (1)| 00:00:21 |--------------------------------------------------------------------------------Predicate Information (identified by operation id):---------------------------------------------------   1 - filter("EMPNO"<=2300 AND "EMPNO">=1)统计信息----------------------------------------------------------  1  recursive calls  0  db block gets       6410  consistent gets  0  physical reads  0  redo size     121081  bytes sent via SQL*Net to client       2203  bytes received via SQL*Net from client155  SQL*Net roundtrips to/from client  0  sorts (memory)  0  sorts (disk)       2300  rows processedSQL> select * from emp_random where empno between &range1 and &range2;输入 range1 的值:  8输入 range2 的值:  1980原值    1: select * from emp_random where empno between &range1 and &range2新值    1: select * from emp_random where empno between 8 and 1980已选择1973行。执行计划----------------------------------------------------------Plan hash value: 731629521--------------------------------------------------------------------------------| Id  | Operation  | Name       | Rows  | Bytes | Cost (%CPU)| Time     |--------------------------------------------------------------------------------|   0 | SELECT STATEMENT  |       |  1972 | 72964 |  1735 (1)| 00:00:21 ||*  1 |  TABLE ACCESS FULL| EMP_RANDOM |  1972 | 72964 |  1735 (1)| 00:00:21 |--------------------------------------------------------------------------------Predicate Information (identified by operation id):---------------------------------------------------   1 - filter("EMPNO"<=1980 AND "EMPNO">=8)统计信息----------------------------------------------------------  1  recursive calls  0  db block gets       6388  consistent gets  0  physical reads  0  redo size     103922  bytes sent via SQL*Net to client       1961  bytes received via SQL*Net from client133  SQL*Net roundtrips to/from client  0  sorts (memory)  0  sorts (disk)       1973  rows processed

归纳如下,

BITMAPEMPNO (Range)B-TREEConsistent ReadsPhysical ReadsConsistent ReadsPhysical Reads246301-230064100211408-198063880257201850-4250641803172028888-31850645602762082900-854786431072540984888-100000072540

从这里可以看出,位图索引要优于btree索引,这是因为btree索引的cluster  factor 较大,从而优化器选择了全表扫描。即便在emp_normal 表下,即clustering factor较小时,位图索引btree索引相当的。因此在distinct cardinality 较大的情况下,范围扫描的效率位图索引也是不逊色与btree索引。

总结如下:

位图索引的查询性能经常是优于btree索引的,即便在distinct cardinality较大的情况下

位图索引不适合与dml频繁的环境

位图索引适用于DSS系统

位图索引可以进行逻辑运算,多个索引和同时在查询语句中发挥作用,这是一个非常重要的地方



原创粉丝点击