索引易错点:索引与max(),min()

来源:互联网 发布:龙腾网络传奇 编辑:程序博客网 时间:2024/06/13 13:51

前面说完了索引与count(*)的关系,现在来讨论另外一种聚合查询max(),min()与索引的关系,大家觉得这个聚合查询能用的到索引吗?

通过上一小节的学习后,可能有人会回答:“可以用得上,但是索引列必须要建主键或者要写where column is not null就可以用到了。”对于这样的回答应该值得肯定,非常正确!看来前面没白讲了。不过用上了什么样的索引扫描方式呢?上一小节的方式是INDEX FULL SCAN,大家一定有印象,现在如果是要让max()和min()利用上索引,也是走这个INDEX FULL SCAN扫描方式吗?

大家想一想索引的结构是什么样的?索引结构是从root到branch最后到leaf,好象一个金字塔。最下面的叶子层(也就是金字塔的底部)其实是有序的,比如从左到右值是从小到大,或者从大到小。这样一来大家认为取max()或者 min()还需要INDEX FULL SCAN吗,找到头或尾不就找到最大或最小值,还需要遍历leaf吗?

于是ORACLE的另一种索引扫描类型就横空出世了index full scan(max/min)。多了(max/min)的关键字!index full scan(max/min)蕴含着stopkey的机制,从最左边或者最右边的叶子节点开始扫描,读到第一个值后就停止扫描。

查看max()的查询,发现果然是走  INDEX FULL SCAN (MIN/MAX)

SQL> explain plan for select max(object_id) from ljb_test where object_id is not null;

Explained

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT

----------------------------------------------------------------------------------------------------------------------

Plan hash value: 613051030

----------------------------------------------------------------------------------------------------------------------

| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)

----------------------------------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT            |              |     1 |    13 |     2   (0)

|   1 |  SORT AGGREGATE             |              |     1 |    13 |

|   2 |   FIRST ROW                 |              | 49190 |   624K|     2   (0)

|*  3 |    INDEX FULL SCAN (MIN/MAX)| IDX_LJB_TEST | 49190 |   624K|     2   (0)

----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   3 - filter("OBJECT_ID" IS NOT NULL)

Note

   - dynamic sampling used for this statement

19 rows selected

查看min()的查询,发现也走了INDEX FULL SCAN (MIN/MAX)

SQL> explain plan for select min(object_id) from ljb_test where object_id is not null;

Explained

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT

-----------------------------------------------------------------------------------------------------------------------

Plan hash value: 613051030

-----------------------------------------------------------------------------------------------------------------------

| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)

-----------------------------------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT            |              |     1 |    13 |     2   (0)

|   1 |  SORT AGGREGATE             |              |     1 |    13 |

|   2 |   FIRST ROW                 |              | 49190 |   624K|     2   (0)

|*  3 |    INDEX FULL SCAN (MIN/MAX)| IDX_LJB_TEST | 49190 |   624K|     2   (0)

-----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   3 - filter("OBJECT_ID" IS NOT NULL)

Note

   - dynamic sampling used for this statement

19 rows selected

到此大家应该完全明白了max()和min()的时候,执行计划中会走INDEX FULL SCAN (MIN/MAX)的原因了吧,在获取正确的信息后,ORACLE对此类查询自然就会选择这样的扫描方式,希望大家能理解其中选择这样方式扫描的原理!也许有人说,知道这个也没用,ORACLE自己就会选怎么走索引吧,这个NDEX FULL SCAN (MIN/MAX)的知识点知道也没意义。其实我认为,多理解点东西总是有用的,尤其是原理性方面,比如我现在再问这样一个问题:select min(object_id),max(object_id) fromljb_test where object_id is not null 这个语句ORACLE怎么处理?大家怎么回答?

让我实验一下吧(很多人猜还是INDEX FULL SCAN (MIN/MAX)):

下面执行结果出来了,走的索引扫描类型是INDEX FULL SCAN,看不到(MIN/MAX)的关键字,咋回事?

 SQL> explain plan for select min(object_id),max(object_id) from ljb_test where object_id is not null;

 

Explained

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT

-------------------------------------------------------------------------------------------------------------------

Plan hash value: 1341606234

-------------------------------------------------------------------------------------------------------------------

| Id  | Operation             | Name         | Rows  | Bytes | Cost (%CPU)| Time

-------------------------------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT      |              |     1 |    13 |    61   (4)| 00:0

|   1 |  SORT AGGREGATE       |              |     1 |    13 |            |

|*  2 |   INDEX FAST FULL SCAN| IDX_LJB_TEST | 49190 |   624K|    61   (4)| 00:0

-------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   2 - filter("OBJECT_ID" IS NOT NULL)

Note

-----

   - dynamic sampling used for this statement

18 rows selected

原来这样的SQL语句是表示ORACLE要利用该索引同时取到这两个值,INDEX FULL SCAN (MIN/MAX)是无法一次取到两个值的,所以ORACLE不得不选择了INDEX FULL SCAN ,把叶子的索引扫了个遍,同时取到了两个值。

明白了原理,处理起问题就简单了,改写代码如下:

select (select max(object_id) from test1) c, (select min(object_id) from test1) b from dual;

现在终于走了INDEX FULL SCAN (MIN/MAX)索引了,大家看到这个INDEX FULL SCAN (MIN/MAX)威力还是非常大的,走了两次INDEX FULL SCAN (MIN/MAX),居然代价才4,远远低于一次INDEX FULL SCAN的代价61

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT

-----------------------------------------------------------------------------------------------------------------------------------------

Plan hash value: 3189180828

-----------------------------------------------------------------------------------------------------------------------------------------

| Id  | Operation                     | Name         | Rows  | Bytes | Cost (%CPU)| Time     |

-----------------------------------------------------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT              |              |     1 |    26 |     4   (0)| 00:00:01 |

|   1 |  NESTED LOOPS                 |              |     1 |    26 |     4   (0)| 00:00:01 |

|   2 |   VIEW                        |              |     1 |    13 |     2   (0)| 00:00:01 |

|   3 |    SORT AGGREGATE             |              |     1 |    13 |            |          |

|   4 |     FIRST ROW                 |              | 49190 |   624K|     2   (0)| 00:00:01 |

|*  5 |      INDEX FULL SCAN (MIN/MAX)| IDX_LJB_TEST | 49190 |   624K|     2   (0)| 00:00:01 |

|   6 |   VIEW                        |              |     1 |    13 |     2   (0)| 00:00:01 |

|   7 |    SORT AGGREGATE             |              |     1 |    13 |            |          |

|   8 |     FIRST ROW                 |              | 49190 |   624K|     2   (0)| 00:00:01 |

|*  9 |      INDEX FULL SCAN (MIN/MAX)| IDX_LJB_TEST | 49190 |   624K|     2   (0)| 00:00:01 |

-----------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   5 - filter("OBJECT_ID" IS NOT NULL)

   9 - filter("OBJECT_ID" IS NOT NULL)

PLAN_TABLE_OUTPUT

----------------------------------------------------------------------------------

Note

   - dynamic sampling used for this statement

总结:max() 和 min() 是大家常用的使用频率很高的sql写法,计费项目各种报表中需要这样编写的地方比比皆是!希望大家能对这样的查询建立索引,在保证该列不空的情况下,就有可能利用到INDEX FULL SCAN (MIN/MAX)这个索引扫描方式,能为查询性能带来很大的提高,另外只要善于思考,还可以通过改写SQL的方式,将原本利用不到INDEX FULL SCAN (MIN/MAX)查询方式的语句select min(object_id),max(object_id) from ljb_test where object_id is not null改造后,利用上INDEX FULL SCAN (MIN/MAX)。希望这个能启发开发人员多利用现有的SQL知识,编写出高效的SQL语句。

引申联想:大家记得前面我有提到index full scan(max/min)蕴含着stopkey的机制,有优化基础的朋友一定认识这个stopkey,经常在分页查询的执行计划中,看到有这样的关键字,基本上可以认为这个查询的执行计划是正确的。

比如select * from (select * from table where id= order by name desc) where rownum<11;这样的语句具体的意思就是id为某个值的时候,根据name做排序,然后取前10行.这个语句存在2个部分:id为某个值,name降序。假设我现在存在这一个索引(id,name desc)这个索引的结构也是id相同的情况下按照name的降序排列,这个索引同时满足前面的两个条件,因此就能提高速度,只要从索引中读取出10个rowid,然后根据这10个rowid来回表,这时候速度肯定很快的,因此类似这类的分页语句可以根据sql语句的原意来创建索引,就能提高速度,但是如果where条件里出现非等于的条件,那么不管怎么创建索引都无法满足前面的2个条件(根据索引的结构就很容易明白这点),就必须根据字段的选择性来创建合适的索引.

原创粉丝点击