(4)选择率(selectivity)的相关问题
来源:互联网 发布:内衣取名知乎 编辑:程序博客网 时间:2024/05/24 02:37
1、前导零
创建一个包含2000000行数据的表,它有一个id列,采用序号并利用0来进行填充其他位。因此典型的存储值将是A00000000000000001,系统中绝大部分使用这种策略的查询,类似于where id={string constant}的形式;但是,如果它们使用了基于区间的谓词,可能将出现一些奇怪的性能问题。
SQL> create table t1 2 nologging 3 pctfree 0 4 as 5 with generator as ( 6 select 7 rownum id 8 from all_objects 9 where rownum <= 2000 10 ) 11 select 12 /*+ ordered use_nl(v2) */ 13 trunc((rownum-1)/10000) grp_id, 14 'A' || lpad(rownum, 17, '0') id 15 from 16 generator v1, 17 generator v2 18 where 19 rownum <= 2000000 20 ;表已创建。SQL> begin 2 dbms_stats.gather_table_stats( 3 ownname=> user, 4 tabname=> 't1', 5 cascade=> true, 6 estimate_percent=> null, 7 method_opt=>'for all columns size 1' 8 ); 9 end; 10 /PL/SQL 过程已成功完成。
SQL> set autotrace traceonly;SQL> select * from t1 where id between 'A00000000000060000' and 'A00000000000070000';已选择10001行。执行计划----------------------------------------------------------Plan hash value: 3617692013--------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |--------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 17 | 374 | 1190 (3)| 00:00:15 ||* 1 | TABLE ACCESS FULL| T1 | 17 | 374 | 1190 (3)| 00:00:15 |--------------------------------------------------------------------------Predicate Information (identified by operation id):--------------------------------------------------- 1 - filter("ID"<='A00000000000070000' AND "ID">='A00000000000060000')统计信息---------------------------------------------------------- 1 recursive calls 0 db block gets 7520 consistent gets 6849 physical reads 0 redo size 325111 bytes sent via SQL*Net to client 7741 bytes received via SQL*Net from client 668 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 10001 rows processed
针对该查询得到的基数是非常低的(17),但是很明显却返回了10001行。下面重新创建直方图,直方图的默认值为75个桶。
SQL> set autotrace off;SQL> begin 2 dbms_stats.gather_table_stats( 3 ownname=> user, 4 tabname=> 't1', 5 cascade=> true, 6 estimate_percent=> null, 7 method_opt=>'for all columns size 75' 8 ); 9 end; 10 /PL/SQL 过程已成功完成。SQL> set autotrace traceonly;SQL> select * from t1 where id between 'A00000000000060000' and 'A00000000000070000';已选择10001行。执行计划----------------------------------------------------------Plan hash value: 3617692013--------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |--------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 8924 | 191K| 1190 (3)| 00:00:15 ||* 1 | TABLE ACCESS FULL| T1 | 8924 | 191K| 1190 (3)| 00:00:15 |--------------------------------------------------------------------------Predicate Information (identified by operation id):--------------------------------------------------- 1 - filter("ID"<='A00000000000070000' AND "ID">='A00000000000060000')统计信息---------------------------------------------------------- 0 recursive calls 0 db block gets 7520 consistent gets 6849 physical reads 0 redo size 325111 bytes sent via SQL*Net to client 7741 bytes received via SQL*Net from client 668 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 10001 rows processed
直方图的默认值为75个桶,执行计划显示估计的基数为8924——这个结果至少大体上是正确的。创建直方图是一种方法,如果id列的值没有字符,是可以转换为数字的字符串(00000000000000001),那么在id列上面创建一个函数索引也是一种办法:create index t1_i1 on t1(grp_id, to_number(id));
2、致命的默认值
即时是数据库应用程序正确的使用了日期类型,也仍然需要避免null值。为了不让任何列为null,每个可空的列都有一个默认值。因此,大部分独立于数据库的开发人员会选择一个什么样的值来表示null日期呢?如果表示很久以后的日期呢?比如4000年12月31日。
SQL> create table t1 2 as 3 with generator as ( 4 select 5 rownum id 6 from all_objects 7 where rownum <= 2000 8 ) 9 select 10 /*+ ordered use_nl(v2) */ 11 decode( 12 mod(rownum - 1,1000), 13 0,to_date('4000-12-31','yyyy-mm-dd'), 14 to_date('2000-01-01','yyyy-mm-dd') + trunc((rownum - 1)/100) 15 ) date_closed 16 from 17 generator v1, 18 generator v2 19 where rownum <= 1827 * 100;表已创建。SQL> begin 2 dbms_stats.gather_table_stats( 3 ownname=> user, 4 tabname=> 't1', 5 cascade=> true, 6 estimate_percent=> null, 7 method_opt=>'for all columns size 1' --直方图的默认值为1桶。 8 ); 9 end; 10 /PL/SQL 过程已成功完成。SQL> set autotrace traceonly;SQL> select * 2 from t1 3 where date_closed between to_date('2003-01-01', 'yyyy-mm-dd') and 4 to_date('2003-12-31', 'yyyy-mm-dd');已选择36463行。执行计划----------------------------------------------------------Plan hash value: 3617692013--------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |--------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 291 | 2328 | 61 (4)| 00:00:01 ||* 1 | TABLE ACCESS FULL| T1 | 291 | 2328 | 61 (4)| 00:00:01 |--------------------------------------------------------------------------Predicate Information (identified by operation id):--------------------------------------------------- 1 - filter("DATE_CLOSED"<=TO_DATE(' 2003-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "DATE_CLOSED">=TO_DATE(' 2003-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))统计信息---------------------------------------------------------- 1 recursive calls 0 db block gets 2759 consistent gets 0 physical reads 0 redo size 494301 bytes sent via SQL*Net to client 27145 bytes received via SQL*Net from client 2432 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 36463 rows processed --这里实际放回了36463行记录,但是oracle却计算错误了(291)。SQL> set autotrace off;SQL> begin 2 dbms_stats.gather_table_stats( 3 ownname=> user, 4 tabname=> 't1', 5 cascade=> true, 6 estimate_percent=> null, 7 method_opt=>'for all columns size 11' --直方图的默认值为11桶。 8 ); 9 end; 10 /PL/SQL 过程已成功完成。SQL> set autotrace traceonly;SQL> select * 2 from t1 3 where date_closed between to_date('2003-01-01', 'yyyy-mm-dd') and 4 to_date('2003-12-31', 'yyyy-mm-dd');已选择36463行。执行计划----------------------------------------------------------Plan hash value: 3617692013--------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |--------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 36320 | 283K| 61 (4)| 00:00:01 ||* 1 | TABLE ACCESS FULL| T1 | 36320 | 283K| 61 (4)| 00:00:01 |--------------------------------------------------------------------------Predicate Information (identified by operation id):--------------------------------------------------- 1 - filter("DATE_CLOSED">=TO_DATE(' 2003-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "DATE_CLOSED"<=TO_DATE(' 2003-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))统计信息---------------------------------------------------------- 0 recursive calls 0 db block gets 2759 consistent gets 0 physical reads 0 redo size 494301 bytes sent via SQL*Net to client 27145 bytes received via SQL*Net from client 2432 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 36463 rows processed --36463:36320这次oracle计算得比较准确了。
可以看见加大了直方图的桶数之后,CBO估算的行数就比较接近真实值了,那我们再加大直方图的桶数试一下呐!
SQL> set autotrace off;SQL> begin 2 dbms_stats.gather_table_stats( 3 ownname=> user, 4 tabname=> 't1', 5 cascade=> true, 6 estimate_percent=> null, 7 method_opt=>'for all columns size 75' 8 ); 9 end; 10 /PL/SQL 过程已成功完成。SQL> set autotrace trace;SQL> select * 2 from t1 3 where date_closed between to_date('2003-01-01', 'yyyy-mm-dd') and 4 to_date('2003-12-31', 'yyyy-mm-dd');已选择36463行。执行计划----------------------------------------------------------Plan hash value: 3617692013--------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |--------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 36345 | 283K| 61 (4)| 00:00:01 ||* 1 | TABLE ACCESS FULL| T1 | 36345 | 283K| 61 (4)| 00:00:01 |--------------------------------------------------------------------------Predicate Information (identified by operation id):--------------------------------------------------- 1 - filter("DATE_CLOSED">=TO_DATE(' 2003-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "DATE_CLOSED"<=TO_DATE(' 2003-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))统计信息---------------------------------------------------------- 0 recursive calls 0 db block gets 2759 consistent gets 0 physical reads 0 redo size 494301 bytes sent via SQL*Net to client 27145 bytes received via SQL*Net from client 2432 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 36463 rows processed --36463:36345 加大了直方图的桶数之后,CBO估算返回的行数误差更小了。
3、离散数据的风险
考虑一个包含period列的计数系统——存储1~12月的数据,并额外再加一个月,其对应的数值为99(同时包含了第二种选择,即这个特殊的月给定的值为13)。
SQL> create table t1 2 as 3 with generator as ( 4 select 5 rownum id 6 from all_objects 7 where rownum <= 1000 8 ) 9 select 10 /*+ ordered use_nl(v2) */ 11 mod(rownum-1,13) period_01, 12 mod(rownum-1,13) period_02 13 from 14 generator v1, 15 generator v2 16 where 17 rownum <= 13000 18 ;表已创建。SQL> update t1 set 2 period_01 = 99, 3 period_02 = 13 4 where 5 period_01 = 0;已更新1000行。SQL> commit;提交完成。SQL> begin 2 dbms_stats.gather_table_stats( 3 user, 4 't1', 5 cascade => true, 6 estimate_percent => null, 7 method_opt => 'for all columns size 1' 8 ); 9 end; 10 /PL/SQL 过程已成功完成。SQL> set autotrace on exp;SQL> select count(*) from t1 where period_01 between 4 and 6; COUNT(*)---------- 3000 --这里实际有3000行,但是oracle估算的是1663行,不准确!执行计划----------------------------------------------------------Plan hash value: 3724264953---------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |---------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 1 | 3 | 6 (0)| 00:00:01 || 1 | SORT AGGREGATE | | 1 | 3 | | ||* 2 | TABLE ACCESS FULL| T1 | 1663 | 4989 | 6 (0)| 00:00:01 |---------------------------------------------------------------------------Predicate Information (identified by operation id):--------------------------------------------------- 2 - filter("PERIOD_01"<=6 AND "PERIOD_01">=4)SQL> select count(*) from t1 where period_02 between 4 and 6; COUNT(*)---------- 3000 --这里实际有3000行,但是oracle估算的是4167行,不准确!执行计划----------------------------------------------------------Plan hash value: 3724264953---------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |---------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 1 | 3 | 6 (0)| 00:00:01 || 1 | SORT AGGREGATE | | 1 | 3 | | ||* 2 | TABLE ACCESS FULL| T1 | 4167 | 12501 | 6 (0)| 00:00:01 |---------------------------------------------------------------------------Predicate Information (identified by operation id):--------------------------------------------------- 2 - filter("PERIOD_02"<=6 AND "PERIOD_02">=4)SQL> set autotrace off;SQL> begin 2 dbms_stats.gather_table_stats( 3 user, 4 't1', 5 cascade => true, 6 estimate_percent => null, 7 method_opt => 'for all columns size 254' --重新收集直方图。 8 ); 9 end; 10 /PL/SQL 过程已成功完成。SQL> set autotrace on exp;SQL> select count(*) from t1 where period_01 between 4 and 6; COUNT(*)---------- 3000执行计划----------------------------------------------------------Plan hash value: 3724264953---------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |---------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 1 | 3 | 6 (0)| 00:00:01 || 1 | SORT AGGREGATE | | 1 | 3 | | ||* 2 | TABLE ACCESS FULL| T1 | 3000 | 9000 | 6 (0)| 00:00:01 |---------------------------------------------------------------------------Predicate Information (identified by operation id):--------------------------------------------------- 2 - filter("PERIOD_01"<=6 AND "PERIOD_01">=4)SQL> select count(*) from t1 where period_02 between 4 and 6; COUNT(*)---------- 3000执行计划----------------------------------------------------------Plan hash value: 3724264953---------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |---------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 1 | 3 | 6 (0)| 00:00:01 || 1 | SORT AGGREGATE | | 1 | 3 | | ||* 2 | TABLE ACCESS FULL| T1 | 3000 | 9000 | 6 (0)| 00:00:01 |---------------------------------------------------------------------------Predicate Information (identified by operation id):--------------------------------------------------- 2 - filter("PERIOD_02"<=6 AND "PERIOD_02">=4)
4、函数索引
SQL> create index idx_t1_PERIOD_01 on t1(upper(PERIOD_01));索引已创建。SQL> select i.index_name,i.index_type from user_indexes i;INDEX_NAME INDEX_TYPE------------------------------ ---------------------------IDX_T1_PERIOD_01 FUNCTION-BASED NORMALSQL> select c.TABLE_NAME,c.COLUMN_NAME from user_tab_cols c;TABLE_NAME COLUMN_NAME------------------------------ ------------------------------T1 PERIOD_01T1 PERIOD_02T1 SYS_NC00003$
需要牢记的是,如果创建了一个基于函数的索引,那么实际上是在虚拟列上创建了索引,当收集关于该表及其索引的统计信息时,同时也就收集了虚拟列上的统计信息。这在类似情况下,诸如upper(PERIOD_01)='xxx'的谓词将被优化为:SYS_NC00003$='xxx'。
5、相互关联的列
如果在谓词中使用相互依赖(相关)的列容易将问题复杂化,前提是这些相关联的列同时出现在where子句中。
SQL> create table t1 2 nologging 3 as 4 select 5 trunc(dbms_random.value(0,25))n1, 6 rpad('x',40)ind_pad, 7 trunc(dbms_random.value(0,20))n2, 8 lpad(rownum,10,'0')small_vc, 9 rpad('x',200)padding 10 from 11 all_objects 12 where 13 rownum <= 10000 14 ;表已创建。SQL> update t1 set n2 = n1;已更新10000行。SQL> commit;提交完成。SQL> begin 2 dbms_stats.gather_table_stats( 3 ownname => user, 4 tabname=> 'T1', 5 cascade=> true, 6 estimate_percent => null, 7 method_opt => 'for all columns size 1' 8 ); 9 end; 10 /PL/SQL 过程已成功完成。SQL> set autotrace traceonly;SQL> select small_vc 2 from t1 3 where n1 = 2 4 and n2 = 2;已选择420行。执行计划----------------------------------------------------------Plan hash value: 3617692013--------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |--------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 16 | 272 | 66 (0)| 00:00:01 ||* 1 | TABLE ACCESS FULL| T1 | 16 | 272 | 66 (0)| 00:00:01 |--------------------------------------------------------------------------Predicate Information (identified by operation id):--------------------------------------------------- 1 - filter("N1"=2 AND "N2"=2)统计信息---------------------------------------------------------- 1 recursive calls 0 db block gets 401 consistent gets 0 physical reads 0 redo size 9059 bytes sent via SQL*Net to client 712 bytes received via SQL*Net from client 29 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 420 rows processed
根据以上执行计划,CBO估算将会返回16行数据,实际确实返回了420行,如果这个表要与多表关联,基数一旦算错,必然导致整个SQL的执行计划全部出错,从而导致SQL性能下降。
接下来再次在相关列上执行查询,不过sql语句中包含了dynamic_sampling提示。
SQL> select /*+ dynamic_sampling(t1 1) */ 2 small_vc 3 from t1 4 where n1 = 2 5 and n2 = 2;已选择420行。执行计划----------------------------------------------------------Plan hash value: 3617692013--------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |--------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 370 | 6290 | 66 (0)| 00:00:01 ||* 1 | TABLE ACCESS FULL| T1 | 370 | 6290 | 66 (0)| 00:00:01 |--------------------------------------------------------------------------Predicate Information (identified by operation id):--------------------------------------------------- 1 - filter("N1"=2 AND "N2"=2)Note----- - dynamic sampling used for this statement (level=2)统计信息---------------------------------------------------------- 0 recursive calls 0 db block gets 401 consistent gets 0 physical reads 0 redo size 9059 bytes sent via SQL*Net to client 712 bytes received via SQL*Net from client 29 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 420 rows processed
- (4)选择率(selectivity)的相关问题
- CBO学习----03--选择率(Selectivity)
- (1)oracle单表选择率(selectivity)——计算执行计划的基数
- 索引的Selectivity
- How to measure Index Selectivity (索引区分度)
- 选择矩阵(选择矩阵是稀疏的)相关计算
- 选择排序相关问题
- xml相关知识,xml与properties的选择问题
- lunx的相关知识点(权限问题)
- 地址转译的相关问题(一)
- 地址转译的相关问题(二)
- 地址转译的相关问题(三)
- 地址转译的相关问题(四)
- javascript学习笔录1(【问题】JavaScript的相关问题)
- ubuntu的相关问题~4
- 活动选择问题(贪心)
- 任务选择问题(贪心)
- 活动选择问题(贪心)
- 前端设计必会技能-gif动画图片制作
- Hadoop之父勾勒大数据平台的未来
- addsubview 的内存管理机制
- 跟我一起玩Win32开发(15):ListView控件
- 有50级台阶,每次走一阶或两阶,有多少种走法?
- (4)选择率(selectivity)的相关问题
- Chrome浏览器内部协议Chrome://收集
- 获取文件夹里的所有文件名称
- 用 c 语言实现类的继承,并且可用父类指针操作子类对像的一个测试例子
- 垃圾回收器 —— 获取内存
- jQuery 1.9,jQuery2.0 beta, Migrate插件的一些注意事項
- SelectMany
- ora-01031:insufficient privileges(权限不足)解决方法
- Android WebView缓存策略详解