5 统计信息--优化主题系列

来源:互联网 发布:舒尔se315 知乎 编辑:程序博客网 时间:2024/05/22 15:47

请记住,高级DBA 一定要收集统计信息,而且数据库自带的JOB一定要关闭

并且要自己定制统计信息收集策略

 

OLTP系统中,一定要收集统计信息

很多的 OLAP 系统都不收集,直接用HINT固定我们现在就不收集

因为OLAP系统I/O压力大,表也非常大,收集统计信息很耗费资源

并且OLAP系统表的数据随时都在变化(因为实时在入库),那么就需要经常收集统计信息

所以OLAP系统收集统计信息不现实。

 

如何判断一个系统是OLAP 还是OLTP??

典型的归档是否开启 OLTP一般都开启 OLAP不开

 

为什么OLTP系统一定要收集统计信息??

OLTP系统业务非常复杂

不同查询条件很可能访问路径就会发生变化

走错了执行计划很可能数据库会崩溃

 

OLTP系统中要优化一个SQL语句,首先要保证SQL语句中所有的表统计信息是准确的。如果统计信息都不准确,那么SQL优化就无从谈起。在OLAP系统中,如果数据量特别大,一般不会去收集统计信息,会直接利用HINT把执行计划给固定住。

 

对非分区表收集统计信息:

BEGIN

 DBMS_STATS.GATHER_TABLE_STATS(ownname         => 'SCOTT',

                               tabname          => 'DEPT',

                               estimate_percent => 30,

                               method_opt       => 'for all columns sizerepeat',

                               no_invalidate    => FALSE,

                               degree           => 8,

                               cascade          => TRUE);

END;

/

 

对分区表收集统计信息

BEGIN

 DBMS_STATS.GATHER_TABLE_STATS(ownname         => 'SCOTT',

                               tabname          => 'P_TEST',

                               estimate_percent => 30,

                               method_opt       => 'for all columns sizerepeat',

                               no_invalidate    => FALSE,

                               degree           => 8,

                               granularity      => 'ALL',

                               cascade          => TRUE);

END;

/

 

estimate_percent表示采样率,采样率设置太大,也没必要,如果表非常大,采样率过高会导致收集统计信息跑很长,增加了系统压力。采样率设置过小,统计的信息就不能很完整的体现表中数据的分布,这样CBO在进行执行计划的选择上,很可能选择错误的执行计划。

根据我的工作经验:

表小于1GB 采样率可以设置50%-100%

表大于1GB小于5GB可以设置30%

表大于5GB 这类表都应该进行分区,采样率可以设置为30%

 

提问:为什么要根据segment_size来计算一个表是大表还是小表??

提问:统计信息最佳采样率多大??

原厂最佳实践 30%是最佳的采样率

 

method_opt表示收集的方法,一个稳定的系统收集统计信息的时候推荐使用

method_opt=> 'for all columns size repeat'

repeat表示以前收集过直方图,现在收集统计信息的时候就收集直方图,如果以前没收集过直方图,现在收集统计信息的时候就不收集。

method_opt => 'for all columns size auto',

auto表示Oracle根据谓词过滤信息(前文讲解直方图的时候提到过的where条件过滤),自动判断该列是否收集直方图。一个稳定的系统,不应该让Oracle去自动判断,自动判断很可能就会出事,比如某列不该收集直方图,设置auto过后它自己去收集直方图了,从而导致系统不稳定。

degree表示收集统计信息的时候并行度,并行度根据你系统配置以及当前系统可用资源自行设置。一般degree设置4--8

一般来说设置为4

 

cascade表示收集表的统计信息时候同时收集索引的统计信息。

no_invalidate表示收集统计信息之后在共享池中引用了相关表的SQL游标是否失效。这个一定要设置为FALSE,不然你可能在做SQL优化的时候,你发现明明更新了统计信息,但是执行计划还是没改变。

FALSE 表示说立即失效否则游标15分钟再失效

 

granularity 分区的粒度默认是'ALL'对所有分区都收集可设置为partition

method_opt => 'for all columns size skewonly' 表示让ORACLE自动判断哪个列去收集统计信息

下面做个试验:

 

grant dba to scott;

conn scott/tiger

drop table test purge;

create table test as select * from dba_objects;

BEGIN

 DBMS_STATS.GATHER_TABLE_STATS(ownname         => 'SCOTT',

                               tabname          => 'TEST',

                               estimate_percent => 100,

                               method_opt      => 'for all columns size skewonly',

                               no_invalidate    => FALSE,

                               degree           => 1,

                               cascade          => TRUE);

END;

/

 

col column_name format a25

col density 99999999

select a.column_name,

       b.num_rows,

       a.num_distinct Cardinality,

       round(a.num_distinct /b.num_rows * 100, 2) selectivity,

       NUM_NULLS,

       density,

       a.histogram,

       a.num_buckets

  from dba_tab_col_statistics a, dba_tables b

 where a.owner = b.owner

   and a.table_name = b.table_name

   and a.owner = upper('scott')

   and a.table_name = upper('test');


method_opt => 'for all columns size skewonly'

就是本来不应该对某列收集直方图但它收集了

一旦收集了直方图就有可能导致绑定变量窥探

select owner from test where object_id=10;

是不是owner这个列不会出现在where条件中??

owner列收集直方图是不是浪费CPU资源??

 

做个试验:method_opt=> 'for all columns size auto'

drop table test purge;

create table test as select * from dba_objects;

BEGIN

 DBMS_STATS.GATHER_TABLE_STATS(ownname         => 'SCOTT',

                               tabname          => 'TEST',

                               estimate_percent => 100,

                               method_opt       => 'for all columns sizeauto',

                               no_invalidate   => FALSE,

                               degree           => 1,

                               cascade          => TRUE);

END;

/

 

col column_name format a25

col density 99999999

select a.column_name,

       b.num_rows,

       a.num_distinct Cardinality,

       round(a.num_distinct /b.num_rows * 100, 2) selectivity,

       NUM_NULLS,

       density,

       a.histogram,

       a.num_buckets

  from dba_tab_col_statistics a, dba_tables b

 where a.owner = b.owner

   and a.table_name = b.table_name

   and a.owner = upper('scott')

   and a.table_name = upper('test');


发现method_opt=> 'for all columns size auto'收集直方图所有列都没收集然后

 

select * from test where owner='SCOTT';

BEGIN

 DBMS_STATS.GATHER_TABLE_STATS(ownname         => 'SCOTT',

                               tabname          => 'TEST',

                               estimate_percent => 100,

                               method_opt       => 'for all columns sizeauto',

                               no_invalidate    => FALSE,

                               degree           => 1,

                               cascade          => TRUE);

END;

/

 

col column_name format a25

col density 99999999

select a.column_name,

       b.num_rows,

       a.num_distinct Cardinality,

       round(a.num_distinct /b.num_rows * 100, 2) selectivity,

       NUM_NULLS,

       density,

       a.histogram,

       a.num_buckets

  from dba_tab_col_statistics a, dba_tables b

 where a.owner = b.owner

   and a.table_name = b.table_name

   and a.owner = upper('scott')

   and a.table_name = upper('test');


现在发现OWNER列已经收集了直方图

提问:auto方式收集直方图保险不??

比如说某个列我不希望它收集直方图

CBO自作聪明的收集了是不是很坑爹

生产上能不收集直方图千万别收集避免绑定变量窥探

 

某个列来收集skewonly

method_opt => 'for columns owner size skewonly'

method_opt => 'for columns owner size 25'

 

某个列有函数索引要用hidden

method_opt => 'for all hidden columns'

 

做个试验:method_opt=> 'for all columns size repeat'

drop table test purge;

create table test as select * from dba_objects;

BEGIN

 DBMS_STATS.GATHER_TABLE_STATS(ownname         => 'SCOTT',

                               tabname          => 'TEST',

                               estimate_percent => 30,

                               method_opt       => 'for all columns sizerepeat',

                               no_invalidate    => FALSE,

                               degree           => 1,

                               cascade          => TRUE);

END;

/

 

col column_name format a25

col density 99999999

select a.column_name,

       b.num_rows,

       a.num_distinct Cardinality,

       round(a.num_distinct /b.num_rows * 100, 2) selectivity,

       NUM_NULLS,

       density,

       a.histogram,

       a.num_buckets

  from dba_tab_col_statistics a, dba_tables b

 where a.owner = b.owner

   and a.table_name = b.table_name

   and a.owner = upper('scott')

   and a.table_name = upper('test');


然后指定 OWNER这个列收集直方图

BEGIN

 DBMS_STATS.GATHER_TABLE_STATS(ownname         => 'SCOTT',

                               tabname          => 'TEST',

                               estimate_percent => 30,

                               method_opt       => 'for columns owner sizeskewonly',

                               no_invalidate    => FALSE,

                               degree           => 1,

                               cascade          => TRUE);

END;

/

 

col column_name format a25

col density 99999999

select a.column_name,

       b.num_rows,

       a.num_distinct Cardinality,

       round(a.num_distinct /b.num_rows * 100, 2) selectivity,

       NUM_NULLS,

       density,

       a.histogram,

       a.num_buckets

  from dba_tab_col_statistics a, dba_tables b

 where a.owner = b.owner

   and a.table_name = b.table_name

   and a.owner = upper('scott')

   and a.table_name = upper('test');


BEGIN

 DBMS_STATS.GATHER_TABLE_STATS(ownname         => 'SCOTT',

                               tabname          => 'TEST',

                               estimate_percent => 30,

                               method_opt      => 'for all columns size repeat',

                               no_invalidate    => FALSE,

                               degree           => 1,

                               cascade          => TRUE);

END;

/

 

col column_name format a25

col density 99999999

select a.column_name,

       b.num_rows,

       a.num_distinct Cardinality,

       round(a.num_distinct /b.num_rows * 100, 2) selectivity,

       NUM_NULLS,

       density,

       a.histogram,

       a.num_buckets

  from dba_tab_col_statistics a, dba_tables b

 where a.owner = b.owner

   and a.table_name = b.table_name

   and a.owner = upper('scott')

   and a.table_name = upper('test');


method_opt => 'for all columns size repeat'方式最保险

 

SQL优化最开始就应该查看表统计信息是否过期了如下:


exec dbms_stats.flush_database_monitoring_info;

 

select owner, table_name name, object_type, stale_stats,last_analyzed

  from dba_tab_statistics

 where table_name in ('TEST')

   and owner = 'SCOTT'

   and (stale_stats = 'YES' or last_analyzed is null);

 

select count(*) from test;     --72834

 

数据量变化超过10% 统计信息就会过期

delete from test where rownum <= (select count(*) from test)* 0.11;  --8011 rows deleted.

 

exec dbms_stats.flush_database_monitoring_info;

 

select owner, table_name name, object_type, stale_stats,last_analyzed

  from dba_tab_statistics

 where table_name in ('TEST')

   and owner = 'SCOTT'

   and (stale_stats = 'YES' or last_analyzed is null);


下面SQL是查询TOP 50INSERTS,UPDATES,DELETES的表

select * from

(

select * from

(

select * from

(

select u.name owner, o.name table_name, null partition_name,null subpartition_name,

       m.inserts, m.updates,m.deletes, m.timestamp,

      decode(bitand(m.flags,1),1,'YES','NO') truncated,

       m.drop_segments

from sys.mon_mods_all$ m, sys.obj$ o, sys.tab$ t, sys.user$ u

where o.obj# = m.obj# and o.obj# = t.obj# and o.owner# = u.user#

union all

select u.name, o.name, o.subname, null,

       m.inserts, m.updates,m.deletes, m.timestamp,

      decode(bitand(m.flags,1),1,'YES','NO'),

       m.drop_segments

from sys.mon_mods_all$ m, sys.obj$ o, sys.user$ u

where o.owner# = u.user# and o.obj# = m.obj# and o.type#=19

union all

select u.name,o.name,o2.subname, o.subname,

       m.inserts, m.updates, m.deletes,m.timestamp,

      decode(bitand(m.flags,1),1,'YES','NO'),

       m.drop_segments

from sys.mon_mods_all$ m, sys.obj$ o, sys.tabsubpart$ tsp,sys.obj$ o2,

     sys.user$ u

where o.obj# = m.obj# and o.owner# = u.user# and

      o.obj# = tsp.obj# and o2.obj# =tsp.pobj#

) where owner not like '%SYS%' and owner not like'XDB'    

union all

select * from

(   

select u.name owner, o.name table_name, null partition_name,null subpartition_name,

       m.inserts, m.updates,m.deletes, m.timestamp,

      decode(bitand(m.flags,1),1,'YES','NO') truncated,

       m.drop_segments

from sys.mon_mods$ m, sys.obj$ o, sys.tab$ t, sys.user$ u

where o.obj# = m.obj# and o.obj# = t.obj# and o.owner# = u.user#

union all

select u.name, o.name, o.subname, null,

       m.inserts, m.updates,m.deletes, m.timestamp,

      decode(bitand(m.flags,1),1,'YES','NO'),

       m.drop_segments

from sys.mon_mods$ m, sys.obj$ o, sys.user$ u

where o.owner# = u.user# and o.obj# = m.obj# and o.type#=19

union all

select u.name, o.name, o2.subname, o.subname,

       m.inserts, m.updates,m.deletes, m.timestamp,

      decode(bitand(m.flags,1),1,'YES','NO'),

       m.drop_segments

from sys.mon_mods$ m, sys.obj$ o, sys.tabsubpart$ tsp, sys.obj$o2,

     sys.user$ u

where o.obj# = m.obj# and o.owner# = u.user# and

      o.obj# = tsp.obj# and o2.obj# =tsp.pobj#

) where owner not like '%SYS%' and owner not like '%XDB%'

) order by inserts desc

) where rownum<=50;


上面的脚本是不是可以知道DML的热点表??

 

假如发现经常做DELETE操作的表该如果操作??

moveshrinkmove注:避免enq-lock

 

sys.mon_mods$ 记录还没FLUSHmon_mods_all$的信息(注意:该表不是实时的,需要等一会儿才会记录DML数据,但是它的刷新与sys.mon_mods_all$不一样)

sys.mon_mods_all$ DBA_TAB_MODIFICATIONS的基表

收集统计信息会清空上面2个表数据。

 

提问:

我在 10收集了统计信息,

105分钟执行了一个大批量的 update操作

你在 1010执行查询,但是我发现查询变慢了怎么办?

也就是说,某个表会突然发生大批的DML操作怎么办?

 

及时对相应的表收集统计信息保证统计信息正确

还是慢??而且统计信息没过期那就要去查看表采样率是否正确??

 

SELECT owner,

       table_name,

       num_rows,

       sample_size,

       trunc(sample_size /num_rows * 100) estimate_percent

  FROM DBA_TAB_STATISTICS

 WHERE owner = 'SCOTT'

   AND table_name = 'TEST';


作为高级DBA 保证统计信息正确非过期很重要如何设置采样率??

假如一张100GB的表采样率用100是不是要死??

 

DBA的任务之一:确保统计信息的准确性

DECLARE

  CURSOR STALE_TABLE IS

    SELECT OWNER,

          SEGMENT_NAME,

          CASE

            WHEN SIZE_GB < 0.5 THEN

             30

            WHEN SIZE_GB >= 0.5 AND SIZE_GB < 1 THEN

             20

            WHEN SIZE_GB >= 1 AND SIZE_GB < 5 THEN

             10

            WHEN SIZE_GB >= 5 AND SIZE_GB < 10 THEN

             5

            WHEN SIZE_GB >= 10 THEN

             1

           ENDAS PERCENT,

           8AS DEGREE

      FROM (SELECT OWNER,

                  SEGMENT_NAME,

                  SUM(BYTES / 1024 / 1024 / 1024) SIZE_GB

             FROM DBA_SEGMENTS

            WHERE OWNER = 'ADWU_OPTIMA_AP11'

              AND SEGMENT_NAME IN

                  (SELECT /*+ UNNEST */

                   DISTINCT TABLE_NAME

                     FROM DBA_TAB_STATISTICS

                    WHERE (LAST_ANALYZED IS NULL OR STALE_STATS = 'YES')

                      AND OWNER = 'ADWU_OPTIMA_AP11')

            GROUP BY OWNER, SEGMENT_NAME);

 

BEGIN

  DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO;

  FOR STALE IN STALE_TABLE LOOP

   DBMS_STATS.GATHER_TABLE_STATS(OWNNAME         => STALE.OWNER,

                                 TABNAME          =>STALE.SEGMENT_NAME,

                                 ESTIMATE_PERCENT => STALE.PERCENT,

                                 METHOD_OPT      => 'for all columns size repeat',

                                 DEGREE           => 8,

                                 GRANULARITY      => 'ALL',

                                 CASCADE          => TRUE);

  END LOOP;

END;

/

 

假如遇到一个SQL里设计了10多个表要单独对10多个表查看统计信息是否过期是不是很坑如下脚本:

 

exec dbms_stats.flush_database_monitoring_info; ----刷新信息到磁盘

 

--------------EXPLAIN PLAN FOR SQL, 然后执行下面语句得到表名字---------------------------

select '''' || object_owner || '''', '''' || object_name ||''','

  from plan_table

 where object_type = 'TABLE'

union

---table in the index---------

select '''' || table_owner || '''', '''' || table_name || '*'','

  from dba_indexes

 where owner in

       (select distinctobject_owner from plan_table where rownum > 0)

   and index_name in

       (select distinctobject_name from plan_table where rownum > 0)

order by 2;

 

*的表是根据索引查询出来的

 

--------------再把上面的结果贴入下面代码-------------------------------------------------

select owner, table_name name, object_type, stale_stats,last_analyzed

  from dba_tab_statistics

 where table_name in

(

table_name

)

and owner='owner'

   and (stale_stats = 'YES' or last_analyzed is null);

 

--------------如果stale_stats=YES,想要知道是怎么过期的,运行下面代码-------------------

select *

  from all_tab_modifications

 where table_owner='&owner'

   and table_name in

(

table_name

)

   and (inserts > 0 or updates > 0 or deletes> 0)

 order by table_name;

 

如果一个系统中有临时表怎么办?

global temporary 这种table咋搞

这种表没办法收集统计信息

你在存储过程里面用了临时表 你往临时表里面插入数据

假如你插入了 50W行数据因为你没收集统计信息那么 CBO认为1对不对

CBO 认为它只有1行数据是不是有可能走NL了?

那这样的问题咋搞?这种就用动态采样 dynamic_sampling


原创粉丝点击