观察analyze table compute statistics 都对什么对象统计了信息+user_tab_histograms中的endpoint_value

来源：互联网发布：python 读入整个文件编辑：程序博客网时间：2024/05/01 17:00

analyze table compute statistics = analyze table compute statistics for table for all indexes for all columns;
比analyze table compute statistics for table for all indexes for all indexed columns 分析了更多的信息
但这里并不是鼓励使用 analyze table 的方法进行分析。

SQL> create table t as select * from all_objects;
Table created.

SQL> create index t_idx on t(object_id);
Index created.

SQL> analyze table t compute statistics for table
2 for all indexes for all indexed columns;

Table analyzed.

SQL> select t.num_rows, i.num_rows, c.cnt
2   from (select num_rows from user_tables where table_name = 'T') t,
3   (select num_rows from user_indexes where table_name = 'T' ) i,
4   (select count(distinct column_name) cnt from user_tab_histograms where tab
le_name = 'T' ) c
5 /

NUM_ROWS NUM_ROWS CNT
------------------- -------------------- ----------
31213 31213 1 <------- 在这里因为只有1列建立有索引，因此只统计了object_id列的数据分布，这里object_id因为唯一，所以是均匀分布的

SQL> analyze table t delete statistics;

Table analyzed.

SQL> select t.num_rows, i.num_rows, c.cnt
2    from (select num_rows from user_tables where table_name = 'T') t,
3    (select num_rows from user_indexes where table_name = 'T' ) i,
   4   (select count(distinct column_name) cnt from user_tab_histograms where ta
ble_name = 'T' ) c;

NUM_ROWS NUM_ROWS CNT
-------------------- ------------------- ------------
0

SQL> analyze table t compute statistics;

Table analyzed.

SQL> select t.num_rows, i.num_rows, c.cnt
    2 from (select num_rows from user_tables where table_name = 'T') t,
   3   (select num_rows from user_indexes where table_name = 'T' ) i,
    4 (select count(distinct column_name) cnt from user_tab_histograms where ta
ble_name = 'T' ) c;

NUM_ROWS NUM_ROWS CNT
-------------------- ------------------- -----------
31213 31213 13 <------- 统计了所有的列，但这些列并不是都用在 where col='X' 上的，因此很多信息其实都没有实际意义。

DBMS_STATS 和TABLE的MONITOR结合，可以当表数据量发生10%的变化的时候，自动重新收集统计信息。
我平常更喜欢使用SIZE SKEWONLY 分析histograms

==========================================================================
analyze table t compute statistics = analyze table t compute statistics for table for all indexes for all columns
for table的统计信息存在于视图：user_tables 、all_tables、dba_tables
for all indexes的统计信息存在于视图: user_indexes 、all_indexes、dba_indexes
for all columns的统计信息存在于试图：user_tab_columns、all_tab_columns、dba_tab_columns
当analyze table t delete statistics 会删除所有的statistics

===========================================================================

[Q]怎么样分析表或索引
[A]命令行方式可以采用analyze命令
如Analyze table tablename compute statistics;
Analyze index|cluster indexname estimate statistics;
ANALYZE TABLE tablename COMPUTE STATISTICS
FOR TABLE
FOR ALL [LOCAL] INDEXES
FOR ALL [INDEXED] COLUMNS;
ANALYZE TABLE tablename DELETE STATISTICS
ANALYZE TABLE tablename VALIDATE REF UPDATE
ANALYZE TABLE tablename VALIDATE STRUCTURE
[CASCADE]|[INTO TableName]
ANALYZE TABLE tablename LIST CHAINED ROWS [INTO TableName]
等等。
如果想分析整个用户或数据库，还可以采用工具包，可以并行分析
Dbms_utility(8i以前的工具包)
Dbms_stats(8i以后提供的工具包)
如
dbms_stats.gather_schema_stats(User,estimate_percent=>100,cascade=> TRUE);
dbms_stats.gather_table_stats(User,TableName,degree => 4,cascade => true);
这是对命令与工具包的一些总结
1、对于分区表，建议使用DBMS_STATS，而不是使用Analyze语句。
a) 可以并行进行，对多个用户，多个Table
b) 可以得到整个分区表的数据和单个分区的数据。
c) 可以在不同级别上Compute Statistics：单个分区，子分区，全表，所有分区
d) 可以导出统计信息
e) 可以用户自动收集统计信息
2、DBMS_STATS的缺点
a) 不能Validate Structure
b) 不能收集CHAINED ROWS, 不能收集CLUSTER TABLE的信息，这两个仍旧需要使用Analyze语句。
c) DBMS_STATS 默认不对索引进行Analyze，因为默认Cascade是False，需要手工指定为True
3、对于oracle 9里面的External Table，Analyze不能使用，只能使用DBMS_STATS来收集信息。

==================================================================

来源：http://bianxq.iteye.com/blog/464679

直方图视图user_tab_histograms中的endpoint_value

对于那些上严重倾斜性的列来说，直方图是CBO正确选择执行计划的重要基础。
我们一般查询直方图都是查询user_tab_histograms，all_tab_histograms或者dba_tab_histograms。
以user_tab_histograms来说明下视图各个字段的含义

SQL> desc user_tab_histograms
名称是否为空? 类型
—————————————– ——– ————–

TABLE_NAME VARCHAR2(30)
COLUMN_NAME VARCHAR2(4000)
ENDPOINT_NUMBER NUMBER
ENDPOINT_VALUE NUMBER
ENDPOINT_ACTUAL_VALUE VARCHAR2(1000)

TABLE_NAME ————— 表名
COLUMN_NAME ————— 字段名
ENDPOINT_NUMBER ———– 结束点序号
ENDPOINT_VALUE ———– 结束点值
ENDPOINT_ACTUAL_VALUE—— 结束点真实值(一般为空)

以前使用直方图的时候没有过多的关注ENDPOINT_VALUE字段，一直以后这个字段存储的结束点的实际值。
可今天发现不是这样的。

SQL> create table sunwg (id number,grade varchar2(1));

表已创建。

SQL> insert into sunwg values(1,’a');

已创建 1 行。

SQL> insert into sunwg values(2,’b');

已创建 1 行。

SQL> commit;

提交完成。

SQL> begin
2 dbms_stats.gather_table_stats(
3 ownname => ‘SUNWG’,
4 tabname => ‘SUNWG’,
5 method_opt => ‘FOR TABLE FOR ALL COLUMNS SIZE AUTO’);
6 end;
7 /

PL/SQL 过程已成功完成。

SQL> select column_name,endpoint_number,endpoint_value
2 from user_tab_histograms
3 where table_name = ‘SUNWG’
4 order by 1,2;

COLUMN_NAM ENDPOINT_NUMBER ENDPOINT_VALUE
———- ————— ————–
GRADE 0 5.0365E+35
GRADE 1 5.0885E+35
ID 0 1
ID 1 2

从这个测试结果能看出来，对于列ID存储的是实际值，而对于列GRADE则不是。
后来查询了user_tab_histograms的底层表才知道ENDPOINT_VALUE存储的原来是实际值的hash值。

create table histgrm$ /* histogram table */
( obj# number not null, /* object number */
col# number not null, /* column number */
row# number, /* row number (in row cache) */
bucket number not null, /* bucket number */
endpoint number not null, /* endpoint hashed value */
intcol# number not null, /* internal column number */
epvalue varchar2(1000), /* endpoint value information */
spare1 number,
spare2 number)
cluster c_obj#_intcol#(obj#, intcol#)
/

endpoint number not null, /* endpoint hashed value */
这样就清楚了，5.0365E+35是'a'的hash值。
oracle这样可能是为了增强通用性吧，否则需要建一系列的endpoint，字符型的endpoint，数据型的endpoint，日期型的endpoint等。
而使用了hash之后，结果就唯一，一串数字而已。
><

来源：http://www.oratea.net/?p=296