MySQL单表多索引上分组操作的执行计划分析

来源：互联网发布：兰大网络继续教育学院编辑：程序博客网时间：2024/06/14 00:03

提示:

1 需要先仔细研究问题,然后再查看答复

2 需要仔细研究文件,最好diff这些文件(MySQL--order_goods-SQL1.json MySQL--order_goods-SQL2.json MySQL--order_goods-SQL3.json),明白差异所在

背景:

1、版本：5.6.14-log MySQL Community Server (GPL)
2、表DDL：
CREATE TABLE `order_goods` (
`id` int(20) unsigned NOT NULL COMMENT '订单中一样商品的ID',
`goods_id` int(20) unsigned DEFAULT '0' COMMENT '物品ID',
`goods_name` varchar(64) DEFAULT '' COMMENT '物品名称',
`goods_type` int(4) DEFAULT '0' COMMENT '物品类型',
`goods_number` decimal(16,2) DEFAULT '0.00' COMMENT '物品数量',
`goods_price` int(11) DEFAULT '0' COMMENT '物品价格',
`is_gift` int(2) DEFAULT '0' COMMENT '是否是礼物（可能是购买附送的物品）',
`create_time` timestamp NULL DEFAULT NULL COMMENT '创建时间',
`order_id` int(20) unsigned NOT NULL COMMENT '订单ID',
`uid` int(20) unsigned NOT NULL COMMENT '用户ID',
PRIMARY KEY (`id`),
KEY `INDEX_CREATE_TIME` (`create_time`),
KEY `INDEX_GOODS_ID_TYPE_TIME` (`goods_id`,`goods_type`,`create_time`),
KEY `INDEX_TYPE_CTIME` (`goods_type`,`create_time`),
KEY `IDX_GOODSTYPE_ORDERID_TOG` (`goods_type`,`order_id`),
KEY `index_order_id` (`order_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
3、表统计信息：
           Name: order_goods
         Engine: InnoDB
        Version: 10
     Row_format: Compact
           Rows: 34783720
Avg_row_length: 176
    Data_length: 6127861760
Max_data_length: 0
   Index_length: 6069026816
      Data_free: 4194304
Auto_increment: NULL
    Create_time: 2014-12-25 09:20:49
    Update_time: NULL
     Check_time: NULL
      Collation: utf8_general_ci
       Checksum: NULL
Create_options:
        Comment:
4、表索引统计信息：

+-------------+------------+---------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table       | Non_unique | Key_name                  | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------------+------------+---------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| order_goods |          0 | PRIMARY                   |            1 | id          | A         |    34787722 |     NULL | NULL   |      | BTREE      |         |               | 
| order_goods |          1 | INDEX_CREATE_TIME         |            1 | create_time | A         |    17393861 |     NULL | NULL   | YES  | BTREE      |         |               | 
| order_goods |          1 | INDEX_GOODS_ID_TYPE_TIME  |            1 | goods_id    | A         |        8534 |     NULL | NULL   | YES  | BTREE      |         |               | 
| order_goods |          1 | INDEX_GOODS_ID_TYPE_TIME  |            2 | goods_type  | A         |        8536 |     NULL | NULL   | YES  | BTREE      |         |               | 
| order_goods |          1 | INDEX_GOODS_ID_TYPE_TIME  |            3 | create_time | A         |    34787722 |     NULL | NULL   | YES  | BTREE      |         |               | 
| order_goods |          1 | INDEX_TYPE_CTIME          |            1 | goods_type  | A         |        2485 |     NULL | NULL   | YES  | BTREE      |         |               | 
| order_goods |          1 | INDEX_TYPE_CTIME          |            2 | create_time | A         |    17393861 |     NULL | NULL   | YES  | BTREE      |         |               | 
| order_goods |          1 | IDX_GOODSTYPE_ORDERID_TOG |            1 | goods_type  | A         |        1486 |     NULL | NULL   | YES  | BTREE      |         |               | 
| order_goods |          1 | IDX_GOODSTYPE_ORDERID_TOG |            2 | order_id    | A         |    34787722 |     NULL | NULL   |      | BTREE      |         |               | 
| order_goods |          1 | index_order_id            |            1 | order_id    | A         |    34787722 |     NULL | NULL   |      | BTREE      |         |               | 
+-------------+------------+---------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

5、有3个不同的SQL执行计划，看着不是很明白，附件是相应的optimizer_trace，请帮忙看看什么问题

SQL1:
desc extended
SELECT goods_id,create_time,sum(goods_number),sum(goods_price)
FROM order_goods
where goods_type=0 and
create_time >= '2014-08-09 00:00:00' and
create_time <='2014-08-09 23:59:59'
group by goods_id\G

*************************** 1. row ***************************
           id: 1
select_type: SIMPLE
        table: order_goods
         type: ref
possible_keys: INDEX_CREATE_TIME,INDEX_GOODS_ID_TYPE_TIME,INDEX_TYPE_CTIME,IDX_GOODSTYPE_ORDERID_TOG
          key: INDEX_TYPE_CTIME
      key_len: 5
          ref: const
         rows: 22964
     filtered: 100.00
        Extra: Using index condition; Using where; Using temporary; Using filesort


SQL2:
desc extended
SELECT goods_id,create_time,sum(goods_number),sum(goods_price)
FROM order_goods force index (INDEX_TYPE_CTIME)
where goods_type=0 and
    create_time >= '2014-08-09 00:00:00' and
    create_time <='2014-08-09 23:59:59'
group by goods_id\G
*************************** 1. row ***************************
           id: 1
select_type: SIMPLE
        table: order_goods
         type: range
possible_keys: INDEX_GOODS_ID_TYPE_TIME,INDEX_TYPE_CTIME
          key: INDEX_TYPE_CTIME
      key_len: 10
          ref: NULL
         rows: 22964
     filtered: 100.00
        Extra: Using index condition; Using temporary; Using filesort

SQL3:
desc extended
SELECT goods_id,create_time,sum(goods_number),sum(goods_price)
FROM order_goods
where goods_type=0 and
    create_time >= '2014-08-10 00:00:00' and
    create_time <='2014-08-10 23:59:59'
group by goods_id\G
*************************** 1. row ***************************
           id: 1
select_type: SIMPLE
        table: order_goods
         type: range
possible_keys: INDEX_CREATE_TIME,INDEX_GOODS_ID_TYPE_TIME,INDEX_TYPE_CTIME,IDX_GOODSTYPE_ORDERID_TOG
          key: INDEX_TYPE_CTIME
      key_len: 10
          ref: NULL
         rows: 27750
     filtered: 100.00
        Extra: Using index condition; Using temporary; Using filesort

提出问题:

Q1：SQL2和SQL1相比，这里的type是 range，key_len = 10，同样都是选择了INDEX_TYPE_CTIME索引，为何2者的结果不一样。
Q2：同理，SQL3和SQL1相比，只是日期不一样，时间跨度都是1天，也都同样选择了INDEX_TYPE_CTIME索引，但type和key_len的结果也是不一样的。
Q3：SQL1中的INDEX_TYPE_CTIME只用了最左部分索引，而后面2个SQL都可以用到整个索引。

回答:

Q1：SQL2和SQL1相比，这里的type是 range，key_len = 10，同样都是选择了INDEX_TYPE_CTIME索引，为何2者的结果不一样。
A1：
1 SQL2强制使用索引，使得其他索引失去被判断优劣的机会
2 所以，SQL1有机会判断各个索引中的哪个更优。
这点，可以从order_goods-SQL1.json文件中的“analyzing_range_alternatives”看出，存在多个索引被判断。
这时，可以看到如下索引是最优的（"chosen": true，表示这个索引被选定。
但注意，此时选定的是INDEX_CREATE_TIME不是INDEX_TYPE_CTIME索引。如下）：
    "index": "INDEX_CREATE_TIME",\
    "ranges": [\
      "2014-08-09 00:00:00 <= create_time <= 2014-08-09 23:59:59"\
    ] /* ranges */,\
    "index_dives_for_eq_ranges": true,\
    "rowid_ordered": false,\
    "using_mrr": false,\
    "index_only": false,\
    "rows": 22950,\
    "cost": 27541,\
    "chosen": true\
3 察看order_goods-SQL2.json文件，可知在多表连接优化阶段（此例中单表是多表特例），推知最佳的索引如下：
    "considered_execution_plans": [\
      {\
        "plan_prefix": [\
        ] /* plan_prefix */,\
        "table": "`order_goods` FORCE INDEX (`INDEX_TYPE_CTIME`)",\
        "best_access_path": {\
          "considered_access_paths": [\
            {\
              "access_type": "ref",\
              "index": "INDEX_TYPE_CTIME",\
              "rows": 22964,\
              "cost": 27557,\
              "chosen": true\
            },\
从中可以看到，访问方式"access_type"是"ref"。
4 对于SQL2，观察order_goods-SQL2.json文件，可知其访问类型由"ref"变更为"range"，这是MySQL的一个优化点。
    "access_type_changed": {\
      "table": "`order_goods` FORCE INDEX (`INDEX_TYPE_CTIME`)",\
      "index": "INDEX_TYPE_CTIME",\
      "old_type": "ref",\
      "new_type": "range",\
      "cause": "uses_more_keyparts"\
    } /* access_type_changed */\
而观察order_goods-SQL1.json文件如下内容，可知其访问类型为"ref"，这点同SQL2（通过diff文件order_goods-SQL1.json和order_goods-SQL2.json更容易看出差别），只是SQL1不符合本条所谈到的优化点（原因参看第5条）
    "best_access_path": {\
      "considered_access_paths": [\
        {\
          "access_type": "ref",\
          "index": "INDEX_TYPE_CTIME",\
          "rows": 22964,\
          "cost": 27557,\
          "chosen": true\
5 MySQL认为在如下5个条件下，由此段调用best_access_path()->can_switch_from_ref_to_range()完成4条中所说的优化这个优化。
备注：以上best_access_path()->can_switch_from_ref_to_range()代码属于5.7.6版本。
      5.6.x版本中，如下注释属于make_join_select()函数。
/** can_switch_from_ref_to_range()
A helper function to check whether it's better to use range than ref.

@details
Heuristic: Switch from 'ref' to 'range' access if 'range'
access can utilize more keyparts than 'ref' access. Conditions
for doing switching:

1) Range access is possible    ---存在范围优化的可能
2) 'ref' access and 'range' access uses the same index
---SQL1没有符合这一点（查看order_goods-SQL1.json文件，可知"ref"对应的是INDEX_TYPE_CTIME索引，
---查阅{"considered_access_paths"}，注意判定的条件是："chosen": true，值不为true的表示没有被选定。
---"range" 的对应的是索引INDEX_CREATE_TIME，{查阅"potential_range_indices"}。）
---额外补充一点：查看order_goods-SQL1.json文件，没有"access_type_changed"这样的内容，这是因为此条件不符合。
3) Used parts of key shouldn't have nullable parts, i.e we're
     going to use 'ref' access, not ref_or_null.
4) 'ref' access depends on a constant, not a value read from a    ---这点可以说明单表排序这种情况下，ref为什么不适合使用了（原因就在于需要知道具体的常量才能通过索引快速定位元组，单表且条件中没有提供常量值与索引键比较，显然ref不适合使用）
     table earlier in the join sequence.

     Rationale: if 'ref' depends on a value from another table,
     the join condition is not used to limit the rows read by
     'range' access (that would require dynamic range - 'Range
     checked for each record'). In other words, if 'ref' depends
     on a value from another table, we have a query with
     conditions of the form

      this_table.idx_col1 = other_table.col AND   <<- used by 'ref'    ---SQL1和SQL2的条件部分，不符合这种形式，所以选range是合适的
      this_table.idx_col1 OP <const> AND          <<- used by 'range'
      this_table.idx_col2 OP <const> AND ...      <<- used by 'range'

     and an index on (idx_col1,idx_col2,...). But the fact that
     'range' access uses more keyparts does not mean that it is
     more selective than 'ref' access because these access types
     utilize different parts of the query condition. We
     therefore trust the cost based choice made by
     best_access_path() instead of forcing a heuristic choice
     here.
5) 'range' access uses more keyparts than 'ref' access     ---可以解释为什么”这里的type是 range，key_len = 10“ （这样利用索引获取数据的”准确性“更高，实际是：选择率更低因而用索引更有效）
*/

Q2：同理，SQL3和SQL1相比，只是日期不一样，时间跨度都是1天，也都同样选择了INDEX_TYPE_CTIME索引，但type和key_len的结果也是不一样的。
A2：
Q1种应该表明了这个问题。在此再整理一下。SQL1因2个阶段选定的索引不同，导致不象SQL3一样进行过
best_access_path()->can_switch_from_ref_to_range()优化，这点参见Q1第5条条件2。

Q3：SQL1中的INDEX_TYPE_CTIME只用了最左部分索引，而后面2个SQL都可以用到整个索引。
A3: 首先，类似A2回答方式，对于SQL3有下表：
----+----------------------------+----------------------------+----------------
SQL | 查阅项                     |值   "chosen": true         | 说明
----+----------------------------+----------------------------+----------------
SQL3| chosen_range_access_summary|"index": "INDEX_TYPE_CTIME" |<----+
----+----------------------------+----------------------------+     |<--索引相同
SQL3| considered_access_paths    |"access_type": "ref",       |     |
                                 |"index": "INDEX_TYPE_CTIME" |<----+
----+----------------------------+----------------------------+----------------
另外，为什么SQL3的如上表所得索引相同呢？这应该是基于代价的计算结果，SQL1和SQL3执行时，其数据量不同（总元组数不同+符合条件的记录数不同）。

其次：查阅项为 attached_conditions_summary 时：
----+----------------------------+----------------------------——----------------
SQL | 值 "chosen": true
----+----------------------------+---------------------------------------------
SQL1| "attached": "((`order_goods`.`create_time` >= '2014-08-09 00:00:00') and (`order_goods`.`create_time` <= '2014-08-09 23:59:59'))"\
----+----------------------------+----------------------------------------------
SQL3| "attached": "((`order_goods`.`goods_type` = 0) and (`order_goods`.`create_time` >= '2014-08-10 00:00:00') and (`order_goods`.`create_time` <= '2014-08-10 23:59:59'))"\
----+----------------------------+----------------------------+----------------
显然，SQL1此时比SQL3少了(`order_goods`.`goods_type` = 0)这个条件。所以其key_len的值为5而不是SQL3中的10。
另外，从order_goods-SQL3.json文件的attached_conditions_computation项看SQL3："cause": "uses_more_keyparts"，
这也说明是因为Q2中第五条的条件成立，使得从ref变为range的优化可行，所以可以使用索引中的更多的keypart。
这也是SLQL3中的key_len的值比SQL1的key_len大的原因。

Q：但是，为什么一个索引，不优先使用全部索引组成键，而是优先使用一个索引的一部分呢？比如SQL1中为什么使用create_time列而不使用goods_type？
A：理论上讲，使用索引的全部键比使用前缀部分更好。使用索引的前缀部分是一种退而求其次的方式。
所以，SQL1中的得到的ref访问方式下使用索引的一部分键实则不合适，算作是bug。

0 0