MySQL优化之SQL优化(一)
来源:互联网 发布:数据分析职业发展方向 编辑:程序博客网 时间:2024/05/16 08:17
最近在线上发现很多性能有问题的sql,开发写sql语句的时候,没充分考虑是否用上索引了,所以这个坑得DBA来填,好了,废话不多说,把一些线上的优化经验跟大家分享。
由于是线上的表,所以就不公开具体的表结构了,请大家体谅,我会模拟一个类似的表来说明当时的性能问题:
当时的表结构类似此表:
mysql> show create table test\G *************************** 1. row *************************** Table: testCreate Table: CREATE TABLE `test` ( `id` int(11) NOT NULL AUTO_INCREMENT, `aa_id` int(11) DEFAULT NULL, `dealername` varchar(45) DEFAULT NULL, `dealertype` int(2) DEFAULT NULL, `bb_id` int(11) NOT NULL, `membername` varchar(45) DEFAULT NULL, `createat` datetime DEFAULT NULL, `creator_id` int(11) DEFAULT NULL, `name` varchar(45) DEFAULT NULL, `comp_id` int(11) DEFAULT NULL, `companyname` varchar(45) DEFAULT NULL, `cc_id` int(11) DEFAULT NULL, `level_id` int(2) NOT NULL, PRIMARY KEY (`id`), UNIQUE KEY `shopmember_unique` (`aa_id`,`bb_id`,`cc_id`) USING BTREE) ENGINE=MyISAM AUTO_INCREMENT=301554 DEFAULT CHARSET=utf81 row in set (0.00 sec)mysql>
当时查看Lepus 的慢查询监控,看到大量的这类SQL语句,而且消耗时长有点长:
大量类似以下的SQL语句:
select aa_id,dealername,dealertype,membername from test where level_id <=4 order by aa_id limit 243000, 100;
下面我们看一下SQL语句的执行计划:
mysql> explain select aa_id,dealername,dealertype,membername from test where level_id <=4 order by aa_id limit 243000, 100; +----+-------------+-------+------+---------------+------+---------+------+--------+-----------------------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+-------+------+---------------+------+---------+------+--------+-----------------------------+| 1 | SIMPLE | test | ALL | NULL | NULL | NULL | NULL | 301508 | Using where; Using filesort |+----+-------------+-------+------+---------------+------+---------+------+--------+-----------------------------+1 row in set (0.00 sec)
很多人一看表结构,发现在列level_id没索引,添加一个普通索引就完事啦!下面我们来试试:
mysql> alter table test add key (level_id);Query OK, 301508 rows affected (3.71 sec)Records: 301508 Duplicates: 0 Warnings: 0mysql> explain select * from test where level_id <=4 order by aa_id limit 243000, 100;+----+-------------+-------+-------+---------------+----------+---------+------+--------+-----------------------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+-------+-------+---------------+----------+---------+------+--------+-----------------------------+| 1 | SIMPLE | test | range | level_id | level_id | 4 | NULL | 301393 | Using where; Using filesort |+----+-------------+-------+-------+---------------+----------+---------+------+--------+-----------------------------+1 row in set (0.00 sec)mysql>
添加索引后,可以看到用上索引了,但效果相对之前并没有很大的提升,还有些人可能会说,在level_id和aa_id添加组合索引,性能可能就好了,我们再来看下:
mysql> alter table test add key (level_id,aa_id); Query OK, 301508 rows affected (3.75 sec)Records: 301508 Duplicates: 0 Warnings: 0mysql> explain select * from test where level_id <=4 order by aa_id limit 243000, 100;+----+-------------+-------+-------+---------------+----------+---------+------+--------+-----------------------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+-------+-------+---------------+----------+---------+------+--------+-----------------------------+| 1 | SIMPLE | test | range | level_id | level_id | 4 | NULL | 301218 | Using where; Using filesort |+----+-------------+-------+-------+---------------+----------+---------+------+--------+-----------------------------+1 row in set (0.00 sec)mysql>
可以看到,效果还是一样的差。为什么显示用上了索引,却还扫描了几十万行呢?
我们回顾一下不会用上索引的几种情况:(可以参考我的之前写的常用SQL语句优化)
• 两个表关联字段类型不一样(也包括长度不一样)
• 通过索引扫描的记录数超过30%,变成全表扫描
• 联合索引中,第一个索引列使用范围查询,且返回数据超表总数据的30%
• 联合索引中,第一个查询条件不是最左索引列
• 模糊查询条件列最左以通配符 % 开始
• 内存表(HEAP 表)使用HASH索引时,使用范围检索或者ORDER BY
• 两个独立索引,其中一个用于检索,一个用于排序(只能用到部份)
• 使用了不同的 ORDER BY 和 GROUP BY 表达式
上面的SQL语句,符合了上面的联合索引中,第一个索引使用范围查询所以用不上索引,我们直接查询看看用时为多少:
mysql> select * from test where level_id <=4 order by aa_id limit 243000, 100;100 rows in set (1.63 sec)
没用上索引,那我们应该怎么优化它呢?我们应该用延迟关联的思想,把sql语句修改为如下:
写法一:
mysql> reset query cache;Query OK, 0 rows affected (0.00 sec)mysql> explain SELECT a.* FROM test a,(select id from test where level_id <=4 order by aa_id limit 243000, 100) b where a.id=b.id ;+----+-------------+------------+--------+---------------+----------+---------+------+--------+-----------------------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+------------+--------+---------------+----------+---------+------+--------+-----------------------------+| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 100 | || 1 | PRIMARY | a | eq_ref | PRIMARY | PRIMARY | 4 | b.id | 1 | || 2 | DERIVED | test | range | level_id | level_id | 4 | NULL | 301218 | Using where; Using filesort |+----+-------------+------------+--------+---------------+----------+---------+------+--------+-----------------------------+3 rows in set (0.30 sec)mysql> SELECT a.* FROM test a,(select id from test where level_id <=4 order by aa_id limit 243000, 100) b where a.id=b.id ;100 rows in set (0.30 sec)
可以看到速度快了几倍,现在数据量只有几十万,如果几百万,效果会更明显,为什么这样写会比之前的效果好呢?因为延迟关联通过覆盖索引返回所需数据行的主键,再根据主键关联原表获得需要的数据,所以速度比之前快上不少。
写法二:
mysql> EXPLAIN SELECT * FROM `test` INNER JOIN ( SELECT id FROM `test` where level_id <=4 order by aa_id limit 243000, 100) t2 USING (id);+----+-------------+------------+--------+---------------+----------+---------+-------+--------+-----------------------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+------------+--------+---------------+----------+---------+-------+--------+-----------------------------+| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 100 | || 1 | PRIMARY | test | eq_ref | PRIMARY | PRIMARY | 4 | t2.id | 1 | || 2 | DERIVED | test | range | level_id | level_id | 4 | NULL | 301218 | Using where; Using filesort |+----+-------------+------------+--------+---------------+----------+---------+-------+--------+-----------------------------+3 rows in set (0.30 sec)mysql>
优化案例二:
表结构是:
mysql> show create table test2\G*************************** 1. row *************************** Table: test2Create Table: CREATE TABLE `test2` ( `id` int(11) NOT NULL AUTO_INCREMENT, `name` varchar(45) DEFAULT NULL, `code` varchar(32) DEFAULT NULL, `url` varchar(255) DEFAULT NULL, `status` int(2) DEFAULT '1', `createat` datetime DEFAULT NULL, `write_id` int(11) DEFAULT NULL, `creator_id` int(11) DEFAULT NULL, `dealer_id` int(11) DEFAULT NULL, PRIMARY KEY (`id`), UNIQUE KEY `code_index` (`code`) USING BTREE, KEY `dealer_id` (`dealer_id`)) ENGINE=InnoDB AUTO_INCREMENT=7014142 DEFAULT CHARSET=utf81 row in set (0.00 sec)mysql>
slowlog里有大量这样的查询:
select count( id ) from `test2` where createat between '2015-05-26 00:00:00' and '2015-05-26 23:59:59' and status not in(7) and creator_id=8774 and write_id=925;
查看下执行计划:
mysql> explain select count( id ) from `test2` where createat between '2015-05-26 00:00:00' and '2015-05-26 23:59:59' and status not in(7) and creator_id=8774 and write_id=925;+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+| 1 | SIMPLE | test2 | ALL | NULL | NULL | NULL | NULL | 5135067 | Using where |+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+1 row in set (0.00 sec)mysql>
没有索引,做了全表扫描,有些开发人员创建表的时候考虑得不周到,导致频繁出现影响性能的sql,我们添加组合索引看看效果(这里要注意一下,在线上如果是5.6以下的版本,对于一些大数据的表,别直接添加索引,因为这个过程会阻塞DML操作的,如果添加索引需要的时间是几个小时或者更多,这是很悲剧的一件事情,个人经验,小数据的表发现没索引,或者索引设置的不合理,直接alter修改,大数据的表,就要用pt工具了。5.6版本的MySQL虽然支持了Online DDL,但也添加索引的时候,要考虑是否处于业务的高峰期,尽量选择业务量不繁忙的时候添加):
mysql> alter table test2 add key (createat,status);Query OK, 0 rows affected (1 min 36.27 sec)Records: 0 Duplicates: 0 Warnings: 0mysql> explain select count( id ) from `test2` where createat between '2015-05-26 00:00:00' and '2015-05-26 23:59:59' and status not in(7) and creator_id=8774 and write_id=925;+----+-------------+-------+-------+---------------+----------+---------+------+------+--------------------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+-------+-------+---------------+----------+---------+------+------+--------------------------+| 1 | SIMPLE | test2 | range | createat | createat | 14 | NULL | 1 | Using where; Using index |+----+-------------+-------+-------+---------------+----------+---------+------+------+--------------------------+1 row in set (0.00 sec)mysql>
可以看到已经用上索引了,Using index说明用上了覆盖索引,覆盖索引(只访问索引的查询,即查询只需要访问索引,而无须访问数据行,最简单的理解,比如翻开一本书,从目录页查找某些内容,但是目录就写的比较详细,我们在目录就找到了自己想看的内容)
而且返回在的行数,差距有多大,相信大家一眼就看到了。
例如:SELECT * FROM t WHERE col1 = ? AND col2 >= ? AND col3 = ?; 这时候,只能用到 idx 索引的最左2列进行检索,而col3条件则无法利用索引进行检索,所以上面只添加组合索引为前两列的值。
案例三
多表联接并且有排序时,排序字段必须是驱动表里的,否则排序列无法用到索引;
表os_diskio_history的表结构:
mysql> show create table os_diskio_history \G*************************** 1. row *************************** Table: os_diskio_historyCreate Table: CREATE TABLE `os_diskio_history` ( `id` int(10) NOT NULL AUTO_INCREMENT, `ip` varchar(50) NOT NULL, `tags` varchar(100) DEFAULT NULL, `fdisk` varchar(50) NOT NULL DEFAULT '0', `disk_io_reads` bigint(18) NOT NULL DEFAULT '0', `disk_io_writes` bigint(18) NOT NULL DEFAULT '0', `create_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, `YmdHi` bigint(10) NOT NULL DEFAULT '0', PRIMARY KEY (`id`), KEY `idx_ymdhi` (`YmdHi`) USING BTREE, KEY `idx_ip_ymdhi` (`ip`,`YmdHi`), KEY `idx_io_reads` (`disk_io_reads`), KEY `idx_io_writes` (`disk_io_writes`)) ENGINE=InnoDB AUTO_INCREMENT=3550043 DEFAULT CHARSET=utf81 row in set (0.00 sec)
表os_diskio的表结构:
mysql> show create table os_diskio\G*************************** 1. row *************************** Table: os_diskioCreate Table: CREATE TABLE `os_diskio` ( `id` int(10) NOT NULL AUTO_INCREMENT, `ip` varchar(50) NOT NULL, `tags` varchar(100) DEFAULT NULL, `fdisk` varchar(50) NOT NULL DEFAULT '0', `disk_io_reads` bigint(18) NOT NULL DEFAULT '0', `disk_io_writes` bigint(18) NOT NULL DEFAULT '0', `create_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, PRIMARY KEY (`id`)) ENGINE=InnoDB AUTO_INCREMENT=3550092 DEFAULT CHARSET=utf81 row in set (0.00 sec)mysql>
优化前的执行计划如下:
mysql> explain select a.* from os_diskio_history a inner join os_diskio b on a.id=b.id order by a.id desc limit 10;+----+-------------+-------+--------+---------------+---------+---------+------------+------+----------------------------------------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+-------+--------+---------------+---------+---------+------------+------+----------------------------------------------+| 1 | SIMPLE | b | index | PRIMARY | PRIMARY | 4 | NULL | 19 | Using index; Using temporary; Using filesort || 1 | SIMPLE | a | eq_ref | PRIMARY | PRIMARY | 4 | lepus.b.id | 1 | |+----+-------------+-------+--------+---------------+---------+---------+------------+------+----------------------------------------------+2 rows in set (0.00 sec)mysql>
sql重构后,把连接方式改成把连接方式改成了「STRAIGHT_JOIN」的执行计划如下:
mysql> explain select a.* from os_diskio_history a straight_join os_diskio b on a.id=b.id order by a.id desc limit 10;+----+-------------+-------+--------+---------------+---------+---------+------------+------+-------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+-------+--------+---------------+---------+---------+------------+------+-------------+| 1 | SIMPLE | a | index | PRIMARY | PRIMARY | 4 | NULL | 10 | || 1 | SIMPLE | b | eq_ref | PRIMARY | PRIMARY | 4 | lepus.a.id | 1 | Using index |+----+-------------+-------+--------+---------------+---------+---------+------------+------+-------------+2 rows in set (0.00 sec)mysql>
可以看到性能是提升的,由于数据量不大,这里只能举个例子说明,嘻嘻,期待大家探讨
总结:一、定期采集slow query log,用pt-query-digest工具进行分析,可结合Anemometer系统进行slow query管理以便分析slow query并进行后续优化工作;也可以用开源的Lepus监控系统,可以让你清楚明了的发现那些sql执行用的时长,执行的次数等
二、当发现表没有适当的索引时,要添加索引的时候,要考虑数据里的大小及业务的繁忙程度再添加
三、给开发适当的MySQL培训,让开发慢慢改善,毕竟DBA对业务逻辑这块没开发熟悉
参考资料:http://blog.itpub.net/22664653/viewspace-1176153/
http://imysql.com/2014/07/26/mysql-optimization-case-paging-optimize.shtml
http://t.cn/R2IcTMi
作者:陆炫志
出处:xuanzhi的博客 http://www.cnblogs.com/xuanzhi201111
您的支持是对博主最大的鼓励,感谢您的认真阅读。本文版权归作者所有,欢迎转载,但请保留该声明。
- MySQL优化之SQL优化(一)
- mysql之sql优化
- MySQL之SQL优化
- mysql之sql优化
- Mysql 千万以上数据优化方法(一,SQL优化),月薪30K之路系列
- Mysql的一条SQL优化(一)
- Mysql优化(一)—Sql语句
- Mysql 千万以上数据优化方法(一,SQL优化)
- MySQL 数据库性能优化之(四)SQL优化
- mysql优化之二(SQL语句优化)
- 性能优化之MySQL优化(二)- explain分析SQL
- 【Day62】MySQL数据库性能优化之四(SQL优化)
- MySQL 数据库性能优化之SQL优化
- MySQL 数据库性能优化之SQL优化
- MySQL 数据库性能优化之SQL优化
- MySQL 数据库性能优化之SQL优化
- MySQL 数据库性能优化之SQL优化
- MySQL 数据库性能优化之SQL优化
- python学习:运算符
- 基于二维数组的六边形地图数据结构的实现(蜂窝拓补)
- Failed to execute 'importScripts' on 'WorkerGlobalScope'
- 【XSY1536】【BZOJ3522】【BZOJ4543】【POI2014】Hotel 树形DP 长链剖分 启发式合并
- mybatis之用户订单商品数据模型分析(九)
- MySQL优化之SQL优化(一)
- VBA处理文件框架代码 【第六部分:目录结构,框架功能介绍】
- 类的加载
- 命令模式
- 从太原马拉松未中签聊起
- html中的元数据和全局属性
- golang 各版本下载地址
- lintcode --寻找缺失的数
- C之有趣-数组中不改变次序的排名