2017-04-05

MySQL查询优化技术之非SPJ优化

摘要：结合实例探讨mysql查询优化技术中的非SPJ优化

SPJ：Select（查询）、Project（投影）、Join（连接）
非SPJ：SPJ+[Group By、Order By、Distinct、Limit等]

Group By的优化

创建测试数据

CREATE TABLE t_god (a INT, b INT, c INT, d INT, e INT);
CREATE INDEX t_god_idx_1 ON t_god (a,b,c);
CREATE INDEX t_god_idx_2 ON t_god (d);

1、在索引列上执行GROUP BY，支持GROUP BY优化（没有使用“Using file sort”类似的操作进行排序）：

root@testdb 10:44:23>EXPLAIN EXTENDED SELECT a FROM t_god GROUP BY a;
+----+-------------+-------+-------+---------------+-------------+---------+------+------+----------+-------------+
| id | select_type | table | type  | possible_keys | key         | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-------+-------+---------------+-------------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | t_god | index | t_god_idx_1   | t_god_idx_1 | 15      | NULL |    1 |   100.00 | Using index |
+----+-------------+-------+-------+---------------+-------------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

2、在索引列上执行ORDERBY，MySQL支持ORDERBY优化：

root@testdb 10:44:33>EXPLAIN EXTENDED SELECT a FROM t_god ORDER BY a;     
+----+-------------+-------+-------+---------------+-------------+---------+------+------+----------+-------------+
| id | select_type | table | type  | possible_keys | key         | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-------+-------+---------------+-------------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | t_god | index | NULL          | t_god_idx_1 | 15      | NULL |    1 |   100.00 | Using index |
+----+-------------+-------+-------+---------------+-------------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

3、在索引列上执行ORDERBY、GROUPBY，MySQL支持ORDERBY优化也支持GROUPBY优化：

root@testdb 11:08:54>EXPLAIN EXTENDED SELECT a FROM t_god GROUP BY a ORDER BY a;
+----+-------------+-------+-------+---------------+-------------+---------+------+------+----------+-------------+
| id | select_type | table | type  | possible_keys | key         | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-------+-------+---------------+-------------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | t_god | index | t_god_idx_1   | t_god_idx_1 | 15      | NULL |    1 |   100.00 | Using index |
+----+-------------+-------+-------+---------------+-------------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

4、在索引列上执行带有聚集操作的GROUPBY，MySQL支持GROUPBY优化：

root@testdb 11:10:14>EXPLAIN SELECT a, MIN(b) FROM t_god WHERE c>2 GROUP BY a;
+----+-------------+-------+-------+---------------+-------------+---------+------+------+--------------------------+
| id | select_type | table | type  | possible_keys | key         | key_len | ref  | rows | Extra                    |
+----+-------------+-------+-------+---------------+-------------+---------+------+------+--------------------------+
|  1 | SIMPLE      | t_god | index | t_god_idx_1   | t_god_idx_1 | 15      | NULL |    1 | Using where; Using index |
+----+-------------+-------+-------+---------------+-------------+---------+------+------+--------------------------+
1 row in set (0.00 sec)

Order By的优化

创建测试数据

CREATE TABLE t_o1 (a1 INT UNIQUE, b1 INT);
CREATE TABLE t_o2 (a2 INT UNIQUE, b2 INT);

1、在索引列上进行排序操作， MySQL支持利用索引进行排序优化：

root@testdb 11:13:04>EXPLAIN SELECT a1 FROM t_o1 ORDER BY a1;
+----+-------------+-------+-------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+-------+-------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | t_o1  | index | NULL          | a1   | 5       | NULL |    1 | Using index |
+----+-------------+-------+-------+---------------+------+---------+------+------+-------------+
1 row in set (0.00 sec)

2、排序下推，MySQL不支持。在非索引列上执行连接，然后排序：

root@testdb 11:16:32>EXPLAIN EXTENDED SELECT * FROM t_o1, t_o2 WHERE b1=b2 ORDER BY b1;
+----+-------------+-------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                                              |
+----+-------------+-------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+
|  1 | SIMPLE      | t_o1  | ALL  | NULL          | NULL | NULL    | NULL |    1 |   100.00 | Using temporary; Using filesort                    |
|  1 | SIMPLE      | t_o2  | ALL  | NULL          | NULL | NULL    | NULL |    1 |   100.00 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+
2 rows in set, 1 warning (0.00 sec)
可以看到执行计划中使用了“Using file sort”临时表操作进行排序

group by、order by对同一个Btree索引的前缀部分可以用到索引进行优化

Distinct的优化

1、MySQL支持对于DISTINCT消除的优化技术。在主键的a1列上执行DISTINCT是可以去除distinct的，查询执行计划如下：

root@testdb 11:36:32>explain select distinct a1 from t1;
+----+-------------+-------+-------+----------------+--------+---------+------+------+-------------+
| id | select_type | table | type  | possible_keys  | key    | key_len | ref  | rows | Extra       |
+----+-------------+-------+-------+----------------+--------+---------+------+------+-------------+
|  1 | SIMPLE      | t1    | index | PRIMARY,idx_b1 | idx_b1 | 21      | NULL | 9977 | Using index |
+----+-------------+-------+-------+----------------+--------+---------+------+------+-------------+
1 row in set (0.00 sec)

2、MySQL不支持对于DISTINCT推入的优化技术。
a2列是唯一列，又处于反半连接的语义（NOT EXISTS），完全可以把DISTINCT下推到表t_o2中先执行，然后再执行反半连接操作：

root@testdb 11:39:21>EXPLAIN EXTENDED SELECT DISTINCT b1 FROM t_o1 WHERE NOT EXISTS (SELECT 1 FROM t_o2 WHERE b1=a2);
+----+--------------------+-------+------+---------------+------+---------+----------------+------+----------+------------------------------+
| id | select_type        | table | type | possible_keys | key  | key_len | ref            | rows | filtered | Extra                        |
+----+--------------------+-------+------+---------------+------+---------+----------------+------+----------+------------------------------+
|  1 | PRIMARY            | t_o1  | ALL  | NULL          | NULL | NULL    | NULL           |    1 |   100.00 | Using where; Using temporary |
|  2 | DEPENDENT SUBQUERY | t_o2  | ref  | a2            | a2   | 5       | testdb.t_o1.b1 |    1 |   100.00 | Using index                  |
+----+--------------------+-------+------+---------------+------+---------+----------------+------+----------+------------------------------+
2 rows in set, 2 warnings (0.00 sec)
通过优化器处理后的执行计划可以看出，distinct关键字仍然存在：
root@testdb 11:39:33>show warnings;
+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                                                                                                                |
+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note  | 1276 | Field or reference 'testdb.t_o1.b1' of SELECT #2 was resolved in SELECT #1                                                                                                                             |
| Note  | 1003 | /* select#1 */ select distinct `testdb`.`t_o1`.`b1` AS `b1` from `testdb`.`t_o1` where (not(exists(/* select#2 */ select 1 from `testdb`.`t_o2` where (`testdb`.`t_o1`.`b1` = `testdb`.`t_o2`.`a2`)))) |
+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)

LIMIT的优化

1、LIMIT对单表扫描的影响：如果索引扫描可用且花费低于全表扫描，则用索引扫描实现LIMIT（LIMIT取很少量的行，否则优化器更倾向于使用全表扫描）。
2、LIMIT对排序的影响：如果LIMIT和ORDERBY子句协同使用，当取到LIMIT设定个数的有序元组数后，后续的排序操作将不再进行。
3、LIMIT对去重的影响：如果LIMIT和DISTINCT子句协同使用，当取到LIMIT设定个数的唯一的元组数后，后续的去重操作将不再进行。
4、LIMIT受分组的影响：如果LIMIT和GROUPBY子句协同使用，GROUPBY按索引有序计算每个组的总数的过程中，LIMIT操作不必计数直到下一个分组开始计算。
5、LIMIT 0：直接返回空结果集。
6、MySQL支持对不带HAVING子句的LIMIT进行优化。

SET的优化

与集合操作相关的优化：ORDER BY子句去除
union本身就是无序的合并结果集，在子句中加order by操作是无意义的，因此从下面执行计划中可以看出是没有排序操作的：

root@testdb 01:39:50>explain extended (select * from t1 where b1 < 20 order by b1) union (select * from t2 where b2 < 20 order by b2);
+----+--------------+------------+-------+---------------+--------+---------+------+------+----------+--------------------------+
| id | select_type  | table      | type  | possible_keys | key    | key_len | ref  | rows | filtered | Extra                    |
+----+--------------+------------+-------+---------------+--------+---------+------+------+----------+--------------------------+
|  1 | PRIMARY      | t1         | range | idx_b1        | idx_b1 | 5       | NULL |   18 |   100.00 | Using where; Using index |
|  2 | UNION        | t2         | ALL   | NULL          | NULL   | NULL    | NULL | 9999 |   100.00 | Using where              |
| NULL | UNION RESULT | <union1,2> | ALL   | NULL          | NULL   | NULL    | NULL | NULL |     NULL | Using temporary          |
+----+--------------+------------+-------+---------------+--------+---------+------+------+----------+--------------------------+
3 rows in set, 1 warning (0.01 sec)

如果在order by后面加上limit就无法去除order by子句了。从下面执行计划可以看出存在排序操作（Using filesort）。

root@testdb 01:40:53>explain extended (select * from t1 where b1 < 20 order by b1 limit 5) union (select * from t2 where b2 < 20 order by b2 limit 5);
+----+--------------+------------+-------+---------------+--------+---------+------+------+----------+-----------------------------+
| id | select_type  | table      | type  | possible_keys | key    | key_len | ref  | rows | filtered | Extra                       |
+----+--------------+------------+-------+---------------+--------+---------+------+------+----------+-----------------------------+
|  1 | PRIMARY      | t1         | range | idx_b1        | idx_b1 | 5       | NULL |   18 |   100.00 | Using where; Using index    |
|  2 | UNION        | t2         | ALL   | NULL          | NULL   | NULL    | NULL | 9999 |   100.00 | Using where; Using filesort |
| NULL | UNION RESULT | <union1,2> | ALL   | NULL          | NULL   | NULL    | NULL | NULL |     NULL | Using temporary             |
+----+--------------+------------+-------+---------------+--------+---------+------+------+----------+-----------------------------+
3 rows in set, 1 warning (0.01 sec)

其他优化:

1、SELECT DISTINCT a FROM t1 LIMIT 1;
2、SELECT DISTICT MAX(a) FROM t1;

本文标题:MySQL查询优化技术之非SPJ优化

文章作者:一条大河

发布时间:2017-04-05, 10:35:51

最后更新:2017-04-05, 14:28:08

原始链接:http://riverdba.github.io/2017/04/05/Non-SPJ-Optimization-of-mysql/

许可协议: "署名-非商用-相同方式共享 4.0" 转载请保留原文链接及作者。