MYSQL 索引以及优化

MYSQL 索引的实用案例:
假设数据库中一个表有10^6条记录,DBMS的页面大小为4K,并存储100条记录。如果没有索引,查询将对整个表进行扫描,最坏的情况下,如果所有数据页都不在内存,需要读取10^4个页面,如果这10^4个页面在磁盘上随机分布,需要进行10^4次I/O,假设磁盘每次I/O时间为10ms(忽略数据传输时间),则总共需要100s(但实际上要好很多很多)。如果对之建立B-Tree索引,则只需要进行log100(10^6)=3次页面读取,最坏情况下耗时30ms。这就是索引带来的效果,很多时候,当你的应用程序进行SQL查询速度很慢时,应该想想是否可以建索引

数据库表,有以下索引

  KEY `s_a_c` (`status`,`answer_cnt`,`cid2`,`cid3`),
  KEY `s_c_c` (`status`,`coins`,`cid2`,`cid3`),
  KEY `s_c` (`status`,`cid2`,`cid3`),
  KEY `idx_status_add` (`status`,`add_coins`,`user_update_time`),
  KEY `o_t` (`top_time`),
  KEY `s_o` (`status`,`top_time`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |

一、实验背景list_10这个表中有9百万

mysql> select count(1) from list_10 \G
*************************** 1. row ***************************
count(1): 9185382
1 row in set (1.39 sec)

二、测试索引字段的顺序对查询结果的影响

2.1

mysql> explain select * from list_10 where status=10 and cid2=0 and cid3=0 ;
+----+-------------+---------+------+------------------------------------+------+---------+-------------------+-------+-------+
| id | select_type | table   | type | possible_keys                      | key  | key_len | ref               | rows  | Extra |
+----+-------------+---------+------+------------------------------------+------+---------+-------------------+-------+-------+
|  1 | SIMPLE      | list_10 | ref  | s_a_c,s_c_c,s_c,idx_status_add,s_o | s_c  | 5       | const,const,const | 21254 |       |
+----+-------------+---------+------+------------------------------------+------+---------+-------------------+-------+-------+

2.2

mysql> explain select * from list_10 where cid2=0 and cid3=0 and status=10 ;
+----+-------------+---------+------+------------------------------------+------+---------+-------------------+-------+-------+
| id | select_type | table   | type | possible_keys                      | key  | key_len | ref               | rows  | Extra |
+----+-------------+---------+------+------------------------------------+------+---------+-------------------+-------+-------+
|  1 | SIMPLE      | list_10 | ref  | s_a_c,s_c_c,s_c,idx_status_add,s_o | s_c  | 5       | const,const,const | 21254 |       |
+----+-------------+---------+------+------------------------------------+------+---------+-------------------+-------+-------+

2.1 和 2.2 故意把索引字段的顺序做了改变,发现结果type都是ref,ref 都是 const,结论索引字段顺序不会影响查询效果

三、测试缺失组合索引的字段来观察影响

3.1结论索引缺失后并没有影响索引的使用ref列可以看到,但是组合索引首列缺失是无法用到的

mysql> explain select * from list_10 where cid2=0 and status=10;
+----+-------------+---------+------+------------------------------------+------+---------+-------------+-------+-------+
| id | select_type | table   | type | possible_keys                      | key  | key_len | ref         | rows  | Extra |
+----+-------------+---------+------+------------------------------------+------+---------+-------------+-------+-------+
|  1 | SIMPLE      | list_10 | ref  | s_a_c,s_c_c,s_c,idx_status_add,s_o | s_c  | 3       | const,const | 21260 |       |
+----+-------------+---------+------+------------------------------------+------+---------+-------------+-------+-------+

首列缺失

mysql> explain select * from list_10 where cid2=0 and cid3=0;
+----+-------------+---------+------+---------------+------+---------+------+----------+-------------+
| id | select_type | table   | type | possible_keys | key  | key_len | ref  | rows     | Extra       |
+----+-------------+---------+------+---------------+------+---------+------+----------+-------------+
|  1 | SIMPLE      | list_10 | ALL  | NULL          | NULL | NULL    | NULL | 10589505 | Using where |
+----+-------------+---------+------+---------------+------+---------+------+----------+-------------+

3.2 以下两个图可以看出在组合索引中只要有字段采用范围符号,显示用到索引,但是实际上用不到索引列

mysql> explain select * from list_10 where cid2>0 and status=10 ;
+----+-------------+---------+-------+------------------------------------+------+---------+------+---------+-------------+
| id | select_type | table   | type  | possible_keys                      | key  | key_len | ref  | rows    | Extra       |
+----+-------------+---------+-------+------------------------------------+------+---------+------+---------+-------------+
|  1 | SIMPLE      | list_10 | range | s_a_c,s_c_c,s_c,idx_status_add,s_o | s_c  | 3       | NULL | 1366888 | Using where |
+----+-------------+---------+-------+------------------------------------+------+---------+------+---------+-------------+

mysql> explain select * from list_10 where status =10 and cid2>0;
+----+-------------+---------+-------+------------------------------------+------+---------+------+---------+-------------+
| id | select_type | table   | type  | possible_keys                      | key  | key_len | ref  | rows    | Extra       |
+----+-------------+---------+-------+------------------------------------+------+---------+------+---------+-------------+
|  1 | SIMPLE      | list_10 | range | s_a_c,s_c_c,s_c,idx_status_add,s_o | s_c  | 3       | NULL | 1366888 | Using where |
+----+-------------+---------+-------+------------------------------------+------+---------+------+---------+-------------+

四:在组合索引外增加个order by 主键索引和无索引字段. 真是发现无索引的字段毁所有啊. 当然用了 force index 强制一下效果能好多了.

mysql> explain select * from list_10 where cid2=0 and cid3=0 and status=10 order by ask_id desc limit 5;
+----+-------------+---------+------+------------------------------------+------+---------+-------------------+-------+-------------+
| id | select_type | table   | type | possible_keys                      | key  | key_len | ref               | rows  | Extra       |
+----+-------------+---------+------+------------------------------------+------+---------+-------------------+-------+-------------+
|  1 | SIMPLE      | list_10 | ref  | s_a_c,s_c_c,s_c,idx_status_add,s_o | s_c  | 5       | const,const,const | 21262 | Using where |
+----+-------------+---------+------+------------------------------------+------+---------+-------------------+-------+-------------+
1 row in set (0.00 sec)

mysql> 
mysql> explain select * from list_10 where cid2=0 and cid3=0 and status=10 order by top_time desc limit 5;
+----+-------------+---------+-------+------------------------------------+------+---------+------+------+-------------+
| id | select_type | table   | type  | possible_keys                      | key  | key_len | ref  | rows | Extra       |
+----+-------------+---------+-------+------------------------------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | list_10 | index | s_a_c,s_c_c,s_c,idx_status_add,s_o | o_t  | 4       | NULL | 2203 | Using where |
+----+-------------+---------+-------+------------------------------------+------+---------+------+------+-------------+
1 row in set (0.00 sec)

mysql> explain select * from list_10 FORCE INDEX (s_c) where cid2=0 and cid3=0 and status=10 order by top_time desc limit 5;
+----+-------------+---------+------+---------------+------+---------+-------------------+-------+-----------------------------+
| id | select_type | table   | type | possible_keys | key  | key_len | ref               | rows  | Extra                       |
+----+-------------+---------+------+---------------+------+---------+-------------------+-------+-----------------------------+
|  1 | SIMPLE      | list_10 | ref  | s_c           | s_c  | 5       | const,const,const | 21262 | Using where; Using filesort |
+----+-------------+---------+------+---------------+------+---------+-------------------+-------+-----------------------------+

EXPLAIN列的解释:

table:显示这一行的数据是关于哪张表的

type:这是重要的列,显示连接使用了何种类型。从最好到最差的连接类型为const、eq_reg、ref、range、indexhe和ALL

possible_keys:显示可能应用在这张表中的索引。如果为空,没有可能的索引。可以为相关的域从WHERE语句中选择一个合适的语句

key: 实际使用的索引。如果为NULL,则没有使用索引。很少的情况下,MYSQL会选择优化不足的索引。这种情况下,可以在SELECT语句中使用USE INDEX(indexname)来强制使用一个索引或者用IGNORE INDEX(indexname)来强制MYSQL忽略索引

key_len:使用的索引的长度。在不损失精确性的情况下,长度越短越好

ref:显示索引的哪一列被使用了,如果可能的话,是一个常数

rows:MYSQL认为必须检查的用来返回请求数据的行数

Extra:这里可以看到的坏的例子是Using temporary和Using filesort,意思MYSQL根本不能使用索引,结果是检索会很慢
(1)using where 使用了where子句来过滤元组
(2)using temporary 用临时表保存中间结果,常用于GROUP BY 和 ORDER BY操作中
(3)using filesort 使用文件完成排序操作,这是可能是ordery by,group by语句的结果,这可能是一个CPU密集型的过程,可以通过选择合适的索引来改进性能,用索引来为查询结果排序。
(4)using index 不需要读取数据文件,从索引树(索引文件)中即可获得信息。如果同时出现using where,表明索引被用来执行索引键值的查找,没有using where,表明索引用来读取数据而非执行查找动作。
(5)using join buffer使用了连接缓存:Block Nested Loop,连接算法是块嵌套循环连接
(6)impossible where where子句的值总是false,不能用来获取任何元组
(7)select tables optimized away在没有GROUP BY子句的情况下,基于索引优化MIN/MAX操作,或者对于MyISAM存储引擎优化COUNT(*)操作,不必等到执行阶段再进行计算,查询执行计划生成的阶段即完成优化。
(8)distinct优化distinct操作,在找到第一匹配的元组后即停止找同样值的动作