首页 mysql性能优化器使用指南
文章
取消

mysql性能优化器使用指南

关于本文

本文由The Unofficial MySQL 8.0 Optimizer Guide翻译得来,虽然原文标题为《MYSQL 8.0非官方性能优化器指南》,但实际上大量内容可以运用在MySQL 5.6MySQL 5.7上。所以我并不打算按原文一字不动翻译,会做部分修改,并添加一些本人的见解,添加一些更细致的说明。

虽然本文说是“非官方”,但是作者却是MySQL项目经理,而且得到了MySQL性能优化器团队的支持,可以认为依然属于官方的文档,只是没有放在MySQL官方文档站点上。

阅读本文需要基本的MySQL的知识,可以查阅官方文档。同时建议阅读《高性能MYSQL》,本文可以说是这本书的补充。

本文数据库基于InnoDB引擎。

本文中diff代码块不代表增删,仅仅用于颜色标记。

顺便提一下,MySQL优化不是万能的,特别是包含count()+group by之类的sql。MySQL是OLTP(联机事务处理)数据库,对于OLAP(联机分析处理)的需求,建议从业务角度解决,或者使用其他数据库。

欢迎

关于作者

Morgan Tocker是Oracle的MySQL服务器产品经理。他之前曾担任过各种角色,包括支持、培训和社区。摩根总部位于加拿大多伦多。联系原作者twitter @morgo

致谢

  • 感谢MySQL Optimizer团队提供服务器架构、成本模型和查询优化的图表。
  • 感谢Chrissy Cutting的复制编辑。

本文使用的数据库

示例基于MySQL官网的world sample数据库,它可以从http://dev.MySQL.com/doc/index-other.html下载。进入页面后,下载Example Databases中的world database库导入到本地数据库中,即可同步实验。由于本文的sql语句表名首字母都是大写,所以最后需要重命名表名。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
liuxu@liuxu:~ $ ls -lh world-db.tar.gz
-rw-r--r-- 1 vagrant vagrant 91K Feb 15 04:28 world-db.tar.gz
liuxu@liuxu:~ $ tar -xvf world-db.tar.gz
liuxu@liuxu:~ $ cd world-db
liuxu@liuxu:~ $ MySQL -u你的数据库用户名 -p你的数据库密码 < world.sql
liuxu@liuxu:~ $ MySQL -u你的数据库用户名 -p你的数据库密码
MySQL> show databases;
+---------------------+
| Database            |
+---------------------+
| information_schema  |
| lost+found          |
| MySQL               |
| performance_schema  |
| sys                 |
| world               |
+---------------------+
17 rows in set (0.00 sec)

MySQL> use world;
Database changed

MySQL> show tables;
+-----------------+
| Tables_in_world |
+-----------------+
| city            |
| country         |
| countrylanguage |
+-----------------+
3 rows in set (0.00 sec)

mysql> rename table city to City;
Query OK, 0 rows affected (0.01 sec)

mysql> rename table country to Country;
Query OK, 0 rows affected (0.01 sec)

mysql> rename table countrylanguage to CountryLanguage;
Query OK, 0 rows affected (0.00 sec)

介绍

查询优化器将查询(queries)作为输入,并通过本章中描述的过程产生一个执行计划(execution plan)作为输出。我喜欢将查询优化描述为类似于GPS导航:

  1. 你输入一个地址作为目的地:
    • 大街3294号
  2. 它告诉你如何以最有效的方式到达那里:
    • 继续直行2英里
    • 在万锦街左转
    • 继续500英尺
    • 右转
    • 继续1000英尺
    • 你的目的地在右边

地址是目的地。你没有指定如何到达那里,但你希望导航系统评估潜在路线并建议你最有效的路线。

SQL查询类似于地址,因为SQL语言是声明性的。它传达了最终状态,而不是如何到达那里的过程。同样,由于一个数据库系统会有很多索引(并且可以join更多表),所以也有很多路由可以达到相同的结果。

作为这个类比的最终用途,就像GPS导航并不总是将你引导到绝对最快的路线上一样,查询优化也是如此。正如并非总是有所有街道的交通数据一样,优化器只能在不完整的模型下工作。所以对于有经验的操作员,这可能会需要重写和调整查询语句的情况。

服务器架构

在高层次上,MySQL服务器由两个不同的部分组成:服务器层和存储引擎层。查询优化发生在存储引擎API之上的服务器层,并且在语义上分为四个阶段:

  • 逻辑转换(Logical Transformations)
  • 准备开始基于成本的优化(Preparation for Cost-based Optimization)
  • 基于成本的优化(Cost-based Optimization)
  • 计划细化(Plan Refinement)

将优化器划分为以上定义的多层是长期发展过来的。在MySQL的早期,优化器的查询优化较少,出于性能原因,查询优化的阶段没有很好地分离或定义。

由于新特性降低了可维护性,这就产生了重构的需求,重构已经在MySQL的多个版本(5.6、5.7和8.0)上实现。将执行分成多个阶段也为新功能铺平了道路。例如,如果没有明确的分离,计划缓存(plan caching)将难以实现,因为连接之间的重用变得困难。

B+树索引

虽然还有其他可用的索引类型,但添加索引通常意味着添加B+树索引,无论它是主键、唯一索引还是普通索引。通过很好地理解B+树,你不仅可以提高单个查询的性能,还可以减少必须在内存中的数据(工作集)。这反过来又有助于提高数据库的整体可扩展性。

为了解释B+树的工作原理,我想先解释二叉树的工作原理,然后描述它们的不同之处:

二叉树的主要特性是它具有二叉搜索(binary search)。这意味着要在有100万个条目的(平衡)二叉树中查找829813这个条目,应该不会超过20跳。与扫描整个列表中的100万个条目相比,这是一个相当大的改进:

但我还是想说二叉树是有不足的,如果它进行20次磁盘访问,那么20跳实际上是一个很高的数字。即使每次访问磁盘仅为了获取一行极少的数据,这些IO导致的资源消耗依然很昂贵。读取4K和500B的成本通常相似,B+树有助于分摊IO成本(即使对于SSD也是如此)。

之所以成本相似,是因为我们的文件系统创建时(分区时)通常会将block大小设置为4K。而不管是4K还是小于4K,操作系统读取数据时都会一次读取4K数据并缓存到内存中,为的是避免多次磁盘IO影响性能。如果你的系统有大量小于4K的文件,例如几百万500B的文件,可以在创建文件系统时选择更小的block,以避免磁盘浪费(每一个文件,至少占用1个block)。

相比之下,这是在仅使用2跳的B+tree中进行的相同搜索:

显示来自使用jeremycole/innodb_ruby分析的InnoDB B+tree的实际值

上图显示根索引页(root index page)指向25个不同的索引页(index page),这些索引页本身均匀的指向1538个叶页(leaf page)。查找行829813的路径做了高亮显示:

  • 根页面(root page)开始查找:800788 <= 值 < 829908在页面16386
  • 从页面16386查找:829804 <= 值 <829830 在叶子页面32012
  • 最后在叶子页面32012找到行829813

所以想对于二叉树来说,我们在B+树有2个特别的增强功能:

  1. 将数据组织成页面。这是为从存储中读取和写入而组织在一起的基本数据单元。这种组织方式在数据库术语中称为集群(不要与数据库集群混淆)。
  2. 树本身宽而不深。当每个索引页可以容纳1000+个键时,而每个键又可以指向另一个具有另外1000个键的索引页,在叶子页面之前有一个结构深度超过3层的B+树并不常见(而且InnoDB数据库主键包含整行数据)。另一种表述方式是B+树比二叉树能更快的定位获取数据。

将根页面加上25个内部索引页面保留在内存中只需要400KiB的内存(每个页面为16KiB)。38463个叶子页面将需要600MiB的存储空间,但不一定必须同时加载全部存储到内存中。大多数数据集都只有一部分热点页面,其他数据的基本不会使用到。InnoDB在内部使用修改后的LRU算法跟踪页面访问,如果需要腾出空间,将驱逐不常访问的页面。

通常,总数据大小可能远远超过内存,我们将需要保存在内存中的数据部分称为工作集。每个数据集都有不同的工作集要求,数据库管理员的目标是找到可以减少它的优化。

一个有300GB工作集的500GB数据库,优化难度比只有100GB工作集的1TB数据库更难优化,优化难度往往和工作集大小关系更密切,工作集越大越难优化。

Explain

EXPLAIN描述MySQL计划如何执行给定的查询。这就是说它并不真正执行查询,只是给出MySQL优化器的结果。不要与profiling混淆,profiling是真正执行查询后给出的查询分析数据。

EXPLAIN在评估了数千种可以执行的查询计划后,打印出它认为的最佳计划。还有一个关于其他计划选择的简要介绍possible_keys,但一般来说,你需要从OPTIMIZER_TRACE中查看这些信息。

我建议始终使用EXPLAIN FORMAT=JSON,因为它提供查询成本(默认格式不提供)。成本(cost)很重要,因为它使我们从数量上思考和说话。我是说我听过数据库从业者说“创建临时表(temporary tables)不好”或“join不好”。很难直接回应这些评论,因为它们缺乏重要的背景。过量饮水对你也有害,不谈数据量的优化是个伪命题,只有几十行的表不需要优化。

除了在sql语句前加上EXPLAIN FORMAT=JSON之外,你还可以执行EXPLAIN FORMAT=JSON FOR CONNECTION <connection_id>,这就可以对正在运行的连接查看它是如何对查询进行优化的。这对于诊断瞬时错误很有用,因为数据(和基础统计信息)的更改会影响计划选择。

示例1:Explain显示表扫描操作

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
EXPLAIN FORMAT=JSON SELECT
* FROM Country WHERE continent='Asia' and population > 5000000;
{
  "query_block": {
   "select_id": 1,
   "cost_info": {
+  "query_cost": "53.80"            # 这个sql的成本是53.8
   },
   "table": {
   "table_name": "Country",
+  "access_type": "ALL",            # 访问类型是ALL,表示表扫描
+  "rows_examined_per_scan": 239,   # 访问的表一共239行
   "rows_produced_per_join": 11,
   "filtered": "4.76",
+  "cost_info": {                   # 这里详细说明成本消耗的细节
+     "read_cost": "51.52",         # 读成本51.52
+     "eval_cost": "2.28",          # 评估成本2.28,优化器消耗掉的成本,如果这里消耗太高可以用Hint提示
+     "prefix_cost": "53.80",       # 这个字段在这个例子其实没用,它用在join语句后面子句记录前面子句的成本,由于没有join,此例记录了整个查询的成本
+     "data_read_per_join": "2K"    # 每个join读取的数据量,由于没join,在此例是总评估需要读取的数据量
   },
   "used_columns": [
      "Code",
      "Name",
      "Continent",
      "Region",
      "SurfaceArea",
      "IndepYear",
      "Population",
      "LifeExpectancy",
      "GNP",
      "GNPOld",
      "LocalName",
      "GovernmentForm",
      "HeadOfState",
      "Capital",
      "Code2"
   ],
+  "attached_condition": "((`world`.`Country`.`Continent` = 'Asia') and (`world`.`Country`.`Population` > 5000000))"
   }
  }
}

示例1中EXPLAIN显示查询将作为表扫描(access_type ALL)执行,成本为53.80个成本单位。没有列出可用的索引,因为缺少possible_keys输出部分(它通常出现在access_type下面)。

rows_examined_per_scan计数等于表中的行数(239),这大致说明了世界上有多少个国家。需要指出的是,这些只是估计值,不需要100%准确,因为这样做可能会影响性能。

attach_condition意味着在读取行时将应用过滤器:((`world`.`Country`.`Continent` = 'Asia') and (`world`.`Country`.`Population` > 5000000))。要是有索引的话,attached_conditions中的附加条件就可能在读取行之前生效,过滤不需要的行。但遗憾的是,这里列出意味着情况并非如此。

我喜欢将rows_examined_per_scan与执行查询后发送给客户端的行数进行比较。

发送给客户端的行数也可在表performance_schema.events_statements_history_long中找到。

只有真正执行了查询才能获取到行数,我执行后看到有32行。这意味着该查询检查的行数比发送的行数多7.5倍(239/32约等于7.5),这表明这个查询语句应该可以优化。

一个非常小的比率(即使是1:1)并不表示优化是不可能的。非要优化的话,仍是有机会的,例如分区或更改页面大小。

提醒一下,总共239行数据不算多,所有行都可以加载到内存里。但是表扫描只在内存中很快,随着表越来越大,性能可能会断崖式下降。添加索引有助于提高性能,并随着数据的增长保持一致。

优化器跟踪(Optimizer Trace)

EXPLAIN仅显示查询的预期执行计划,而没有显示为什么没有选择其他可选的执行策略。理解为什么没有选择可选方案可能会令人困惑:

  • 是因为其他方案不合适吗(例如,一些优化只能应用于特定的用例)?
  • 是因为人们认为其他方案成本更高吗?
  • 如果另一种选择是更高的成本,是多少?

OPTIMIZER_TRACE为这些问题提供了答案。它的设计目的是提供更多关于优化器的诊断数据,但它对于实际故障排除以及了解优化器成本模型的工作方式非常有用。

示例2:EXPLAIN显示未使用新添加的索引

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
ALTER TABLE Country ADD INDEX p (population);
EXPLAIN FORMAT=JSON
SELECT * FROM Country WHERE continent='Asia' and population > 5000000;
{
  "query_block": {
   "select_id": 1,
   "cost_info": {
   "query_cost": "53.80"
   },
   "table": {
   "table_name": "Country",
+  "access_type": "ALL",       # 虽然经过优化器的处理,但此查询还是将进行表扫描
+  "possible_keys": [
+     "p"                      # 识别到可用的索引
+  ],
   "rows_examined_per_scan": 239,
   "rows_produced_per_join": 15,
   "filtered": "6.46",
   "cost_info": {
      "read_cost": "50.71",
+     "eval_cost": "3.09",    # 相对于实例1,这里评估成本有所增加,是因为评估了索引
      "prefix_cost": "53.80",
      "data_read_per_join": "3K"
   },
   "used_columns": [
      "Code",
      "Name",
      "Continent",
      "Region",
      "SurfaceArea",
      "IndepYear",
      "Population",
      "LifeExpectancy",
      "GNP",
      "GNPOld",
      "LocalName",
      "GovernmentForm",
      "HeadOfState",
      "Capital",
      "Code2"
   ],
   "attached_condition": "((`world`.`Country`.`Continent` = 'Asia') and (`world`.`Country`.`Population` > 5000000))"
   }
  }
}

在示例2中,我们可以看到索引p (Population)在被添加到表后并没有被选中。EXPLAIN显示它是一个候选,但它没有描述为什么它没有被选中。要找到原因,我们需要求助于OPTIMIZER_TRACE

示例3:OPTIMIZER_TRACE显示了为什么没有使用索引

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
SET optimizer_trace="enabled=on"
SELECT * FROM Country WHERE continent='Asia' and population > 5000000;
SELECT * FROM information_schema.optimizer_trace\G
{
  "steps": [
   {
+  "join_preparation": { # "*"转换成了全部列名,明确所有列名前都加了表名
      "select#": 1,
      "steps": [
         {
         "expanded_query": "/* select#1 */ select `Country`.`Code` AS `Code`,`Country`.`Name` AS `Name`,`Country`.`Continent` AS `Continent`,`Country`.`Region` AS `Region`,`Country`.`SurfaceArea` AS `SurfaceArea`,`Country`.`IndepYear` AS `IndepYear`,`Country`.`Population` AS `Population`,`Country`.`LifeExpectancy` AS `LifeExpectancy`,`Country`.`GNP` AS `GNP`,`Country`.`GNPOld` AS `GNPOld`,`Country`.`LocalName` AS `LocalName`,`Country`.`GovernmentForm` AS `GovernmentForm`,`Country`.`HeadOfState` AS `HeadOfState`,`Country`.`Capital` AS `Capital`,`Country`.`Code2` AS `Code2` from `Country` where ((`Country`.`Continent` = 'Asia') and (`Country`.`Population` > 5000000))"
         }
      ]
   }
   },
   {
   "join_optimization": {
      "select#": 1,
      "steps": [
         {
+        "condition_processing": {     # where语句处理,调整查询条件顺序,计算数学公式改为常量,移除多余的查询条件
            "condition": "WHERE",
            "original_condition": "((`Country`.`Continent` = 'Asia') and (`Country`.`Population` > 5000000))",
            "steps": [
               {
               "transformation": "equality_propagation",
               "resulting_condition": "((`Country`.`Population` > 5000000) and multiple equal('Asia', `Country`.`Continent`))"
               },
               {
               "transformation": "constant_propagation",
               "resulting_condition": "((`Country`.`Population` > 5000000) and multiple equal('Asia', `Country`.`Continent`))"
               },
               {
               "transformation": "trivial_condition_removal",
               "resulting_condition": "((`Country`.`Population` > 5000000) and multiple equal('Asia', `Country`.`Continent`))"
               }
            ]
         }
         },
         {
         "substitute_generated_columns": {}
         },
         {
         "table_dependencies": [
            {
               "table": "`Country`",
               "row_may_be_null": false,
               "map_bit": 0,
               "depends_on_map_bits": []
            }
         ]
         },
         {
         "ref_optimizer_key_uses": []
         },
         {
         "rows_estimation": [
            {
               "table": "`Country`",
+              "range_analysis": {     # 这里说明表区域扫描审计
+              "table_scan": {         # 扫239行,成本55.9
                  "rows": 239,
                  "cost": 55.9
               },
+              "potential_range_indexes": [     # 可以用的索引
                  {
                     "index": "PRIMARY",
                     "usable": false,
+                    "cause": "not_applicable"    # 主键不适用
                  },
                  {
                     "index": "p",
+                    "usable": true,              # 新加的普通索引p可以使用
                     "key_parts": [
                     "Population",
                     "Code"
                     ]
                  }
               ],
               "setup_range_conditions": [],
               "group_index_range": {
                  "chosen": false,
                  "cause": "not_group_by_or_distinct"
               },
+              "analyzing_range_alternatives": {
+                 "range_scan_alternatives": [   # 表区域扫描可选方案
+                    {
+                    "index": "p",
+                    "ranges": [
+                       "5000000 < Population"   # 使用索引p (Population)扫描
+                    ],
+                    "index_dives_for_eq_ranges": true,
+                    "rowid_ordered": false,
+                    "using_mrr": false,
+                    "index_only": false,
+                    "rows": 108,
+                    "cost": 130.61,             # 此方案成本130.61
+                    "chosen": false,            # 但是没有选择此方案
+                    "cause": "cost"             # 因为它消耗的成本太多
+                    }
+                 ],
                  "analyzing_roworder_intersect": {
                     "usable": false,
                     "cause": "too_few_roworder_scans"
                  }
               }
               }
            }
         ]
         },
         {
+        "considered_execution_plans": [       # 最后选择的方案
            {
               "plan_prefix": [],
               "table": "`Country`",
               "best_access_path": {
+              "considered_access_paths": [    # 直接扫表
                  {
                     "rows_to_scan": 239,
                     "access_type": "scan",
                     "resulting_rows": 239,
+                    "cost": 53.8,             # 扫表成本53.8,成本比可选的用索引p做区域扫描的130.61少很多
+                    "chosen": true            # 选择了这个方案
                  }
               ]
               },
               "condition_filtering_pct": 100,
               "rows_for_plan": 239,
               "cost_for_plan": 53.8,
               "chosen": true
            }
         ]
         },
         {
+        "attaching_conditions_to_tables": {   # 查询后进行条件过滤
            "original_condition": "((`Country`.`Continent` = 'Asia') and (`Country`.`Population` > 5000000))",
            "attached_conditions_computation": [],
            "attached_conditions_summary": [   # 最终使用的过滤条件
               {
               "table": "`Country`",
               "attached": "((`Country`.`Continent` = 'Asia') and (`Country`.`Population` > 5000000))"
               }
            ]
         }
         },
         {
         "refine_plan": [
            {
               "table": "`Country`"
            }
         ]
         }
      ]
   }
   },
   {
   "join_execution": {
      "select#": 1,
      "steps": []
   }
   }
  ]
}

在示例3中,range_scan_alternatives列出了可选的索引p (Population),但是由于成本太高被淘汰了("chosen": false and "cause": "cost")。输出甚至提供了使用该索引的估计成本:130.61个成本单位。相比之下,表扫描的成本单位为55.9。成本越低越好,因此首选表扫描。

要说明为什么会这样,我们首先需要了解索引的排除工作原理。在这个数据集中,Country表中大多数行的population都大于5000000,优化器判断直接行扫描比用索引搜索行更快。因为不是InnoDB的覆盖索引和主键,所以使用这种普通索引的原理是查询到键,然后回表取数据,然后再回到索引查找下一个键,再回表取数据,如此反复,所以使用索引反而导致成本更高。当然高级用户可以配置如何做出这些决策的成本。

逻辑转换

MySQL优化器可能会以不影响结果的方式对你的查询进行更改。这些转换背后的目标是尝试以一种可以消除多余的工作,并且能更快执行查询来重写查询语句。例如,考虑以下查询:

1
SELECT * FROM Country WHERE population > 5000000 AND continent='Asia' AND 1=1;

因为1总是等于1,而且这个条件是AND(而不是OR),所以它是完全多余的。在执行查询时检查每一行的1是否仍然等于1并没有任何好处,并且删除条件仍然会返回相同的结果。我们可以看到MySQL在OPTIMIZER_TRACE中应用了这种转换,以及许多其他转换:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
..
{
  "join_optimization": {
    "select#": 1,
    "steps": [
      {
        "condition_processing": {
          "condition": "WHERE",
          "original_condition": "((`Country`.`Population` > 5000000) and (1 = 1))",
          "steps": [
            {
              "transformation": "equality_propagation",
              "resulting_condition": "((`Country`.`Population` > 5000000) and (1 = 1))"
            },
            {
              "transformation": "constant_propagation",
              "resulting_condition": "((`Country`.`Population` > 5000000) and (1 = 1))"
            },
            {
              "transformation": "trivial_condition_removal",
              "resulting_condition": "(`Country`.`Population` > 5000000)"
            }
          ]
        }
      },
..

在执行了EXPLAIN之后,在应用了所有转换后的重写的查询也可以在SHOW WARNINGS中使用。上述语句的改写形式如下:

1
2
3
EXPLAIN FORMAT=JSON SELECT * FROM Country WHERE population > 5000000 AND 1=1;
SHOW WARNINGS;
/* select#1 */ select `world`.`Country`.`Code` AS `Code`,`world`.`Country`.`Name` AS `Name`,`world`.`Country`.`Continent` AS `Continent`,`world`.`Country`.`Region` AS `Region`,`world`.`Country`.`SurfaceArea` AS `SurfaceArea`,`world`.`Country`.`IndepYear` AS `IndepYear`,`world`.`Country`.`Population` AS `Population`,`world`.`Country`.`LifeExpectancy` AS `LifeExpectancy`,`world`.`Country`.`GNP` AS `GNP`,`world`.`Country`.`GNPOld` AS `GNPOld`,`world`.`Country`.`LocalName` AS `LocalName`,`world`.`Country`.`GovernmentForm` AS `GovernmentForm`,`world`.`Country`.`HeadOfState` AS `HeadOfState`,`world`.`Country`.`Capital` AS `Capital`,`world`.`Country`.`Code2` AS `Code2` from `world`.`Country` where (`world`.`Country`.`Population` > 5000000)

示例转换

下面是一些经过逻辑转换的查询示例。请注意,在几个示例中PRIMARYUNIQUE索引在查询执行阶段之前被永久转换。由于这部分查询应该已经是最有效的形式,优化器在考虑执行计划之前会执行永久转换。这是另一个正在应用的优化,以减少需要考虑的计划数量。

code是主键,将值转换成常量重写查询。

  
原始形式SELECT * FROM Country WHERE code='CAN'
重写形式/* select#1 */ select 'CAN' AS `Code`,'Canada' AS `Name`,'North America' AS `Continent`, 'North America' AS `Region`,'9970610.00' AS `SurfaceArea`,'1867' AS `IndepYear`, '31147000' AS `Population`,'79.4' AS `LifeExpectancy`,'598862.00' AS `GNP`,'625626.00' AS `GNPOld`, 'Canada' AS `LocalName`,'Constitutional Monarchy, Federation' AS `GovernmentForm`, 'Elisabeth II' AS `HeadOfState`,'1822' AS `Capital`,'CA' AS `Code2` from `world`.`Country` where 1

因为主键里保存了所有字段值,所以不需要回表查询,直接从主键里面提取出来即可。

code是一个主键,但是这个主键在表中不存在(where条件无效)。

  
原始形式SELECT * FROM Country WHERE code='XYZ'
重写形式/* select#1 */ select NULL AS `Code`,NULL AS `Name`,NULL AS `Continent`,NULL AS `Region`, NULL AS `SurfaceArea`,NULL AS `IndepYear`,NULL AS `Population`,NULL AS `LifeExpectancy`,NULL AS `GNP`, NULL AS `GNPOld`,NULL AS `LocalName`,NULL AS `GovernmentForm`,NULL AS `HeadOfState`,NULL AS `Capital`, NULL AS `Code2` from `world`.`Country` where multiple equal('XYZ', NULL)

由于没有找到这个主键,所以所有字段值返回NULL。

另一种无法成立的where条件,虽然code值存在,但1=0是一个不可能的条件

  
原始形式SELECT * FROM Country WHERE code='CAN' AND 1=0
重写形式/* select#1 */ select world.Country.Code AS Code,world.Country.Name AS Name, world.Country.Continent AS Continent,world.Country.Region AS Region, world.Country.SurfaceArea AS SurfaceArea,world.Country.IndepYear AS IndepYear, world.Country.Population AS Population,world.Country.LifeExpectancy AS LifeExpectancy, world.Country.GNP AS GNP,world.Country.GNPOld AS GNPOld, world.Country.LocalName AS LocalName,world.Country.GovernmentForm AS GovernmentForm, world.Country.HeadOfState AS HeadOfState,world.Country.Capital AS Capital, world.Country.Code2 AS Code2 from world.Country where 0

虽然有这个主键,能找到其他字段的值,但是where为0所以不会返回什么。

对Country表的子查询派生表被合并成直接join Country表

  
原始形式SELECT City.* FROM City, (SELECT * FROM Country WHERE continent='Asia') as Country WHERE Country.code=City.CountryCode AND Country.population > 5000000;
重写形式/* select#1 */ select `world`.`City`.`ID` AS `ID`,`world`.`City`.`Name` AS `Name`, `world`.`City`.`CountryCode` AS `CountryCode`,`world`.`City`.`District` AS `District`, `world`.`City`.`Population` AS `Population` from `world`.`City` join `world`.`Country` where ((`world`.`Country`.`Continent` = 'Asia') and (`world`.`City`.`CountryCode` = ` world`.`Country`.`Code`) and (`world`.`Country`.`Population` > 5000000))

重写形式:此行为取决于optimizer_switch derived_merge(默认情况下)

视图的定义与用于查询它的查询条件合并。

  
原始形式CREATE VIEW Countries_in_asia AS SELECT * FROM Country WHERE continent='Asia'; SELECT * FROM Countries_in_asia WHERE population > 5000000;
重写形式/* select#1 */ select `world`.`Country`.`Code` AS `Code`,`world`.`Country`.`Name` AS `Name`, `world`.`Country`.`Continent` AS `Continent`,`world`.`Country`.`Region` AS `Region`, `world`.`Country`.`SurfaceArea` AS `SurfaceArea`,`world`.`Country`.`IndepYear` AS `IndepYear`, `world`.`Country`.`Population` AS `Population`,`world`.`Country`.`LifeExpectancy` AS `LifeExpectancy`, `world`.`Country`.`GNP` AS `GNP`,`world`.`Country`.`GNPOld` AS `GNPOld`, `world`.`Country`.`LocalName` AS `LocalName`,`world`.`Country`.`GovernmentForm` AS `GovernmentForm`, `world`.`Country`.`HeadOfState` AS `HeadOfState`,`world`.`Country`.`Capital` AS `Capital`, `world`.`Country`.`Code2` AS `Code2` from `world`.`Country` where ((`world`.`Country`.`Continent` = 'Asia') and (`world`.`Country`.`Population` > 5000000))

将view里自带的where条件和查询这个view的语句中的where条件合并。

基于成本的优化

优化器决定如何执行查询的方式是基于一种称为基于成本的优化的方法。这一过程的简化如下:

  1. 为每个操作分配成本。
  2. 评估每个可能的计划将采取多少行动。
  3. 把总数加起来。
  4. 选择总成本最低的计划。

为什么我说上面是一个简化,是因为优化器不会穷尽搜索每个可能的执行计划。如果有5个表要join,每个表有5个可能的索引,那么可能超过5! * 5! = 14400种查询的执行方式:

  • 每个索引可能有一个以上的潜在访问方法。(例如,索引扫描,范围扫描,或索引查找)。此外,每个表都可能使用表扫描。
  • 对于一个INNER JOIN查询,可以以任何顺序连接表(指定表的顺序无关紧要)。
  • 当join时,可能有多个join缓存方法或子查询策略可用。

优化器评估每个潜在的执行计划是不可行的。例如,考虑这样一种情况:优化花费的时间比执行的时间长。为此,优化器将在默认情况下跳过对某些计划的评估。

默认optimizer_prune_level=1。要禁用此启发式并评估所有计划,请将其设置为零。http://dev.mysql.com/doc/refman/5.7/en/controlling-query-plan-evaluation.html

也存在配置选项optimizer_search_depth来限制查询计划的搜索深度,但默认情况下没有启用该选项。

默认optimizer_search_depth=64。较低的值可能会减少计划评估时间,因为可能会降低最优计划的成本。

修改成本(cost)常数

每个操作的成本(cost)可以通过mysql库中的server_costengine_cost表进行配置。下面是MySQL 8.0中使用的默认值:

  
40disk_temptable_create_cost
1disk_temptable_row_cost
2memory_temptable_create_cost
0.2memory_temptable_row_cost
0.1key_compare_cost
0.2row_evaluate_cost
1io_block_read_cost
1memory_block_read_cost

MySQL 8.0引入了一个新特性,成本模型可以根据索引在内存中的百分比来调整。在以前版本的MySQL中,成本模型总是假设访问一个页面需要IO。

成本本身是一个代表资源使用情况的逻辑单位。一个单位不再有确切的含义,但它的起源可以追溯到20世纪90年代硬盘驱动器上的一个随机IO。

随着硬件的改进,它可能不会以一致的速率对所有组件进行处理(例如,SSD的存储延迟已大大改善)。类似地,当软件处理硬件的变化(具有压缩等特性)时,资源消耗也会发生变化。拥有可配置的成本常数可以细化处理这些情况。

示例4显示,将row_evaluate_cost更改为原来的5倍会导致表扫描的成本大大增加(与通过索引消除工作相比)。这将导致优化器选择使用在示例2中创建的p (Population)索引。

示例4:增加行评估的成本会使表扫描的成本更大

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
# 将成本从0.2改到1.0
UPDATE mysql.server_cost SET cost_value=1 WHERE cost_name='row_evaluate_cost';
FLUSH OPTIMIZER_COSTS;

# 在一个新的会话中执行查询
EXPLAIN FORMAT=JSON
SELECT * FROM Country WHERE continent='Asia' and population > 5000000;
{
   "select_id": 1,
+  "cost_info": {          # 由于行评估高出5倍
+  "query_cost": "325.01"  # 查询的总成本上升了
   },
   "table": {
   "table_name": "Country",
+  "access_type": "range",  # 但它将作为一个范围扫描执行
   "possible_keys": [
      "p"
   ],
+  "key": "p",              # 使用了索引p (Population)
   "used_key_parts": [
      "Population"
   ],
   "key_length": "4",
   "rows_examined_per_scan": 108,
   "rows_produced_per_join": 15,
   "filtered": "14.29",
   "index_condition": "(`world`.`Country`.`Population` > 5000000)",
   "cost_info": {
      "read_cost": "309.58",
      "eval_cost": "15.43",
      "prefix_cost": "325.01",
      "data_read_per_join": "3K"
   },
   "used_columns": [
      "Code",
      "Name",
      "Continent",
      "Region",
      "SurfaceArea",
      "IndepYear",
      "Population",
      "LifeExpectancy",
      "GNP",
      "GNPOld",
      "LocalName",
      "GovernmentForm",
      "HeadOfState",
      "Capital",
      "Code2"
   ],
   "attached_condition": "(`world`.`Country`.`Continent` = 'Asia')"
   }
  }
}

修改成本常量时要小心,因为许多查询计划可能会变得更糟!这里显示的是演示目的。在大多数生产环境中,添加查询提示(hint)会更好。

在继续讨论其他例子之前,要记得重置成本(cost): UPDATE mysql.server_cost SET cost_value=NULL WHERE cost_name='row_evaluate_cost'; FLUSH OPTIMIZER_COSTS; # 然后关闭会话

元数据和统计信息

如示例3所示,数据的分布会影响执行计划的成本。优化器利用数据字典和统计信息作为其决策过程的一部分。

元数据

 索引信息唯一性(Uniqueness)是否可NULL(Nullability)
描述字典为每个表提供索引列表。如果索引是唯一的,则可以将其用作永久转换的一部分,从而缩短计划的某些部分。优化器需要正确处理潜在的NULL值。列的是否可NULL会影响某些执行计划的使用。

统计信息

 表的大小基数范围估计(Range Estimates)
描述提供表总大小的估计值。对少量随机页面(默认为20)进行抽样,并推断出索引列中唯一值的数量。优化器为InnoDB提供了一个最小值和一个最大值,并返回该范围内的行数估计值。
用于所有列索引列索引列
计算预先预先按需求
自动更新正常运行期间[1]表的变化超过10%后N/A
手动更新ANALYZE TABLEANALYZE TABLEN/A
配置选项N/A样本的页数[2]index dive最大值[3]和最大内存使用量[4]
精度最不准确的可能会受到分布偏斜的影响最精确的
常用的确定表扫描成本。当缺少索引时,可以用于join顺序(最大表优先)。确定join顺序。当range扫描估值超过eq_range_index_dive_limit最大数时也用基数。用于计算查询条件(例如,它查看可能使用的索引,并估计有多少行匹配)。用范围估计来确定population > 5000000不应该使用索引。

由于统计信息的原因,QA和产品中看似相同的查询执行起来可能非常不同。即使是生产环境中的查询计划也会随着时间的推移而改变,就像数据分发一样。

[1] 统计信息在常规操作期间更新,但不能保证准确。

[2] 可以使用innodb_stats_persistent_sample_pages更改采样的页面数。较高的值可能会产生更准确的估计(在一定程度上增加成本产生)。

[3] 当IN列表超过eq_range_index_dive_limit项(默认值:200)时,优化器将切换到基数估计。

[4] 范围优化器限制为range_optimizer_max_mem_size(默认值:8M)。使用多个IN列表的查询可能会超出此范围,因为选项组合可能会在内部扩展。

查询提示(Hints)

除了在mysql系统数据库中调整成本常量外,mysql还提供了一种方法来指定如何选择计划。使用查询提示有两个不同的原因:

  1. 为了调试。EXPLAIN显示了基于可用元数据做出的决策,添加提示可以强制执行一些其他的执行计划,并比较实际的执行时间。
  2. 为了生产环境的性能。如果在调试时发现另一种执行计划更快,那么可以选择添加提示到生产环境来加速执行查询。

我想说的是,虽然我在调试中经常使用提示,但在生产中引入提示时,我总是很小心。这可能会造成维护负担,因为随着数据分布和索引的变化,提示可能仅仅是在某个时间点选择的特定计划中,添加的提示可能让查询变得更慢。

最佳实践建议,在MySQL主版本升级(例如,升级到MySQL 9.0)之后,应该重新查看提示有什么变化。你可能会发现,随着引入新的优化,你可以取消一些提示。

老版本提示

MySQL的早期版本仅支持一些直接扩展SQL语法的提示。例如:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# 按指定顺序连接表
SELECT STRAIGHT_JOIN Country.Name as CountryName, City.Name AS City
FROM Country INNER JOIN City ON City.CountryCode=Country.Code;

# 强制使用某个索引
SELECT * FROM Country FORCE INDEX (p)
WHERE continent='Asia' and population > 5000000;

# 忽略某个索引
SELECT * FROM Country IGNORE INDEX (p)
WHERE continent='Asia' and population > 5000000;

# 建议使用某个索引
SELECT * FROM Country USE INDEX (p)
WHERE continent='Asia' and population > 5000000;

虽然这些提示在MySQL 8.0中继续被支持,但它们被部分替换为注释风格的提示。在示例5中我们可以看到,由于使用了FORCE INDEX提示,示例2中没被选中的p (population)索引被选中了。尽管有FORCE INDEX, EXPLAIN仍然显示其真实成本为152.21,而表扫描成本约为53。

示例5:不管成本如何,强制使用索引

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
EXPLAIN FORMAT=JSON
SELECT * FROM Country FORCE INDEX (p) WHERE continent='Asia' and population > 5000000;
{
  "query_block": {
     "select_id": 1,
     "cost_info": {
+    "query_cost": "152.21"   # 成本大于表扫描
     },
     "table": {
     "table_name": "Country",
     "access_type": "range",
     "possible_keys": [
     "p"
     ],
     "key": "p",
     "used_key_parts": [
     "Population"
     ],
     "key_length": "4",
     "rows_examined_per_scan": 108,
     "rows_produced_per_join": 15,
     "filtered": "14.29",
     "index_condition": "(`world`.`Country`.`Population` > 5000000)",
     "cost_info": {
     "read_cost": "149.12",
     "eval_cost": "3.09",
     "prefix_cost": "152.21",
     "data_read_per_join": "3K"
     },
     "used_columns": [
     "Code",
     "Name",
     "Continent",
     "Region",
     "SurfaceArea",
     "IndepYear",
     "Population",
     "LifeExpectancy",
     "GNP",
     "GNPOld",
     "LocalName",
     "GovernmentForm",
     "HeadOfState",
     "Capital",
     "Code2"
     ],
     "attached_condition": "(`world`.`Country`.`Continent` = 'Asia')"
     }
  }
}

新Comment-Style提示

MySQL 8.0扩展了MySQL 5.7中引入的注释风格提示的新风格,具有控制表连接顺序的能力(类似于STRAIGHT_JOIN)。比起旧的SQL语法扩展提示,我更喜欢注释风格,原因有三:

  1. 通过将SQL的声明性本质与暗示如何执行的内容分开,它们易于读写。
  2. 它们具有明确的语义提示而不是指令。这意味着,如果提示不能操作,它将导致语句警告,而不是错误。这与FORCE INDEX等提示形成对比,其中缺少索引将导致错误。例如,这种行为类似于Oracle数据库中的提示。
  3. 它们的控制粒度更细。这赋予了DBA在使用提示时更多的灵活性。
提示名称描述
BKA, NO_BKA为特定的表启用或禁用批量键访问优化(该优化在默认情况下是关闭的,但可以由optimizer_switch控制)。
BNL, NO_BNL为指定的表启用或禁用块嵌套循环优化。
MAX_EXECUTION_TIME设置查询运行的最大执行时间(以毫秒为单位)。此提示当前仅影响SELECT语句。
MRR, NO_MRR影响多范围读优化
NO_ICP影响索引条件下推优化
NO_RANGE_OPTIMIZATION禁用指定表或索引的范围优化。
QB_NAME为查询块分配一个名称
SEMIJOIN, NO_SEMIJOIN控制子查询的半连接策略(选项包括DUPSWEEDOUT,FIRSTMATCH,LOOSESCAN,MATERIALIZATION)。
SUBQUERY类似于SEMIJOIN/NO_SEMIJOIN。用于控制子查询策略,包括IN-to-EXISTS优化。

在示例6中,禁用了特定表的范围优化。这导致p (Population)索引被忽略,尽管它的选择性(selective)非常高。在这个表中,只有两行数据population > 1000000000

示例6:禁用范围优化意味着索引不能被使用

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
EXPLAIN FORMAT=JSON
SELECT /*+NO_RANGE_OPTIMIZATION(Country) */  * FROM Country
WHERE continent='Asia' AND population > 1000000000;
{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "56.80"
    },
    "table": {
      "table_name": "Country",
+     "access_type": "ALL",       # 访问方式成了表扫描
+     "possible_keys": [
+       "p"                       # 可以做范围扫描的索引被禁用了
+     ],
      "rows_examined_per_scan": 239,
      "rows_produced_per_join": 11,
      "filtered": "4.76",
      "cost_info": {
        "read_cost": "54.52",
        "eval_cost": "2.28",
        "prefix_cost": "56.80",
        "data_read_per_join": "2K"
      },
      "used_columns": [
        "Code",
        "Name",
        "Continent",
        "Region",
        "SurfaceArea",
        "IndepYear",
        "Population",
        "LifeExpectancy",
        "GNP",
        "GNPOld",
        "LocalName",
        "GovernmentForm",
        "HeadOfState",
        "Capital",
        "Code2"
      ],
      "attached_condition": "((`world`.`Country`.`Continent` = 'Asia') and (`world`.`Country`.`Population` > 1000000000))"
    }
  }
}

比较计划

简单回顾一下:优化器的作用是从成千上万的选择中选择最佳的执行计划。在这些选择中,可能有几个不同的索引而且每个索引有不同的访问方法。到目前为止,我们已经演示了一个关于p (population)的索引和两个不同的执行计划:

  1. 范围扫描p (population)
  2. 表扫描

由于p (population)索引被证明选择性不高,我现在将在c (continent)上添加一个索引。在典型的生产环境中,你现在可能希望删除p (population)索引,因为它没有提供价值。但我将把它留在这里,因为我希望演示优化器可以成功地评估许多选择。

示例7:在continent上添加索引

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
ALTER TABLE Country ADD INDEX c (continent);
EXPLAIN FORMAT=JSON
SELECT * FROM Country WHERE continent='Asia' AND population > 5000000;
{
  "query_block": {
     "select_id": 1,
     "cost_info": {
     "query_cost": "28.20"
     },
     "table": {
     "table_name": "Country",
+    "access_type": "ref",       # 访问方法是ref
+    "possible_keys": [          # 使用非唯一索引
+    "p",
+    "c"
+    ],
+    "key": "c",                 # continent索引被选中
     "used_key_parts": [
     "Continent"
     ],
     "key_length": "1",
     "ref": [
     "const"
     ],
     "rows_examined_per_scan": 51,
     "rows_produced_per_join": 23,
     "filtered": "45.19",
     "cost_info": {
     "read_cost": "18.00",
     "eval_cost": "4.61",
     "prefix_cost": "28.20",
     "data_read_per_join": "5K"
     },
     "used_columns": [
     "Code",
     "Name",
     "Continent",
     "Region",
     "SurfaceArea",
     "IndepYear",
     "Population",
     "LifeExpectancy",
     "GNP",
     "GNPOld",
     "LocalName",
     "GovernmentForm",
     "HeadOfState",
     "Capital",
     "Code2"
     ],
     "attached_condition": "(`world`.`Country`.`Population` > 5000000)"
     }
  }
}

在示例7中,我们可以看到c (Continent)索引优于p (Population)索引和表扫描。它以索引的形式执行其工作,即消除工作。优化器估计在使用索引之后,只需要检查51行(rows_examined_per_scan)。另一种解释是,亚洲的国家比较少,因此该索引的选择性高。

在示例8中,我们可以看到,通过稍微修改查询,where条件使用population > 500000000而不是population > 5000000过滤数据,索引选择变成了p (population)。原因就是世界上只有两个国家的人口达到5亿,因此该索引变得更具选择性。

示例8:在人口众多的情况下,p的范围扫描优于c

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
EXPLAIN FORMAT=JSON
SELECT * FROM Country WHERE continent='Asia' and population > 500000000;
{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "7.01"
    },
    "table": {
      "table_name": "Country",
+      "access_type": "range",    # 访问方法是range扫描,而不是使用c做ref查询
+      "possible_keys": [
+        "p",
+        "c"
+      ],
+      "key": "p",                # key已经从例7的c切换到p了
      "used_key_parts": [
        "Population"
      ],
      "key_length": "4",
      "rows_examined_per_scan": 2,
      "rows_produced_per_join": 0,
      "filtered": "21.34",
      "index_condition": "(`world`.`Country`.`Population` > 500000000)",
      "cost_info": {
        "read_cost": "6.58",
        "eval_cost": "0.43",
        "prefix_cost": "7.01",
        "data_read_per_join": "112"
      },
      "used_columns": [
        "Code",
        "Name",
        "Continent",
        "Region",
        "SurfaceArea",
        "IndepYear",
        "Population",
        "LifeExpectancy",
        "GNP",
        "GNPOld",
        "LocalName",
        "GovernmentForm",
        "HeadOfState",
        "Capital",
        "Code2"
      ],
      "attached_condition": "(`world`.`Country`.`Continent` = 'Asia')"
    }
  }
}

随着p (population)索引增加,我们现在至少有四个可能的执行计划:

  1. 范围(range)扫描,使用p (population): "500000000 < Population"
  2. 表扫描
  3. ref查询,使用c (continent)
  4. 范围(range)扫描,使用c (continent): "Asia <= Continent <= Asia"

除了这些计划,我们还可以使用索引合并,结合使用索引p (population)c (continent),在示例9的OPTIMIZER_TRACE输出中,我们可以看到对PRIMARY索引的范围扫描也被评估但被拒绝了。

我已经描述了rows_examined_per_scan来显示索引的选择性,但是还有另外两个统计信息要指出示例7和示例8之间的区别:

  1. key_length。用于continent的数据类型是1字节(enum),而用于population的数据类型是4字节。如果选择性相等,则更短的键长度更好,因为你可以在每个页面中容纳更多的键,从而让索引更好的在内存中安置。
  2. access_type。在所有条件相同的情况下,ref的访问类型比range的成本要小。

示例9:优化器跟踪将索引p和c与更多的population进行比较

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
SET optimizer_trace="enabled=on";
SELECT * FROM Country WHERE continent='Asia' AND population > 500000000;
SELECT * FROM information_schema.optimizer_trace;
{
  "steps": [
    {
      "join_preparation": {
        "select#": 1,
        "steps": [
          {
            "expanded_query": "/* select#1 */ select `Country`.`Code` AS `Code`,`Country`.`Name` AS `Name`,`Country`.`Continent` AS `Continent`,`Country`.`Region` AS `Region`,`Country`.`SurfaceArea` AS `SurfaceArea`,`Country`.`IndepYear` AS `IndepYear`,`Country`.`Population` AS `Population`,`Country`.`LifeExpectancy` AS `LifeExpectancy`,`Country`.`GNP` AS `GNP`,`Country`.`GNPOld` AS `GNPOld`,`Country`.`LocalName` AS `LocalName`,`Country`.`GovernmentForm` AS `GovernmentForm`,`Country`.`HeadOfState` AS `HeadOfState`,`Country`.`Capital` AS `Capital`,`Country`.`Code2` AS `Code2` from `Country` where ((`Country`.`Continent` = 'Asia') and (`Country`.`Population` > 500000000))"
          }
        ]
      }
    },
    {
      "join_optimization": {
        "select#": 1,
        "steps": [
          {
            "condition_processing": {
              "condition": "WHERE",
              "original_condition": "((`Country`.`Continent` = 'Asia') and (`Country`.`Population` > 500000000))",
              "steps": [
                {
                  "transformation": "equality_propagation",
                  "resulting_condition": "((`Country`.`Population` > 500000000) and multiple equal('Asia', `Country`.`Continent`))"
                },
                {
                  "transformation": "constant_propagation",
                  "resulting_condition": "((`Country`.`Population` > 500000000) and multiple equal('Asia', `Country`.`Continent`))"
                },
                {
                  "transformation": "trivial_condition_removal",
                  "resulting_condition": "((`Country`.`Population` > 500000000) and multiple equal('Asia', `Country`.`Continent`))"
                }
              ]
            }
          },
          {
            "substitute_generated_columns": {
            }
          },
          {
            "table_dependencies": [
              {
                "table": "`Country`",
                "row_may_be_null": false,
                "map_bit": 0,
                "depends_on_map_bits": [
                ]
              }
            ]
          },
          {
            "ref_optimizer_key_uses": [
              {
                "table": "`Country`",
                "field": "Continent",
                "equals": "'Asia'",
                "null_rejecting": false
              }
            ]
          },
          {
            "rows_estimation": [
              {
                "table": "`Country`",
                "range_analysis": {
                  "table_scan": {
                    "rows": 239,
                    "cost": 247.1
                  },
+                  "potential_range_indexes": [
+                    {
+                      "index": "PRIMARY",
+                      "usable": false,            # 在主键上进行范围扫描
+                      "cause": "not_applicable"   # 不适用
+                    },
+                    {
+                      "index": "p",
+                      "usable": true,   # 在p上使用range扫描是可能的
+                      "key_parts": [    # 后面"analyzing_range_alternatives"有评估,成本是5.01
+                        "Population",
+                        "Code"
+                      ]
+                    },
+                    {
+                      "index": "c",     # 在c上使用range扫描是可能的
+                      "usable": true,   # 后面"analyzing_range_alternatives"有评估,成本是103.01
+                      "key_parts": [
+                        "Continent",
+                        "Code"
+                      ]
+                    }
                  ],
                  "setup_range_conditions": [
                  ],
                  "group_index_range": {
                    "chosen": false,
                    "cause": "not_group_by_or_distinct"
                  },
                  "analyzing_range_alternatives": {
                    "range_scan_alternatives": [
                      {
                        "index": "p",
                        "ranges": [
                          "500000000 < Population"
                        ],
                        "index_dives_for_eq_ranges": true,
                        "rowid_ordered": false,
                        "using_mrr": false,
                        "index_only": false,
                        "rows": 2,
                        "cost": 5.01,
+                       "chosen": true              # 索引p被选中
                      },
                      {
                        "index": "c",
                        "ranges": [
                          "Asia <= Continent <= Asia"
                        ],
                        "index_dives_for_eq_ranges": true,
                        "rowid_ordered": true,
                        "using_mrr": false,
                        "index_only": false,
                        "rows": 51,
                        "cost": 103.01,
+                       "chosen": false,           # 索引c没被选中,因为成本比p高
                        "cause": "cost"
                      }
                    ],
+                   "analyzing_roworder_intersect": {
+                     "usable": false,                 # 拒绝合并索引
+                     "cause": "too_few_roworder_scans"
+                   }
+                 },
+                 "chosen_range_access_summary": {
+                   "range_access_plan": {
+                     "type": "range_scan",          # 优化器选择了在p上进行范围扫描
+                     "index": "p",
+                     "rows": 2,
+                     "ranges": [
+                       "500000000 < Population"
+                     ]
+                   },
+                   "rows_for_plan": 2,
+                   "cost_for_plan": 5.01,
+                   "chosen": true
                  }
                }
              }
            ]
          },
          {
+            "considered_execution_plans": [
+              {
+                "plan_prefix": [
+                ],
+                "table": "`Country`",
+                "best_access_path": {          # 优化器汇总出几个最好的可能的访问策略(ref和range)
+                  "considered_access_paths": [
+                    {
+                      "access_type": "ref",    # 在c上做ref索引
+                      "index": "c",
+                      "rows": 51,
+                      "cost": 69,
+                      "chosen": true
+                    },
+                    {
+                      "rows_to_scan": 2,
+                      "access_type": "range",  # 在p上做range扫描
+                      "range_details": {
+                        "used_index": "p"
+                      },
+                      "resulting_rows": 2,
+                      "cost": 7.01,
+                      "chosen": true
+                    }
+                  ]
+                },
+                "condition_filtering_pct": 100,
+                "rows_for_plan": 2,
+                "cost_for_plan": 7.01,         # 7.01成本更低,选p做range扫描更好
+                "chosen": true
+              }
+            ]
+          },
          {
            "attaching_conditions_to_tables": {
              "original_condition": "((`Country`.`Continent` = 'Asia') and (`Country`.`Population` > 500000000))",
              "attached_conditions_computation": [
              ],
              "attached_conditions_summary": [
                {
                  "table": "`Country`",
                  "attached": "((`Country`.`Continent` = 'Asia') and (`Country`.`Population` > 500000000))"
                }
              ]
            }
          },
          {
            "refine_plan": [
              {
                "table": "`Country`",
                "pushed_index_condition": "(`Country`.`Population` > 500000000)",
                "table_condition_attached": "(`Country`.`Continent` = 'Asia')"
              }
            ]
          }
        ]
      }
    },
    {
      "join_execution": {
        "select#": 1,
        "steps": [
        ]
      }
    }
  ]
}

复合索引

由于不是所有population > 5000000的国家都在Asia,这两个查询条件的结合应该能提高指标消除工作的能力。也就是说,对于这个数据集,复合索引将提高选择性。

复合索引有两种可能的选择:

  1. 索引p_c (Population, Continent)
  2. 索引c_p (Continent, Population)

复合索引排序的差异非常重要。事实上,因为population是一个范围,所以优化器只能让使用p_c (population, continent)索引的第一部分。因此,在这个查询中,p_c (population, continent)对于p (population)并没有什么改进。在强制使用索引之后,我们可以清楚地看到这一点。

对于p_c (population, continent),如果是population的条件是population IN (1,2,3,4,5),IN里面是固定值的范围搜索,它能够在使用完整索引。

示例10:复合索引(population, continent)没有改善查询

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
ALTER TABLE Country ADD INDEX p_c (Population, Continent);
EXPLAIN FORMAT=JSON
SELECT * FROM Country FORCE INDEX (p_c) WHERE continent='Asia' and population > 5000000;
{
  "query_block": {
   "select_id": 1,
+   "cost_info": {           # 索引已强制使用
+   "query_cost": "152.21"   # 成本高于表扫描
   },
   "table": {
   "table_name": "Country",
   "access_type": "range",
   "possible_keys": [
      "p_c"
   ],
   "key": "p_c",
+   "used_key_parts": [
+      "Population"      # 只有population列被使用
+   ],
+   "key_length": "4",   # population=4字节
   "rows_examined_per_scan": 108,
   "rows_produced_per_join": 15,
   "filtered": "14.29",
   "index_condition": "((`world`.`Country`.`Continent` = 'Asia') and (`world`.`Country`.`Population` > 5000000))",
   "cost_info": {
      "read_cost": "149.12",
      "eval_cost": "3.09",
      "prefix_cost": "152.21",
      "data_read_per_join": "3K"
   },
   "used_columns": [
      "Code",
      "Name",
      "Continent",
      "Region",
      "SurfaceArea",
      "IndepYear",
      "Population",
      "LifeExpectancy",
      "GNP",
      "GNPOld",
      "LocalName",
      "GovernmentForm",
      "HeadOfState",
      "Capital",
      "Code2"
   ]
   }
  }
}

这个限制是由于B+树的索引结构导致的。一种简单的理解方式是复合索引中的“范围扫描的列放到复合索引的右边”。考虑到这一点,在示例11中演示了索引c_p (continent, population)。这两列组合建立的复合索引比仅在continent上建立的索引选择性更高,因此成本从28.20(示例7)下降到24.83。由于这两个列在索引中被有效地连接起来,所以访问方法被认为是一个范围。

示例11:效果更好的复合索引(Continent, Population)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
ALTER TABLE Country ADD INDEX c_p (Continent, Population);
EXPLAIN FORMAT=JSON
SELECT * FROM Country WHERE continent='Asia' and population > 5000000;
{
  "query_block": {
   "select_id": 1,
+   "cost_info": {
+   "query_cost": "24.83"    # 成本比复合索引p,c (152.21)要低得多
+   },
   "table": {
   "table_name": "Country",
   "access_type": "range",
   "possible_keys": [
      "p",
      "c",
      "p_c",
      "c_p"
   ],
   "key": "c_p",
+   "used_key_parts": [  # 列都使用到了
+      "Continent",      # ENUM为1字节
+      "Population"      # INT为4字节
+   ],
+   "key_length": "5",   # 一共5字节
   "rows_examined_per_scan": 32,
   "rows_produced_per_join": 15,
   "filtered": "100.00",
   "index_condition": "((`world`.`Country`.`Continent` = 'Asia') and (`world`.`Country`.`Population` > 5000000))",
   "cost_info": {
      "read_cost": "18.00",
      "eval_cost": "3.09",
      "prefix_cost": "24.83",
      "data_read_per_join": "3K"
   },
   "used_columns": [
      "Code",
      "Name",
      "Continent",
      "Region",
      "SurfaceArea",
      "IndepYear",
      "Population",
      "LifeExpectancy",
      "GNP",
      "GNPOld",
      "LocalName",
      "GovernmentForm",
      "HeadOfState",
      "Capital",
      "Code2"
   ]
   }
  }
}

确定复合索引的顺序

确定复合索引中列的正确顺序是一件棘手的事情。以下是一些需要注意的事项:

  1. 最左规则。一个索引(a, b)还可以用于需要索引(a)的查询,但不能用在需要索引(b)的查询。尝试以这样一种方式设计复合索引,使它们能够被尽可能多的查询复用。
  2. 范围在右。一个索引(a, b)不能用于满足查询WHERE a BETWEEN x AND y AND B = 'z'。或者更具体地说:复合索引的其余部分将不会在第一个范围条件之后使用。更通俗的说:a如果是个范围查询,那么b就无效了。
  3. 高选择性的列在左边。想想如何尽快消除不需要的行。这通常还可以降低内存的占用,因为需要访问的索引页更少。
  4. 更改索引顺序时要小心。混合ASCDESC可能会影响可以使用多少复合索引。

第2条中同前面说明的,IN(1, 2, 3)这种IN里面是固定值的查询可以用到复合索引的全部列。

覆盖索引

覆盖索引是一种特殊类型的复合索引,其中所有列都存在于索引中。在这种情况下,MySQL能够应用一种优化,它只从索引中返回数据而不访问回表获取行。

研究一个案例,我们使用条件population > 5000000 and continent='Asia',而且只想查询Country表的Name列,所以不要使用SELECT * FROM Country。如示例12所示,复合索引c_p_n (Continent,Population,Name)可以使用前两列过滤行,并从第三列返回值。

示例12:覆盖索引c_p_n

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
ALTER TABLE Country ADD INDEX c_p_n (Continent,Population,Name);
EXPLAIN FORMAT=JSON
SELECT Name FROM Country WHERE continent='Asia' and population > 5000000;
{
  "query_block": {
     "select_id": 1,
     "cost_info": {
     "query_cost": "8.07"   # 查询成本只有8.07,减少了67%
     },
     "table": {
     "table_name": "Country",
     "access_type": "range",
     "possible_keys": [
     "p",
     "c",
     "p_c",
     "c_p",
     "c_p_n"
     ],
     "key": "c_p_n",
     "used_key_parts": [
     "Continent",
     "Population"
     ],
     "key_length": "5",
     "rows_examined_per_scan": 32,
     "rows_produced_per_join": 15,
     "filtered": "100.00",
     "using_index": true,      # “using_index”代表使用了“覆盖索引”
     "cost_info": {
     "read_cost": "1.24",
     "eval_cost": "3.09",
     "prefix_cost": "8.07",
     "data_read_per_join": "3K"
     },
     "used_columns": [
     "Name",
     "Continent",
     "Population"
     ],
     "attached_condition": "((`world`.`Country`.`Continent` = 'Asia') and (`world`.`Country`.`Population` > 5000000))"
     }
  }
}

EXPLAIN输出"using_index":true表示使用了覆盖索引。覆盖索引是一个被低估的优化,许多从业者错误地认为覆盖索引只会“成本减半”,因为使用了索引,但没有触及表行。我们可以看到,示例12与示例11中的非覆盖索引相比,成本缩减近2/3

在生产环境中,由于索引的聚集效应,用覆盖索引做查询可能比用其他的做查询具有更好的内存利用。也就是说,如果要访问的二级索引与聚集索引(主键)不相关[1],则可能需要访问更多数量的聚集键页。

[1]:通过前面的内容,我的意思是“大致遵循相同的顺序”。例如,为列inserted_timestamp创建二级索引与auto_increment主键高度相关,而为populationcontinent创建的二级索引则不太可能与主键相关,主键是三个字母的国家代码。

Visual Explain

MySQL Workbench包含Visual Explain,它可以帮助你更容易理解复杂的执行计划。这个特性由内部EXPLAIN FORMAT=JSON提供支持的,因此需要注意的是,Visual EXPLAIN在常规EXPLAIN FORMAT=JSON之上不会有任何额外的输出。事实上,为了简单起见,它省略了一些输出,比如覆盖索引的使用。

Visual Explain访问方法用颜色区分:

  • ref是绿色
  • range是橙色
  • ALL(表扫描)和INDEX(索引扫描)为红色

正如我们在目前的示例中所看到的,这可能是一个小的简化,因为高选择性范围比低选择性ref访问更可取。

临时计划

有时类似的查询可能有非常不同的执行计划。到目前为止,我们已经通过修改总体查询条件看到了这一点:在示例8中,通过分别指定population > 500000000population > 5000000搜索条件,最后两者使用了不同的索引。

这在生产环境中非常常见,其中查询的部分可能由用户输入生成:

  • 在包含数月数据的数据集上指定从昨天到现在的日期范围可能会使用日期列上的索引,指定从去年到现在可能不会。需要查询的行更多,可能导致使用表扫描。
  • 查找记录WHERE is_deleted=1可能会过滤掉一些记录,非常适合索引。同样,WHERE is_deleted=0可能不会,因为全表数据都满足这个条件。

根据输入的查询的范围,以上两个示例也可用于分区(partitioning)章节说明。

这是预期的行为,并且是收集统计信息的结果(在前面的“元数据和统计信息”中提到过)。下表显示了索引的相对成本如何随着populationcontinent的变化而变化。(1M = 1000000, c:continent,p:population)

 p>5M, c=’Asia’p>5M, c=’Antarctica’p>50M, c=’Asia’p>50M, c=’Antarctica’p>500M, c=’Asia’p>500M, c=’Antarctica’
p152.21152.2134.6134.613.813.81
c28.206.0028.206.0028.206.00
c,p24.832.4116.412.413.812.41
p,c152.21152.2134.6134.613.813.81
表扫描53.8053.8053.8053.8053.8053.80
  • 表扫描有一个相对固定的成本,因为它总是查看表中的每一行。
  • 随着输入值能够更有效地过滤,使用pc索引的成本会发生变化(Antarctica的国家更少,population更多的国家更少)。
  • 复合索引(c,p)在降低大多数查询的成本方面是相对有效的。
  • 由于前面解释的原因,p(p,c)索引的成本是相同的。由于左侧范围,(p,c)索引将仅使用索引的p部分。
  • 在这个数据集中,Antarctica有5个国家,人口都为零。所以(c,p)上的复合索引导致成本最低的查询,因为结果为零。

这些信息也可以用视图表示出来。注意,在continent的表扫描ref访问具有固定的成本,并且当population选择性不高时,(continent, population)上的复合索引与continent上的ref访问的成本相同。随着population数量的增加,它变得更具有选择性。

如果复合索引不是很有效,你会看到一种情况,即它会更接近单个索引(population或continent)的成本。最后,由于所有population超过3亿的国家都在Asia,因此选择性并不比仅针对population的索引好。

下面的视图显示了执行100次时的中值,强制使用索引p并改变population范围(从WHERE population > 1MWHERE population > 500M)。执行时间取自performance_schema.Events_statements_history_long,并转换为微秒(μs)。我怀疑这是因为数据集很小,而且因为population很少的国家很少,所以在访问同一组不同的页面时,执行时间存在一些集群。但尽管如此,两者还是有一定的相关性。

这些可视化高度依赖于数据分布。在检查示例数据集时,你可能会注意到有一些不准确的地方。数据来自芬兰统计局(1990年前后),并没有统一更新。

子查询

优化器有许多可用于优化子查询的执行策略,包括将查询重写为连接(join)、半连接和物化。使用的策略取决于子查询的类型和位置。

标量子查询

标量子查询的子查询只返回一行,可以在执行期间进行优化和缓存。在示例13中,我们可以看到一个标量子查询的示例,它试图找到TorontoCountryCode。重要的是要确定优化器将此视为两个查询,成本分别为1.20和4862.60。

第二个查询(select_id: 2)缺少索引,因此正在执行表扫描。我可以看到这一点,因为在attached_condition (`City`.`Name`)中提到的列没有索引。添加索引之后,查询得到了优化。

示例13:标量子查询

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
EXPLAIN FORMAT=JSON
SELECT * FROM Country WHERE Code = (SELECT CountryCode FROM City WHERE name='Toronto');
{
  "query_block": {
+   "select_id": 1,
    "cost_info": {
      "query_cost": "1.20"
    },
    "table": {
      "table_name": "Country",
      "access_type": "ref",
      "possible_keys": [
        "PRIMARY"
      ],
      "key": "PRIMARY",
      "used_key_parts": [
        "Code"
      ],
      "key_length": "3",
      "ref": [
        "const"
      ],
      "rows_examined_per_scan": 1,
      "rows_produced_per_join": 1,
      "filtered": "100.00",
      "cost_info": {
        "read_cost": "1.00",
        "eval_cost": "0.20",
        "prefix_cost": "1.20",
        "data_read_per_join": "264"
      },
      "used_columns": [
        "Code",
        "Name",
        "Continent",
        "Region",
        "SurfaceArea",
        "IndepYear",
        "Population",
        "LifeExpectancy",
        "GNP",
        "GNPOld",
        "LocalName",
        "GovernmentForm",
        "HeadOfState",
        "Capital",
        "Code2"
      ],
      "attached_condition": "(`world`.`Country`.`Code` = (/* select#2 */ select `world`.`City`.`CountryCode` from `world`.`City` where (`world`.`City`.`Name` = 'Toronto')))",
      "attached_subqueries": [
        {
          "dependent": false,
          "cacheable": true,
          "query_block": {
+           "select_id": 2,
            "cost_info": {
+             "query_cost": "862.60"          # 查询成本862.60
            },
            "table": {
+             "table_name": "City",
+             "access_type": "ALL",
              "rows_examined_per_scan": 4188,
              "rows_produced_per_join": 418,
              "filtered": "10.00",
              "cost_info": {
                "read_cost": "778.84",
                "eval_cost": "83.76",
                "prefix_cost": "862.60",
                "data_read_per_join": "29K"
              },
              "used_columns": [
                "Name",
                "CountryCode"
              ],
              "attached_condition": "(`world`.`City`.`Name` = 'Toronto')"
            }
          }
        }
      ]
    }
  }
}

示例14:添加索引以改进标量子查询

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
# 添加索引
ALTER TABLE City ADD INDEX n (Name);
EXPLAIN FORMAT=JSON
SELECT * FROM Country WHERE Code = (SELECT CountryCode FROM City WHERE name='Toronto');
{
"query_block": {
"select_id": 1,
"cost_info": {
   "query_cost": "1.00"
},
"table": {
   "table_name": "Country",
   "access_type": "const",
   "possible_keys": [
   "PRIMARY"
   ],
   "key": "PRIMARY",
   "used_key_parts": [
   "Code"
   ],
   "key_length": "3",
   "ref": [
   "const"
   ],
   "rows_examined_per_scan": 1,
   "rows_produced_per_join": 1,
   "filtered": "100.00",
   "cost_info": {
   "read_cost": "0.00",
   "eval_cost": "1.00",
   "prefix_cost": "0.00",
   "data_read_per_join": "264"
   },
   "used_columns": [
   "Code",
   "Name",
   "Continent",
   "Region",
   "SurfaceArea",
   "IndepYear",
   "Population",
   "LifeExpectancy",
   "GNP",
   "GNPOld",
   "LocalName",
   "GovernmentForm",
   "HeadOfState",
   "Capital",
   "Code2"
   ]
},
"optimized_away_subqueries": [
   {
   "dependent": false,
   "cacheable": true,
   "query_block": {
   "select_id": 2,
   "cost_info": {
+     "query_cost": "2.00"        # 查询成本只有2.00了
   },
   "table": {
+     "table_name": "City",
+     "access_type": "ref",
      "possible_keys": [
         "n"
      ],
      "key": "n",
      "used_key_parts": [
         "Name"
      ],
      "key_length": "35",
      "ref": [
         "const"
      ],
      "rows_examined_per_scan": 1,
      "rows_produced_per_join": 1,
      "filtered": "100.00",
      "cost_info": {
         "read_cost": "1.00",
         "eval_cost": "1.00",
         "prefix_cost": "2.00",
         "data_read_per_join": "72"
      },
      "used_columns": [
         "Name",
         "CountryCode"
      ]
   }
   }
   }
]
}
}

IN子查询(Unique)

示例15显示了一个子查询,其中返回的是主键,所以可以确定每一个结果都是唯一的。因此,可以安全地将该子查询转换为INNER JOIN查询并返回相同的结果。通过在EXPLAIN之后执行SHOW WARNINGS,我们可以看到情况确实如此:

1
2
/* select#1 */ select `world`.`City`.`ID` AS `ID`,`world`.`City`.`Name` AS `Name`,`world`.`City`.`CountryCode` AS `CountryCode`,`world`.`City`.`District` AS `District`,`world`.`City`.`Population` AS `Population` from `world`.`Country` join `world`.`City` where ((`world`.`City`.`CountryCode` = `world`.`Country`.`Code`) and (`world`.`Country`.`Continent` = 'Asia'))
1 row in set (0.00 sec)

这个子查询的效率比较高。我们可以看到它首先访问Country表(只使用了覆盖索引),然后对于匹配的每一行,使用索引c (CountryCode)在City表中查找一组行。

示例15:一个可转换的IN子查询

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
EXPLAIN FORMAT=JSON
SELECT * FROM City WHERE CountryCode IN (SELECT Code FROM Country WHERE Continent = 'Asia');
{
 "query_block": {
   "select_id": 1,
   "cost_info": {
   "query_cost": "1893.30"
   },
   "nested_loop": [
   {
      "table": {
      "table_name": "Country",
      "access_type": "ref",
      "possible_keys": [
         "PRIMARY",
         "c"
      ],
      "key": "c",
      "used_key_parts": [
         "Continent"
      ],
      "key_length": "1",
      "ref": [
         "const"
      ],
      "rows_examined_per_scan": 51,
      "rows_produced_per_join": 51,
      "filtered": "100.00",
+     "using_index": true,
      "cost_info": {
         "read_cost": "1.02",
         "eval_cost": "51.00",
         "prefix_cost": "52.02",
         "data_read_per_join": "13K"
      },
      "used_columns": [
         "Code",
         "Continent"
      ]
      }
   },
   {
      "table": {
      "table_name": "City",
      "access_type": "ref",
      "possible_keys": [
         "CountryCode"
      ],
      "key": "CountryCode",
      "used_key_parts": [
         "CountryCode"
      ],
      "key_length": "3",
      "ref": [
         "world.Country.Code"
      ],
      "rows_examined_per_scan": 18,
      "rows_produced_per_join": 920,
      "filtered": "100.00",
      "cost_info": {
         "read_cost": "920.64",
         "eval_cost": "920.64",
         "prefix_cost": "1893.30",
         "data_read_per_join": "64K"
      },
      "used_columns": [
         "ID",
         "Name",
         "CountryCode",
         "District",
         "Population"
      ]
      }
   }
   ]
 }
}

IN子查询(非Unique)

在示例15中,子查询被重写为INNER JOIN,因为它返回了一组值都是唯一的数据。当子查询不是唯一的时候,MySQL优化器必须选择不同的策略。

在示例16中,子查询试图找到拥有至少一种官方语言(CountryCode)的国家。因为有些国家有不止一种官方语言,所以子查询的结果不是唯一的。

OPTIMIZER_TRACE(示例17)的输出表明,优化器识别出查询不能直接重写为JOIN语句,而需要一个“半连接”。优化器有几种策略来执行半连接(FirstMatch, MaterializeLookup, duplatesweedout)。在本例中,它认为materializing(创建缓冲区来存储临时结果)是执行该查询的成本最低的方法。

示例16:不能重写到INNER JOIN的子查询

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
EXPLAIN FORMAT=JSON
SELECT * FROM Country WHERE Code IN (SELECT CountryCode FROM CountryLanguage WHERE isOfficial=1);
{
 "query_block": {
   "select_id": 1,
   "cost_info": {
     "query_cost": "407.80"
   },
   "nested_loop": [
     {
       "table": {
         "table_name": "Country",
         "access_type": "ALL",
         "possible_keys": [
           "PRIMARY"
         ],
         "rows_examined_per_scan": 239,
         "rows_produced_per_join": 239,
         "filtered": "100.00",
         "cost_info": {
           "read_cost": "9.00",
           "eval_cost": "47.80",
           "prefix_cost": "56.80",
           "data_read_per_join": "61K"
         },
         "used_columns": [
           "Code",
           "Name",
           "Continent",
           "Region",
           "SurfaceArea",
           "IndepYear",
           "Population",
           "LifeExpectancy",
           "GNP",
           "GNPOld",
           "LocalName",
           "GovernmentForm",
           "HeadOfState",
           "Capital",
           "Code2"
         ],
         "attached_condition": "(`world`.`Country`.`Code` is not null)"
       }
     },
     {
       "table": {
+        "table_name": "<subquery2>",
+        "access_type": "eq_ref",
+        "key": "<auto_key>",
+        "key_length": "3",
         "ref": [
           "world.Country.Code"
         ],
         "rows_examined_per_scan": 1,
+        "materialized_from_subquery": {
+          "using_temporary_table": true,
           "query_block": {
             "table": {
               "table_name": "CountryLanguage",
               "access_type": "ALL",
               "possible_keys": [
                 "PRIMARY",
                 "CountryCode"
               ],
               "rows_examined_per_scan": 984,
               "rows_produced_per_join": 492,
               "filtered": "50.00",
               "cost_info": {
                 "read_cost": "104.40",
                 "eval_cost": "98.40",
                 "prefix_cost": "202.80",
                 "data_read_per_join": "19K"
               },
               "used_columns": [
                 "CountryCode",
                 "IsOfficial"
               ],
               "attached_condition": "(`world`.`CountryLanguage`.`IsOfficial` = 1)"
             }
           }
         }
       }
     }
   ]
 }
}

示例17:分析子查询的半连接策略

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
SET OPTIMIZER_TRACE="enabled=on";
SET optimizer_trace_max_mem_size = 1024 * 1024;
SELECT * FROM Country WHERE Code IN (SELECT CountryCode FROM CountryLanguage WHERE isOfficial=1);
SELECT * FROM information_schema.optimizer_trace;
{
 "steps": [
   {
     "join_preparation": {
       "select#": 1,
       "steps": [
         {
           "join_preparation": {
             "select#": 2,
             "steps": [
               {
                 "expanded_query": "/* select#2 */ select `CountryLanguage`.`CountryCode` from `CountryLanguage` where (`CountryLanguage`.`IsOfficial` = 1)"
               },
               {
                 "transformation": {
                   "select#": 2,
                   "from": "IN (SELECT)",
                   "to": "semijoin",
                   "chosen": true
                 }
               }
             ]
           }
         },
         {
           "expanded_query": "/* select#1 */ select `Country`.`Code` AS `Code`,`Country`.`Name` AS `Name`,`Country`.`Continent` AS `Continent`,`Country`.`Region` AS `Region`,`Country`.`SurfaceArea` AS `SurfaceArea`,`Country`.`IndepYear` AS `IndepYear`,`Country`.`Population` AS `Population`,`Country`.`LifeExpectancy` AS `LifeExpectancy`,`Country`.`GNP` AS `GNP`,`Country`.`GNPOld` AS `GNPOld`,`Country`.`LocalName` AS `LocalName`,`Country`.`GovernmentForm` AS `GovernmentForm`,`Country`.`HeadOfState` AS `HeadOfState`,`Country`.`Capital` AS `Capital`,`Country`.`Code2` AS `Code2` from `Country` where `Country`.`Code` in (/* select#2 */ select `CountryLanguage`.`CountryCode` from `CountryLanguage` where (`CountryLanguage`.`IsOfficial` = 1))"
         },
         {
           "transformation": {
             "select#": 2,
             "from": "IN (SELECT)",
             "to": "semijoin",
             "chosen": true,
             "evaluating_constant_semijoin_conditions": [
             ]
           }
         },
         {
           "transformations_to_nested_joins": {
             "transformations": [
               "semijoin"
             ],
             "expanded_query": "/* select#1 */ select `Country`.`Code` AS `Code`,`Country`.`Name` AS `Name`,`Country`.`Continent` AS `Continent`,`Country`.`Region` AS `Region`,`Country`.`SurfaceArea` AS `SurfaceArea`,`Country`.`IndepYear` AS `IndepYear`,`Country`.`Population` AS `Population`,`Country`.`LifeExpectancy` AS `LifeExpectancy`,`Country`.`GNP` AS `GNP`,`Country`.`GNPOld` AS `GNPOld`,`Country`.`LocalName` AS `LocalName`,`Country`.`GovernmentForm` AS `GovernmentForm`,`Country`.`HeadOfState` AS `HeadOfState`,`Country`.`Capital` AS `Capital`,`Country`.`Code2` AS `Code2` from `Country` semi join (`CountryLanguage`) where (1 and (`CountryLanguage`.`IsOfficial` = 1) and (`Country`.`Code` = `CountryLanguage`.`CountryCode`))"
           }
         }
       ]
     }
   },
   {
     "join_optimization": {
       "select#": 1,
       "steps": [
         {
           "condition_processing": {
             "condition": "WHERE",
             "original_condition": "(1 and (`CountryLanguage`.`IsOfficial` = 1) and (`Country`.`Code` = `CountryLanguage`.`CountryCode`))",
             "steps": [
               {
                 "transformation": "equality_propagation",
                 "resulting_condition": "(1 and (`CountryLanguage`.`IsOfficial` = 1) and multiple equal(`Country`.`Code`, `CountryLanguage`.`CountryCode`))"
               },
               {
                 "transformation": "constant_propagation",
                 "resulting_condition": "(1 and (`CountryLanguage`.`IsOfficial` = 1) and multiple equal(`Country`.`Code`, `CountryLanguage`.`CountryCode`))"
               },
               {
                 "transformation": "trivial_condition_removal",
                 "resulting_condition": "((`CountryLanguage`.`IsOfficial` = 1) and multiple equal(`Country`.`Code`, `CountryLanguage`.`CountryCode`))"
               }
             ]
           }
         },
         {
           "substitute_generated_columns": {
           }
         },
         {
           "table_dependencies": [
             {
               "table": "`Country`",
               "row_may_be_null": false,
               "map_bit": 0,
               "depends_on_map_bits": [
               ]
             },
             {
               "table": "`CountryLanguage`",
               "row_may_be_null": false,
               "map_bit": 1,
               "depends_on_map_bits": [
               ]
             }
           ]
         },
         {
           "ref_optimizer_key_uses": [
             {
               "table": "`Country`",
               "field": "Code",
               "equals": "`CountryLanguage`.`CountryCode`",
               "null_rejecting": false
             },
             {
               "table": "`CountryLanguage`",
               "field": "CountryCode",
               "equals": "`Country`.`Code`",
               "null_rejecting": false
             },
             {
               "table": "`CountryLanguage`",
               "field": "CountryCode",
               "equals": "`Country`.`Code`",
               "null_rejecting": false
             }
           ]
         },
         {
           "pulled_out_semijoin_tables": [
           ]
         },
         {
           "rows_estimation": [
             {
               "table": "`Country`",
               "table_scan": {
                 "rows": 239,
                 "cost": 9
               }
             },
             {
               "table": "`CountryLanguage`",
               "table_scan": {
                 "rows": 984,
                 "cost": 6
               }
             }
           ]
         },
         {
           "execution_plan_for_potential_materialization": {
             "steps": [
               {
                 "considered_execution_plans": [
                   {
                     "plan_prefix": [
                     ],
                     "table": "`CountryLanguage`",
                     "best_access_path": {
                       "considered_access_paths": [
                         {
                           "access_type": "ref",
                           "index": "PRIMARY",
                           "usable": false,
                           "chosen": false
                         },
                         {
                           "access_type": "ref",
                           "index": "CountryCode",
                           "usable": false,
                           "chosen": false
                         },
                         {
                           "rows_to_scan": 984,
                           "access_type": "scan",
                           "resulting_rows": 492,
                           "cost": 202.8,
                           "chosen": true
                         }
                       ]
                     },
                     "condition_filtering_pct": 100,
                     "rows_for_plan": 492,
                     "cost_for_plan": 202.8,
                     "chosen": true
                   }
                 ]
               }
             ]
           }
         },
         {
           "considered_execution_plans": [
             {
               "plan_prefix": [
               ],
               "table": "`Country`",
               "best_access_path": {
                 "considered_access_paths": [
                   {
                     "access_type": "ref",
                     "index": "PRIMARY",
                     "usable": false,
                     "chosen": false
                   },
                   {
                     "rows_to_scan": 239,
                     "access_type": "scan",
                     "resulting_rows": 239,
                     "cost": 56.8,
                     "chosen": true
                   }
                 ]
               },
               "condition_filtering_pct": 100,
               "rows_for_plan": 239,
               "cost_for_plan": 56.8,
               "semijoin_strategy_choice": [
               ],
               "rest_of_plan": [
                 {
                   "plan_prefix": [
                     "`Country`"
                   ],
                   "table": "`CountryLanguage`",
                   "best_access_path": {
                     "considered_access_paths": [
                       {
                         "access_type": "ref",
                         "index": "PRIMARY",
                         "rows": 4.2232,
                         "cost": 442.83,
                         "chosen": true
                       },
                       {
                         "access_type": "ref",
                         "index": "CountryCode",
                         "rows": 4.2232,
                         "cost": 1211.2,
                         "chosen": false
                       },
                       {
                         "rows_to_scan": 984,
                         "access_type": "scan",
                         "using_join_cache": true,
                         "buffers_needed": 1,
                         "resulting_rows": 492,
                         "cost": 23647,
                         "chosen": false
                       }
                     ]
                   },
                   "condition_filtering_pct": 50,
                   "rows_for_plan": 504.67,
                   "cost_for_plan": 499.63,
+                  "semijoin_strategy_choice": [
+                    {
+                      "strategy": "FirstMatch",
+                      "recalculate_access_paths_and_cost": {
+                        "tables": [
+                        ]
+                      },
+                      "cost": 499.63,
+                      "rows": 239,
+                      "chosen": true
+                    },
+                    {
+                      "strategy": "MaterializeLookup",
+                      "cost": 407.8,                          # Materialize成本最低
+                      "rows": 239,
+                      "duplicate_tables_left": false,
+                      "chosen": true
+                    },
+                    {
+                      "strategy": "DuplicatesWeedout",
+                      "cost": 650.36,
+                      "rows": 239,
+                      "duplicate_tables_left": false,
+                      "chosen": false
+                    }
+                  ],
+                  "chosen": true
+                }
               ]
             },
             {
               "plan_prefix": [
               ],
               "table": "`CountryLanguage`",
               "best_access_path": {
                 "considered_access_paths": [
                   {
                     "access_type": "ref",
                     "index": "PRIMARY",
                     "usable": false,
                     "chosen": false
                   },
                   {
                     "access_type": "ref",
                     "index": "CountryCode",
                     "usable": false,
                     "chosen": false
                   },
                   {
                     "rows_to_scan": 984,
                     "access_type": "scan",
                     "resulting_rows": 492,
                     "cost": 202.8,
                     "chosen": true
                   }
                 ]
               },
               "condition_filtering_pct": 100,
               "rows_for_plan": 492,
               "cost_for_plan": 202.8,
               "semijoin_strategy_choice": [
                 {
                   "strategy": "MaterializeScan",
                   "choice": "deferred"
                 }
               ],
               "rest_of_plan": [
                 {
                   "plan_prefix": [
                     "`CountryLanguage`"
                   ],
                   "table": "`Country`",
                   "best_access_path": {
                     "considered_access_paths": [
                       {
                         "access_type": "ref",
                         "index": "PRIMARY",
                         "rows": 1,
                         "cost": 590.4,
                         "chosen": true
                       },
                       {
                         "rows_to_scan": 239,
                         "access_type": "scan",
                         "using_join_cache": true,
                         "buffers_needed": 1,
                         "resulting_rows": 239,
                         "cost": 23527,
                         "chosen": false
                       }
                     ]
                   },
                   "condition_filtering_pct": 100,
                   "rows_for_plan": 492,
                   "cost_for_plan": 793.2,
                   "semijoin_strategy_choice": [
                     {
                       "strategy": "LooseScan",
                       "recalculate_access_paths_and_cost": {
                         "tables": [
                           {
                             "table": "`CountryLanguage`",
                             "best_access_path": {
                               "considered_access_paths": [
                                 {
                                   "access_type": "ref",
                                   "index": "PRIMARY",
                                   "usable": false,
                                   "chosen": false
                                 },
                                 {
                                   "access_type": "ref",
                                   "index": "CountryCode",
                                   "usable": false,
                                   "chosen": false
                                 },
                                 {
                                   "rows_to_scan": 984,
                                   "access_type": "scan",
                                   "resulting_rows": 492,
                                   "cost": 202.8,
                                   "chosen": true
                                 }
                               ]
                             },
                             "unknown_key_1": {
                               "searching_loose_scan_index": {
                                 "indexes": [
                                   {
                                     "index": "PRIMARY",
                                     "ref_possible": false,
                                     "covering_scan_possible": false
                                   },
                                   {
                                     "index": "CountryCode",
                                     "ref_possible": false,
                                     "covering_scan_possible": false
                                   }
                                 ]
                               }
                             }
                           }
                         ]
                       },
                       "chosen": false
                     },
                     {
                       "strategy": "MaterializeScan",
                       "recalculate_access_paths_and_cost": {
                         "tables": [
                           {
                             "table": "`Country`",
                             "best_access_path": {
                               "considered_access_paths": [
                                 {
                                   "access_type": "ref",
                                   "index": "PRIMARY",
                                   "rows": 1,
                                   "cost": 590.4,
                                   "chosen": true
                                 },
                                 {
                                   "rows_to_scan": 239,
                                   "access_type": "scan",
                                   "using_join_cache": true,
                                   "buffers_needed": 1,
                                   "resulting_rows": 239,
                                   "cost": 23527,
                                   "chosen": false
                                 }
                               ]
                             }
                           }
                         ]
                       },
                       "cost": 992,
                       "rows": 1,
                       "duplicate_tables_left": true,
                       "chosen": true
                     },
                     {
                       "strategy": "DuplicatesWeedout",
                       "cost": 941.4,
                       "rows": 239,
                       "duplicate_tables_left": false,
                       "chosen": true
                     }
                   ],
                   "pruned_by_cost": true
                 }
               ]
             },
+            {
+              "final_semijoin_strategy": "MaterializeLookup"       # Materialize最终被选中
+            }
           ]
         },
         {
           "creating_tmp_table": {
             "tmp_table_info": {
               "row_length": 4,
               "key_length": 3,
               "unique_constraint": false,
               "location": "memory (heap)",
               "row_limit_estimate": 4194304
             }
           }
         },
         {
           "attaching_conditions_to_tables": {
             "original_condition": "((`<subquery2>`.`CountryCode` = `Country`.`Code`) and (`CountryLanguage`.`IsOfficial` = 1))",
             "attached_conditions_computation": [
               {
                 "table": "`CountryLanguage`",
                 "rechecking_index_usage": {
                   "recheck_reason": "not_first_table",
                   "range_analysis": {
                     "table_scan": {
                       "rows": 984,
                       "cost": 204.9
                     },
                     "potential_range_indexes": [
                       {
                         "index": "PRIMARY",
                         "usable": true,
                         "key_parts": [
                           "CountryCode",
                           "Language"
                         ]
                       },
                       {
                         "index": "CountryCode",
                         "usable": true,
                         "key_parts": [
                           "CountryCode",
                           "Language"
                         ]
                       }
                     ],
                     "setup_range_conditions": [
                     ],
                     "group_index_range": {
                       "chosen": false,
                       "cause": "not_single_table"
                     }
                   }
                 }
               }
             ],
             "attached_conditions_summary": [
               {
                 "table": "`Country`",
                 "attached": "(`Country`.`Code` is not null)"
               },
               {
                 "table": "``.`<subquery2>`",
                 "attached": null
               },
               {
                 "table": "`CountryLanguage`",
                 "attached": "(`CountryLanguage`.`IsOfficial` = 1)"
               }
             ]
           }
         },
         {
           "refine_plan": [
             {
               "table": "`Country`"
             },
             {
               "table": "``.`<subquery2>`"
             },
             {
               "table": "`CountryLanguage`"
             }
           ]
         }
       ]
     }
   },
   {
     "join_execution": {
       "select#": 1,
       "steps": [
       ]
     }
   }
 ]
}

NOT IN的查询

NOT IN子查询既可以使用materialization策略,也可以使用已存在的策略。为了描述两者之间的区别,考虑以下两个例子:

  1. SELECT * FROM City WHERE CountryCode NOT IN (SELECT code FROM Country);
  2. SELECT * FROM City WHERE CountryCode NOT IN (SELECT code FROM Country WHERE continent IN (‘Asia’, ‘Europe’, ‘North America’));

在第一个查询中,内部子查询或多或少处于其最佳形式。code列是表的主键,可以通过索引扫描检索一组唯一的值。唯一的缺点(如果有的话)是索引扫描主键可能比索引扫描二级键慢,因为主键距离行值很远。EXPLAIN确认优化器确实为这个查询选择了一个materialization策略,但是这个策略可以被一个提示覆盖:

1
SELECT * FROM City WHERE CountryCode NOT IN (SELECT /*+ SUBQUERY(INTOEXISTS) */ code FROM Country);

在第二个查询中,有一个额外的查询条件:continent In ('Asia', 'Europe', 'North America'))。由于City表中的每一行都需要与not in中的查询进行比较,因此实现和缓存匹配的行是有意义的。这样就不必重复为City表中的每一行检查查询条件的过程。

示例18:使用物化的NOT IN子查询

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
EXPLAIN FORMAT=JSON
SELECT * FROM City WHERE CountryCode NOT IN (SELECT code FROM Country WHERE continent IN ('Asia', 'Europe', 'North America'));
{
 "query_block": {
   "select_id": 1,
   "cost_info": {
     "query_cost": "862.60"
   },
   "table": {
     "table_name": "City",
     "access_type": "ALL",
     "rows_examined_per_scan": 4188,
     "rows_produced_per_join": 4188,
     "filtered": "100.00",
     "cost_info": {
       "read_cost": "25.00",
       "eval_cost": "837.60",
       "prefix_cost": "862.60",
       "data_read_per_join": "294K"
     },
     "used_columns": [
       "ID",
       "Name",
       "CountryCode",
       "District",
       "Population"
     ],
     "attached_condition": "(not(<in_optimizer>(`world`.`City`.`CountryCode`,`world`.`City`.`CountryCode` in ( <materialize> (/* select#2 */ select `world`.`Country`.`Code` from `world`.`Country` where (`world`.`Country`.`Continent` in ('Asia','Europe','North America')) ), <primary_index_lookup>(`world`.`City`.`CountryCode` in <temporary table> on <auto_key> where ((`world`.`City`.`CountryCode` = `materialized-subquery`.`code`)))))))",
     "attached_subqueries": [
       {
         "table": {
+          "table_name": "<materialized_subquery>",
+          "access_type": "eq_ref",
+          "key": "<auto_key>",
+          "key_length": "3",
           "rows_examined_per_scan": 1,
           "materialized_from_subquery": {
             "using_temporary_table": true,
             "dependent": true,
             "cacheable": false,
             "query_block": {
               "select_id": 2,
               "cost_info": {
                 "query_cost": "54.67"
               },
               "table": {
                 "table_name": "Country",
                 "access_type": "range",
                 "possible_keys": [
                   "PRIMARY",
                   "c",
                   "c_p"
                 ],
                 "key": "c",
                 "used_key_parts": [
                   "Continent"
                 ],
                 "key_length": "1",
                 "rows_examined_per_scan": 134,
                 "rows_produced_per_join": 134,
                 "filtered": "100.00",
                 "using_index": true,
                 "cost_info": {
                   "read_cost": "27.87",
                   "eval_cost": "26.80",
                   "prefix_cost": "54.67",
                   "data_read_per_join": "34K"
                 },
                 "used_columns": [
                   "Code",
                   "Continent"
                 ],
                 "attached_condition": "(`world`.`Country`.`Continent` in ('Asia','Europe','North America'))"
               }
             }
           }
         }
       }
     ]
   }
 }
}

派生表

from子句中的子查询不需要materialized。MySQL通常可以将它“合并”回去,就像视图将它的定义与你将在查询中使用的查询条件合并一样。示例19显示了将派生查询合并到外部查询中的情况。

示例19:被“合并”回的派生表

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
EXPLAIN FORMAT=JSON
SELECT * FROM Country, (SELECT * FROM City WHERE CountryCode ='CAN' ) as CityTmp WHERE Country.code=CityTmp.CountryCode
AND CityTmp.name ='Toronto';
{
 "query_block": {
   "select_id": 1,
   "cost_info": {
   "query_cost": "2.00"
   },
   "nested_loop": [
   {
      "table": {
      "table_name": "Country",
      "access_type": "const",
      "possible_keys": [
         "PRIMARY"
      ],
      "key": "PRIMARY",
      "used_key_parts": [
         "Code"
      ],
      "key_length": "3",
      "ref": [
         "const"
      ],
      "rows_examined_per_scan": 1,
      "rows_produced_per_join": 1,
      "filtered": "100.00",
      "cost_info": {
         "read_cost": "0.00",
         "eval_cost": "1.00",
         "prefix_cost": "0.00",
         "data_read_per_join": "264"
      },
      "used_columns": [
         "Code",
         "Name",
         "Continent",
         "Region",
         "SurfaceArea",
         "IndepYear",
         "Population",
         "LifeExpectancy",
         "GNP",
         "GNPOld",
         "LocalName",
         "GovernmentForm",
         "HeadOfState",
         "Capital",
         "Code2"
      ]
      }
   },
   {
+     "table": {
+     "table_name": "City",
+     "access_type": "ref",
+     "possible_keys": [
+        "CountryCode",
+        "n"
+     ],
+     "key": "n",
      "used_key_parts": [
         "Name"
      ],
      "key_length": "35",
      "ref": [
         "const"
      ],
      "rows_examined_per_scan": 1,
      "rows_produced_per_join": 0,
      "filtered": "5.00",
      "cost_info": {
         "read_cost": "1.00",
         "eval_cost": "0.05",
         "prefix_cost": "2.00",
         "data_read_per_join": "3"
      },
      "used_columns": [
         "ID",
         "Name",
         "CountryCode",
         "District",
         "Population"
      ],
      "attached_condition": "(`world`.`City`.`CountryCode` = 'CAN')"
      }
   }
   ]
 }
}

这种“合并”的潜在缺点是,它使一些查询不再合法。如果升级时收到警告,可以选择禁用derived_merge优化。这将导致查询成本的增加,因为materialization可能很昂贵。

示例20:禁用合并和触发物化

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
SET optimizer_switch='derived_merge=off';
EXPLAIN FORMAT=JSON
SELECT * FROM Country, (SELECT * FROM City WHERE CountryCode ='CAN' ) as CityTmp WHERE Country.code=CityTmp.CountryCode
AND CityTmp.name ='Toronto';
{
  "query_block": {
   "select_id": 1,
   "cost_info": {
   "query_cost": "19.90"
   },
   "nested_loop": [
   {
      "table": {
         "table_name": "CityTmp",
         "access_type": "ref",
         "possible_keys": [
         "<auto_key0>"
         ],
         "key": "<auto_key0>",
         "used_key_parts": [
         "Name"
         ],
         "key_length": "35",
         "ref": [
         "const"
         ],
         "rows_examined_per_scan": 5,
         "rows_produced_per_join": 5,
         "filtered": "100.00",
         "cost_info": {
         "read_cost": "4.90",
         "eval_cost": "5.00",
         "prefix_cost": "9.90",
         "data_read_per_join": "360"
         },
         "used_columns": [
         "ID",
         "Name",
         "CountryCode",
         "District",
         "Population"
         ],
+        "materialized_from_subquery": {
+        "using_temporary_table": true,
+        "dependent": false,
+        "cacheable": true,
         "query_block": {
            "select_id": 2,
            "cost_info": {
               "query_cost": "98.00"
            },
            "table": {
               "table_name": "City",
               "access_type": "ref",
               "possible_keys": [
               "CountryCode"
               ],
               "key": "CountryCode",
               "used_key_parts": [
               "CountryCode"
               ],
               "key_length": "3",
               "ref": [
               "const"
               ],
               "rows_examined_per_scan": 49,
               "rows_produced_per_join": 49,
               "filtered": "100.00",
               "cost_info": {
               "read_cost": "49.00",
               "eval_cost": "49.00",
               "prefix_cost": "98.00",
               "data_read_per_join": "3K"
               },
               "used_columns": [
               "ID",
               "Name",
               "CountryCode",
               "District",
               "Population"
               ]
            }
         }
         }
      }
   },
   {
      "table": {
         "table_name": "Country",
         "access_type": "eq_ref",
         "possible_keys": [
         "PRIMARY"
         ],
         "key": "PRIMARY",
         "used_key_parts": [
         "Code"
         ],
         "key_length": "3",
         "ref": [
         "CityTmp.CountryCode"
         ],
         "rows_examined_per_scan": 1,
         "rows_produced_per_join": 5,
         "filtered": "100.00",
         "cost_info": {
         "read_cost": "5.00",
         "eval_cost": "5.00",
         "prefix_cost": "19.90",
         "data_read_per_join": "1K"
         },
         "used_columns": [
         "Code",
         "Name",
         "Continent",
         "Region",
         "SurfaceArea",
         "IndepYear",
         "Population",
         "LifeExpectancy",
         "GNP",
         "GNPOld",
         "LocalName",
         "GovernmentForm",
         "HeadOfState",
         "Capital",
         "Code2"
         ]
      }
   }
   ]
  }
}

CTE和VIEW

VIEW是一种保存查询以供以后重用的方法,它们对应用程序来说就像一个表一样,这允许将复杂的SQL查询分解并分阶段简化。CTE本身与VIEW非常相似,不同之处在于它们的生命周期较短,只与单个语句绑定。

MySQL优化器有两种主要的策略来执行CTE和VIEW:

  1. 合并(Merge)。转换查询,以便CTE或VIEW的定义与查询的其余部分合并。通过在运行EXPLAIN执行SHOW WARNINGS可以看到合并的结果。
  2. 物化(Materialize)。执行CTE或VIEW,并将结果存储在临时表中,查询的其余部分将对临时表执行。物化选项通常是较慢的方法,并且在合并选项不合适的时候被选择。也有例外,尽早物化可以缩减工作并导致更快的执行。

示例21:对合并视图的查询

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
CREATE VIEW vCountry_Asia AS SELECT * FROM Country WHERE Continent='Asia';
EXPLAIN FORMAT=JSON
SELECT * FROM vCountry_Asia WHERE Name='China';
{
 "query_block": {
   "select_id": 1,
   "cost_info": {
     "query_cost": "1.20"
   },
   "table": {
+    "table_name": "Country",  # 表是原始表名,而不是视图名
     "access_type": "ref",
     "possible_keys": [
       "c",
       "c_p",
       "Name"
     ],
     "key": "Name",
     "used_key_parts": [
       "Name",
       "Continent"
     ],
     "key_length": "53",
     "ref": [
       "const",
       "const"
     ],
     "rows_examined_per_scan": 1,
     "rows_produced_per_join": 1,
     "filtered": "100.00",
     "cost_info": {
       "read_cost": "1.00",
       "eval_cost": "0.20",
       "prefix_cost": "1.20",
       "data_read_per_join": "264"
     },
     "used_columns": [
       "Code",
       "Name",
       "Continent",
       "Region",
       "SurfaceArea",
       "IndepYear",
       "Population",
       "LifeExpectancy",
       "GNP",
       "GNPOld",
       "LocalName",
       "GovernmentForm",
       "HeadOfState",
       "Capital",
       "Code2"
     ]
   }
 }
}

SHOW WARNINGS;
/* select#1 */ select `world`.`Country`.`Code` AS `Code`,`world`.`Country`.`Name` AS `Name`,`world`.`Country`.`Continent` AS `Continent`,`world`.`Country`.`Region` AS `Region`,`world`.`Country`.`SurfaceArea` AS `SurfaceArea`,`world`.`Country`.`IndepYear` AS `IndepYear`,`world`.`Country`.`Population` AS `Population`,`world`.`Country`.`LifeExpectancy` AS `LifeExpectancy`,`world`.`Country`.`GNP` AS `GNP`,`world`.`Country`.`GNPOld` AS `GNPOld`,`world`.`Country`.`LocalName` AS `LocalName`,`world`.`Country`.`GovernmentForm` AS `GovernmentForm`,`world`.`Country`.`HeadOfState` AS `HeadOfState`,`world`.`Country`.`Capital` AS `Capital`,`world`.`Country`.`Code2` AS `Code2` from `world`.`Country` where ((`world`.`Country`.`Continent` = 'Asia') and (`world`.`Country`.`Name` = 'China'))

示例22:视图上的查询不能合并

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
CREATE VIEW vCountrys_Per_Continent AS SELECT Continent, COUNT(*) as Count FROM Country
GROUP BY Continent;
EXPLAIN FORMAT=JSON
SELECT * FROM vCountrys_Per_Continent WHERE Continent='Asia';
{
 "query_block": {
   "select_id": 1,
   "cost_info": {
     "query_cost": "12.47"
   },
   "table": {
+    "table_name": "vCountrys_Per_Continent",  # 是视图名
     "access_type": "ref",
     "possible_keys": [
       "<auto_key0>"
     ],
     "key": "<auto_key0>",
     "used_key_parts": [
       "Continent"
     ],
     "key_length": "1",
     "ref": [
       "const"
     ],
     "rows_examined_per_scan": 10,
     "rows_produced_per_join": 10,
     "filtered": "100.00",
     "cost_info": {
       "read_cost": "10.39",
       "eval_cost": "2.08",
       "prefix_cost": "12.47",
       "data_read_per_join": "166"
     },
     "used_columns": [
       "Continent",
       "Count"
     ],
     "materialized_from_subquery": {
       "using_temporary_table": true,
       "dependent": false,
       "cacheable": true,
       "query_block": {
         "select_id": 2,
         "cost_info": {
           "query_cost": "56.80"
         },
         "grouping_operation": {
           "using_filesort": false,
           "table": {
             "table_name": "Country",
             "access_type": "index",
             "possible_keys": [
               "PRIMARY",
               "p",
               "c",
               "p_c",
               "c_p",
               "Name"
             ],
             "key": "c",
             "used_key_parts": [
               "Continent"
             ],
             "key_length": "1",
             "rows_examined_per_scan": 239,
             "rows_produced_per_join": 239,
             "filtered": "100.00",
             "using_index": true,
             "cost_info": {
               "read_cost": "9.00",
               "eval_cost": "47.80",
               "prefix_cost": "56.80",
               "data_read_per_join": "61K"
             },
             "used_columns": [
               "Code",
               "Continent"
             ]
           }
         }
       }
     }
   }
 }
}

SHOW WARNINGS;
/* select#1 */ select `vCountrys_Per_Continent`.`Continent` AS `Continent`,`vCountrys_Per_Continent`.`Count` AS `Count` from `world`.`vCountrys_Per_Continent` where (`vCountrys_Per_Continent`.`Continent` = 'Asia')

连接(JOIN)

MySQL使用嵌套循环连接算法执行连接。它不支持其他一些数据库中可能提供的哈希或排序合并连接(sort merge joins),这使得它不太适合某些分析/数据仓库风格的查询。不过,MySQL优化器确实有缓冲策略,可以减少嵌套循环连接的最坏情况。

嵌套循环连接

示例23显示了CountryCityCountryLanguage表之间的3表JOIN。执行该查询的完整步骤如下:

  1. 优化器必须首先决定一个驱动表(Country),以及用以连接其余表的索引(City, CountryLanguage)。
  2. 执行从遍历第一个表开始(Country)一次一行。对于匹配过滤条件的每一行(Country.Continent=’Asia’),则会查询下一个表(City)。
  3. 对于City表中匹配的每一行,将在最终的表(CountryLanguage)中进行查找。CountryLanguage表将应用一个额外的过滤器:检查IsOfficial = 1

当工作可以在连接开始之前消除时,嵌套循环算法工作得最好。这意味着“最好的情况”是驱动表有许多高度选择性的查询条件。

最坏的情况是,查询条件分散在许多表中,在需要连接所有表之前,索引不能消除足够的工作。在这种情况下,对数据库进行反规范化设计是很常见的。

反规范化很有帮助,因为通过在驱动表上冗余地维护一组列,可以添加复合索引,并且可以在必须访问行或连接到其他表之前应用过滤。

示例23:在三个表上的嵌套循环连接

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
EXPLAIN FORMAT=JSON
SELECT
  Country.Name as Country, City.Name as Capital, Language
FROM
  City
  INNER JOIN Country ON Country.Capital=City.id
  INNER JOIN CountryLanguage ON CountryLanguage.CountryCode=Country.code
WHERE
  Country.Continent='Asia' and CountryLanguage.IsOfficial='T';
{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "3.42"
    },
    "nested_loop": [
      {
        "table": {
+         "table_name": "Country",
          "access_type": "const",
          "possible_keys": [
            "PRIMARY"
          ],
          "key": "PRIMARY",
          "used_key_parts": [
            "Code"
          ],
          "key_length": "3",
          "ref": [
            "const"
          ],
          "rows_examined_per_scan": 1,
          "rows_produced_per_join": 1,
          "filtered": "100.00",
          "cost_info": {
            "read_cost": "0.00",
            "eval_cost": "0.20",
            "prefix_cost": "0.00",
            "data_read_per_join": "264"
          },
          "used_columns": [
            "Code",
            "Name",
            "Capital"
          ]
        }
      },
      {
        "table": {
+         "table_name": "City",
          "access_type": "const",
          "possible_keys": [
            "PRIMARY"
          ],
          "key": "PRIMARY",
          "used_key_parts": [
            "ID"
          ],
          "key_length": "4",
          "ref": [
            "const"
          ],
          "rows_examined_per_scan": 1,
          "rows_produced_per_join": 1,
          "filtered": "100.00",
          "cost_info": {
            "read_cost": "0.00",
            "eval_cost": "0.20",
            "prefix_cost": "0.00",
            "data_read_per_join": "72"
          },
          "used_columns": [
            "ID",
            "Name"
          ]
        }
      },
      {
        "table": {
+         "table_name": "CountryLanguage",
          "access_type": "ref",
          "possible_keys": [
            "PRIMARY",
            "CountryCode"
          ],
          "key": "PRIMARY",
          "used_key_parts": [
            "CountryCode"
          ],
          "key_length": "3",
          "ref": [
            "const"
          ],
          "rows_examined_per_scan": 12,
          "rows_produced_per_join": 6,
          "filtered": "50.00",
          "cost_info": {
            "read_cost": "1.02",
            "eval_cost": "1.20",
            "prefix_cost": "3.42",
            "data_read_per_join": "240"
          },
          "used_columns": [
            "CountryCode",
            "Language",
            "IsOfficial"
          ],
          "attached_condition": "(`world`.`CountryLanguage`.`IsOfficial` = 'T')"
        }
      }
    ]
  }
}

INNER JOIN

INNER JOIN在语义上要求在连接的左边和右边都存在一行。有了这种语义,就意味着MySQL能够以任意一种顺序连接两个表。选择成本最低的顺序是优化器的责任。

LEFT and RIGHT JOIN

LEFT JOIN在语义上意味着连接右侧的一行是可选的(因此,RIGHT JOIN也意味着左侧是可选的)。因为连接的一端是可选的,所以执行被强制从首先存在的一端开始访问。因为强制了顺序,所以优化器不能考虑所有可以应用于INNER JOIN的潜在连接顺序(执行计划)。因此,这导致了简化,LEFT JOIN较慢。

条件展开过滤器

从MySQL 5.7开始,优化器现在考虑除了驱动表之外的条件的过滤效果。这个condition_filtering用于细化后续表的连接顺序。

条件过滤非常适合于直方图(histograms),因为如果没有它,它将使用一种更简单的启发式,这种启发式在数据倾斜或不平衡时可能不准确。它还将利用索引提供的统计信息(索引的成本更高)。

在MySQL 5.7中,大量的查询回归是由于条件扇出过滤器和倾斜的数据集。在MySQL 8.0中,可以考虑将直方图添加到用于连接条件中使用的表的列中。

聚合

GROUP BY

GROUP BY操作需要按照排序顺序读取行,或者需要一个临时表来缓冲中间结果以进行聚合。也就是说,MySQL可以通过以下方式为GROUP BY使用索引:

  1. 宽松的索引扫描。如果GROUP BY条件被索引,MySQL可能会选择从头到尾扫描索引,避免出现中间结果。如果没有非常有选择性的查询条件,并且要创建的临时表非常大,那么这种操作是首选的。

  2. 过滤行。索引可用于标识行,然后将这些行存储在临时表中。然后将结果汇总到临时表中,默认情况下按GROUP BY条件排序。

    此行为已弃用,将来可能会删除。如果不需要排序,则建议显式排序GROUP BY x ORDER BY NULL。

  3. 过滤和排序的组合。此优化适用于过滤行的索引已经具有GROUP BY操作的正确顺序的数据。

示例24:使用松散索引扫描的GROUP BY

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
EXPLAIN FORMAT=JSON
SELECT count(*) as c, continent FROM Country GROUP BY continent;
{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "56.80"
    },
    "grouping_operation": {
+     "using_filesort": false,
      "table": {
        "table_name": "Country",
+       "access_type": "index",
        "possible_keys": [
          "PRIMARY",
          "p",
          "c",
          "p_c",
          "c_p",
          "Name"
        ],
        "key": "c",
        "used_key_parts": [
          "Continent"
        ],
        "key_length": "1",
        "rows_examined_per_scan": 239,
        "rows_produced_per_join": 239,
        "filtered": "100.00",
        "using_index": true,
        "cost_info": {
          "read_cost": "9.00",
          "eval_cost": "47.80",
          "prefix_cost": "56.80",
          "data_read_per_join": "61K"
        },
        "used_columns": [
          "Code",
          "Continent"
        ]
      }
    }
  }
}

示例25:使用索引然后排序的GROUP BY

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
EXPLAIN FORMAT=JSON
SELECT count(*) as c, continent FROM Country WHERE population > 500000000 GROUP BY continent;
{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "3.81"
    },
    "grouping_operation": {
+     "using_temporary_table": true,
+     "using_filesort": true,
      "cost_info": {
        "sort_cost": "2.00"
      },
      "table": {
        "table_name": "Country",
+       "access_type": "range",
        "possible_keys": [
          "PRIMARY",
          "p",
          "c",
          "p_c",
          "c_p",
          "Name"
        ],
        "key": "p",
        "used_key_parts": [
          "Population"
        ],
        "key_length": "4",
        "rows_examined_per_scan": 2,
        "rows_produced_per_join": 2,
        "filtered": "100.00",
        "using_index": true,
        "cost_info": {
          "read_cost": "1.41",
          "eval_cost": "0.40",
          "prefix_cost": "1.81",
          "data_read_per_join": "528"
        },
        "used_columns": [
          "Code",
          "Continent",
          "Population"
        ],
        "attached_condition": "(`world`.`Country`.`Population` > 500000000)"
      }
    }
  }
}

示例26:使用索引进行过滤和排序的GROUP BY

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
EXPLAIN FORMAT=JSON
SELECT count(*) as c, continent FROM Country WHERE continent='Asia' GROUP BY continent;
{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "11.23"
    },
    "grouping_operation": {
+     "using_filesort": false,
      "table": {
        "table_name": "Country",
+       "access_type": "ref",
        "possible_keys": [
          "PRIMARY",
          "p",
          "c",
          "p_c",
          "c_p",
          "Name"
        ],
        "key": "c",
        "used_key_parts": [
          "Continent"
        ],
        "key_length": "1",
        "ref": [
          "const"
        ],
        "rows_examined_per_scan": 51,
        "rows_produced_per_join": 51,
        "filtered": "100.00",
        "using_index": true,
        "cost_info": {
          "read_cost": "1.03",
          "eval_cost": "10.20",
          "prefix_cost": "11.23",
          "data_read_per_join": "13K"
        },
        "used_columns": [
          "Continent"
        ]
      }
    }
  }
}

UNION

MySQL没有对UNION应用任何特殊优化,它在语义上将两个查询的结果组合在一起并删除重复项。如示例27所示,重复数据删除是在中间临时表中执行的。临时表用于所有UNION查询,因此没有分配成本(或可能基于成本的优化)。

简单的UNION的例子:

1
2
3
SELECT * FROM City WHERE CountryCode = 'CAN'
UNION
SELECT * FROM City WHERE CountryCode = 'USA'

假想的优化:

1
SELECT * FROM City WHERE CountryCode IN ('CAN', 'USA')

与子查询和视图不同(对同一个表的多次访问可以在内部合并为一次访问),MySQL不会对UNION查询执行类似的优化。它也不会发现任何不可能重复的情况,并且可以将UNION重写为UNION ALL。这留下了许多情况,熟练的操作员可以对查询进行手动修改(在应用程序中或通过查询重写)并提高性能。

示例27:一个需要临时表的联合查询

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
EXPLAIN FORMAT=JSON
SELECT * FROM City WHERE CountryCode = 'CAN'
UNION
SELECT * FROM City WHERE CountryCode = 'USA'
{
  "query_block": {
    "union_result": {
+     "using_temporary_table": true,  # 临时表
+     "table_name": "<union1,2>",     # 合并两个查询的结果
+     "access_type": "ALL",
      "query_specifications": [
        {
          "dependent": false,
          "cacheable": true,
          "query_block": {
            "select_id": 1,
            "cost_info": {
              "query_cost": "58.80"
            },
            "table": {
              "table_name": "City",
              "access_type": "ref",
              "possible_keys": [
                "CountryCode"
              ],
              "key": "CountryCode",
              "used_key_parts": [
                "CountryCode"
              ],
              "key_length": "3",
              "ref": [
                "const"
              ],
              "rows_examined_per_scan": 49,
              "rows_produced_per_join": 49,
              "filtered": "100.00",
              "cost_info": {
                "read_cost": "49.00",
                "eval_cost": "9.80",
                "prefix_cost": "58.80",
                "data_read_per_join": "3K"
              },
              "used_columns": [
                "ID",
                "Name",
                "CountryCode",
                "District",
                "Population"
              ]
            }
          }
        },
        {
          "dependent": false,
          "cacheable": true,
          "query_block": {
            "select_id": 2,
            "cost_info": {
              "query_cost": "129.80"
            },
            "table": {
              "table_name": "City",
              "access_type": "ref",
              "possible_keys": [
                "CountryCode"
              ],
              "key": "CountryCode",
              "used_key_parts": [
                "CountryCode"
              ],
              "key_length": "3",
              "ref": [
                "const"
              ],
              "rows_examined_per_scan": 274,
              "rows_produced_per_join": 274,
              "filtered": "100.00",
              "cost_info": {
                "read_cost": "75.00",
                "eval_cost": "54.80",
                "prefix_cost": "129.80",
                "data_read_per_join": "19K"
              },
              "used_columns": [
                "ID",
                "Name",
                "CountryCode",
                "District",
                "Population"
              ]
            }
          }
        }
      ]
    }
  }
}

UNION ALL

UNION ALL在语义上与UNION相似,但有一个重要的区别:不需要删除重复数据。这意味着在某些情况下,MySQL能够直接传递UNION ALL查询的结果,而不必在中间表中物化和去重复行。

在内部,总是为UNION ALL查询创建一个临时表,但是是否需要使用它来物化行可以在EXPLAIN中看到。示例28显示了一个使用UNION ALL的查询示例。在添加ORDER BY之后,查询现在需要使用中间临时表。

示例28:没有临时表的UNION ALL

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
EXPLAIN FORMAT=JSON
SELECT * FROM City WHERE CountryCode = 'CAN'
UNION ALL
SELECT * FROM City WHERE CountryCode = 'USA';
{
  "query_block": {
+   "union_result": {
+     "using_temporary_table": false,   # 不需要临时表
      "query_specifications": [
        {
          "dependent": false,
          "cacheable": true,
          "query_block": {
            "select_id": 1,
            "cost_info": {
              "query_cost": "58.80"
            },
            "table": {
              "table_name": "City",
              "access_type": "ref",
              "possible_keys": [
                "CountryCode"
              ],
              "key": "CountryCode",
              "used_key_parts": [
                "CountryCode"
              ],
              "key_length": "3",
              "ref": [
                "const"
              ],
              "rows_examined_per_scan": 49,
              "rows_produced_per_join": 49,
              "filtered": "100.00",
              "cost_info": {
                "read_cost": "49.00",
                "eval_cost": "9.80",
                "prefix_cost": "58.80",
                "data_read_per_join": "3K"
              },
              "used_columns": [
                "ID",
                "Name",
                "CountryCode",
                "District",
                "Population"
              ]
            }
          }
        },
        {
          "dependent": false,
          "cacheable": true,
          "query_block": {
            "select_id": 2,
            "cost_info": {
              "query_cost": "129.80"
            },
            "table": {
              "table_name": "City",
              "access_type": "ref",
              "possible_keys": [
                "CountryCode"
              ],
              "key": "CountryCode",
              "used_key_parts": [
                "CountryCode"
              ],
              "key_length": "3",
              "ref": [
                "const"
              ],
              "rows_examined_per_scan": 274,
              "rows_produced_per_join": 274,
              "filtered": "100.00",
              "cost_info": {
                "read_cost": "75.00",
                "eval_cost": "54.80",
                "prefix_cost": "129.80",
                "data_read_per_join": "19K"
              },
              "used_columns": [
                "ID",
                "Name",
                "CountryCode",
                "District",
                "Population"
              ]
            }
          }
        }
      ]
    }
  }
}

示例29:由于ORDER BY的原因,需要一个临时表的UNION ALL

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
EXPLAIN FORMAT=JSON
SELECT * FROM City WHERE CountryCode = 'CAN'
UNION ALL
SELECT * FROM City WHERE CountryCode = 'USA' ORDER BY Name;
{
  "query_block": {
    "union_result": {
+     "using_temporary_table": true,  # 因为ORDER BY,所以需要一个临时表
      "table_name": "<union1,2>",
      "access_type": "ALL",
      "query_specifications": [
        {
          "dependent": false,
          "cacheable": true,
          "query_block": {
            "select_id": 1,
            "cost_info": {
              "query_cost": "58.80"
            },
            "table": {
              "table_name": "City",
              "access_type": "ref",
              "possible_keys": [
                "CountryCode"
              ],
              "key": "CountryCode",
              "used_key_parts": [
                "CountryCode"
              ],
              "key_length": "3",
              "ref": [
                "const"
              ],
              "rows_examined_per_scan": 49,
              "rows_produced_per_join": 49,
              "filtered": "100.00",
              "cost_info": {
                "read_cost": "49.00",
                "eval_cost": "9.80",
                "prefix_cost": "58.80",
                "data_read_per_join": "3K"
              },
              "used_columns": [
                "ID",
                "Name",
                "CountryCode",
                "District",
                "Population"
              ]
            }
          }
        },
        {
          "dependent": false,
          "cacheable": true,
          "query_block": {
            "select_id": 2,
            "cost_info": {
              "query_cost": "129.80"
            },
            "table": {
              "table_name": "City",
              "access_type": "ref",
              "possible_keys": [
                "CountryCode"
              ],
              "key": "CountryCode",
              "used_key_parts": [
                "CountryCode"
              ],
              "key_length": "3",
              "ref": [
                "const"
              ],
              "rows_examined_per_scan": 274,
              "rows_produced_per_join": 274,
              "filtered": "100.00",
              "cost_info": {
                "read_cost": "75.00",
                "eval_cost": "54.80",
                "prefix_cost": "129.80",
                "data_read_per_join": "19K"
              },
              "used_columns": [
                "ID",
                "Name",
                "CountryCode",
                "District",
                "Population"
              ]
            }
          }
        }
      ]
    }
  }
}

排序

MySQL有四种方法以排序的顺序返回结果,这可能需要作为ORDER BYGROUP BY的一部分(没有ORDER BY NULL)。EXPLAIN将显示数据是否需要排序操作,但它不会列出使用的排序算法,该信息可以在OPTIMIZER_TRACE中找到。返回排序结果的四种方法如下:

  1. 通过一个索引。B+树索引是按排序顺序维护的,因此一些ORDER BY查询根本不需要排序。

  2. 通过优先队列。带有少量LIMITORDER BY可以将完整的结果集存储在一个临时缓冲区中。例如,考虑以下查询:
    1
    
     SELECT * FROM Country IGNORE INDEX (p, p_c) ORDER BY population LIMIT 10;
    

    此查询将进行表扫描,并保留具有10个最高人口(population)的行的缓冲区。由于较新的行被识别为具有较高的人口(population),较早的行可以从优先级队列推掉。

  3. 通过备选排序算法。如果没有TEXT或BLOB列,则使用此算法。它在MySQL手册中定义为:
    1. 读取WHERE子句匹配的行。
    2. 对于每一行,记录由排序键值和查询引用的附加字段组成的值元组。
    3. 当排序缓冲区满时,按内存中的排序键值对元组进行排序,并将其写入一个临时文件。
    4. 在对临时文件进行合并排序之后,按排序顺序检索行,直接从排序的元组中读取所需的列。
  4. 通过原始的排序算法。当出现TEXT或BLOB列时,将使用此算法。它在MySQL手册中定义为:
    1. 根据键或通过表扫描读取所有行。跳过不匹配WHERE子句的行。
    2. 对于每一行,在排序缓冲区中存储一个由一对值(排序键值和行ID)组成的元组。
    3. 如果每一对元组都适合排序缓冲区,则不会创建临时文件。否则,当排序缓冲区满时,在内存中对其运行qsort(快速排序),并将其写入一个临时文件。保存一个指向排序块的指针。
    4. 重复上述步骤,直到读取所有行。
    5. 将多个区域合并到另一个临时文件中的一个块中。重复操作,直到第一个文件中的所有块都在第二个文件中。
    6. 重复以下操作,直到剩下的块少于MERGEBUFF2(15)。
    7. 在最后一次多合并中,只有行ID(元组的最后一部分)被写入结果文件。
    8. 使用结果文件中的行ID按排序顺序读取行。要优化这一点,可以读取大块的行ID,对它们进行排序,然后使用它们按顺序将行读取到行缓冲区中。行缓冲区大小为read_rnd_buffer_size系统变量的值。该步骤的代码在sql/records.cc源文件。

示例30:由索引提供的排序

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
EXPLAIN FORMAT=JSON
SELECT * FROM Country WHERE continent='Asia' ORDER BY population;
{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "34.10"
    },
    "ordering_operation": {
+     "using_filesort": false,  # 由索引c_p提供排序,而不是使用文件排序
      "table": {
        "table_name": "Country",
        "access_type": "ref",
        "possible_keys": [
          "c",
          "c_p"
        ],
        "key": "c_p",
        "used_key_parts": [
          "Continent"
        ],
        "key_length": "1",
        "ref": [
          "const"
        ],
        "rows_examined_per_scan": 51,
        "rows_produced_per_join": 51,
        "filtered": "100.00",
        "index_condition": "(`world`.`Country`.`Continent` <=> 'Asia')",
        "cost_info": {
          "read_cost": "23.90",
          "eval_cost": "10.20",
          "prefix_cost": "34.10",
          "data_read_per_join": "13K"
        },
        "used_columns": [
          "Code",
          "Name",
          "Continent",
          "Region",
          "SurfaceArea",
          "IndepYear",
          "Population",
          "LifeExpectancy",
          "GNP",
          "GNPOld",
          "LocalName",
          "GovernmentForm",
          "HeadOfState",
          "Capital",
          "Code2"
        ]
      }
    }
  }
}

示例31:OPTIMIZER_TRACE显示优先级队列的使用

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
SET OPTIMIZER_TRACE="ENABLED=on";
SELECT * FROM Country IGNORE INDEX (p, p_c) ORDER BY population LIMIT 10;
SELECT * FROM information_schema.optimizer_trace\G
{
  "steps": [
    {
      "join_preparation": {
        "select#": 1,
        "steps": [
          {
            "expanded_query": "/* select#1 */ select `Country`.`Code` AS `Code`,`Country`.`Name` AS `Name`,`Country`.`Continent` AS `Continent`,`Country`.`Region` AS `Region`,`Country`.`SurfaceArea` AS `SurfaceArea`,`Country`.`IndepYear` AS `IndepYear`,`Country`.`Population` AS `Population`,`Country`.`LifeExpectancy` AS `LifeExpectancy`,`Country`.`GNP` AS `GNP`,`Country`.`GNPOld` AS `GNPOld`,`Country`.`LocalName` AS `LocalName`,`Country`.`GovernmentForm` AS `GovernmentForm`,`Country`.`HeadOfState` AS `HeadOfState`,`Country`.`Capital` AS `Capital`,`Country`.`Code2` AS `Code2` from `Country` IGNORE INDEX (`p_c`) IGNORE INDEX (`p`) order by `Country`.`Population` limit 10"
          }
        ]
      }
    },
    {
      "join_optimization": {
        "select#": 1,
        "steps": [
          {
            "substitute_generated_columns": {
            }
          },
          {
            "table_dependencies": [
              {
                "table": "`Country` IGNORE INDEX (`p_c`) IGNORE INDEX (`p`)",
                "row_may_be_null": false,
                "map_bit": 0,
                "depends_on_map_bits": [
                ]
              }
            ]
          },
          {
            "rows_estimation": [
              {
                "table": "`Country` IGNORE INDEX (`p_c`) IGNORE INDEX (`p`)",
                "table_scan": {
                  "rows": 239,
                  "cost": 9
                }
              }
            ]
          },
          {
            "considered_execution_plans": [
              {
                "plan_prefix": [
                ],
                "table": "`Country` IGNORE INDEX (`p_c`) IGNORE INDEX (`p`)",
                "best_access_path": {
                  "considered_access_paths": [
                    {
                      "rows_to_scan": 239,
                      "access_type": "scan",
                      "resulting_rows": 239,
                      "cost": 56.8,
                      "chosen": true
                    }
                  ]
                },
                "condition_filtering_pct": 100,
                "rows_for_plan": 239,
                "cost_for_plan": 56.8,
                "chosen": true
              }
            ]
          },
          {
            "attaching_conditions_to_tables": {
              "original_condition": null,
              "attached_conditions_computation": [
              ],
              "attached_conditions_summary": [
                {
                  "table": "`Country` IGNORE INDEX (`p_c`) IGNORE INDEX (`p`)",
                  "attached": null
                }
              ]
            }
          },
          {
            "clause_processing": {
              "clause": "ORDER BY",
              "original_clause": "`Country`.`Population`",
              "items": [
                {
                  "item": "`Country`.`Population`"
                }
              ],
              "resulting_clause_is_simple": true,
              "resulting_clause": "`Country`.`Population`"
            }
          },
          {
            "reconsidering_access_paths_for_index_ordering": {
              "clause": "ORDER BY",
              "index_order_summary": {
                "table": "`Country` IGNORE INDEX (`p_c`) IGNORE INDEX (`p`)",
                "index_provides_order": false,
                "order_direction": "undefined",
                "index": "unknown",
                "plan_changed": false
              }
            }
          },
          {
            "refine_plan": [
              {
                "table": "`Country` IGNORE INDEX (`p_c`) IGNORE INDEX (`p`)"
              }
            ]
          }
        ]
      }
    },
    {
      "join_execution": {
        "select#": 1,
        "steps": [
          {
            "filesort_information": [
              {
                "direction": "asc",
                "table": "`Country` IGNORE INDEX (`p_c`) IGNORE INDEX (`p`)",
                "field": "Population"
              }
            ],
+           "filesort_priority_queue_optimization": {
+             "limit": 10,
+             "rows_estimate": 939,
+             "row_size": 272,
+             "memory_available": 262144,
+             "chosen": true                     # 优先队列优化
+           },
            "filesort_execution": [
            ],
            "filesort_summary": {
              "rows": 11,
              "examined_rows": 239,
              "number_of_tmp_files": 0,
              "sort_buffer_size": 3080,
              "sort_mode": "<sort_key, additional_fields>"
            }
          }
        ]
      }
    }
  ]
}

分区

优化器能够操作分区划分。这意味着它能够分析传入的查询,将其与字典信息进行比较,然后只访问所需的表分区。

你可以将分区视为类似于视图,因为它是一个表的逻辑表示,下面是一组表。在所有查询都遵循一个通用模式的情况下,分区比索引更适合。例如:

  • 软删除。许多应用程序实现了软删除(例如:is_deleted),而通常只访问已删除或未删除的数据。
  • 多版本的模式[1]。一些应用程序具有永不删除数据的策略,为了历史目的保留较老的几代行。例如,你更新了一个用户的地址,但保留了所有以前的地址。根据地址是否过期进行分区可以帮助提高内存。
  • 时间取向。例如,按季度或财务周期划分发票。
  • 位置。例如,按可以store_idregion分区。

[1]在数据仓库中,这被称为缓慢变化的维度https://en.wikipedia.org/wiki/Slowly_changing_dimension

当大多数查询一次只需要访问分区的一个或一个小子集时,分区性能最好。如果考虑我们的示例模式,可能会出现这样的情况:对CountryLanguage表的所有查询都只针对官方支持的语言。如果为true,则按isOfficial进行分区,如下所示:

1
2
3
4
5
6
7
8
9
# IsOfficial is an ENUM, but can be converted to a CHAR at the same space
# ENUM types only support KEY partitioning
# The partitioning key must be part of the primary and all unique keys

ALTER TABLE CountryLanguage MODIFY IsOfficial CHAR(1) NOT NULL DEFAULT 'F', DROP PRIMARY KEY, ADD PRIMARY KEY(CountryCode, Language, IsOfficial);
ALTER TABLE CountryLanguage PARTITION BY LIST COLUMNS (IsOfficial) (
 PARTITION pUnofficial VALUES IN ('F'),
 PARTITION pOfficial VALUES IN ('T')
);

示例32:一个分区表被优化器修剪

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
EXPLAIN FORMAT=JSON
SELECT * FROM CountryLanguage WHERE isOfficial='T' AND CountryCode='CAN';
{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "2.40"
    },
    "table": {
+     "table_name": "CountryLanguage",
+     "partitions": [
+       "pOfficial"                       # 仅包含官方分区语言被访问
+     ],
      "access_type": "ref",
      "possible_keys": [
        "PRIMARY",
        "CountryCode"
      ],
      "key": "PRIMARY",
      "used_key_parts": [
        "CountryCode"
      ],
      "key_length": "3",
      "ref": [
        "const"
      ],
      "rows_examined_per_scan": 2,
      "rows_produced_per_join": 0,
      "filtered": "10.00",
      "cost_info": {
        "read_cost": "2.00",
        "eval_cost": "0.04",
        "prefix_cost": "2.40",
        "data_read_per_join": "8"
      },
      "used_columns": [
        "CountryCode",
        "Language",
        "IsOfficial",
        "Percentage"
      ],
      "attached_condition": "(`world`.`CountryLanguage`.`IsOfficial` = 'T')"
    }
  }
}

除了分区划分之外,MySQL还支持仅针对选定数量的分区的语法。这可以由应用程序以类似的方式用于查询提示:SELECT * FROM CountryLanguage PARTITION (pOfficial) WHERE CountryCode='CAN';

查询重写

MySQL支持在服务器端重写语句的能力。你可以将其视为类似于服务器端正则表达式,以便能够找到匹配特定模式的语句,并将它们重写为新模式。

该特性的设计目标之一是允许dba向语句中插入查询提示。在应用程序本身不能被修改的情况下(比如使用ORM的情况),或者应用程序是专有的情况下,这就提供了缓解。

应该注意的是,虽然我将查询重写描述为类似于正则表达式,但它的内部效率要高得多。在解析语句时,要根据应该重写的语句的内部哈希表来检查摘要(也称为预处理语句形式)。如果确定某个语句需要重写,服务器将执行此步骤,然后重新解析查询。这意味着不需要重写的查询成本很小,而需要重写的查询只需要解析两次。

示例33:使用查询重写更改查询服务器端

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# 从命令行安装查询重写
mysql -u root -p < install_rewriter.sql

# 向MySQL添加规则并刷新重写规则
INSERT INTO query_rewrite.rewrite_rules(pattern_database, pattern, replacement) VALUES (
"world",
"SELECT * FROM Country WHERE population > ? AND continent=?",
"SELECT * FROM Country WHERE population > ? AND continent=? LIMIT 1"
);
CALL query_rewrite.flush_rewrite_rules();

 # 每次发生重写事件时,查询重写都会触发警告
SELECT * FROM Country WHERE population > 5000000 AND continent='Asia';
SHOW WARNINGS;
Query 'SELECT * FROM Country WHERE population > 5000000 AND continent='Asia'' rewritten to 'SELECT * FROM Country WHERE population > 5000000 AND continent='Asia' LIMIT 1' by a query rewrite plugin

SELECT * FROM query_rewrite.rewrite_rules\G
*************************** 1. row ***************************
                id: 2
           pattern: SELECT * FROM Country WHERE population > ? AND continent=?
  pattern_database: world
       replacement: SELECT * FROM Country WHERE population > ? AND continent=? LIMIT 1
           enabled: YES
           message: NULL
    pattern_digest: 88876bbb502cef6efddcc661cce77deb
normalized_pattern: select `*` from `world`.`Country` where ((`population` > ?) and (`continent` = ?))

MySQL服务器支持查询重写插件。这里展示的例子是一个解析后查询重写插件(称为rewrite),它包含在MySQL服务器分发版中。某些查询可能需要预解析插件,这也是支持的。更多信息请参见MySQL手册:https://dev.mysql.com/doc/refman/5.7/en/rewriter-query-rewrite-plugin.html

隐形索引

隐形索引是MySQL 8.0中的一个新特性,它提供了将索引标记为不可用的能力。这意味着在修改数据时,索引仍将被维护并保持最新,但是不允许任何查询使用该索引(即使查询使用FORCE INDEX index_name)。

隐形索引不应该与禁用索引混淆,禁用索引是MyISAM存储引擎实现的(禁用索引会停止索引的维护)。对于隐形索引,有两个值得注意的用例:

  1. 软删除。无论何时在生产中执行破坏性操作,都希望能够在更改永久之前进行观察。把它想象成一个索引“回收站”如果你弄错了,并且正在使用索引,那么只需修改元数据就可以使索引再次可见,这比从备份中重新创建或恢复索引要快得多。例如:

    1
    
     ALTER TABLE Country ALTER INDEX c INVISIBLE;
    
  2. 分阶段上线。每当添加索引时,一定要考虑到它们可能会改变你现有的查询计划,有时是以不希望看到的方式。隐形索引提供了一个机会,可以在合适的时间上线索引,可能会远离峰值负载,并且在你能够积极观察系统的时候推出索引。例如:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    
     SELECT * FROM information_schema.statistics WHERE is_visible='NO';
     *************************** 1. row ***************************
     TABLE_CATALOG: def
     TABLE_SCHEMA: world
       TABLE_NAME: Country
       NON_UNIQUE: 1
     INDEX_SCHEMA: world
       INDEX_NAME: c
     SEQ_IN_INDEX: 1
       COLUMN_NAME: Continent
         COLLATION: A
       CARDINALITY: 7
         SUB_PART: NULL
           PACKED: NULL
         NULLABLE:
       INDEX_TYPE: BTREE
           COMMENT: disabled
     INDEX_COMMENT:
       IS_VISIBLE: NO
    

删除不需要的索引是个好主意。虽然大多数实践者都知道索引会损害修改(插入、更新)的性能,但这本身就是一种简化。它们还会损害读取的性能,因为优化器需要对它们进行评估以选择计划。

剖析查询(Profiling)

EXPLAIN只显示预执行查询成本的分析,它不提供关于查询真正执行的任何统计信息,而profile可以显示更完整的分析。例如,当优化器不能从索引中删除行(并在EXPLAIN中向表中添加一个attached_condition)时,它不知道将删除多少行。这可能会在JOIN条件中产生涓滴效应,因为对后续表的查找可能非常多,也可能很少。

通过performance_schema支持对每个执行阶段花费的时间进行初步分析,它取代了之前的SHOW PROFILES命令(该命令在MySQL中已经被弃用)。例如,我使用基于SYS数据库的SHOW PROFILES替换,可以如下安装:

1
2
3
# 使用bash执行
wget http://www.tocker.ca/files/ps-show-profiles.sql
mysql -u root -p < ps-show-profiles.sql

此文件以无法下载,但依然可以使用performance_schema分析,更多章节见MySQL官方文档

示例34:使用性能模式分析查询

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# 使用MySQL执行
CALL sys.enable_profiling();
CALL sys.show_profiles;
*************************** 1. row ***************************
Event_ID: 22
Duration: 495.02 us
   Query: SELECT * FROM Country WHERE co ... 'Asia' and population > 5000000
1 row in set (0.00 sec)

CALL sys.show_profile_for_event_id(22);
+----------------------+-----------+
| Status               | Duration  |
+----------------------+-----------+
| starting             | 64.82 us  |
| checking permissions | 4.10 us   |
| Opening tables       | 11.87 us  |
| init                 | 29.74 us  |
| System lock          | 5.63 us   |
| optimizing           | 8.74 us   |
| statistics           | 139.38 us |
| preparing            | 11.94 us  |
| executing            | 348.00 ns |
| Sending data         | 192.59 us |
| end                  | 1.17 us   |
| query end            | 4.60 us   |
| closing tables       | 4.07 us   |
| freeing items        | 13.60 us  |
| cleaning up          | 734.00 ns |
+----------------------+-----------+
15 rows in set (0.00 sec)

可以使用SLEEP()函数来演示在执行的特定阶段花费的大量时间。在这个查询中,MySQL将在每一行匹配continent='Antarctica'时休眠5秒:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
SELECT * FROM Country WHERE Continent='Antarctica' and SLEEP(5);
CALL sys.show_profiles();
CALL sys.show_profile_for_event_id(<event_id>);
+----------------------+-----------+
| Status               | Duration  |
+----------------------+-----------+
| starting             | 103.89 us |
| checking permissions | 4.48 us   |
| Opening tables       | 17.78 us  |
| init                 | 45.75 us  |
| System lock          | 8.37 us   |
| optimizing           | 11.98 us  |
| statistics           | 144.78 us |
| preparing            | 15.78 us  |
| executing            | 634.00 ns |
| Sending data         | 116.15 us |
| User sleep           | 5.00 s    |
| User sleep           | 5.00 s    |
| User sleep           | 5.00 s    |
| User sleep           | 5.00 s    |
| User sleep           | 5.00 s    |
| end                  | 2.05 us   |
| query end            | 5.63 us   |
| closing tables       | 7.30 us   |
| freeing items        | 20.19 us  |
| cleaning up          | 1.20 us   |
+----------------------+-----------+
20 rows in set (0.01 sec)

您可能会发现分析输出并不总是像它可能的那样分阶段细化。例如,状态Sending Data只是意味着在存储引擎和服务器之间传输行。重要的是,临时表和排序的执行时间被分解了:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
SELECT region, count(*) as c FROM Country GROUP BY region;
CALL sys.show_profiles();
CALL sys.show_profile_for_event_id(<event_id>);
 +----------------------+-----------+
 | Status               | Duration  |
 +----------------------+-----------+
 | starting             | 87.43 us  |
 | checking permissions | 4.93 us   |
 | Opening tables       | 17.35 us  |
 | init                 | 25.81 us  |
 | System lock          | 9.04 us   |
 | optimizing           | 3.37 us   |
 | statistics           | 18.31 us  |
 | preparing            | 10.94 us  |
+| Creating tmp table   | 35.57 us  |
+| Sorting result       | 2.38 us   |
 | executing            | 741.00 ns |
+| Sending data         | 446.03 us |
+| Creating sort index  | 49.45 us  |
 | end                  | 1.71 us   |
 | query end            | 4.85 us   |
 | removing tmp table   | 4.71 us   |
 | closing tables       | 6.12 us   |
 | freeing items        | 17.17 us  |
 | cleaning up          | 1.00 us   |
 +----------------------+-----------+
19 rows in set (0.01 sec)

除了这些帮助程序公开的分析信息之外,performance_schema还具有关于需要排序的实际行数以及发送的行数的附加统计信息。此执行级别分析补充了EXPLAIN中看到的执行前分析:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
SELECT * FROM performance_schema.events_statements_history_long
WHERE event_id=<event_id>\G
 *************************** 1. row ***************************
               THREAD_ID: 3062
                EVENT_ID: 1566
            END_EVENT_ID: 1585
              EVENT_NAME: statement/sql/select
                  SOURCE: init_net_server_extension.cc:80
             TIMER_START: 588883869566277000
               TIMER_END: 588883870317683000
              TIMER_WAIT: 751406000
               LOCK_TIME: 132000000
                SQL_TEXT: SELECT region, count(*) as c FROM Country GROUP BY region
                  DIGEST: d3a04b346fe48da4f1f5c2e06628a245
             DIGEST_TEXT: SELECT `region` , COUNT ( * ) AS `c` FROM `Country` GROUP BY `region`
          CURRENT_SCHEMA: world
             OBJECT_TYPE: NULL
           OBJECT_SCHEMA: NULL
             OBJECT_NAME: NULL
   OBJECT_INSTANCE_BEGIN: NULL
             MYSQL_ERRNO: 0
       RETURNED_SQLSTATE: NULL
            MESSAGE_TEXT: NULL
                  ERRORS: 0
                WARNINGS: 0
           ROWS_AFFECTED: 0
+              ROWS_SENT: 25
+          ROWS_EXAMINED: 289
 CREATED_TMP_DISK_TABLES: 0
      CREATED_TMP_TABLES: 1
        SELECT_FULL_JOIN: 0
  SELECT_FULL_RANGE_JOIN: 0
            SELECT_RANGE: 0
      SELECT_RANGE_CHECK: 0
             SELECT_SCAN: 1
       SORT_MERGE_PASSES: 0
              SORT_RANGE: 0
+              SORT_ROWS: 25
               SORT_SCAN: 1
           NO_INDEX_USED: 1
      NO_GOOD_INDEX_USED: 0
        NESTING_EVENT_ID: NULL
      NESTING_EVENT_TYPE: NULL
     NESTING_EVENT_LEVEL: 0

JSON和生成的列

MySQL服务器支持无模式数据存储,并具有以下特性:

  1. JSON数据类型。JSON值在插入/更新、验证时进行解析,然后以二进制优化格式存储。在读取值时,JSON数据类型不需要任何解析或验证。
  2. JSON函数。一组20多个SQL函数,用于搜索、操作和创建JSON值。
  3. 生成的列。虽然不是特别绑定到JSON,生成的列工作类似函数索引并允许提取JSON文档的一部分并建立索引。

当查询JSON数据[1]时,优化器将自动从生成的列中查找匹配的索引。在示例35中,用户首选项存储在JSON列中。最初,对请求更新通知(notify_on_updates)的用户进行查询会导致表扫描操作。通过添加带有索引的虚拟生成列,EXPLAIN表明现在可以使用索引。

[1]: 简写就是一个例子JSON_EXTRACT操作符(->)。提取字符串时,应该使用简写提取和反引号操作符(-»)。

示例35:用户偏好的无模式表示

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
CREATE TABLE users (
  id INT NOT NULL auto_increment,
  username VARCHAR(32) NOT NULL,
  preferences JSON NOT NULL,
  PRIMARY KEY (id),
  UNIQUE (username)
);

INSERT INTO users
 (id,username,preferences)
VALUES
 (NULL, 'morgan', '{"layout": "horizontal", "warn_before_delete": false, "notify_on_updates": true}'),
 (NULL, 'wes', '{"layout": "horizontal", "warn_before_delete": false, "notify_on_updates": false}'),
 (NULL, 'jasper', '{"layout": "horizontal", "warn_before_delete": false, "notify_on_updates": false}'),
 (NULL, 'gus', '{"layout": "horizontal", "warn_before_delete": false, "notify_on_updates": false}'),
 (NULL, 'olive', '{"layout": "horizontal", "warn_before_delete": false, "notify_on_updates": false}');

EXPLAIN FORMAT=JSON
SELECT * FROM users WHERE preferences->"$.notify_on_updates" = true;
{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "2.00"
    },
    "table": {
+     "table_name": "users",
+     "access_type": "ALL",
      "rows_examined_per_scan": 5,
      "rows_produced_per_join": 5,
      "filtered": "100.00",
      "cost_info": {
        "read_cost": "1.00",
        "eval_cost": "1.00",
        "prefix_cost": "2.00",
        "data_read_per_join": "280"
      },
      "used_columns": [
        "id",
        "username",
        "preferences"
      ],
      "attached_condition": "(json_extract(`test`.`users`.`preferences`,'$.notify_on_updates') = TRUE)"
    }
  }
}

示例36:添加带有索引的虚拟生成列

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
ALTER TABLE users ADD notify_on_updates TINYINT AS (preferences->"$.notify_on_updates"),
 ADD INDEX(notify_on_updates);

EXPLAIN FORMAT=JSON SELECT * FROM users WHERE preferences->"$.notify_on_updates" = true;
{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "1.20"
    },
    "table": {
+     "table_name": "users",
+     "access_type": "ref",
      "possible_keys": [
        "notify_on_updates"
      ],
      "key": "notify_on_updates",
      "used_key_parts": [
        "notify_on_updates"
      ],
      "key_length": "2",
      "ref": [
        "const"
      ],
      "rows_examined_per_scan": 1,
      "rows_produced_per_join": 1,
      "filtered": "100.00",
      "cost_info": {
        "read_cost": "1.00",
        "eval_cost": "0.20",
        "prefix_cost": "1.20",
        "data_read_per_join": "56"
      },
      "used_columns": [
        "id",
        "username",
        "preferences",
        "notify_on_updates"
      ]
    }
  }
}

字符集

MySQL 8.0支持最新的Unicode 9.0版本,其名称为utf8mb4utf8mb4是可变长度的,每个字符占用1到4个字节。在字节和字符之间有许多命名,它们是可变长度的:

  1. 将列创建为VARCHAR(n)时,n表示字符长度。字节存储要求可能高达4倍(但通常更少)。
  2. InnoDB存储引擎在内部始终[1]将utf8mb4存储为VARCHARCHARTEXT数据类型的可变长度(在索引和表行中)。
  3. 在内存中用作物化的一部分临时表是固定长度的。这可能会导致临时表更大,或者在使用utf8mb4字符集时更早溢出到磁盘的情况。
  4. 用于对数据进行排序的缓冲区是可变长度的(从 MySQL 5.7 开始)。
  5. EXPLAIN将始终显示可变长度索引的最大长度(字节长度)。通常存储要求要低得多。

[1]:在使用DYNAMICCOMPACTCOMPRESSED行格式时。使用早期的REDUNDANT行格式通常没有实际用途。

示例37显示了EXPLAIN报告使用拉丁字符集CHAR(52)类型的列上的索引。在将表转换为utf8mb4之后,表的存储需求没有增加,但是EXPLAIN现在显示key_length增加了。

示例37:EXPLAIN显示索引的最大键长度(拉丁字符集)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
EXPLAIN FORMAT=JSON
SELECT * FROM Country WHERE name='Canada';
{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "1.20"
    },
    "table": {
      "table_name": "Country",
      "access_type": "ref",
      "possible_keys": [
        "Name"
      ],
      "key": "Name",
      "used_key_parts": [
        "Name"
      ],
+     "key_length": "52",  # CHAR(52)
      "ref": [
        "const"
      ],
      "rows_examined_per_scan": 1,
      "rows_produced_per_join": 1,
      "filtered": "100.00",
      "cost_info": {
        "read_cost": "1.00",
        "eval_cost": "0.20",
        "prefix_cost": "1.20",
        "data_read_per_join": "264"
      },
      "used_columns": [
        "Code",
        "Name",
        "Continent",
        "Region",
        "SurfaceArea",
        "IndepYear",
        "Population",
        "LifeExpectancy",
        "GNP",
        "GNPOld",
        "LocalName",
        "GovernmentForm",
        "HeadOfState",
        "Capital",
        "Code2"
      ]
    }
  }
}

示例38:EXPLAIN显示索引的最大键长度(utf8mb4字符集)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
ALTER TABLE Country CONVERT TO CHARACTER SET utf8mb4;
EXPLAIN FORMAT=JSON
SELECT * FROM Country WHERE name='Canada';
{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "1.20"
    },
    "table": {
      "table_name": "Country",
      "access_type": "ref",
      "possible_keys": [
        "Name"
      ],
      "key": "Name",
      "used_key_parts": [
        "Name"
      ],
+     "key_length": "208",            # CHAR(52) * 4 = 208
      "ref": [
        "const"
      ],
      "rows_examined_per_scan": 1,
      "rows_produced_per_join": 1,
      "filtered": "100.00",
      "cost_info": {
        "read_cost": "1.00",
        "eval_cost": "0.20",
        "prefix_cost": "1.20",
        "data_read_per_join": "968"
      },
      "used_columns": [
        "Code",
        "Name",
        "Continent",
        "Region",
        "SurfaceArea",
        "IndepYear",
        "Population",
        "LifeExpectancy",
        "GNP",
        "GNPOld",
        "LocalName",
        "GovernmentForm",
        "HeadOfState",
        "Capital",
        "Code2"
      ]
    }
  }
}
知识共享许可协议 本文由作者按照 CC BY-SA 4.0 进行授权