黄东旭解析 TiDB 的核心优势
442
2024-03-10
MySQL 与TiDB 都是开源数据库里面使用比较广泛的数据库,在日常的使用中,也会习惯性的对一些语句的执行过程和执行效率做下对比,接下来笔者就以一些特定的语句来对比下两款数据库优化器的执行过程和效率
备注:笔者测试语句的orders表数据来源均来自tpch,可以由tiup bench tpch ...导入生成这些数据
测试的TiDB版本为6.5.3,mysql 版本为8.0.30
假设有这么一条语句,非常简单就是对O_TOTALPRICE 字段进行去重然后聚合(O_TOTALPRICE 字段有索引)
select count(distinctO_TOTALPRICE) from orders;我们看下在MySQL上的执行计划(O_TOTALPRICE 字段有索引)
mysql> explain select count(distinct O_TOTALPRICE) from orders; +----+-------------+--------+------------+-------+---------------------------------------------------+--------------------+---------+------+---------+----------+-------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+--------+------------+-------+---------------------------------------------------+--------------------+---------+------+---------+----------+-------------------------------------+ | 1 | SIMPLE | orders | NULL | range | index_O_ORDERDATE_O_TOTALPRICE,index_O_TOTALPRICE | index_O_TOTALPRICE | 7 | NULL | 1462376 | 100.00 | Using index for group-by (scanning) | +----+-------------+--------+------------+-------+---------------------------------------------------+--------------------+---------+------+---------+----------+-------------------------------------+ 1 row in set, 1 warning (0.01 sec) mysql> explain analyze select count(distinct O_TOTALPRICE) from orders; +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | EXPLAIN | +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | -> Aggregate: count(distinct orders.O_TOTALPRICE) (cost=292475.20 rows=1) (actual time=1611.925..1611.926 rows=1 loops=1) -> Covering index skip scan for deduplication onordersusing index_O_TOTALPRICE (cost=146237.60 rows=1462376) (actual time=17.200..1563.704 rows=1464556 loops=1) | +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (1.61 sec)通过上述的执行计划分析,我们可以看到索引的级别是range 级别,并且 Extra 出现了Using index for group-by (scanning)
接下来我们再对比下TiDB的执行计划(O_TOTALPRICE 字段有索引)
mysql> explain select count(distinct O_TOTALPRICE) from orders; +--------------------------+------------+-----------+------------------------------------------------------+------------------------------------------------------------+ | id | estRows | task | access object | operator info | +--------------------------+------------+-----------+------------------------------------------------------+------------------------------------------------------------+ | StreamAgg_6 | 1.00 | root | | funcs:count(distinct tpch2.orders.o_totalprice)->Column#10 | | └─IndexReader_14 | 1498900.00 | root | | index:IndexFullScan_13 | | └─IndexFullScan_13 | 1498900.00 | cop[tikv] | table:orders, index:index_O_TOTALPRICE(O_TOTALPRICE) | keep order:false | +--------------------------+------------+-----------+------------------------------------------------------+------------------------------------------------------------+ 3 rows in set (0.01 sec) mysql> explain analyze select count(distinct O_TOTALPRICE) from orders; +--------------------------+------------+---------+-----------+------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------+---------+------+ | id | estRows | actRows | task | access object | execution info | operator info | memory | disk | +--------------------------+------------+---------+-----------+------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------+---------+------+ | StreamAgg_6 | 1.00 | 1 | root | | time:1.32s, loops:2 | funcs:count(distincttpch2.orders.o_totalprice)->Column#10 | 64.0 MB | N/A | | └─IndexReader_14 | 1498900.00 | 1498900 | root | | time:764.6ms, loops:1470, cop_task: {num: 56, max: 74.9ms, min: 1.26ms, avg: 30.3ms, p95: 60.3ms, max_proc_keys: 50144, p95_proc_keys: 50144, tot_proc: 728ms, tot_wait: 5ms, rpc_num: 59, rpc_time: 1.7s, copr_cache_hit_ratio: 0.00, distsql_concurrency: 15}, backoff{regionMiss: 2ms} | index:IndexFullScan_13 | 3.83 MB | N/A | | └─IndexFullScan_13 | 1498900.00 | 1498900 | cop[tikv] | table:orders, index:index_O_TOTALPRICE(O_TOTALPRICE) | tikv_task:{proc max:26ms, min:0s, avg: 12ms, p80:23ms, p95:26ms, iters:1680, tasks:56}, scan_detail: {total_process_keys: 1498900, total_process_keys_size: 68949400, total_keys: 1498956, get_snapshot_time: 6.56ms, rocksdb: {key_skipped_count: 1498900, block: {cache_hit_count: 348, read_count: 1105, read_byte: 9.34 MB, read_time: 6.72ms}}} | keep order:false | N/A | N/A | +--------------------------+------------+---------+-----------+------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------+---------+------+ 3 rows in set (1.33 sec) mysql> select count( O_TOTALPRICE) from orders; +----------------------+ | count( O_TOTALPRICE) | +----------------------+ | 1498900 | +----------------------+ 1 row in set (0.51 sec)我们发现TiDB和MySQL一样虽然说都用到了O_TOTALPRICE 字段的索引,但是TiDB却把整个索引的数据全扫描了一遍了,而MySQL的却只对索引进行range 级别的索引扫描,从这一点上,笔者认为TiDB 这一点上应该可以改进
对于count(distinct O_TOTALPRICE) 这种语句,其实MySQL是有自己的优化策略的,这种优化策略叫做
Loose Index Scan,如果用了Loose Index Scan,那么执行计划会显示 Using index for group-by (scanning)。
其实对于count(distinct O_TOTALPRICE) 这种语句其实本质是要统计这个字段去重后还剩下多少个值,那也就是说在执行过程中如果是遇到不同的值加1就行了,也就是说对于相同的值扫描1个就行了,没必要每个值都进行扫描,而基于此MySQL的Loose Index Scan刚好可以利用这个特性,来达到跳跃扫描的目的,只需要扫描索引的部分数据就可以达到目的。
官方文档对此也有说明:
https://dev.mysql.com/doc/refman/8.0/en/group-by-optimization.html#loose-index-scan
假设有这么一条语句
select count(*) from (select * from orders order by O_TOTALPRICE)t;当然这条语句有些地方写的是多余的,count(*) 这种聚合反正只返回一条数据,根本没必要order by xxx,所以直接select count(*) from orders 就可以了,但是这样的语句谁也没法说就一定不会出现,如果出现了那么我们希望在优化器这一层直接把语句改写了,毕竟order by在不能用到索引的情况下,代价还是很昂贵的,下面我们来看下,这2条语句在MySQL和TiDB 上的执行效果、
MySQL上的执行效果
mysql> explain select count(*) from (select * from orders order by O_TOTALPRICE)t; +----+-------------+--------+------------+-------+---------------+--------------------------------+---------+------+---------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+--------+------------+-------+---------------+--------------------------------+---------+------+---------+----------+-------------+ | 1 | SIMPLE | orders | NULL | index | NULL | index_O_ORDERDATE_O_TOTALPRICE | 10 | NULL | 1139400 | 100.00 | Using index | +----+-------------+--------+------------+-------+---------------+--------------------------------+---------+------+---------+----------+-------------+ 1 row in set, 1 warning (0.01 sec) mysql> show warnings; +-------+------+--------------------------------------------------------------------+ | Level | Code | Message | +-------+------+--------------------------------------------------------------------+ | Note | 1003 | /* select#1 */ select count(0) AS `count(*)` from `tpch2`.`orders` | +-------+------+--------------------------------------------------------------------+ 1 row in set (0.00 sec) mysql> explain analyze select count(*) from (select * from orders order byO_TOTALPRICE)t; +-------------------------------------------------------------------------+ | EXPLAIN | +-------------------------------------------------------------------------+ | -> Count rows in orders (actual time=490.758..490.758 rows=1 loops=1) | +-------------------------------------------------------------------------+ 1 row in set (0.49 sec)在MySQL上我们可以看到优化器直接将这条语句改写为了
select count(0) AS `count(*)` from `tpch2`.`orders`这种改写是符合我们的预期的,避免了由于排序和子查询的带来的额外开销
TiDB上的执行效果
mysql> explain select count(*) from (select * from orders order by O_TOTALPRICE)t; +----------------------------+------------+-----------+---------------+---------------------------+ | id | estRows | task | access object | operator info | +----------------------------+------------+-----------+---------------+---------------------------+ | StreamAgg_8 | 1.00 | root | | funcs:count(1)->Column#10 | | └─Sort_13 | 1498900.00 | root | | tpch2.orders.o_totalprice | | └─TableReader_12 | 1498900.00 | root | | data:TableFullScan_11| | └─TableFullScan_11 | 1498900.00 | cop[tikv] | table:orders | keep order:false | +----------------------------+------------+-----------+---------------+---------------------------+ 4 rows in set (0.00 sec) mysql> explain analyze select count(*) from (select * from orders order by O_TOTALPRICE)t -> ; +----------------------------+------------+---------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------+---------+---------+ | id | estRows | actRows | task | access object | execution info | operator info | memory | disk | +----------------------------+------------+---------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------+---------+---------+ |StreamAgg_8| 1.00 | 1 | root | | time:3.22s, loops:2 | funcs:count(1)->Column#10 | 40.3 KB | N/A | | └─Sort_13 | 1498900.00 | 1498900 | root | | time:3.2s, loops:1465 | tpch2.orders.o_totalprice | 69.1 MB | 0 Bytes | | └─TableReader_12 | 1498900.00 | 1498900 | root | | time:611ms, loops:1468, cop_task: {num: 55, max: 86.3ms, min: 1.13ms, avg: 26.5ms,p95:73.3ms, max_proc_keys: 50144, p95_proc_keys: 50144, tot_proc: 864ms, tot_wait: 9ms, rpc_num: 55, rpc_time: 1.45s, copr_cache_hit_ratio: 0.00, distsql_concurrency: 15} | data:TableFullScan_11 | 3.83 MB | N/A | | └─TableFullScan_11 | 1498900.00 | 1498900 | cop[tikv] | table:orders | tikv_task:{proc max:52ms, min:0s, avg: 14.4ms, p80:25ms, p95:48ms, iters:1682, tasks:55}, scan_detail: {total_process_keys: 1498900, total_process_keys_size: 227047584, total_keys: 1500055, get_snapshot_time: 4.45ms, rocksdb: {key_skipped_count: 1500000, block: {cache_hit_count: 4139}}} | keep order:false | N/A | N/A | +----------------------------+------------+---------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------+---------+---------+ 4 rows in set (3.22 sec)然后在TiDB上从执行计划来看我们并没有看到这条语句被改写了,执行了额外的排序,笔者个人认为在一些特定的场景下面,TiDB优化器的改写功能还是要增强的
语句
select a.C_NAME,b.price from customer a left join (select o_custkey,sum(O_TOTALPRICE) as price from orders group by o_custkey) b on a.C_CUSTKEY=b.O_CUSTKEY where a.C_PHONE=23-768-687-3665;MySQL的执行计划,可以看到MySQL是可以走index join的
mysql> explain select a.C_NAME,b.price from customer a left join (select o_custkey,sum(O_TOTALPRICE) as price from orders group by o_custkey) b on a.C_CUSTKEY=b.O_CUSTKEY where a.C_PHONE=23-768-687-3665; +----+-------------+------------+------------+-------+-----------------+-----------------+---------+-------------------+---------+--------版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。