TiDB与MySQL特定语句执行效果对比研究之一

网友投稿 693 2024-03-26

一、引言

MySQL 与TiDB 都是开源数据库里面使用比较广泛的数据库，在日常的使用中，也会习惯性的对一些语句的执行过程和执行效率做下对比，接下来笔者就以一些特定的语句来对比下两款数据库优化器的执行过程和效率

备注:笔者测试语句的orders表数据来源均来自tpch，可以由tiup bench tpch ...导入生成这些数据

测试的TiDB版本为6.5.3,mysql 版本为8.0.30

二、count(distinct) 语句执行效果

假设有这么一条语句,非常简单就是对O_TOTALPRICE 字段进行去重然后聚合(O_TOTALPRICE 字段有索引)

select count(distinctO_TOTALPRICE) from orders;

我们看下在MySQL上的执行计划(O_TOTALPRICE 字段有索引)

mysql> explain select count(distinct O_TOTALPRICE) from orders; +----+-------------+--------+------------+-------+---------------------------------------------------+--------------------+---------+------+---------+----------+-------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+--------+------------+-------+---------------------------------------------------+--------------------+---------+------+---------+----------+-------------------------------------+ | 1 | SIMPLE | orders | NULL | range | index_O_ORDERDATE_O_TOTALPRICE,index_O_TOTALPRICE | index_O_TOTALPRICE | 7 | NULL | 1462376 | 100.00 | Using index for group-by (scanning) | +----+-------------+--------+------------+-------+---------------------------------------------------+--------------------+---------+------+---------+----------+-------------------------------------+ 1 row in set, 1 warning (0.01 sec) mysql> explain analyze select count(distinct O_TOTALPRICE) from orders; +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | EXPLAIN | +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | -> Aggregate: count(distinct orders.O_TOTALPRICE) (cost=292475.20 rows=1) (actual time=1611.925..1611.926 rows=1 loops=1) -> Covering index skip scan for deduplication onordersusing index_O_TOTALPRICE (cost=146237.60 rows=1462376) (actual time=17.200..1563.704 rows=1464556 loops=1) | +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (1.61 sec)

通过上述的执行计划分析，我们可以看到索引的级别是range 级别，并且 Extra 出现了Using index for group-by (scanning)

接下来我们再对比下TiDB的执行计划(O_TOTALPRICE 字段有索引)

mysql> explain select count(distinct O_TOTALPRICE) from orders; +--------------------------+------------+-----------+------------------------------------------------------+------------------------------------------------------------+ | id | estRows | task | access object | operator info | +--------------------------+------------+-----------+------------------------------------------------------+------------------------------------------------------------+ | StreamAgg_6 | 1.00 | root | | funcs:count(distinct tpch2.orders.o_totalprice)->Column#10 | | └─IndexReader_14 | 1498900.00 | root | | index:IndexFullScan_13 | | └─IndexFullScan_13 | 1498900.00 | cop[tikv] | table:orders, index:index_O_TOTALPRICE(O_TOTALPRICE) | keep order:false | +--------------------------+------------+-----------+------------------------------------------------------+------------------------------------------------------------+ 3 rows in set (0.01 sec) mysql> explain analyze select count(distinct O_TOTALPRICE) from orders; +--------------------------+------------+---------+-----------+------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------+---------+------+ | id | estRows | actRows | task | access object | execution info | operator info | memory | disk | +--------------------------+------------+---------+-----------+------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------+---------+------+ | StreamAgg_6 | 1.00 | 1 | root | | time:1.32s, loops:2 | funcs:count(distincttpch2.orders.o_totalprice)->Column#10 | 64.0 MB | N/A | | └─IndexReader_14 | 1498900.00 | 1498900 | root | | time:764.6ms, loops:1470, cop_task: {num: 56, max: 74.9ms, min: 1.26ms, avg: 30.3ms, p95: 60.3ms, max_proc_keys: 50144, p95_proc_keys: 50144, tot_proc: 728ms, tot_wait: 5ms, rpc_num: 59, rpc_time: 1.7s, copr_cache_hit_ratio: 0.00, distsql_concurrency: 15}, backoff{regionMiss: 2ms} | index:IndexFullScan_13 | 3.83 MB | N/A | | └─IndexFullScan_13 | 1498900.00 | 1498900 | cop[tikv] | table:orders, index:index_O_TOTALPRICE(O_TOTALPRICE) | tikv_task:{proc max:26ms, min:0s, avg: 12ms, p80:23ms, p95:26ms, iters:1680, tasks:56}, scan_detail: {total_process_keys: 1498900, total_process_keys_size: 68949400, total_keys: 1498956, get_snapshot_time: 6.56ms, rocksdb: {key_skipped_count: 1498900, block: {cache_hit_count: 348, read_count: 1105, read_byte: 9.34 MB, read_time: 6.72ms}}} | keep order:false | N/A | N/A | +--------------------------+------------+---------+-----------+------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------+---------+------+ 3 rows in set (1.33 sec) mysql> select count( O_TOTALPRICE) from orders; +----------------------+ | count( O_TOTALPRICE) | +----------------------+ | 1498900 | +----------------------+ 1 row in set (0.51 sec)

我们发现TiDB和MySQL一样虽然说都用到了O_TOTALPRICE 字段的索引，但是TiDB却把整个索引的数据全扫描了一遍了，而MySQL的却只对索引进行range 级别的索引扫描，从这一点上,笔者认为TiDB 这一点上应该可以改进

对于count(distinct O_TOTALPRICE) 这种语句，其实MySQL是有自己的优化策略的，这种优化策略叫做

Loose Index Scan,如果用了Loose Index Scan，那么执行计划会显示 Using index for group-by (scanning)。

其实对于count(distinct O_TOTALPRICE) 这种语句其实本质是要统计这个字段去重后还剩下多少个值，那也就是说在执行过程中如果是遇到不同的值加1就行了，也就是说对于相同的值扫描1个就行了，没必要每个值都进行扫描，而基于此MySQL的Loose Index Scan刚好可以利用这个特性，来达到跳跃扫描的目的，只需要扫描索引的部分数据就可以达到目的。

官方文档对此也有说明:

https://dev.mysql.com/doc/refman/8.0/en/group-by-optimization.html#loose-index-scan

三、count(*) from (.... order by)优化器的改写效果

假设有这么一条语句

select count(*) from (select * from orders order by O_TOTALPRICE)t;

当然这条语句有些地方写的是多余的，count(*) 这种聚合反正只返回一条数据，根本没必要order by xxx，所以直接select count(*) from orders 就可以了，但是这样的语句谁也没法说就一定不会出现，如果出现了那么我们希望在优化器这一层直接把语句改写了，毕竟order by在不能用到索引的情况下,代价还是很昂贵的,下面我们来看下，这2条语句在MySQL和TiDB 上的执行效果、

MySQL上的执行效果

mysql> explain select count(*) from (select * from orders order by O_TOTALPRICE)t; +----+-------------+--------+------------+-------+---------------+--------------------------------+---------+------+---------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+--------+------------+-------+---------------+--------------------------------+---------+------+---------+----------+-------------+ | 1 | SIMPLE | orders | NULL | index | NULL | index_O_ORDERDATE_O_TOTALPRICE | 10 | NULL | 1139400 | 100.00 | Using index | +----+-------------+--------+------------+-------+---------------+--------------------------------+---------+------+---------+----------+-------------+ 1 row in set, 1 warning (0.01 sec) mysql> show warnings; +-------+------+--------------------------------------------------------------------+ | Level | Code | Message | +-------+------+--------------------------------------------------------------------+ | Note | 1003 | /* select#1 */ select count(0) AS `count(*)` from `tpch2`.`orders` | +-------+------+--------------------------------------------------------------------+ 1 row in set (0.00 sec) mysql> explain analyze select count(*) from (select * from orders order byO_TOTALPRICE)t; +-------------------------------------------------------------------------+ | EXPLAIN | +-------------------------------------------------------------------------+ | -> Count rows in orders (actual time=490.758..490.758 rows=1 loops=1) | +-------------------------------------------------------------------------+ 1 row in set (0.49 sec)

在MySQL上我们可以看到优化器直接将这条语句改写为了

select count(0) AS `count(*)` from `tpch2`.`orders`

这种改写是符合我们的预期的，避免了由于排序和子查询的带来的额外开销

TiDB上的执行效果

mysql> explain select count(*) from (select * from orders order by O_TOTALPRICE)t; +----------------------------+------------+-----------+---------------+---------------------------+ | id | estRows | task | access object | operator info | +----------------------------+------------+-----------+---------------+---------------------------+ | StreamAgg_8 | 1.00 | root | | funcs:count(1)->Column#10 | | └─Sort_13 | 1498900.00 | root | | tpch2.orders.o_totalprice | | └─TableReader_12 | 1498900.00 | root | | data:TableFullScan_11| | └─TableFullScan_11 | 1498900.00 | cop[tikv] | table:orders | keep order:false | +----------------------------+------------+-----------+---------------+---------------------------+ 4 rows in set (0.00 sec) mysql> explain analyze select count(*) from (select * from orders order by O_TOTALPRICE)t -> ; +----------------------------+------------+---------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------+---------+---------+ | id | estRows | actRows | task | access object | execution info | operator info | memory | disk | +----------------------------+------------+---------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------+---------+---------+ |StreamAgg_8| 1.00 | 1 | root | | time:3.22s, loops:2 | funcs:count(1)->Column#10 | 40.3 KB | N/A | | └─Sort_13 | 1498900.00 | 1498900 | root | | time:3.2s, loops:1465 | tpch2.orders.o_totalprice | 69.1 MB | 0 Bytes | | └─TableReader_12 | 1498900.00 | 1498900 | root | | time:611ms, loops:1468, cop_task: {num: 55, max: 86.3ms, min: 1.13ms, avg: 26.5ms,p95:73.3ms, max_proc_keys: 50144, p95_proc_keys: 50144, tot_proc: 864ms, tot_wait: 9ms, rpc_num: 55, rpc_time: 1.45s, copr_cache_hit_ratio: 0.00, distsql_concurrency: 15} | data:TableFullScan_11 | 3.83 MB | N/A | | └─TableFullScan_11 | 1498900.00 | 1498900 | cop[tikv] | table:orders | tikv_task:{proc max:52ms, min:0s, avg: 14.4ms, p80:25ms, p95:48ms, iters:1682, tasks:55}, scan_detail: {total_process_keys: 1498900, total_process_keys_size: 227047584, total_keys: 1500055, get_snapshot_time: 4.45ms, rocksdb: {key_skipped_count: 1500000, block: {cache_hit_count: 4139}}} | keep order:false | N/A | N/A | +----------------------------+------------+---------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------+---------+---------+ 4 rows in set (3.22 sec)

然后在TiDB上从执行计划来看我们并没有看到这条语句被改写了，执行了额外的排序,笔者个人认为在一些特定的场景下面，TiDB优化器的改写功能还是要增强的

四、index join 的执行差异

语句

select a.C_NAME,b.price from customer a left join (select o_custkey,sum(O_TOTALPRICE) as price from orders group by o_custkey) b on a.C_CUSTKEY=b.O_CUSTKEY where a.C_PHONE=23-768-687-3665;

MySQL的执行计划，可以看到MySQL是可以走index join的

mysql> explain select a.C_NAME,b.price from customer a left join (select o_custkey,sum(O_TOTALPRICE) as price from orders group by o_custkey) b on a.C_CUSTKEY=b.O_CUSTKEY where a.C_PHONE=23-768-687-3665; +----+-------------+------------+------------+-------+-----------------+-----------------+---------+-------------------+---------+--------

标签：TiDB 数据库开源数据库

TiDB与MySQL特定语句执行效果对比研究之一

一、引言

二、count(distinct) 语句执行效果

三、count(*) from (.... order by)优化器的改写效果

四、index join 的执行差异

黄东旭解析 TiDB 的核心优势

黄东旭关于基础软件产品价值的思考

麒麟v10 上部署 TiDB v5.1.2 生产环境优化实践

推荐文章

HTAP 还可以这么玩？丨TiDB 在 IoT 智慧园区的应用

新特性解析丨TiDB 资源管控的设计思路与场景解析

TiDB赋能保险业-首个全栈自主核心保单系统成功投产

首个云原生、分布式、全栈国产化银行核心业务系统投产上线丨TiDB × 杭州银行

TiDB 在社交场景的解决方案实践

电商数据技术栈，在海量数据增长下如何实现实时与全量兼得？

金融行业数据库的选择

TiDB 在智能制造中的应用实践

TiDB 在全球头部物流企业计费管理系统的应用实践

PingCAP与教育部教育管理信息中心合作，推动普惠教育数字化转型

友情链接

热评文章

TiDB 中标杭州银行核心系统数据库项目

TiDB 首批通过信通院 HTAP 数据库基础能力评

PingCAP 与 Wisconsin-Madiso

PingCAP 成为中国唯一入选 Forrester

TiDB 走进东软集团，共建医疗数字化基石

共享开源技术，共建开放生态丨平凯星辰余梦杰出席 20