亿级多表关联在TiFlash的优化实践 从无到快速执行仅需2.59秒

网友投稿 238 2024-03-04



前段时间遇到一个SQL测试用例,典型的OLAP分析查询语句—4表关联,其中2张表数据量为亿级,查询结果集为百万级,希望在TiDB中测试一下大致性能预期。由于无法提供真实数据,基于表结构模拟造数测试。基于SQL用例,选择TiFlash引擎进行测试,从初始情况下无法跑出结果到最终优化到2.59秒,本文详细描述一下测试过程及优化经验。

亿级多表关联在TiFlash的优化实践 从无到快速执行仅需2.59秒

准备工作

测试环境准备

所选择测试环境为三节点的TiDB集群,集群资源及组件部署情况如下:

默认参数调整

考虑到需要批量快速模拟造数,以及SQL中包括group_concat拼接功能,我们需要提前修改以下系统变量。

表结构及造数

测试语句共使用4张表,其中2张表模拟1亿条数据,2张表模拟1千条数据,SQL语句查询结果集为100万条左右。为了简便起见,模拟造数使用TiDB兼容的CTE递归语句批量生成。同时,为了避免大事务的影响,使用BATCH语句将大批量INSERT写入内部拆分为多个事务进行。

SQL语句

SQL语句为4表关联语句,两张亿级表按条件关联,另外两张1000行的表使用group_concat拼接为1行记录后与大表关联结果进行Json匹配,整体结果集为百万条记录级别。初始SQL语句如下:

SELECT vehicle_no, policy_expire_date, city_code FROM (SELECT vehicle_no, policy_expire_date, city_code, IFNULL( JSON_EXTRACT(day_info, concat($.C,city_code)), JSON_EXTRACT( day_info, concat($.C,substring(city_code, 1, 2), 0000) ) ) AS biz_begin_date_str FROM (SELECT t1.vehicle_no AS vehicle_no, t2.policy_expire_date AS policy_expire_date, IFNULL( JSON_EXTRACT( t3.city_info, concat($., substring(t1.vehicle_no, 1, 2)) ), JSON_EXTRACT( t3.city_info, concat($., substring(t1.vehicle_no, 1, 1), _) ) ) AS city_code, day_info FROM (SELECT c1 AS vehicle_no, c2 AS vin_no FROM c_t1 ) t1 JOIN (SELECT c1 AS vehicle_no, c2 AS policy_expire_date FROM c_t4 ) t2 ON t1.vehicle_no = t2.vehicle_no OR t1.vin_no = t2.vehicle_no JOIN (SELECT CONCAT( {, GROUP_CONCAT( ", replace(c9, *, _), ":, c1, ), } ) AS city_info FROM c_t2 ) t3 ON 1 = 1 JOIN (SELECT CONCAT( {, GROUP_CONCAT( ", concat(C,c1), ": , c2, ), } ) AS day_info FROM c_t3 ) t4 ON 1 = 1 ) t WHERE IFNULL(city_code, ) != ) tt WHERE now() BETWEEN date_add(policy_expire_date, interval -60 day) AND policy_expire_date AND IF( biz_begin_date_str is null OR biz_begin_date_str = , 0, biz_begin_date_str ) is NOT null

SQL优化过程

初始执行—无法查询结果

使用原始语句执行,等待数小时后,无法执行出结果。

使用explain查看执行计划,

+----------------------------------------------------+---------------------+--------------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | id | estRows | task | access object | operator info | +----------------------------------------------------+---------------------+--------------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Projection_26 | 6400000000000000.00 | root | | za.c_t1.c1, za.c_t4.c2, Column#42 | | └─Selection_28 | 6400000000000000.00 | root | | not(isnull(if(or(isnull(cast(Column#43, var_string(16777216))), eq(Column#43, cast("", json BINARY))), "0", cast(Column#43, var_string(16777216))))) | | └─Projection_30 | 8000000000000000.00 | root | | za.c_t1.c1, za.c_t4.c2, Column#42, ifnull(json_extract(cast(Column#41, json BINARY), concat($.C, cast(Column#42, var_string(16777216)))), json_extract(cast(Column#41, json BINARY), concat($.C, substring(cast(Column#42, var_string(16777216)), 1, 2), 0000)))->Column#43 | | └─Projection_32 | 8000000000000000.00 | root | | za.c_t1.c1, za.c_t4.c2, ifnull(json_extract(cast(Column#30, json BINARY), concat($., substring(za.c_t1.c1, 1, 2))), json_extract(cast(Column#30, json BINARY), concat($., substring(za.c_t1.c1, 1, 1), _)))->Column#42, Column#41 | | └─Projection_34 | 8000000000000000.00 | root | | za.c_t1.c1, za.c_t4.c2, Column#30, Column#41 | | └─HashJoin_38 | 8000000000000000.00 | root | | CARTESIAN inner join, other cond:ne(ifnull(cast(ifnull(json_extract(cast(Column#30, json BINARY), concat("$.", substring(za.c_t1.c1, 1, 2))), json_extract(cast(Column#30, json BINARY), concat("$.", substring(za.c_t1.c1, 1, 1), "_"))), var_string(16777216)), ""), "") | | ├─HashJoin_48(Build) | 1.00 | root | | CARTESIAN inner join | | │ ├─Projection_77(Build) | 1.00 | root | | concat({, Column#40, })->Column#41 | | │ │ └─HashAgg_94 | 1.00 | root | | funcs:group_concat(Column#53 separator ",")->Column#40 | | │ │ └─TableReader_96 | 1.00 | root | | MppVersion: 2, data:ExchangeSender_95 | | │ │ └─ExchangeSender_95 | 1.00 | mpp[tiflash] | | ExchangeType: PassThrough | | │ │ └─HashAgg_82 | 1.00 | mpp[tiflash] | | funcs:group_concat(" "", Column#57, "": ", Column#58, "" separator ",")->Column#53 | | │ │ └─Projection_130 | 1001.00 | mpp[tiflash] | | concat(C, za.c_t3.c1)->Column#57, cast(za.c_t3.c2, var_string(20))->Column#58 | | │ │ └─TableFullScan_93 | 1001.00 | mpp[tiflash] | table:c_t3 | keep order:false | | │ └─Projection_50(Probe) | 1.00 | root | | concat({, Column#29, })->Column#30 | | │ └─HashAgg_66 | 1.00 | root | | funcs:group_concat(Column#52 separator ",")->Column#29 | | │ └─TableReader_68 | 1.00 | root | | MppVersion: 2, data:ExchangeSender_67 | | │ └─ExchangeSender_67 | 1.00 | mpp[tiflash] | | ExchangeType: PassThrough | | │ └─HashAgg_55 | 1.00 | mpp[tiflash] | | funcs:group_concat(""", Column#55, "":", Column#56, "" separator ",")->Column#52 | | │ └─Projection_129 | 1000.00 | mpp[tiflash] | | replace(za.c_t2.c9, *, _)->Column#55, cast(za.c_t2.c1, var_string(20))->Column#56 | | │ └─TableFullScan_43 | 1000.00 | mpp[tiflash] | table:c_t2 | keep order:false | | └─HashJoin_110(Probe) | 8000000000000000.00 | root | | CARTESIAN inner join, other cond:or(eq(za.c_t1.c1, za.c_t4.c1), eq(za.c_t1.c2, za.c_t4.c1)) | | ├─TableReader_118(Build) | 80000000.00 | root | | MppVersion: 2, data:ExchangeSender_117 | | │ └─ExchangeSender_117 | 80000000.00 | mpp[tiflash] | | ExchangeType: PassThrough | | │ └─Selection_116 | 80000000.00 | mpp[tiflash] | | ge(2024-02-02 16:45:26, cast(date_add(za.c_t4.c2, -60, "DAY"), datetime(6) BINARY)), le(2024-02-02 16:45:26, cast(za.c_t4.c2, datetime(6) BINARY)) | | │ └─TableFullScan_115 | 100000000.00 | mpp[tiflash] | table:c_t4 | pushed down filter:empty, keep order:false | | └─TableReader_123(Probe) | 100000000.00 | root | | MppVersion: 2, data:ExchangeSender_122 | | └─ExchangeSender_122 | 100000000.00 | mpp[tiflash] | | ExchangeType: PassThrough | | └─TableFullScan_121 | 100000000.00 | mpp[tiflash] | table:c_t1 | keep order:false | +----------------------------------------------------+---------------------+--------------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 29 rows in set, 5 warnings (0.01 sec)

执行计划中表现的主要问题包括:

两张亿级别表关联产生笛卡尔积,导致预估处理行数超大

语句执行warning信息,提示group_concat、date_add未下推

mysql> show warnings; +---------+------+---------------------------------------------------------------------------------------------------------------------------------+ | Level | Code | Message | +---------+------+---------------------------------------------------------------------------------------------------------------------------------+ | Warning | 1105 | Aggregation can not be pushed to tikv because AggFunc`group_concat`is not supported now | | Warning | 1105 | Aggregation can not be pushed to tikv because AggFunc`group_concat` is not supported now | | Warning | 1105 | Aggregation can not be pushed to tikv because AggFunc `group_concat` is not supported now | | Warning | 1105 | Aggregation can not be pushed to tikv because AggFunc `group_concat`is not supported now | | Warning | 1105 | Scalar function date_add(signature: AddDateStringInt, return type: var_string(29)) is not supported to push down to tikv now. | +---------+------+---------------------------------------------------------------------------------------------------------------------------------+ 5 rows in set (0.00 sec)

优化大表笛卡尔积问题—执行耗时08分34秒

上述两张亿级表关联产生了笛卡尔积,主要是因为关联条件使用t1.vehicle_no = t2.vehicle_no OR t1.vin_no = t2.vehicle_no 导致,猜测可能是优化器针对带OR条件关联场景未做特殊优化。基于过往经验,这样的关联语句可以等价改写为两个SELECT子句UNION的方式,更改如下,

修改后,SQL语句从无法执行出结果到8分钟34秒完成查询。新的执行计划中也不再看到巨大的预估行数,说明此等价修改有效。

优化Json转换导致HashJoin效率问题—执行耗时19秒

仔细查看上述语句的explain analyze输出,我们发现时间主要消耗在外层的HashJoin算子,这个HashJoin主要是对两张大表关联的结果集(约100万)与两个Json字符串(各1条记录)做笛卡尔积。关联的结果集大小并没有任何变化,但耗时却长达8分钟,这不符合常理,我们怀疑问题出在百万次的Varchar->Json类型转换上面。

从上述执行计划看,HashJoin的operator info中需要针对每一行关联的记录做2次cast(.., json BINARY)动作,因为t3和t4子查询中的city_info和day_info是字符类型,这个隐式转换动作不可避免。

因此我们的优化思路是在子查询中提前将拼接的字符串转换为Json类型,这样理论上可以规避百万次的cast(..,json BINARY)动作,于是修改如下内容,

经过上述修改,执行耗时缩短到19秒,从执行计划可以看出,虽然HashJoin的operator info中仍然能看到cast(.., json BINARY),但由于对应字段已经提前转换为Json类型,所以内部并不需要实际的隐式转换动作,性能得到大幅提升。

优化date_add条件下推问题—执行耗时13秒

虽然优化到19秒,但warning信息我们仍然没有解决,我们首先来处理date_add。跟date_add相关的语句为:now() BETWEEN date_add(policy_expire_date, interval -60 day) AND policy_expire_date。TiFlash支持粗糙索引,针对数值、日期数据类型默认每 8K 行生成 min-max内建索引,而这里的条件过滤因为把字段放在date_add函数中,会导致无法走min-max索引的情况,因此需要将过滤条件做如下修改:

按上述修改后,语句执行后虽然不再显示date_add的warning,但是从执行计划中查看过滤条件仍然没有下推执行。

经查看,原因是policy_expire_date对应的字段(c_t4.c2)类型为varchar类型,导致在语句policy_expire_date between now() and date_add(now(), interval 60 day)内部会包含一层隐式类型转换。因此,我们将这个c2字段修改为datetime类型,为了便于对比,创建一张新表并将数据同步到新表结构中。

现在,SQL的执行时间被优化到13秒,从执行计划中可以看出,条件过滤已经能正常下推。

优化group_concat条件下推问题—执行耗时2.59秒

解决date_add的warning后,还剩下一个group_concat的warning。从TiDB官方文档可以看出,TiFlash存储引擎是支持group_concat下推的,那么为什么这里group_concat没有下推呢?在相关专家的指导下了解到,SQL示例中的group_concat所在的查询语句是一个不带group by的聚合查询,TiFlash当前针对这样的场景尚且还无法下推执行,后续版本会进行改进,详见

final mode scalar agg function does not push down tiflash, and without warnings · Issue #50583 · pingcap/tidb · GitHub

虽然TiFlash不支持这种情况的下推,但我们可以使用一些绕行方案,比如在不影响结果正确性的情况下通过增加ORDER BY或DISTINCT可以让查询下推。这里我们通增加DISTINCT来进行测试,修改如下

修改后的执行计划如下所示,可以看出除root以外所有的执行算子均已下推到TiFlash执行,执行耗时也被优化到2.59秒

+--------------------------------------------------------------------+--------------+-----------+--------------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+------+ | id | estRows | actRows | task | access object | execution info | operator info | memory | disk | +--------------------------------------------------------------------+--------------+-----------+--------------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+------+ | TableReader_213 | 1842063.40 | 1028305 | root | | time:2.57s, loops:1484, RU:0.000000, cop_task: {num: 801, max: 0s, min: 0s, avg: 0s, p95: 0s, copr_cache_hit_ratio: 0.00} | MppVersion: 2, data:ExchangeSender_212 | 160.6 KB | N/A | | └─ExchangeSender_212 | 1842063.40 | 1028305 | mpp[tiflash] | | tiflash_task:{proc max:2.56s, min:898.8ms, avg: 1.83s, p80:2.49s, p95:2.56s, iters:1652, tasks:6, threads:240} | ExchangeType: PassThrough | N/A | N/A | | └─Projection_37 | 1842063.40 | 1028305 | mpp[tiflash] | | tiflash_task:{proc max:2.56s, min:898.8ms, avg: 1.82s, p80:2.49s, p95:2.56s, iters:1652, tasks:6, threads:240}

版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。

上一篇:云数据库真相 AWS数据库性能与价格大揭秘
下一篇:从 2018 到 2022:大数据工程师视角下的 TiDB
相关文章