黄东旭解析 TiDB 的核心优势
419
2024-03-27
我们公司有一种特殊的数据分析场景,大数据部门会定期抽取业务数据进行分析然后T+1更新报表。这个抽数任务经常会影响业务的正常访问。经过调研后发现tiflash可以解决这种场景带来的问题。
但是,近期我们公司在使用tiflash引擎的过程中遇到一些与预期不符合的问题,本文就来整理总结一下。
tidb版本4.0.13
根据官方文档的介绍,tiflash在统计信息count(*)这种场景有很大的优势,所以我们就测试了这种使用场景,结果和描述的一致。
(1)准备测试数据
mysql> show create table tb_monitor_disk_info\G *************************** 1. row *************************** Table: tb_monitor_disk_info Create Table: CREATE TABLE `tb_monitor_disk_info` ( `id` bigint(20) NOT NULL AUTO_INCREMENT, `rshost` varchar(20) NOT NULL DEFAULT COMMENT 主机地址, `part` varchar(50) NOT NULL DEFAULT COMMENT 分区信息, `disk_info` json DEFAULT NULL COMMENT disk信息 json串, 单位是GB, `a_time` datetime NOT NULL DEFAULT 2022-01-01 00:00:00, PRIMARY KEY (`id`), KEY `idx_rshost` (`rshost`), KEY `idx_a_time` (`a_time`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin AUTO_INCREMENT=307494024 1 row in set (0.00 sec) mysql> select count(*) from tb_monitor_disk_info; +----------+ | count(*) | +----------+ | 52916325 | +----------+ 1 row in set (0.07 sec) mysql> select * from tb_monitor_disk_info limit 10; +-----------+----------------+-----------+---------------------------------------------------+---------------------+ | id | rshost | part | disk_info | a_time | +-----------+----------------+-----------+---------------------------------------------------+---------------------+ | 260383923 | 192.168.64.160 | /dev/sdb1 | {"free": "3900", "total": "5361", "used": "1461"} | 2023-02-01 03:39:28 | | 260383924 | 192.168.64.77 | /dev/sda1 | {"free": "2617", "total": "2748", "used": "19"} | 2023-02-01 03:39:28 | | 260383925 | 192.168.66.224 | /dev/sdb1 | {"free": "1147", "total": "3574", "used": "2426"} | 2023-02-01 03:39:27 | | 260383926 | 192.168.64.158 | /dev/sdb1 | {"free": "2746", "total": "5361", "used": "2614"} | 2023-02-01 03:39:28 | | 260383927 | 192.168.65.33 | /dev/sdb1 | {"free": "2328", "total": "5361", "used": "3033"} | 2023-02-01 03:39:28 | | 260383928 | 192.168.65.93 | /dev/dfa1 | {"free": "2947", "total": "2978", "used": "31"} | 2023-02-01 03:39:28 | | 260383929 | 192.168.64.68 | /dev/sda1 | {"free": "2607", "total": "2748", "used": "28"} | 2023-02-01 03:39:28 | | 260383930 | 192.168.65.83 | /dev/sdb1 | {"free": "3346", "total": "3574", "used": "227"} | 2023-02-01 03:39:27 | | 260383931 | 192.168.64.245 | /dev/sdb1 | {"free": "3819", "total": "5321", "used": "1233"} | 2023-02-01 03:39:29 | | 260383932 | 192.168.64.249 | /dev/sdb1 | {"free": "3972", "total": "5321", "used": "1080"} | 2023-02-01 03:39:29 | +-----------+----------------+-----------+---------------------------------------------------+---------------------+ 10 rows in set (0.02 sec) mysql> select * from information_schema.tiflash_replica; +--------------+----------------------+----------+---------------+-----------------+-----------+----------+ | TABLE_SCHEMA | TABLE_NAME | TABLE_ID | REPLICA_COUNT | LOCATION_LABELS | AVAILABLE | PROGRESS | +--------------+----------------------+----------+---------------+-----------------+-----------+----------+ | dbzz_monitor | tb_monitor_disk_info | 455 | 2 | | 1 | 1 | +--------------+----------------------+----------+---------------+-----------------+-----------+----------+ 1 row in set (0.01 sec) mysql>所有演示都是采用我们线上的监控测试数据,总量大概5000w左右,并且已经部署好tiflash引擎。
(2)测试tiflash
使用tikv做统计查询
mysql> set SESSION tidb_isolation_read_engines = "tikv"; Query OK, 0 rows affected (0.00 sec) mysql> desc select count(*) from tb_monitor_disk_info; +----------------------------+-------------+-----------+------------------------------------------------------+----------------------------------+ | id | estRows | task | access object | operator info | +----------------------------+-------------+-----------+------------------------------------------------------+----------------------------------+ | StreamAgg_24 | 1.00 | root | | funcs:count(Column#14)->Column#6 | | └─IndexReader_25 | 1.00 | root | | index:StreamAgg_8 | | └─StreamAgg_8 | 1.00 | cop[tikv] | | funcs:count(1)->Column#14 | | └─IndexFullScan_22 | 53220210.00 | cop[tikv] | table:tb_monitor_disk_info, index:idx_rshost(rshost) | keep order:false | +----------------------------+-------------+-----------+------------------------------------------------------+----------------------------------+ 4 rows in set (0.00 sec) mysql> select count(*) from tb_monitor_disk_info; +----------+ | count(*) | +----------+ | 52916325 | +----------+ 1 row in set (2.12 sec) mysql> select count(*) from tb_monitor_disk_info; +----------+ | count(*) | +----------+ | 52916325 | +----------+ 1 row in set (2.32 sec) mysql> select count(*) from tb_monitor_disk_info; +----------+ | count(*) | +----------+ | 52916325 | +----------+ 1 row in set (2.10 sec) mysql> select count(*) from tb_monitor_disk_info; +----------+ | count(*) | +----------+ | 52916325 | +----------+ 1 row in set (2.14 sec) mysql> select count(*) from tb_monitor_disk_info; +----------+ | count(*) | +----------+ | 52916325 | +----------+ 1 row in set (2.07 sec) mysql>使用tiflash做统计查询
mysql> set SESSION tidb_isolation_read_engines = "tiflash"; Query OK, 0 rows affected (0.00 sec) mysql> desc select count(*) from tb_monitor_disk_info; +----------------------------+-------------+-------------------+----------------------------+---------------------------------+ | id | estRows | task | access object | operator info | +----------------------------+-------------+-------------------+----------------------------+---------------------------------+ | StreamAgg_16 | 1.00 | root | | funcs:count(Column#8)->Column#6 | | └─TableReader_17 | 1.00 | root | | data:StreamAgg_8 | | └─StreamAgg_8 | 1.00 | batchCop[tiflash] | | funcs:count(1)->Column#8 | | └─TableFullScan_15 | 53220210.00 | batchCop[tiflash] | table:tb_monitor_disk_info | keep order:false | +----------------------------+-------------+-------------------+----------------------------+---------------------------------+ 4 rows in set (0.00 sec) mysql> select count(*) from tb_monitor_disk_info; +----------+ | count(*) | +----------+ | 52916325 | +----------+ 1 row in set (0.07 sec) mysql> select count(*) from tb_monitor_disk_info; +----------+ | count(*) | +----------+ | 52916325 | +----------+ 1 row in set (0.05 sec) mysql> select count(*) from tb_monitor_disk_info; +----------+ | count(*) | +----------+ | 52916325 | +----------+ 1 row in set (0.06 sec) mysql> select count(*) from tb_monitor_disk_info; +----------+ | count(*) | +----------+ | 52916325 | +----------+ 1 row in set (0.05 sec) mysql> select count(*) from tb_monitor_disk_info; +----------+ | count(*) | +----------+ | 52916325 | +----------+ 1 row in set (0.06 sec) mysql>可以看到在使用tikv引擎做全表统计行数的时候耗时大概2秒多,使用tiflash引擎做全表行数统计大概只需要100 ms以内,提升还是很大的。
在使用tiflash组件前,我们只是验证了一下统计查询对性能的提升,并没有对起针对性的测试调研,比如模拟业务的查询场景。
这是一个反面例子,强烈建议上线前要充分测试验证,有条件的还要模拟线上业务场景做充分测试。
以至于,我们抽数任务在使用tiflash引擎后,查询SQL并没有走tiflash,经过测试发现,使用索引的查询条件就没法使用tiflash。所以从使用体验来看,tiflash并没有那么舒服,也不像网上吹虚的那样,具体测试如下:
mysql> desc select count(*) from tb_monitor_disk_info where a_time >= 2023-02-10 and a_time < 2023-02-11; +-----------------------------+------------+-----------+------------------------------------------------------+-------------------------------------------------------------------+ | id | estRows | task | access object | operator info | +-----------------------------+------------+-----------+------------------------------------------------------+-------------------------------------------------------------------+ | StreamAgg_24 | 1.00 | root | | funcs:count(Column#8)->Column#6 | | └─IndexReader_25 | 1.00 | root | | index:StreamAgg_9 | | └─StreamAgg_9 | 1.00 | cop[tikv] | | funcs:count(1)->Column#8 | | └─IndexRangeScan_23 | 1513586.15 | cop[tikv] | table:tb_monitor_disk_info, index:idx_a_time(a_time) | range:[2023-02-10 00:00:00,2023-02-11 00:00:00), keep order:false | +-----------------------------+------------+-----------+------------------------------------------------------+-------------------------------------------------------------------+ 4 rows in set (0.00 sec) mysql> select count(*) from tb_monitor_disk_info where a_time >= 2023-02-10 and a_time < 2023-02-11; +----------+ | count(*) | +----------+ | 1775971 | +----------+ 1 row in set (0.25 sec) mysql> desc select * from tb_monitor_disk_info where a_time >= 2023-02-10 and a_time < 2023-02-11; +--------------------------------+------------+-----------+------------------------------------------------------+-------------------------------------------------------------------+ | id | estRows | task | access object | operator info | +--------------------------------+------------+-----------+------------------------------------------------------+-------------------------------------------------------------------+ | IndexLookUp_13 | 1513586.15 | root | | | | ├─IndexRangeScan_11(Build) | 1513586.15 | cop[tikv] | table:tb_monitor_disk_info, index:idx_a_time(a_time) | range:[2023-02-10 00:00:00,2023-02-11 00:00:00), keep order:false | | └─TableRowIDScan_12(Probe) | 1513586.15 | cop[tikv] | table:tb_monitor_disk_info | keep order:false | +--------------------------------+------------+-----------+------------------------------------------------------+-------------------------------------------------------------------+ 3 rows in set (0.00 sec) mysql> desc select * from tb_monitor_disk_info ignore index(idx_a_time) where a_time >= 2023-02-10 and a_time < 2023-02-11; +-------------------------+-------------+--------------+----------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------+ | id | estRows | task | access object | operator info | +-------------------------+-------------+--------------+----------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------+ | TableReader_10 | 1513586.15 | root | | data:Selection_9 | | └─Selection_9 | 1513586.15 | cop[tiflash] | | ge(dbzz_monitor.tb_monitor_disk_info.a_time, 2023-02-10 00:00:00.000000), lt(dbzz_monitor.tb_monitor_disk_info.a_time, 2023-02-11 00:00:00.000000) | | └─TableFullScan_8 | 53220210.00 | cop[tiflash] | table:tb_monitor_disk_info | keep order:false | +-------------------------+-------------+--------------+----------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------+ 3 rows in set (0.00 sec) mysql>针对上述现象,只能使用ignore index语法,使其不使用索引才会走tiflash。
也可以使用 set SESSION tidb_isolation_read_engines = "tiflash";
抽数是我们公司一个重要的业务场景,具有一定的特殊性,基本都是通过时间字段取遍历增量数据或者全量数据,十分消耗资源,所以基于这个背景我们上线了tiflash组件提供这类业务场景进行抽数,但是实际使用中发现并没有得到预期的效果:
(1)我们的抽数SQL非统计类且因为使用了索引,可能tidb识别为代价不高的查询,所以还是下发到tikv,导致我们的业务正常访问请求收到影响。
(2)为了解决上述(1)的问题,有如下两个方案。
修改业务逻辑。
这个方案需要去修改业务逻辑,将抽数语句改成ignore index,或者在连接tidb的时候设置tidb_isolation_read_engines,总的来说对业务来说也不太友好,所以需要看实际情况,业务能不能接受。
修改tidb的配置
另一种方案就是新加一个tidb节点,将新tidb节点的引擎强制使用tiflash。
[isolation-read] engines = ["tiflash"]但是需要注意,没有启用tiflash的表是没法查数的,所以这个新加的节点不能暴露给业务使用,否则可能会导致查询报错。
综上所述,每个环境的业务场景都比较复杂,存在不同情况的查询慢问题,tiflash的使用场景也存在一定的限制,所以想通过tiflash解决线上查询性能问题一定要先测试。
本文对tiflash引擎在线上环境做了简单测试总结,各公司的业务场景也不一样,需求也不同,还可能碰上其他未知的问题,本文所有内容仅供参考。
版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。