黄东旭解析 TiDB 的核心优势
646
2023-11-24
记得在某个6的版本发布宣传里面看到,新版本建索引速度提升了10倍,一直没抽时间亲自测试验证下,前段时间V7.1发布,趁着这次机会,把我的一个V6.1.0的环境做下升级,顺便测试了下各版本创建索引速度对比下。
不测不知道,一测吓一跳,满满的惊喜。一个3kw的表新索引速度从v6.1.0的21min,到v7.1.0只要1min8s。提升了20倍!
以下来自官方文档相关版本关于索引加速功能的Release Notes:
TiDB v6.3.0 支持开启添加索引加速功能,提升了创建索引回填过程的速度。开启该功能后,TiDB 添加索引的性能提升约为原来的 3 倍。 从 v6.5.0 起,添加索引加速功能默认开启。
单节点TiDB环境,物理机。配置:40c512G,7T *** TiDB初始版本:v6.1.0,逐步升级到v6.5.0和v7.1.0
直接使用的tpcc里面的customer表,3000w,大小约18G
[root@127.0.0.1][tpcc][05:17:48]> show create table customer\G *************************** 1. row *************************** Table: customer Create Table: CREATE TABLE `customer` ( `c_id` int(11) NOT NULL, `c_d_id` int(11) NOT NULL, `c_w_id` int(11) NOT NULL, `c_first` varchar(16) DEFAULT NULL, `c_middle` char(2) DEFAULT NULL, `c_last` varchar(16) DEFAULT NULL, `c_street_1` varchar(20) DEFAULT NULL, `c_street_2` varchar(20) DEFAULT NULL, `c_city` varchar(20) DEFAULT NULL, `c_state` char(2) DEFAULT NULL, `c_zip` char(9) DEFAULT NULL, `c_phone` char(16) DEFAULT NULL, `c_since` datetime DEFAULT NULL, `c_credit` char(2) DEFAULT NULL, `c_credit_lim` decimal(12,2) DEFAULT NULL, `c_discount` decimal(4,4) DEFAULT NULL, `c_balance` decimal(12,2) DEFAULT NULL, `c_ytd_payment` decimal(12,2) DEFAULT NULL, `c_payment_cnt` int(11) DEFAULT NULL, `c_delivery_cnt` int(11) DEFAULT NULL, `c_data` varchar(500) DEFAULT NULL, PRIMARY KEY (`c_w_id`,`c_d_id`,`c_id`) /*T![clustered_index] NONCLUSTERED */, KEY `idx_customer` (`c_w_id`,`c_d_id`,`c_last`,`c_first`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin 1 row in set (0.01 sec) [root@127.0.0.1][tpcc][05:21:48]> show variables like tidb_ddl%; +----------------------------+--------------+ | Variable_name | Value | +----------------------------+--------------+ | tidb_ddl_error_count_limit | 512 | | tidb_ddl_reorg_batch_size | 256 | | tidb_ddl_reorg_priority | PRIORITY_LOW | | tidb_ddl_reorg_worker_cnt | 4 | +----------------------------+--------------+ 4 rows in set (0.00 sec)分别测试三个索引,一个是varchar列、一个是char列,一个组合索引
alter table tpcc.customer add index idx_01(c_city); alter table tpcc.customer add index idx_02(c_phone); alter table tpcc.customer add index idx_03(c_phone,c_since);经测试,时间差不多,后面就主要以组合索引idx_03为主来做对比。
--默认4时 [root@127.0.0.1][tpcc][05:25:40]> alter table tpcc.customer add index idx_01(c_city); Query OK, 0 rows affected (21 min 4.48 sec) [root@127.0.0.1][tpcc][05:46:48]> alter table tpcc.customer add index idx_02(c_phone); Query OK, 0 rows affected (21 min 8.08 sec)[root@127.0.0.1][tpcc][06:08:50]> set global tidb_ddl_reorg_worker_cnt=8; Query OK, 0 rows affected (0.02 sec) [root@127.0.0.1][tpcc][06:08:58]> alter table tpcc.customer add index idx_03(c_phone,c_since); Query OK, 0 rows affected (10 min 44.26 sec) [root@127.0.0.1][tpcc][06:22:43]> set global tidb_ddl_reorg_worker_cnt=16; Query OK, 0 rows affected (0.02 sec) [root@127.0.0.1][tpcc][06:23:01]> alter table tpcc.customer add index idx_03(c_phone,c_since); Query OK, 0 rows affected (9 min 6.01 sec) [root@127.0.0.1][tpcc][08:38:03]> alter table tpcc.customer drop index idx_03; Query OK, 0 rows affected (0.33 sec) [root@127.0.0.1][tpcc][08:38:18]> set global tidb_ddl_reorg_worker_cnt=32; Query OK, 0 rows affected (0.02 sec) [root@127.0.0.1][tpcc][08:38:29]> alter table tpcc.customer add index idx_03(c_phone,c_since); Query OK, 0 rows affected (8 min 35.48 sec)tidb日志中可以看到:当tidb_ddl_reorg_worker_cnt=4时,创建索引的读取数据的速度。单线程每秒5k-7k行左右。 当tidb_ddl_reorg_worker_cnt=32时,创建索引的读取数据的速度。单线程每秒只有1600行左右。
[2023/06/19 17:35:06.521 +08:00] [INFO] [backfilling.go:271] ["[ddl] backfill worker back fill index"] [workerID=1] [addedCount=90624] [scanCount=90624] [nextHandle=12317094.next] [speed(rows/s)=6611.8594994214745] [2023/06/19 17:35:06.521 +08:00] [INFO] [backfilling.go:271] ["[ddl] backfill worker back fill index"] [workerID=3] [addedCount=90624] [scanCount=90624] [nextHandle=12609699.next] [speed(rows/s)=6681.175296206864] [2023/06/19 17:35:06.689 +08:00] [INFO] [backfilling.go:271] ["[ddl] backfill worker back fill index"] [workerID=2] [addedCount=90624] [scanCount=90624] [nextHandle=12463112.next] [speed(rows/s)=6661.681631740738] [2023/06/19 17:35:06.904 +08:00] [INFO] [backfilling.go:271] ["[ddl] backfill worker back fill index"] [workerID=0] [addedCount=90624] [scanCount=90624] [nextHandle=12171054.next] [speed(rows/s)=6364.5596641745815] .... [2023/06/20 08:45:14.810 +08:00] [INFO] [backfilling.go:271] ["[ddl] backfill worker back fill index"] [workerID=28] [addedCount=30208] [scanCount=30208] [nextHandle=27312608.next] [speed(rows/s)=1657.5438087595428]默认:tidb_ddl_reorg_worker_cnt=4
[root@127.0.0.1][tpcc][09:19:36]> show variables like tidb_ddl_reorg_worker_cnt; +---------------------------+-------+ | Variable_name | Value | +---------------------------+-------+ | tidb_ddl_reorg_worker_cnt | 4 | +---------------------------+-------+ 1 row in set (0.00 sec) [root@127.0.0.1][tpcc][09:19:40]> alter table tpcc.customer add index idx_01(c_city); Query OK, 0 rows affected (1 min 19.36 sec) [root@127.0.0.1][tpcc][09:21:03]> alter table tpcc.customer add index idx_02(c_phone); Query OK, 0 rows affected (1 min 19.49 sec) [root@127.0.0.1][tpcc][09:23:51]> alter table tpcc.customer add index idx_03(c_phone,c_since); Query OK, 0 rows affected (1 min 16.22 sec)分别测试tidb_ddl_reorg_worker_cnt=8、16、32
set global tidb_ddl_reorg_worker_cnt=8; alter table tpcc.customer drop index idx_03; alter table tpcc.customer add index idx_03(c_phone,c_since); set global tidb_ddl_reorg_worker_cnt=16; alter table tpcc.customer drop index idx_03; alter table tpcc.customer add index idx_03(c_phone,c_since); set global tidb_ddl_reorg_worker_cnt=32; alter table tpcc.customer drop index idx_03; alter table tpcc.customer add index idx_03(c_phone,c_since);操作日志:
[root@127.0.0.1][tpcc][09:39:18]> set global tidb_ddl_reorg_worker_cnt=8; Query OK, 0 rows affected (0.02 sec) [root@127.0.0.1][tpcc][09:40:07]> alter table tpcc.customer drop index idx_03; Query OK, 0 rows affected (0.73 sec) [root@127.0.0.1][tpcc][09:40:08]> alter table tpcc.customer add index idx_03(c_phone,c_since); Query OK, 0 rows affected (53.23 sec) [root@127.0.0.1][tpcc][09:41:03]> set global tidb_ddl_reorg_worker_cnt=16; Query OK, 0 rows affected (0.02 sec) [root@127.0.0.1][tpcc][09:41:41]> alter table tpcc.customer drop index idx_03; Query OK, 0 rows affected (0.49 sec) [root@127.0.0.1][tpcc][09:41:41]> alter table tpcc.customer add index idx_03(c_phone,c_since); Query OK, 0 rows affected (43.99 sec) [root@127.0.0.1][tpcc][09:42:27]> set global tidb_ddl_reorg_worker_cnt=32; Query OK, 0 rows affected (0.24 sec) [root@127.0.0.1][tpcc][09:43:24]> alter table tpcc.customer drop index idx_03; Query OK, 0 rows affected (0.68 sec) [root@127.0.0.1][tpcc][09:43:25]> alter table tpcc.customer add index idx_03(c_phone,c_since); Query OK, 0 rows affected (38.04 sec)执行 ADMIN SHOW DDL JOBS 语句查看 JOB_TYPE 一列中是否含有 ingest 字样
[root@127.0.0.1][tpcc][09:17:35]> admin show ddl jobs; +--------+---------+------------+------------------------+----------------------+-----------+----------+-----------+---------------------+---------------------+---------------------+---------+ | JOB_ID | DB_NAME | TABLE_NAME | JOB_TYPE | SCHEMA_STATE | SCHEMA_ID | TABLE_ID | ROW_COUNT | CREATE_TIME | START_TIME | END_TIME | STATE | +--------+---------+------------+------------------------+----------------------+-----------+----------+-----------+---------------------+---------------------+---------------------+---------+ | 23464 | tpcc | customer | add index /* ingest */ | write reorganization | 23386 | 23447 | 25853098 | 2023-06-21 09:22:31 | 2023-06-21 09:22:32 | NULL | running | | 23463 | tpcc | customer | add index /* ingest */ | public | 23386 | 23447 | 30000000 | 2023-06-21 09:19:44 | 2023-06-21 09:19:44 | 2023-06-21 09:21:03 | synced | | 23462 | tpcc | customer | drop index | none | 23386 | 23447 | 0 | 2023-06-21 09:19:34 | 2023-06-21 09:19:34 | 2023-06-21 09:19:34 | synced | | 23461 | tpcc | customer | add index /* ingest */ | public | 23386 | 23447 | 30000000 | 2023-06-21 09:16:40 | 2023-06-21 09:16:40 | 2023-06-21 09:17:44 | synced |相关参数:
tidb_ddl_enable_fast_reorg tidb_ddl_disk_quotahttps://docs.pingcap.com/zh/tidb/stable/system-variables#tidb_ddl_enable_fast_reorg-span-classversion-mark%E4%BB%8E-v630-%E7%89%88%E6%9C%AC%E5%BC%80%E5%A7%8B%E5%BC%95%E5%85%A5span
在升级到 v6.5.0 及以上版本时,建议你检查 TiDB 的 temp-dir 路径是否正确挂载了 *** 磁盘。该参数是 TiDB 的配置参数,设置后需要重启 TiDB 才能生效。因此,在升级前提前进行设置,可以避免再次重启。
使用快速索引功能,建索引的时候,数据会回填到temp-dir指定目录,过程中可以看到该目录空间在持续增长。
[tidb@host_130 tmp_ddl-4000]$ du -sh /tmp/tidb/tmp_ddl-4000/ 0 /tmp/tidb/tmp_ddl-4000/ [tidb@host_130 tmp_ddl-4000]$ du -sh /tmp/tidb/tmp_ddl-4000/ 88M /tmp/tidb/tmp_ddl-4000/ [tidb@host_130 tmp_ddl-4000]$ du -sh /tmp/tidb/tmp_ddl-4000/ 193M /tmp/tidb/tmp_ddl-4000/ [tidb@host_130 tmp_ddl-4000]$ tree /tmp/tidb/tmp_ddl-4000/ /tmp/tidb/tmp_ddl-4000/ └── 23464 ├── d4cacd68-ac06-558d-8dca-9bd229f322d2 │ ├── 000002.log │ ├── CURRENT │ ├── LOCK │ ├── MANIFEST-000001 │ └── OPTIONS-000003 └── d4cacd68-ac06-558d-8dca-9bd229f322d2.sst ├── 010ff049-964c-4f42-8880-47d69a44fede.sst ├── 078d2401-a319-4a62-928a-9af030faf742.sst ├── 2497f448-a7fe-4e6f-a9be-cea2bd4413c4.sst ├── 26ac2d83-2ca9-4962-80a3-2dcdf556e3f1.sst ├── 39549396-c9f2-4f06-8edf-7b325376e6a4.sst ├── 69be99c9-cc7c-42bf-97cc-f67547d72865.sst ├── 70b137b4-0fcf-4139-8c25-d03a4a1ef932.sst ├── 735fe13f-cf00-468c-91f9-2d9578064d60.sst ├── 7c50175e-f97f-49e5-8a5b-eb7ea4acbfb1.sst ├── 86e551b8-93fb-4862-82cb-a3a0c34ce280.sst ├── 913eef57-b90b-45df-b1d5-85ee52a95471.sst ├── 9d736cbb-fc5b-4df2-a203-aab03afdedea.sst ├── def3a75f-bee9-4e45-b1ae-a8085f7daa57.sst ├── e0151d63-ff16-4f09-946f-c18f35fb1e94.sst └── f32bbfe0-9672-4f25-a20f-af391e7c50d6.sst 3 directories, 20 files默认:tidb_ddl_reorg_worker_cnt=4
[root@127.0.0.1][tpcc][08:40:55]> set global tidb_ddl_reorg_worker_cnt=4; Query OK, 0 rows affected (0.04 sec) [root@127.0.0.1][tpcc][08:41:20]> show variables like tidb_ddl_%; +--------------------------------+--------------+ | Variable_name | Value | +--------------------------------+--------------+ | tidb_ddl_disk_quota | 107374182400 | | tidb_ddl_enable_fast_reorg | ON | | tidb_ddl_error_count_limit | 512 | | tidb_ddl_flashback_concurrency | 64 | | tidb_ddl_reorg_batch_size | 256 | | tidb_ddl_reorg_priority | PRIORITY_LOW | | tidb_ddl_reorg_worker_cnt | 4 | +--------------------------------+--------------+ 7 rows in set (0.01 sec) [root@127.0.0.1][tpcc][10:37:58]> alter table tpcc.customer add index idx_01(c_city); Query OK, 0 rows affected (1 min 8.36 sec) [root@127.0.0.1][tpcc][10:29:31]> alter table tpcc.customer add index idx_02(c_phone); Query OK, 0 rows affected (1 min 7.93 sec) [root@127.0.0.1][tpcc][10:36:00]> alter table tpcc.customer add index idx_03(c_phone,c_since); Query OK, 0 rows affected (1 min 7.64 sec)分别测试tidb_ddl_reorg_worker_cnt=8、16、32
set global tidb_ddl_reorg_worker_cnt=8; alter table tpcc.customer drop index idx_03; alter table tpcc.customer add index idx_03(c_phone,c_since); set global tidb_ddl_reorg_worker_cnt=16; alter table tpcc.customer drop index idx_03; alter table tpcc.customer add index idx_03(c_phone,c_since); set global tidb_ddl_reorg_worker_cnt=32; alter table tpcc.customer drop index idx_03; alter table tpcc.customer add index idx_03(c_phone,c_since);执行记录:
[root@127.0.0.1][tpcc][10:43:58]> set global tidb_ddl_reorg_worker_cnt=8; Query OK, 0 rows affected (0.03 sec) [root@127.0.0.1][tpcc][10:44:00]> alter table tpcc.customer drop index idx_03; Query OK, 0 rows affected (0.47 sec) [root@127.0.0.1][tpcc][10:44:01]> alter table tpcc.customer add index idx_03(c_phone,c_since); Query OK, 0 rows affected (44.50 sec) [root@127.0.0.1][tpcc][10:44:47]> set global tidb_ddl_reorg_worker_cnt=16; Query OK, 0 rows affected (0.01 sec) [root@127.0.0.1][tpcc][10:46:04]> alter table tpcc.customer drop index idx_03; Query OK, 0 rows affected (0.47 sec) [root@127.0.0.1][tpcc][10:46:05]> alter table tpcc.customer add index idx_03(c_phone,c_since); Query OK, 0 rows affected (36.20 sec) [root@127.0.0.1][tpcc][10:46:42]> set global tidb_ddl_reorg_worker_cnt=32; Query OK, 0 rows affected (0.02 sec) [root@127.0.0.1][tpcc][10:46:46]> alter table tpcc.customer drop index idx_03; Query OK, 0 rows affected (0.41 sec) [root@127.0.0.1][tpcc][10:46:46]> alter table tpcc.customer add index idx_03(c_phone,c_since); Query OK, 0 rows affected (31.45 sec)tidb_ddl_reorg_batch_size 默认256.
set global tidb_ddl_reorg_worker_cnt=4; set global tidb_ddl_reorg_batch_size=512; alter table tpcc.customer drop index idx_03; alter table tpcc.customer add index idx_03(c_phone,c_since); set global tidb_ddl_reorg_batch_size=1024; alter table tpcc.customer drop index idx_03; alter table tpcc.customer add index idx_03(c_phone,c_since); set global tidb_ddl_reorg_batch_size=2048; alter table tpcc.customer drop index idx_03; alter table tpcc.customer add index idx_03(c_phone,c_since);操作日志:
[root@127.0.0.1][tpcc][05:23:46]> set global tidb_ddl_reorg_batch_size=512; Query OK, 0 rows affected (0.02 sec) [root@127.0.0.1][tpcc][05:23:53]> alter table tpcc.customer drop index idx_03; Query OK, 0 rows affected (0.44 sec) [root@127.0.0.1][tpcc][05:23:54]> alter table tpcc.customer add index idx_03(c_phone,c_since); Query OK, 0 rows affected (1 min 5.60 sec) [root@127.0.0.1][tpcc][05:25:01]> show variables like tidb_ddl_%; +--------------------------------+--------------+ | Variable_name | Value | +--------------------------------+--------------+ | tidb_ddl_disk_quota | 107374182400 | | tidb_ddl_enable_fast_reorg | ON | | tidb_ddl_error_count_limit | 512 | | tidb_ddl_flashback_concurrency | 64 | | tidb_ddl_reorg_batch_size | 512 | | tidb_ddl_reorg_priority | PRIORITY_LOW | | tidb_ddl_reorg_worker_cnt | 4 | +--------------------------------+--------------+ 7 rows in set (0.00 sec) [root@127.0.0.1][tpcc][05:25:26]> set global tidb_ddl_reorg_batch_size=1024; Query OK, 0 rows affected (0.02 sec) [root@127.0.0.1][tpcc][05:25:31]> alter table tpcc.customer drop index idx_03; Query OK, 0 rows affected (0.47 sec) [root@127.0.0.1][tpcc][05:25:32]> alter table tpcc.customer add index idx_03(c_phone,c_since); Query OK, 0 rows affected (58.42 sec) [root@127.0.0.1][tpcc][05:26:30]> [root@127.0.0.1][tpcc][05:26:30]> set global tidb_ddl_reorg_batch_size=2048; Query OK, 0 rows affected (0.02 sec) [root@127.0.0.1][tpcc][05:26:57]> alter table tpcc.customer drop index idx_03; Query OK, 0 rows affected (0.42 sec) [root@127.0.0.1][tpcc][05:26:58]> alter table tpcc.customer add index idx_03(c_phone,c_since); Query OK, 0 rows affected (56.39 sec)从上面的数据我们可以看到,在V6.5.0版本中,默认4个worker下,建索引速度对比v6.1.0提升了约20倍,随着worker个数增加速度会进一步提升,但是越到后面提升越少,可能跟数据量有关系哈,测试结果仅供参考。
V7.1.0,相比V6.5.0也有一定提升,在worker为8和16的情况下,大约提升了16%((53-44)/53=0.1698,(43-36)/43=0.1628)。
另外,在V7.1.0下,测试了不同tidb_ddl_reorg_batch_size下的影响,也是会有一定提升。
版本表数据量tidb_ddl_reorg_batch_size=512(默认)tidb_ddl_reorg_batch_size=1024tidb_ddl_reorg_batch_size=2048v7.1.03000w1min5s58s56s以上就是在同一台机器上测试不同TiDB版本建索引速度的对比。V6.3开始新增的索引加速功能确实提升非常大,还在用老版本的同学,快升级测试下吧。
版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。