TiDB 7.1.0资源管控特性试用心得

网友投稿 342 2024-02-25



一.背景

印象里 2022 年初的时候就有小伙伴在 asktug 提出 TiDB 未来是否会有多租户功能的问题了,没想到这么快就已经 GA 了。资源管控特性(Resource Control)可以说是 TiDB 7.1 中最亮眼的功能之一,使得 TiDB 具备了多租户隔离的能力。合理地利用资源管控特性可以减少集群数量,降低运维难度及管理成本,即拥有了更低的门槛和更高的收益。

TiDB 7.1.0资源管控特性试用心得

我们这边的 TiDB 集群都是使用物理机,为了防止系统负载突然增高在采购硬件的时候要做超量采购,一次就要采购至少 10 几台物理机。对于单租户数据库来说,这一部分预留的资源在很长的一段时间都是闲置的,会有很大的资源浪费。近期因为 MySQL 数据归档问题忙了好几个通宵,而现有的 TiDB 集群平均负载并不高,于是产生了将多套 MySQL 的业务迁移至一套 TiDB 集群的想法,但是又怕多个业务会产生资源争抢,影响核心业务。Resource Control 恰恰可以解决这个问题,趁这个机会对该特性来进行一下简单的测试。

二.测试环境准备

1.硬件配置及集群拓扑规划

RoleHostPortsalertmanager10.0.0.19093/9094grafana10.0.0.13000pd10.0.0.12379/2380tidb10.0.0.14000/10080tikv10.0.0.220160/20180tikv10.0.0.320160/20180tikv10.0.0.420160/20180

由于硬件条件有限, 使用 4 台云主机,配置为 8C 32G 200G 普通 SAS 硬盘。不过本次测试仅针对资源管控的特性进行测试,对硬件的需求不是特别严格。TiDB 7.1 默认会将资源组流控(tidb_enable_resource_control)和基于资源组配额的请求调度(resource-control.enabled)的参数打开,因此也未对参数进行特别调整。

2. 资源管控配置

实现原理

首先,要知道什么是 Request Unit (RU)

Request Unit (RU) 是 TiDB 对 CPU、IO 等系统资源(目前包括 CPU、IOPS 和 IO 带宽)的统一抽象的单位

资源RU 权重CPU1/3 RU / 毫秒读 IO1/64 RU / KB写 IO1 RU / KB一次读请求的基本开销0.25 RU一次写请求的基本开销1.5 RU

基于上表,假设某个资源组消耗的 TiKV 时间是 c 毫秒,r1 次请求读取了 r2 KB 数据,w1 次写请求写入了 w2 KB 数据,集群中非 witness TiKV 节点数是 n,则该资源组消耗的总 RU 的公式如下:

c* 1/3 + (r1 * 0.25 + r2 * 1/64) + (w1 * 1.5 + w2 * 1 * n)

工作原理

TiDB 资源管控特性提供了两层资源管理能力,包括在 TiDB 层的流控能力和 TiKV 层的优先级调度的能力。

首先,要看资源组是否超过了 RU 限制,没有超过可以直接到 TiKV 里进行调度。如果超过了 RU 限制,再看是否设置了 BURSTABLE,即是否允许这个资源组的应用超额占用资源,如果设置了并且还有空闲的资源,可以继续访问数据库,如果没有设置,就需要等待或者超时。

到 TiKV 读取这一步的时候,就涉及优先级PRIORITY 的概念,如果 TikV 出现了资源争抢的情况,高优先级的就可以优先响应,对于资源使用率不高但是需要快速响应的业务来说就适合设置高优先级。

1.预估集群容量

由于是测试集群,并没有真实的业务流量,因此我们使用基于硬件部署估算容量,而没有使用可能更准确的根据实际负载估算容量。

MySQL [(none)]> CALIBRATE RESOURCE WORKLOAD TPCC; +-------+ | QUOTA | +-------+ | 69768 | +-------+ 1 row in set (0.01 sec) MySQL [(none)]> CALIBRATE RESOURCE WORKLOAD OLTP_WRITE_ONLY; +-------+ | QUOTA | +-------+ | 73184 | +-------+ 1 row in set (0.00 sec) MySQL [(none)]>CALIBRATE RESOURCE WORKLOAD OLTP_READ_WRITE; +-------+ | QUOTA | +-------+ | 29772 | +-------+ 1 row in set (0.01 sec) MySQL [(none)]> CALIBRATE RESOURCE WORKLOAD OLTP_READ_ONLY; +-------+ | QUOTA | +-------+ | 13970 | +-------+ 1 row in set (0.00 sec)

当然,在dashboard中也可以直观地看到

2.创建资源组

创建 cc1 资源组,限额是每秒 10000 RU,并且允许这个资源组的应用超额占用资源,设置绝对优先级为 HIGH

CREATE RESOURCE GROUP IF NOT EXISTS cc1 RU_PER_SEC = 10000 BURSTABLE HIGH PRIORITY = HIGH;

创建 cc2 资源组,限额是每秒 2000 RU,在系统资源充足的时候,不允许这个资源组的应用超额占用资源,设置绝对优先级为 LOW

CREATE RESOURCE GROUP IF NOT EXISTS cc2 RU_PER_SEC = 2000 PRIORITY = LOW;

查看资源组

MySQL [(none)]> select * from information_schema.resource_groups; +---------+------------+----------+-----------+ | NAME | RU_PER_SEC | PRIORITY | BURSTABLE | +---------+------------+----------+-----------+ | cc1 | 10000 | HIGH | YES | | cc2 | 2000 | LOW | NO | +---------+------------+----------+-----------+ 2 rows in set (0.00 sec)

3.绑定资源组

将用户绑定到资源组MySQL [(none)]> CREATE USER cc1@% IDENTIFIED BY test RESOURCE GROUP cc1; Query OK, 0 rows affected (0.54 sec) MySQL [(none)]> CREATE USER cc2@% IDENTIFIED BY test RESOURCE GROUP cc2; Query OK, 0 rows affected (0.02 sec) MySQL [(none)]> grant all on cctest1.* to cc1@%; Query OK, 0 rows affected (0.28 sec) MySQL [(none)]> grant all on cctest2.* to cc2@%; Query OK, 0 rows affected (0.13 sec) MySQL [(none)]> select Host,user, User_attributes from mysql.user; +------+-----------+---------------------------+ | Host | user | User_attributes | +------+-----------+---------------------------+ | % | root | NULL | | % | cc1 | {"resource_group": "cc1"} | | % | cc2 | {"resource_group": "cc2"} | +------+-----------+---------------------------+ 3 rows in set (0.01 sec)

4.其他可能会用到的命令

将当前会话绑定到资源组 SET RESOURCE GROUP rg1;

将语句绑定到资源组 SELECT /*+ RESOURCE_GROUP(rg1) */ * FROM t limit 10;

更改资源组 ALTER RESOURCE GROUP rg1 RU_PER_SEC = 200 PRIORITY = LOW;

删除资源组DROP RESOURCE GROUPIF EXISTS rg1;

已存在用户绑定资源组 ALTER USER usr1 RESOURCE GROUP rg1;

注意:DROP RESOURCE GROUP 和ALTER RESOURCE GROUP 语句只能在全局变量 tidb_enable_resource_control 设置为 ON 时才能执行

三.性能测试

1.未做资源管控

先关闭资源管控

SET GLOBAL tidb_enable_resource_control = OFF;

生成 10 张 1kw的表选取 oltp_read_write 的场景进行测试

使用sysbench对库cctest1进行压测,模拟业务1

sysbench --mysql-host=10.0.0.1 --mysql-port=4000 --mysql-db=cctest1 --mysql-user=cc1 --mysql-password=test --time=60 --threads=32 --report-interval=10 --db-driver=mysql oltp_read_write --tables=10 --table-size=10000000 run

[ 10s ] thds: 32 tps: 1218.30 qps: 24404.43 (r/w/o: 17088.15/4800.41/2515.87) lat (ms,95%): 36.89 err/s: 0.00 reconn/s: 0.00 [ 20s ] thds: 32 tps: 1175.40 qps: 23512.93 (r/w/o: 16456.85/4632.38/2423.70) lat (ms,95%): 42.61 err/s: 0.00 reconn/s: 0.00 [ 30s ] thds: 32 tps: 1206.80 qps: 24140.37 (r/w/o: 16901.68/4750.89/2487.80) lat (ms,95%): 38.94 err/s: 0.00 reconn/s: 0.00 [ 40s ] thds: 32 tps: 1211.18 qps: 24219.65 (r/w/o: 16952.28/4768.21/2499.15) lat (ms,95%): 38.25 err/s: 0.00 reconn/s: 0.00 [ 50s ] thds: 32 tps: 1180.30 qps: 23605.52 (r/w/o: 16524.24/4649.48/2431.79) lat (ms,95%): 41.10 err/s: 0.00 reconn/s: 0.00 [ 60s ] thds: 32 tps: 1227.51 qps: 24545.73 (r/w/o: 17181.29/4837.82/2526.61) lat (ms,95%): 36.24 err/s: 0.00 reconn/s: 0.00SQL statistics: queries performed: read:1011220 write: 284490 other: 148890 total: 1444600 transactions: 72230 (1203.28 per sec.) queries: 1444600 (24065.53 per sec.) ignored errors: 0 (0.00 per sec.) reconnects: 0 (0.00 per sec.) General statistics: total time: 60.0264s total number of events:72230 Latency (ms): min: 14.43 avg: 26.59 max: 90.19 95th percentile: 38.94 sum: 1920326.91 Threads fairness: events (avg/stddev): 2257.1875/10.55 execution time (avg/stddev): 60.0102/0.01

平均 QPS 24065.53

使用 sysbench 对库 cctest1 及 cctest2 进行压测,模拟业务 1 和业务 2 同时进行

sysbench --mysql-host=10.0.0.1 --mysql-port=4000 --mysql-db=cctest1 --mysql-user=cc1 --mysql-password=test --time=60 --threads=32 --report-interval=10 --db-driver=mysql oltp_read_write --tables=10 --table-size=10000000 run

[ 10s ] thds: 32 tps: 728.88 qps: 14619.00 (r/w/o: 10237.65/2878.31/1503.04) lat (ms,95%): 80.03 err/s: 0.00 reconn/s: 0.00 [ 20s ] thds: 32 tps: 670.78 qps: 13424.65 (r/w/o: 9398.36/2645.13/1381.17) lat (ms,95%): 87.56 err/s: 0.00 reconn/s: 0.00 [ 30s ] thds: 32 tps: 673.99 qps: 13461.74 (r/w/o: 9421.59/2654.79/1385.36) lat (ms,95%): 87.56 err/s: 0.00 reconn/s: 0.00 [ 40s ] thds: 32 tps: 700.84 qps: 14023.75 (r/w/o: 9817.53/2761.22/1445.00)lat(ms,95%): 80.03 err/s: 0.00 reconn/s: 0.00 [ 50s ] thds: 32 tps: 670.72 qps: 13412.90 (r/w/o: 9389.71/2642.46/1380.73) lat (ms,95%): 90.78 err/s: 0.00 reconn/s: 0.00 [ 60s ] thds: 32 tps: 748.85 qps: 14977.25 (r/w/o: 10484.06/2953.79/1539.39) lat (ms,95%): 68.05 err/s: 0.00 reconn/s: 0.00SQL statistics: queries performed: read:587636 write: 165458 other: 86386 total: 839480 transactions: 41974 (699.05 per sec.) queries: 839480 (13981.06 per sec.) ignored errors: 0 (0.00 per sec.) reconnects: 0 (0.00 per sec.) General statistics: total time: 60.0427s total number of events: 41974 Latency (ms): min:14.99 avg: 45.75 max: 214.63 95th percentile: 82.96 sum: 1920514.27 Threads fairness: events (avg/stddev): 1311.6875/11.52 execution time (avg/stddev): 60.0161/0.01

sysbench --mysql-host=10.0.0.1 --mysql-port=4000 --mysql-db=cctest2 --mysql-user=cc2--mysql-password=test --time=60 --threads=32 --report-interval=10 --db-driver=mysql oltp_read_write --tables=10 --table-size=10000000 run

[ 10s ] thds: 32 tps: 702.80 qps: 14081.23 (r/w/o: 9858.82/2784.22/1438.19) lat (ms,95%): 84.47 err/s: 0.00 reconn/s: 0.00 [ 20s ] thds: 32 tps: 716.73 qps: 14350.91 (r/w/o: 10048.43/2839.72/1462.76) lat (ms,95%): 81.48 err/s: 0.00 reconn/s: 0.00 [ 30s ] thds: 32 tps: 705.69 qps: 14105.48 (r/w/o: 9872.32/2791.28/1441.89) lat (ms,95%): 84.47 err/s: 0.00 reconn/s: 0.00 [ 40s ] thds: 32 tps: 760.04 qps: 15205.28 (r/w/o: 10647.74/3003.17/1554.37) lat (ms,95%): 74.46 err/s: 0.00 reconn/s: 0.00 [ 50s ] thds: 32 tps: 725.75 qps: 14508.59 (r/w/o: 10152.46/2875.01/1481.12) lat (ms,95%): 81.48 err/s: 0.00 reconn/s: 0.00 [ 60s ] thds: 32 tps: 840.27 qps: 16812.45 (r/w/o: 11771.21/3322.29/1718.94) lat (ms,95%): 64.47 err/s: 0.00 reconn/s: 0.00 SQL statistics: queries performed: read: 623644 write: 176266 other: 91010 total: 890920 transactions: 44546 (741.82per sec.) queries: 890920 (14836.39 per sec.) ignored errors: 0 (0.00 per sec.) reconnects: 0 (0.00 per sec.) General statistics: total time: 60.0483s total number of events: 44546 Latency (ms): min: 14.30 avg: 43.12 max: 294.21 95th percentile: 78.60 sum: 1920991.21Threads fairness: events(avg/stddev): 1392.0625/14.39 execution time (avg/stddev): 60.0310/0.01

可以看到业务 2 QPS 为 14836.39 ,抢占了业务 1 的资源,业务 1 的 QPS 也下降为 13981.06

2.使用资源管控

打开资源管控

SET GLOBAL tidb_enable_resource_control = ON;

由于资源不足,本次测试主要模拟的场景如下,如果资源充足可以尝试更多的租户以及资源预留。

使用 sysbench 对库 cctest1 及 cctest2 进行压测,模拟业务 1 和业务 2 同时进行

sysbench --mysql-host=10.0.0.1 --mysql-port=4000 --mysql-db=cctest1 --mysql-user=cc1 --mysql-password=test --time=60 --threads=32 --report-interval=10 --db-driver=mysql oltp_read_write --tables=10 --table-size=10000000 run

[ 10s ] thds: 32 tps: 1149.18 qps: 23022.02 (r/w/o: 16120.63/4540.04/2361.35) lat (ms,95%): 41.10 err/s: 0.00 reconn/s: 0.00 [ 20s ] thds: 32 tps: 1192.25 qps: 23843.04 (r/w/o: 16689.56/4706.29/2447.20) lat (ms,95%): 38.94 err/s: 0.00 reconn/s: 0.00 [ 30s ] thds: 32 tps: 1198.40 qps: 23980.96 (r/w/o: 16787.47/4730.29/2463.20) lat (ms,95%): 38.25 err/s: 0.00 reconn/s: 0.00 [ 40s ] thds: 32 tps: 1224.59 qps: 24479.44 (r/w/o: 17134.82/4835.05/2509.57) lat (ms,95%): 36.89 err/s: 0.00 reconn/s: 0.00 [ 50s ] thds: 32 tps: 1213.27 qps: 24266.29 (r/w/o: 16986.95/4790.90/2488.45) lat (ms,95%): 38.25 err/s: 0.00 reconn/s: 0.00 [ 60s ] thds: 32 tps: 1178.54 qps: 23566.47 (r/w/o: 16495.84/4652.36/2418.27) lat (ms,95%): 38.94 err/s: 0.00 reconn/s: 0.00SQL statistics: queries performed: read:1002344 write: 282657 other: 146919 total: 1431920 transactions: 71596 (1192.73 per sec.) queries: 1431920 (23854.51 per sec.) ignored errors: 0 (0.00 per sec.) reconnects: 0 (0.00 per sec.) General statistics: total time: 60.0257s total number of events:71596 Latency (ms): min: 15.95 avg: 26.82 max: 230.65 95th percentile: 38.94 sum: 1920204.43 Threads fairness: events (avg/stddev): 2237.3750/10.36 execution time (avg/stddev): 60.0064/0.01

sysbench --mysql-host=10.0.0.1 --mysql-port=4000 --mysql-db=cctest2 --mysql-user=cc2--mysql-password=test --time=60 --threads=32 --report-interval=10 --db-driver=mysql oltp_read_write --tables=10 --table-size=10000000 run

[ 10s ] thds: 32 tps: 121.44 qps: 2448.37 (r/w/o:

版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。

上一篇:PostgreSQL中Tuple可见性判断的基本逻辑是什么
下一篇:TiDB 7.1资源管控与Oceanbase 4.0多租户功能对比
相关文章