TiDB Cloud 新体验:Tiflash 列存功能探索

网友投稿 495 2024-03-19



简介:

TiDB Cloud 新体验:Tiflash 列存功能探索

相比于行存,TiFlash 根据强 Schema 按列式存储结构化数据,借助 *** 的向量化计算引擎,带来读取和计算双重性能优势。相较于普通列存,TiFlash 则具有实时更新、分布式自动扩展、SI(Snapshot Isolation)隔离级别读取等优势。

一:TiFlash 对 OLAP 查询加速

OLAP 类的查询通常具有以下几个特点:

· 每次查询读取大量的行,但是仅需要少量的列

· 宽表,即每个表包含着大量的列

· 查询通过一张或多张小表关联一张大表,并对大表上的列做聚合

二:TiFlash 列存引擎针对这类查询有较好的优化效果:

(1) I/O 优化

· 每次查询可以只读取需要的列,减少了 I/O 资源的使用

· 同列数据类型相同,相较于行存可以获得更高的压缩比

· 整体的 I/O 减少,令内存的使用更加高效

(2) CPU 优化

· 列式存储可以很方便地按批处理字段,充分利用 CPU Cache 取得更好的局部性

· 利用向量化处理指令并行处理部分计算

三:实验过程:

(1)创建tidb cloud账号

(2)创建tidb 集群

(3)通过tidb cloud import导入1000条和100万条基础测试数据。

(4)创建客户、订单、产品表,插入测试数据

mysql> CREATE TABLE customers (

->     id INT AUTO_INCREMENT PRIMARY KEY,

    ->     name VARCHAR(100),

    ->     email VARCHAR(100),

    ->     created_at DATETIME

    -> ) ;

Query OK, 0 rows affected (0.71 sec)

mysql> CREATE TABLE orders (

    ->     id INT AUTO_INCREMENT PRIMARY KEY,

    ->     customer_id INT,

    ->     product_id INT,

    ->     quantity INT,

    ->     price DECIMAL(10, 2),

->     timestamp DATETIME

    -> ) ;

Query OK, 0 rows affected (0.46 sec)

mysql> CREATE TABLE products (

    ->     id INT AUTO_INCREMENT PRIMARY KEY,

    ->     name VARCHAR(100),

    ->     category VARCHAR(100),

    ->     price DECIMAL(10, 2)

    -> ) ;

Query OK, 0 rows affected (0.56 sec)

-- 顾客表插入100万行数据

mysql> INSERT INTO customers (name, email, created_at)

    -> SELECT 

    ->     CONCAT(Customer, seq),

    ->     CONCAT(customer, seq, @example.com),

    ->     NOW()

    -> FROM 

    ->     (SELECT seq FROM seq_1_to_1000000) s;

Query OK, 1000000 rows affected (11.96 sec)

Records: 1000000  Duplicates: 0  Warnings: 0

-- 产品表插入1000行数据

mysql> INSERT INTO products (name, category, price)

    -> SELECT 

    ->     CONCAT(Product, seq),

    ->     CONCAT(Category, FLOOR(RAND()*(10-1+1))+1),

    ->     ROUND(RAND()*(1000-1+1), 2)

    -> FROM 

    ->     (SELECT seq FROM seq_1_to_1000) s;

Query OK, 1000 rows affected (0.91 sec)

Records: 1000  Duplicates: 0  Warnings: 0

mysql> 

(5)通过上图可以看出数据已入库,并且按照行存储。并执行关联,求和操作,耗时1229ms。

(6)构建TIFLASH 副本

mysql> ALTER TABLE test.customers SET TIFLASH REPLICA 1;

Query OK, 0 rows affected, 1 warning (0.87 sec)

mysql> ALTER TABLE test.orders SET TIFLASH REPLICA 1;

Query OK, 0 rows affected, 1 warning (1.65 sec)

mysql> ALTER TABLE test.products SET TIFLASH REPLICA 1;

Query OK, 0 rows affected, 1 warning (0.85 sec)

查看表同步进度

mysql>  SELECT * FROM information_schema.tiflash_replica WHERE TABLE_SCHEMA = test and TABLE_NAME = customers;

+--------------+------------+----------+---------------+-----------------+-----------+----------+

| TABLE_SCHEMA | TABLE_NAME | TABLE_ID | REPLICA_COUNT | LOCATION_LABELS | AVAILABLE | PROGRESS |

+--------------+------------+----------+---------------+-----------------+-----------+----------+

| test         | customers  |      123 |             2 |                 |         0 |        0 |

+--------------+------------+----------+---------------+-----------------+-----------+----------+

1 row in set (0.64 sec)

mysql> SELECT * FROM information_schema.tiflash_replica WHERE TABLE_SCHEMA = test and TABLE_NAME = orders;

+--------------+------------+----------+---------------+-----------------+-----------+----------+

| TABLE_SCHEMA | TABLE_NAME | TABLE_ID | REPLICA_COUNT | LOCATION_LABELS | AVAILABLE | PROGRESS |

+--------------+------------+----------+---------------+-----------------+-----------+----------+

| test         | orders     |      134 |             2 |                 |         0 |        0 |

+--------------+------------+----------+---------------+-----------------+-----------+----------+

1 row in set (0.49 sec)

mysql> SELECT * FROM information_schema.tiflash_replica WHERE TABLE_SCHEMA = test and TABLE_NAME = products;

+--------------+------------+----------+---------------+-----------------+-----------+----------+

| TABLE_SCHEMA | TABLE_NAME | TABLE_ID | REPLICA_COUNT | LOCATION_LABELS | AVAILABLE | PROGRESS |

+--------------+------------+----------+---------------+-----------------+-----------+----------+

| test         | products   |      128 |             2 |                 |         0 |        0 |

+--------------+------------+----------+---------------+-----------------+-----------+----------+

1 row in set (0.52 sec)

AVAILABLE 字段表示该表的 TiFlash 副本是否可用。1 代表可用,0 代表不可用。副本状态变为可用之后就不再改变,如果通过 DDL 命令修改副本数则会重新计算同步进度。

PROGRESS 字段代表同步进度,在 0.0~1.0 之间,1 代表至少 1 个副本已经完成同步。刚才查询lineitem表副本不可用,再次查询已经可用

mysql>  SELECT * FROM information_schema.tiflash_replica WHERE TABLE_SCHEMA = test and TABLE_NAME = customers;

+--------------+------------+----------+---------------+-----------------+-----------+----------+

| TABLE_SCHEMA | TABLE_NAME | TABLE_ID | REPLICA_COUNT | LOCATION_LABELS | AVAILABLE | PROGRESS |

+--------------+------------+----------+---------------+-----------------+-----------+----------+

| test         | customers  |      123 |             2 |                 |         1 |        1 |

+--------------+------------+----------+---------------+-----------------+-----------+----------+

1 row in set (1.94 sec)

mysql> SELECT * FROM information_schema.tiflash_replica WHERE TABLE_SCHEMA = test and TABLE_NAME = products;

+--------------+------------+----------+---------------+-----------------+-----------+----------+

| TABLE_SCHEMA | TABLE_NAME | TABLE_ID | REPLICA_COUNT | LOCATION_LABELS | AVAILABLE | PROGRESS |

+--------------+------------+----------+---------------+-----------------+-----------+----------+

| test         | products   |      128 |             2 |                 |         1 |        1 |

+--------------+------------+----------+---------------+-----------------+-----------+----------+

1 row in set (0.77 sec)

mysql> SELECT * FROM information_schema.tiflash_replica WHERE TABLE_SCHEMA = test and TABLE_NAME = orders;

+--------------+------------+----------+---------------+-----------------+-----------+----------+

| TABLE_SCHEMA | TABLE_NAME | TABLE_ID | REPLICA_COUNT | LOCATION_LABELS | AVAILABLE | PROGRESS |

+--------------+------------+----------+---------------+-----------------+-----------+----------+

| test         | orders     |      134 |             2 |                 |         1 |        1 |

+--------------+------------+----------+---------------+-----------------+-----------+----------+

1 row in set (0.66 sec)

(7)如图所示,发送进行了列存储。执行相同的SQL查询。

结论:如图所示相同SQL,执行计划使用了TiFlash ,执行时间118ms;相比未使用TiFlash 执行时间1229ms,速度提升了10倍。

备注:一条select 查询中,可以同时使用TIKV和TIFLSH,tidb默认会根据select查询进行判断。

版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。

上一篇:分享TIKV 源码学习笔记--分布式事务接口 CheckTxnStatus/ ResolveLock
下一篇:TiDB Contributor 资料汇总分享
相关文章