TiSpark v2.4.x 升级到 v2.5.x 的经验分享

网友投稿 274 2024-03-28



一、背景

在安装 TiDB v6.0的时候,使用 Tiup 扩容的方式安装TiSpark集群,最高的版本是 TiSpark v2.4.1,没有最新的 Release TiSpark v2.5.1 。另外,TiSpark v2.5.0 及以上版本实现了部分鉴权与授权功能。

TiSpark v2.4.x 升级到 v2.5.x 的经验分享

本次主要是体验

TiSpark v2.4.1 升级到 TiSpark v2.5.1

体验 TiSpark v2.5.1 的鉴权和授权功能

二、准备环境

2.1 安装 Cluster111 (V6.0.0)2.1.1 Cluster111 拓扑# cluster111.yml server_configs: tidb:   log.slow-threshold: 300   binlog.enable: false   binlog.ignore-error: false tikv:   readpool.storage.use-unified-pool: false   readpool.coprocessor.use-unified-pool: true pd:   schedule.leader-schedule-limit: 4   schedule.region-schedule-limit: 2048   schedule.replica-sch tikv_servers: - host: 10.0.2.15    # ssh_port: 22   port: 20160   status_port: 20180   config:     server.gager_servers: - host: 10.0.2.152.1.2 安装 Cluster1112.2 TiSpark v2.4.12.2.1 拓扑# cluster111-v6.0.0-tispark.yml tispark_masters: - host: 10.0.2.15   ssh_port: 22   port: 7077 # NOTE: multiple worker nodes on the same host is not supported by Spark tispark_workers: - host: 10.0.2.152.2.2 安装 TiSpark

安装openjdk8 (略)

扩容的方式安装 TiSpark

tiup cluster scale-out cluster111 ./cluster111-v6.0.0-tispark.yml -uroot -p 2.3 测试 Spark v2.4.3 Standalone

spark-defaults.conf 中增加配置

# sql扩展类 spark.sql.extensions org.apache.spark.sql.TiExtensions # master节点 spark.master spark://10.0.2.15:7077 # pd节点 多个pd用逗号隔开 如:10.16.20.1:2379,10.16.20.2:2379,10.16.20.3:2379 spark.tispark.pd.addresses 10.0.2.15:2379

启动 Spark 集群

/tidb-deploy/tispark-master-7077/sbin/start-all.sh

启动Spark-shell

启动 Spark-sql

# 启动 Spark-sql /tidb-deploy/tispark-master-7077/bin/spark-sql # 执行 select ti_version();

三、升级 TiSpark

3.1 下载升级软件

# 下载 Spark V3.1.3 curl -L "https://dlcdn.apache.org/spark/spark-3.1.3/spark-3.1.3-bin-hadoop3.2.t3.2 备份\cp -rf /tidb-deploy/tispark-master-7077 /tidb-deploy/tispark-master-7077-bak2.4.13.3 升级# 替换 Spark mkdir -p /usr/local0/webserver/tispark && tar -zxvf spark-3.1.3-bin-hadoop3.2.tgz -C /usr/local0/webserver/tispark/ mv /usr/local0/webserver/tispark/spark-3.1.3-bin-hadoop3.2 /tidb-deploy/tispark-master-7077 chown tidb.tidb -R /tidb-deploy/tispark-master-7077 # 替换 TiSpark 包 cp -rf tispark-assembly-3.1-2.5.1.jar /tidb-deploy/tispark-master-7077/jars/ # 配置文件 cp -rf /tidb-deploy/tispark-master-7077-bak2.4.1/conf/* /tidb-deploy/tispark-master-7077/conf/3.4 测试

启动 Spark 集群

/tidb-deploy/tispark-master-7077/sbin/start-all.sh

启动Spark-shell

# 启动 spark-shell /tidb-deploy/tispark-master-7077/bin/spark-s

启动 Spark-sql

# 启动 Spark-sql /tidb-deploy/tispark-master-7077/bin/spark-sql # 执行 select ti_version();

四、测试 TiSpark v2.5.1 鉴权

参考:https://github.com/pingcap/tispark/blob/master/docs/authorization_userguide.md

Authorization and authentication through TiDB server

The databases user account must have the PROCESS privilege.

TiSpark version >= 2.5.0

Spark version = 3.0.x or 3.1.x

4.1 增加配置 spark-defaults.confspark.sql.tidb.addr   10.0.2.15 rue # in seconds. Values range from 5 to 3600 spark.sql.tidb.auth.refreshInterval 304.2 配置错误密码#这里是错误的密码 spark.sql.tidb.password abc

启动 spark-sql 后使用 执行 sql 语句将报错

4.3 修正密码# 空密码 spark.sql.t

启动 spark-sql

/tidb-deploy/tispark-master-7077/bin/spark-sqluse tidb_catalog; show databases; select CUSTOMER tablename , count(*) ct from tidb_catalog.TPCH_001.CUSTOMER union all select NATION tablename , count(*) ct from tidb_catalog.TPCH_001.NATION union all select REGION tablename , count(*) ct from tidb_catalog.TPCH_001.REGION union all select PART tablename , count(*) ct from tidb_catalog.TPCH_001.PART union all select SUPPLIER tablename , count(*) ct from tidb_catalog.TPCH_001.SUPPLIER union all select PARTSUPP tablename , count(*) ct from tidb_catalog.TPCH_001.PARTSUPP union all select ORDERS tablename , count(*) ct from tidb_catalog.TPCH_001.ORDERS union all select LINEITEM tablename , count(*) ct from tidb_catalog.TPCH_001.LINEITEM  order by ct desc; 4.4 SparkSession 中配置密码spark.sqlContext.setConf("spark.sql.tidb.addr", your_tidb_server_address) spark.sqlContext.setConf("spark.sql.tidb.port", your_tidb_server_port) spark.sqlContext.setConf("spark.sql.tidb.user", your_tidb_server_user) spark.sqlContext.setConf("spark.sql.tidb.password", your_tidb_server_password)4.5 限制

不能与 TiDB 以外的其他数据源一起工作

不支持基于角色的权限

TiDB Data Source API 不支持,例如 TiBatchWrite

五、总结

本篇实践了 tiup list tispark --all 没有 TiSpark v2.5.x的情况下,升级到 TiSpark v2.5.1;

同时试用了 TiSpark v2.5.x 新支持的鉴权特性。

谢谢!

参考

https://tidb.net/blog/19eeb447#Spark Standalone集群升级步骤 https://tidb.net/blog/b8f902a9#TiSpark 2.4.1(Spark 2.4.5)到TiSpark 2.5.0(Spark 3.0.X/3.1.X)迁移实践 https://github.com/pingcap/tispark/blob/master/docs/authorization_userguide.md

版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。

上一篇:TiSpark 3.0.0 版本新特性实践分享
下一篇:TiSpark v2.5 开发入门与 v3.0.0 新功能解读
相关文章