TiKV-CDC 实现 rawkv 集群的两地三中心架构

网友投稿 485 2024-03-28



一、背景

业务的连续性和稳定性对企业来说至关重要,因此对数据库侧的容灾能力要求也越来越高,传统的高可用架构已无法满足高要求的容灾要求,两地三中心架构已成为主流架构。

TiKV-CDC 实现 rawkv 集群的两地三中心架构

对于rawkv集群来说,需要使用TiKV-CDC来实现在两个集群间的数据同步

二、环境准备

安装介质

tikv-cdc***:

https://github.com/tikv/migration/releases/tag/cdc-v1.1.1

(注:TiKV-CDC是专门同步rawkv集群开发的,跟TiCDC不同)

tikv-cdc使用手册:https://tikv.org/docs/6.5/concepts/explore-tikv-features/cdc/cdc-cn/

三、集群拓扑

两地三中心架构下,总共有3套集群,分别为:主生产集群、同城应急集群、异地灾备集群。

主生产集群承担读写业务,通过TiKV-CDC同步数据至同城应急集群、异地灾备集群,同步链路为2条独立的链路,互不影响。

TiKV-CDC同步链路可以定期进行切换演练,验证灾备环境的可用性。

同城应急集群可以承担读流量。

故障情况说明

单个AZ级故障:

1、集群中同城A机房2个AZ中任意单AZ级故障,集群不受影响。

2、集群中同城B机房AZ故障,集群不受影响。

区域级故障:

1、同城A、B机房同时发生故障,异地C机房的灾备集群可以直接承载流量。

集群级故障:

1、主生产集群发生问题后,可以切换至同城应急集群继续提供服务。

读写分离:

1、同城应急库可以根据不同业务类型提供读能力。

四、创建两地三中心同步链路

4.1 备份主集群数据

rawkv备份使用的工具为tikv-br

备份流水索引集群 $ mkdir -p /data/br $ ./tikv-br backup raw --gcttl=600m --pd "http://PDIP:2379" --storage "local:///data/br" --dst-api-version v2 --log-file="/tmp/tikv-br.log"

4.2 还原数据至灾备集群

将备份数据scp到灾备集群上

$ ./tikv-br restore raw --pd "http://PDIP:2379" --storage "local:///data/br" --log-file /tmp/restore.log"

4.3 创建同步链路

在主集群的tikv-cdc机器上创建同步链路changefeed-id

(PD_MASTER_IP为主集群的PD地址,PD_SLAVE_IP为灾备集群的PD地址)

$ cd /home/tidb/tidb_deploy/cdc-8300/bin ./tikv-cdc cli changefeed create --pd "http://PD_MASTER_IP:2379" --sink-uri tikv://PD_SLAVE_IP:2379 --start-ts<前边记录的BackupTs> --changefeed-id test-cdc

4.4 查看同步链路状态

$ ./tikv-cdc cli changefeed --pd http://PD_MASTER_IP:2379 list

五、主备集群切换

5.1 主集群切换到灾备集群

5.1.1 主集群停止TICDC同步,检查主集群grpc流量为0

观察tikv-details下的grpc流量,若为0,则说明业务已经停止。

5.1.2 执行remove_cdc.sh脚本

remove_cdc.sh需要在主集群的tikv-cdc节点执行。

sh remove_cdc.sh

remove_cdc.sh脚本如下:

注意:CHANGEFEED_ID,pd_master这几个变量需要按照实际情况修改。

#!/bin/bash export PATH=/home/ap/cdacs/.tiup/bin:$PATH TIME=`(date "+%Y-%m-%d_%H-%M-%S")` #CHANGEFEED_ID,pdipdport are variables CHANGEFEED_ID=test-cdc-bak ##pd_master is up stream cdc pd info. pd_master=IP:2379 ticdc_dir=`ps -ef |grep tikv-[c]dc |awk -F" " {print $(NF)}|awk -F"log" {print $1}` ##new add 20230106 cdcstate=`${ticdc_dir}bin/tikv-cdc cli changefeed query -s --changefeed-id $CHANGEFEED_ID --pd http://$pd_master |grep -E "state" |awk -F: {print $2} |awk -F, {print $1} |sed s/[^a-z]//g` logpath="${ticdc_dir}log" scripts_log="$logpath/remove_changefeed_$TIME.log" cdc_change(){ echo "cdc changefeed remove." ${ticdc_dir}bin/tikv-cdc cli changefeed remove --changefeed-id $CHANGEFEED_ID --pd=http://$pd_master if [ -eq 0 ]; then echo "’$TIME‘ remove changefeed-id successed!!!" sleep 5 ${ticdc_dir}bin/tikv-cdc cli changefeed --pd http://$pd_master list else echo "‘$TIME’ remove changefeed-id failed!!!" fi } tikvcdcstate() { echo "tikvcdc replication state check." if [ $cdcstate = "normal" ];then echo "tikvcdc replication state is normal." else echo "tikvcdc replication state is $cdcstate." exit 1 fi } tikvcdcreplication() { tsoa=`${ticdc_dir}bin/tikv-cdc cli tso query --pd=http://$pd_master` echo "tikv cdc replication progress check." checknums=10 seconds=2 nums=0 while true do tsob=`${ticdc_dir}bin/tikv-cdc cli changefeed query -s --changefeed-id $CHANGEFEED-ID --pd http://$pd_master |grep -E "tso" |awk -F" " {print $2}|awk -F"," {print $1}` let nums++ if [ $tsob -gt $tsoa ] && [ $nums -le $checknums ];then echo "tikvcdc replication is done,this is $nums times replication check." break elif [ $tsob -lt $tsoa ] && [ $nums -lt $checknums ];then echo "tikvcdc replication is not done,this is $nums times replication check." sleep $seconds else echo "tikvcdc replication is not done,this is$nums times and last replication check." break fi done } main() { tikvcdcstate tikvcdcreplication cdc_change } main| tee ${scripts_log}5.1.3 观察日志输出

检查日志${scripts_log}有无报错,若无报错,则说明执行成功。

5.1.4 灾备集群启动TICDC同步

执行create_cdc.sh脚本,create_cdc.sh需要在tikv-cdc节点执行。

sh create_cdc.sh

create_cdc.sh脚本如下:

注意:CHANGEFEED_ID,pd_master,pd_slave这几个变量需要按照实际情况修改。

#!/bin/bash export PATH=/home/ap/cdacs/.tiup/bin:$PATH TIME=`(date "+%Y-%m-%d_%H-%M-%S")` #CHANGEFEED_ID,pd_master,pd_slave are variables CHANGEFEED_ID=xxx-cdc-bak ##pd_master is up stream cdc pd info. pd_master=IP:2379 #pd_slave is down stream cdc pd info. pd_slave=IP:2379 ticdc_dir=`ps -ef |grep tikv-[c]dc |awk -F" " {print $(NF)}|awk -F"log" {print $1}` logpath="${ticdc_dir}log" scripts_log="$logpath/create_changefeed_$TIME.log" tikvcdcstate() { echo "tikvcdc replication state check." if [ $cdcstate = "normal" ];then echo "tikvcdc replication state is normal." else echo "tikvcdc replication state is $cdcstate." exit 1 fi } cdc_change(){ echo "cdc changefeed-id create." .${ticdc_dir}bin/tikv-cdc cli changefeed create --pd "http://$pd_master" --sink-uri tikv://$pd_slave --changefeed-id $CHANGEFEED_ID if [ -eq 0 ]; then echo "$TIME Create changefeed-id successed!!!" cdcstate=`.${ticdc_dir}bin/tikv-cdc cli changefeed --pd "http://$pd_master" list |grep -E "state" |awk -F: {print $2} |awk -F, {print $1} |sed s/\"//g` tikvcdcstate else echo "$TIME Create changefeed-id failed!!!" exit 1 fi } cdc_change | tee ${scripts_log}5.1.5 观察日志输出

观察日志${scripts_log}有无报错,若无报错,则说明执行成功。

5.2 灾备集群回切到主集群

5.2.1 灾备集群停止TICDC同步,检查灾备集群grpc流量为0

观察tikv-details下的grpc流量,若为0,则说明业务已经停止。

5.2.2 执行remove_cdc.sh脚本

remove_cdc.sh需要在tikv-cdc节点执行

sh remove_cdc.sh

remove_cdc.sh脚本如下:

注意:CLUSTER_NAME,CHANGEFEED_ID,pdipdport这几个变量需要按照实际情况修改。

#!/bin/bash export PATH=/home/ap/cdacs/.tiup/bin:$PATH TIME=`(date "+%Y-%m-%d_%H-%M-%S")` #CHANGEFEED_ID,pdipdport are variables CHANGEFEED_ID=test-cdc-bak ##pd_master is up stream cdc pd info. pd_master=172.16.11.115:2379 ticdc_dir=`ps -ef |grep tikv-[c]dc |awk -F" " {print $(NF)}|awk -F"log" {print $1}` ##new add 20230106 cdcstate=`${ticdc_dir}bin/tikv-cdc cli changefeed query -s --changefeed-id $CHANGEFEED_ID --pd http://$pd_master|grep -E "state" |awk -F: {print $2} |awk -F, {print $1} |sed s/[^a-z]//g` logpath="${ticdc_dir}log" scripts_log="$logpath/remove_changefeed_$TIME.log" cdc_change(){ echo "cdc changefeed remove." ${ticdc_dir}bin/tikv-cdc cli changefeed remove --changefeed-id $CHANGEFEED_ID --pd=http://$pd_master if [ -eq 0 ]; then echo "’$TIME‘ remove changefeed-id successed!!!" sleep 5 ${ticdc_dir}bin/tikv-cdc cli changefeed --pd http://$pd_master list else echo "‘$TIME’ remove changefeed-id failed!!!" fi } ##new add 20230106 tikvcdcstate() { echo "tikvcdc replication state check." if [ $cdcstate = "normal" ];then echo "tikvcdc replication state is normal." else echo "tikvcdc replication state is $cdcstate." exit 1 fi } ##new add 20230106 tikvcdcreplication() { tsoa=`${ticdc_dir}bin/tikv-cdc cli tso query --pd=http://$pd_master` echo "tikv cdc replication progress check." checknums=10 seconds=2 nums=0 while true do tsob=`${ticdc_dir}bin/tikv-cdc cli changefeed query -s --changefeed-id $CHANGEFEED-ID --pd http://$pd_master|grep -E "tso" |awk -F" " {print $2}|awk -F"," {print $1}` let nums++ if [ $tsob -gt $tsoa ] && [ $nums -le $checknums ];then echo "tikvcdc replication is done,this is $nums times replication check." break elif [ $tsob -lt $tsoa ] && [ $nums -lt $checknums ];then echo "tikvcdc replication is not done,this is$nums times replication check." sleep $seconds else echo "tikvcdc replication is not done,this is $nums times and last replication check." break fi done } main() { tikvcdcstate tikvcdcreplication cdc_change } main| tee ${scripts_log}5.2.3 观察日志输出

检查$ticdc_dir/log/$logfile有无报错,若无报错,则说明执行成功。

5.2.4 主集群启动TICDC同步

执行create_cdc.sh脚本

create_cdc.sh需要在tikv-cdc节点执行

sh create_cdc.sh

create_cdc.sh脚本如下:

注意:CLUSTER_NAME,CHANGEFEED_ID,pd_master,pd_slave这几个变量需要按照实际情况修改。

#!/bin/bash set -e export PATH=/home/cdacs/.tiup/bin:$PATH TIME=`(date "+%Y-%m-%d_%H-%M-%S")` #CHANGEFEED_ID,pd_master,pd_slave are variables CHANGEFEED_ID=test-cdc-bak ##pd_master is up stream cdc pd info. pd_master=IP:2379 #pd_slave is down stream cdc pd info. pd_slave=IP:2379 ticdc_dir=`ps -ef |grep tikv-[c]dc |awk -F" " {print $(NF)}|awk -F"log" {print $1}` logpath="${ticdc_dir}log" scripts_log="$logpath/create_changefeed_$TIME.log" #cdcstate=`${ticdc_dir}bin/tikv-cdc cli changefeed --pd "http://$pd_master" list |grep -E "state" |awk -F: {print $2}|awk -F, {print $1}|sed s/[^a-z]//g` ##new add 20230106 tikvcdcstate() { cdcstate=`${ticdc_dir}bin/tikv-cdc cli changefeed query -s --changefeed-id $CHANGEFEED_ID --pd http://$pd_master|grep -E "state" |awk -F: {print $2} |awk -F, {print $1} |sed s/[^a-z]//g` echo "tikvcdc replication state check." if [ $cdcstate = normal ];then echo "tikvcdc replication state is normal." else echo "tikvcdc replication state is $cdcstate." exit 1 fi } cdc_change(){ echo "cdc changefeed-id start create." ${ticdc_dir}bin/tikv-cdc cli changefeed create --pd"http://$pd_master" --sink-uri tikv://$pd_slave --changefeed-id $CHANGEFEED_ID if [ -eq 0 ]; then echo "$TIME Create changefeed-id successed!!!" tikvcdcstate else echo "$TIME Create changefeed-id failed!!!" exit 1 fi } cdc_change | tee ${scripts_log}5.2.5 观察日志输出

观察日志${scripts_log}有无报错,若无报错,则说明执行成功。

六、总结

1、两地三中心架构给业务连续性提供了强有力的保障;

2、定期进行灾备切换演练。

版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。

上一篇:TiKV 缩容难题的解决方法
下一篇:TiKV主要内存结构与OOM排查总结
相关文章