Data Migration 高可用演练指南

网友投稿 448 2024-02-22



为确保DM可以在线上稳定运行,现计划对其高可用机制进行演练,主要包括如下事项:

Data Migration 高可用演练指南

事项验证点步骤结论dm-worker ha

验证dm-worker宕机

同步任务是否会转移

同步任务情况(延迟、状态等)

宕掉的dm-worker启动后,dm-worker是否会自动启动并重新加入集群

如下

如下

dm-master ha

验证dm-master leader宕机

leader是否正常选举

选举过程中,同步任务的情况(延迟、状态等)

dm-master所在机器启动后,dm-master是否会自动启动并重新加入集群

如下如下滚动升级

升级dm到v2.0.6

leader是否正常选举

同步任务情况

如下如下

步骤及结论

dm-worker HA

模拟dm-worker宕机

?

date; kill -9 pid; mv <deploy dir> <deploy dir>-1 # 强制kill dm-worker pid,并将部署目录改名防止自启动

观察任务切换情况

记录相关数据:切换耗时,任务状态,延时情况

结论:

同步任务是否会转移

?

[2021/08/17 13:28:04.712 +08:00] [WARN] [grpclog.go:60] ["grpc: addrConn.createTransport failed to connect to {172.17.201.115:8262 <nil> 0 <nil>}. Err :connection error: desc = \"transport: Error while dialing dial tcp 172.17.201.115:8262: connect: connection refused\". Reconnecting..."] [component="embed etcd"] ... [2021/08/17 13:28:51.576 +08:00] [WARN] [grpclog.go:60] ["grpc: addrConn.createTransport failed to connect to {172.17.201.115:8262 <nil> 0 <nil>}. Err :connection error: desc = \"transport: Error while dialing dial tcp 172.17.201.115:8262: connect: connection refused\". Reconnecting..."] [component="embed etcd"] [2021/08/17 13:28:54.876 +08:00] [WARN] [grpclog.go:60] ["grpc: addrConn.createTransport failed to connect to {172.17.201.115:8262 <nil> 0 <nil>}. Err :connection error: desc = \"transport: Error while dialing dial tcp 172.17.201.115:8262: connect: connection refused\". Reconnecting..."] [component="embed etcd"] [2021/08/17 13:28:57.913 +08:00] [WARN] [grpclog.go:60] ["grpc: addrConn.createTransport failed to connect to {172.17.201.115:8262 <nil> 0 <nil>}. Err :connection error: desc = \"transport: Error while dialing dial tcp 172.17.201.115:8262: connect: connection refused\". Reconnecting..."] [component="embed etcd"] [2021/08/17 13:28:58.159 +08:00] [INFO] [keepalive.go:216] ["receive dm-worker keep alive event"] [operation=DELETE] [kv=/dm-worker/a/646d2d3137322e31372e3230312e3131352d38323632] [2021/08/17 13:28:58.163 +08:00] [INFO] [scheduler.go:1506] ["receive worker status change event"] [component=scheduler] [delete=true] [event="{\"worker-name\":\"dm-172.17.201.115-8262\",\"join-time\":\"0001-01-01T00:00:00Z\"}"] [2021/08/17 13:28:58.165 +08:00] [INFO] [scheduler.go:1662] ["unbound the worker for source"] [component=scheduler] [bound="{\"source\":\"ds-mysql_report\",\"worker\":\"dm-172.17.201.115-8262\"}"] [event="{\"worker-name\":\"dm-172.17.201.115-8262\",\"join-time\":\"0001-01-01T00:00:00Z\"}"] [2021/08/17 13:28:58.165 +08:00] [INFO] [scheduler.go:1838] ["found free worker when source bound"] [component=scheduler] [worker=dm-172.18.78.254-8265] [source=ds-mysql_report] [2021/08/17 13:28:58.168 +08:00] [INFO] [scheduler.go:1876] ["bound the source to worker"] [component=scheduler] [bound="{\"source\":\"ds-mysql_report\",\"worker\":\"dm-172.18.78.254-8265\"}"]

大约60s左右,新的dm-worker成功接管同步任务,通过query-status查看同步状态正常

同步任务情况

?

[2021/08/17 13:28:58.168 +08:00] [INFO] [server.go:581] ["receive source bound"] [bound="{\"source\":\"ds-mysql_report\",\"worker\":\"dm-172.18.78.254-8265\"}"] ["is deleted"=false] [2021/08/17 13:28:58.170 +08:00] [WARN] [task.go:826] ["session variable time_zone is overwritten by default UTC timezone."] [time_zone=+00:00] [2021/08/17 13:28:58.170 +08:00] [INFO] [server.go:836] ["will start a new worker"] [sourceID=ds-mysql_report] [2021/08/17 13:28:58.170 +08:00] [INFO] [worker.go:120] [initialized] [component="worker controller"] [cfg="{\"enable-gtid\":true,\"auto-fix-gtid\":false,\"relay-dir\":\"relay-dir\",\"meta-dir\":\"\",\"flavor\":\"mysql\",\"charset\":\"\",\"enable-relay\":false,\"relay-binlog-name\":\"\",\"relay-binlog-gtid\":\"\",\"source-id\":\"ds-mysql_report\",\"from\":{\"host\":\"172.16.150.53\",\"port\":15381,\"user\":\"dm_sync\",\"max-allowed-packet\":null,\"session\":{\"time_zone\":\"+00:00\"},\"security\":null},\"purge\":{\"interval\":3600,\"expires\":0,\"remain-space\":15},\"checker\":{\"check-enable\":true,\"backoff-rollback\":{\"Duration\":\"5m0s\"},\"backoff-max\":{\"Duration\":\"5m0s\"}},\"server-id\":429548349,\"case-sensitive\":false,\"filters\":null}"] [2021/08/17 13:28:58.170 +08:00] [INFO] [worker.go:135] ["start running"] [component="worker controller"] [2021/08/17 13:28:58.270 +08:00] [INFO] [worker.go:310] ["enter EnableHandleSubtasks"] [component="worker controller"] [2021/08/17 13:28:58.272 +08:00] [WARN] [task.go:826] ["session variable time_zone is overwritten by default UTC timezone."] [time_zone=+00:00] [2021/08/17 13:28:58.272 +08:00] [WARN] [task.go:826] ["session variable time_zone is overwritten by default UTC timezone."] [time_zone=+00:00] [2021/08/17 13:28:58.273 +08:00] [INFO] [worker.go:326] ["starting to handle mysql source"] [component="worker controller"] [sourceCfg="{\"enable-gtid\":true,\"auto-fix-gtid\":false,\"relay-dir\":\"relay-dir\",\"meta-dir\":\"\",\"flavor\":\"mysql\",\"charset\":\"\",\"enable-relay\":false,\"relay-binlog-name\":\"\",\"relay-binlog-gtid\":\"\",\"source-id\":\"ds-mysql_report\",\"from\":{\"host\":\"172.16.150.53\",\"port\":15381,\"user\":\"dm_sync\",\"max-allowed-packet\":null,\"session\":{\"time_zone\":\"+00:00\"},\"security\":null},\"purge\":{\"interval\":3600,\"expires\":0,\"remain-space\":15},\"checker\":{\"check-enable\":true,\"backoff-rollback\":{\"Duration\":\"5m0s\"},\"backoff-max\":{\"Duration\":\"5m0s\"}},\"server-id\":429548349,\"case-sensitive\":false,\"filters\":null}"] [subTasks="{\"dm-mysql_report\":{\"is-sharding\":false,\"shard-mode\":\"\",\"online-ddl-scheme\":\"gh-ost\",\"case-sensitive\":false,\"name\":\"dm-mysql_report\",\"mode\":\"incremental\",\"ignore-checking-items\":[\"dump_privilege\"],\"source-id\":\"ds-mysql_report\",\"server-id\":429548349,\"flavor\":\"mysql\",\"meta-schema\":\"dm_meta\",\"heartbeat-update-interval\":1,\"heartbeat-report-interval\":10,\"enable-heartbeat\":false,\"meta\":{\"BinLogName\":\"\",\"BinLogPos\":0,\"BinLogGTID\":\"34474b1e-2bf3-11e8-8515-00163e1040fb:1-1278385,767d5889-e08e-11ea-bf83-00163e0e3732:1,7fbb40a3-8240-11eb-8cda-00163e17fb0e:1-168290280,803ffea1-7b9d-11e9-87b0-00163e0e3732:1-435424493,a2d27a7f-de3c-11e7-82cd-00163e1040fb:1-304056,a33473fb-de3c-11e7-8140-00163e0e6470:1-68232043,bfbebe4d-1582-11e9-8e63-00163e082a23:1-36011466,e033b7c4-7b9d-11e9-8e45-00163e097eeb:1-608218207\"},\"timezone\":\"\",\"relay-dir\":\"relay-dir\",\"use-relay\":false,\"from\":{\"host\":\"172.16.150.53\",\"port\":15381,\"user\":\"dm_sync\",\"max-allowed-packet\":null,\"session\":{\"time_zone\":\"+00:00\"},\"security\":null},\"to\":{\"host\":\"172.21.35.233\",\"port\":15381,\"user\":\"dm_load\",\"max-allowed-packet\":null,\"session\":{\"TiDB_txn_mode\":\"optimistic\",\"time_zone\":\"+00:00\"},\"security\":null},\"route-rules\":[{\"schema-pattern\":\"reverse_flow\",\"table-pattern\":\"\",\"target-schema\":\"reverse_center\",\"target-table\":\"\"}],\"filter-rules\":[],\"mapping-rule\":[],\"black-white-list\":null,\"block-allow-list\":{\"do-tables\":[{\"db-name\":\"reverse_flow\",\"tbl-name\":\"rc_reverse_record_integration\"}],\"do-dbs\":[\"reverse_flow\"],\"ignore-tables\":null,\"ignore-dbs\":null},\"mydumper-path\":\"./bin/mydumper\",\"threads\":1,\"chunk-filesize\":\"64\",\"statement-size\":0,\"rows\":1000,\"where\":\"\",\"skip-tz-utc\":true,\"extra-args\":\"--consistency none\",\"pool-size\":8,\"dir\":\"./dm-mysql_report.dm-mysql_report\",\"meta-file\":\"\",\"worker-count\":128,\"batch\":100,\"queue-size\":1024,\"checkpoint-flush-interval\":30,\"max-retry\":0,\"auto-fix-gtid\":false,\"enable-gtid\":true,\"disable-detect\":false,\"safe-mode\":false,\"enable-ansi-quotes\":false,\"log-level\":\"\",\"log-file\":\"\",\"log-format\":\"\",\"log-rotate\":\"\",\"pprof-addr\":\"\",\"status-addr\":\"\",\"config-file\":\"\",\"clean-dump-file\":false,\"ansi-quotes\":false}}"] [2021/08/17 13:28:58.273 +08:00] [INFO] [worker.go:333] ["start to create subtask"] [component="worker controller"] [sourceID=ds-mysql_report] [task=dm-mysql_report] [2021/08/17 13:28:58.273 +08:00] [INFO] [worker.go:426] ["subtask created"] [component="worker controller"] [config="{\"is-sharding\":false,\"shard-mode\":\"\",\"online-ddl-scheme\":\"gh-ost\",\"case-sensitive\":false,\"name\":\"dm-mysql_report\",\"mode\":\"incremental\",\"ignore-checking-items\":[\"dump_privilege\"],\"source-id\":\"ds-mysql_report\",\"server-id\":429548349,\"flavor\":\"mysql\",\"meta-schema\":\"dm_meta\",\"heartbeat-update-interval\":1,\"heartbeat-report-interval\":10,\"enable-heartbeat\":false,\"meta\":{\"BinLogName\":\"\",\"BinLogPos\":0,\"BinLogGTID\":\"34474b1e-2bf3-11e8-8515-00163e1040fb:1-1278385,767d5889-e08e-11ea-bf83-00163e0e3732:1,7fbb40a3-8240-11eb-8cda-00163e17fb0e:1-168290280,803ffea1-7b9d-11e9-87b0-00163e0e3732:1-435424493,a2d27a7f-de3c-11e7-82cd-00163e1040fb:1-304056,a33473fb-de3c-11e7-8140-00163e0e6470:1-68232043,bfbebe4d-1582-11e9-8e63-00163e082a23:1-36011466,e033b7c4-7b9d-11e9-8e45-00163e097eeb:1-608218207\"},\"timezone\":\"\",\"relay-dir\":\"relay-dir\",\"use-relay\":false,\"from\":{\"host\":\"172.16.150.53\",\"port\":15381,\"user\":\"dm_sync\",\"max-allowed-packet\":null,\"session\":{\"time_zone\":\"+00:00\"},\"security\":null},\"to\":{\"host\":\"172.21.35.233\",\"port\":15381,\"user\":\"dm_load\",\"max-allowed-packet\":null,\"session\":{\"tidb_txn_mode\":\"optimistic\",\"time_zone\":\"+00:00\"},\"security\":null},\"route-rules\":[{\"schema-pattern\":\"reverse_flow\",\"table-pattern\":\"\",\"target-schema\":\"reverse_center\",\"target-table\":\"\"}],\"filter-rules\":[],\"mapping-rule\":[],\"black-white-list\":null,\"block-allow-list\":{\"do-tables\":[{\"db-name\":\"reverse_flow\",\"tbl-name\":\"rc_reverse_record_integration\"}],\"do-dbs\":[\"reverse_flow\"],\"ignore-tables\":null,\"ignore-dbs\":null},\"mydumper-path\":\"./bin/mydumper\",\"threads\":1,\"chunk-filesize\":\"64\",\"statement-size\":0,\"rows\":1000,\"where\":\"\",\"skip-tz-utc\":true,\"extra-args\":\"--consistency none\",\"pool-size\":8,\"dir\":\"./dm-mysql_report.dm-mysql_report\",\"meta-file\":\"\",\"worker-count\":128,\"batch\":100,\"queue-size\":1024,\"checkpoint-flush-interval\":30,\"max-retry\":0,\"auto-fix-gtid\":false,\"enable-gtid\":true,\"disable-detect\":false,\"safe-mode\":false,\"enable-ansi-quotes\":false,\"log-level\":\"\",\"log-file\":\"\",\"log-format\":\"\",\"log-rotate\":\"\",\"pprof-addr\":\"\",\"status-addr\":\"\",\"config-file\":\"\",\"clean-dump-file\":false,\"ansi-quotes\":false}"] [2021/08/17 13:28:58.273 +08:00] [INFO] [syncer.go:3024] ["use timezone"] [task=dm-mysql_report] [unit="binlog replication"] [location=UTC] [2021/08/17 13:28:58.891 +08:00] [INFO] [config.go:599] ["detect server type"] [task=dm-mysql_report] [unit="binlog replication"] [scope=upstream] [type=MySQL] [2021/08/17 13:28:58.891 +08:00] [INFO] [config.go:618] ["detect server version"] [task=dm-mysql_report] [unit="binlog replication"] [scope=upstream] [version=5.7.20-log] [2021/08/17 13:28:58.894 +08:00] [INFO] [config.go:599] ["detect server type"] [task=dm-mysql_report] [unit="binlog replication"] [scope=downstream] [type=TiDB] [2021/08/17 13:28:58.894 +08:00] [INFO] [config.go:618] ["detect server version"] [task=dm-mysql_report] [unit="binlog replication"] [scope=downstream] [version=4.0.13] [2021/08/17 13:28:59.422 +08:00] [INFO] [checkpoint.go:699] ["create checkpoint schema"] [task=dm-mysql_report] [unit="binlog replication"] [component="remote checkpoint"] [statement="CREATE SCHEMA IF NOT EXISTS `dm_meta`"] [2021/08/17 13:28:59.426 +08:00] [INFO] [checkpoint.go:723] ["create checkpoint table"] [task=dm-mysql_report] [unit="binlog replication"] [component="remote checkpoint"] [statements="[\"CREATE TABLE IF NOT EXISTS `dm_meta`.`dm-mysql_report_syncer_checkpoint` (\\n\\t\\t\\tid VARCHAR(32) NOT NULL,\\n\\t\\t\\tcp_schema VARCHAR(128) NOT NULL,\\n\\t\\t\\tcp_table VARCHAR(128) NOT NULL,\\n\\t\\t\\tbinlog_name VARCHAR(128),\\n\\t\\t\\tbinlog_pos INT UNSIGNED,\\n\\t\\t\\tbinlog_gtid TEXT,\\n\\t\\t\\texit_safe_binlog_name VARCHAR(128) DEFAULT ,\\n\\t\\t\\texit_safe_binlog_pos INT UNSIGNED DEFAULT 0,\\n\\t\\t\\texit_safe_binlog_gtid TEXT,\\n\\t\\t\\ttable_info JSON NOT NULL,\\n\\t\\t\\tis_global BOOLEAN,\\n\\t\\t\\tcreate_time timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,\\n\\t\\t\\tupdate_time timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,\\n\\t\\t\\tUNIQUE KEY uk_id_schema_table (id, cp_schema, cp_table)\\n\\t\\t)\"]"] [2021/08/17 13:28:59.429 +08:00] [INFO] [checkpoint.go:785] ["fetch global checkpoint from DB"] [task=dm-mysql_report] [unit="binlog replication"] [component="remote checkpoint"] ["global checkpoint"="position: (mysql-bin.001906, 820109405), gtid-set: 34474b1e-2bf3-11e8-8515-00163e1040fb:1-1278385,767d5889-e08e-11ea-bf83-00163e0e3732:1,7fbb40a3-8240-11eb-8cda-00163e17fb0e:1-344070205,803ffea1-7b9d-11e9-87b0-00163e0e3732:1-435424493,a2d27a7f-de3c-11e7-82cd-00163e1040fb:1-304056,a33473fb-de3c-11e7-8140-00163e0e6470:1-68232043,bfbebe4d-1582-11e9-8e63-00163e082a23:1-36011466,e033b7c4-7b9d-11e9-8e45-00163e097eeb:1-608218207(flushed position: (mysql-bin.001906, 820109405), gtid-set: 34474b1e-2bf3-11e8-8515-00163e1040fb:1-1278385,767d5889-e08e-11ea-bf83-00163e0e3732:1,7fbb40a3-8240-11eb-8cda-00163e17fb0e:1-344070205,803ffea1-7b9d-11e9-87b0-00163e0e3732:1-435424493,a2d27a7f-de3c-11e7-82cd-00163e1040fb:1-304056,a33473fb-de3c-11e7-8140-00163e0e6470:1-68232043,bfbebe4d-1582-11e9-8e63-00163e082a23:1-36011466,e033b7c4-7b9d-11e9-8e45-00163e097eeb:1-608218207)"] [2021/08/17 13:28:59.431 +08:00] [INFO] [subtask.go:226] ["start to run"] [subtask=dm-mysql_report] [unit=Sync] [2021/08/17 13:28:59.431 +08:00] [INFO] [worker.go:351] ["handling subtask enabled"] [component="worker controller"] [2021/08/17 13:28:59.432 +08:00] [INFO] [syncer.go:1342] ["replicate binlog from checkpoint"] [task=dm-mysql_report] [unit="binlog replication"] [checkpoint="position: (mysql-bin.001906, 820109405), gtid-set: 34474b1e-2bf3-11e8-8515-00163e1040fb:1-1278385,767d5889-e08e-11ea-bf83-00163e0e3732:1,7fbb40a3-8240-11eb-8cda-00163e17fb0e:1-344070205,803ffea1-7b9d-11e9-87b0-00163e0e3732:1-435424493,a2d27a7f-de3c-11e7-82cd-00163e1040fb:1-304056,a33473fb-de3c-11e7-8140-00163e0e6470:1-68232043,bfbebe4d-1582-11e9-8e63-00163e082a23:1-36011466,e033b7c4-7b9d-11e9-8e45-00163e097eeb:1-608218207"] [2021/08/17 13:28:59.440 +08:00] [INFO] [streamer_controller.go:72] ["last slave connection"] [task=dm-mysql_report] [unit="binlog replication"] ["connection ID"=31610609] [2021/08/17 13:28:59.440 +08:00] [INFO] [mode.go:100] ["change count"] [task=dm-mysql_report] [unit="binlog replication"] ["previous count"=0] ["new count"=0] [2021/08/17 13:28:59.440 +08:00] [INFO] [mode.go:100] ["change count"] [task=dm-mysql_report] [unit="binlog replication"] ["previous count"=0] ["new count"=1] [2021/08/17 13:28:59.440 +08:00] [INFO] [mode.go:59] ["enable safe-mode because of task initialization"] [task=dm-mysql_report] [unit="binlog replication"] ["duration in seconds"=60] [2021/08/17 13:29:00.075 +08:00] [INFO] [syncer.go:1690] ["meet heartbeat event and then flush jobs"] [task=dm-mysql_report] [unit="binlog replication"] [2021/08/17 13:29:00.075 +08:00] [INFO] [syncer.go:2746] ["flush all jobs"] [task=dm-mysql_report] [unit="binlog replication"] ["global checkpoint"="position: (mysql-bin.001906, 820109405), gtid-set: 34474b1e-2bf3-11e8-8515-00163e1040fb:1-1278385,767d5889-e08e-11ea-bf83-00163e0e3732:1,7fbb40a3-8240-11eb-8cda-00163e17fb0e:1-344070205,803ffea1-7b9d-11e9-87b0-00163e0e3732:1-435424493,a2d27a7f-de3c-11e7-82cd-00163e1040fb:1-304056,a33473fb-de3c-11e7-8140-00163e0e6470:1-68232043,bfbebe4d-1582-11e9-8e63-00163e082a23:1-36011466,e033b7c4-7b9d-11e9-8e45-00163e097eeb:1-608218207(flushed position: (mysql-bin.001906, 820109405), gtid-set: 34474b1e-2bf3-11e8-8515-00163e1040fb:1-1278385,767d5889-e08e-11ea-bf83-00163e0e3732:1,7fbb40a3-8240-11eb-8cda-00163e17fb0e:1-344070205,803ffea1-7b9d-11e9-87b0-00163e0e3732:1-435424493,a2d27a7f-de3c-11e7-82cd-00163e1040fb:1-304056,a33473fb-de3c-11e7-8140-00163e0e6470:1-68232043,bfbebe4d-1582-11e9-8e63-00163e082a23:1-36011466,e033b7c4-7b9d-11e9-8e45-00163e097eeb:1-608218207)"] [2021/08/17 13:29:00.080 +08:00] [INFO] [syncer.go:1003] ["flushed checkpoint"] [task=dm-mysql_report] [unit="binlog replication"] [checkpoint="position: (mysql-bin.001906, 820109405), gtid-set: 34474b1e-2bf3-11e8-8515-00163e1040fb:1-1278385,767d5889-e08e-11ea-bf83-00163e0e3732:1,7fbb40a3-8240-11eb-8cda-00163e17fb0e:1-344070205,803ffea1-7b9d-11e9-87b0-00163e0e3732:1-435424493,a2d27a7f-de3c-11e7-82cd-00163e1040fb:1-304056,a33473fb-de3c-11e7-8140-00163e0e6470:1-68232043,bfbebe4d-1582-11e9-8e63-00163e082a23:1-36011466,e033b7c4-7b9d-11e9-8e45-00163e097eeb:1-608218207(flushed position: (mysql-bin.001906, 820109405), gtid-set: 34474b1e-2bf3-11e8-8515-00163e1040fb:1-1278385,767d5889-e08e-11ea-bf83-00163e0e3732:1,7fbb40a3-8240-11eb-8cda-00163e17fb0e:1-344070205,803ffea1-7b9d-11e9-87b0-00163e0e3732:1-435424493,a2d27a7f-de3c-11e7-82cd-00163e1040fb:1-304056,a33473fb-de3c-11e7-8140-00163e0e6470:1-68232043,bfbebe4d-1582-11e9-8e63-00163e082a23:1-36011466,e033b7c4-7b9d-11e9-8e45-00163e097eeb:1-608218207)"] [2021/08/17 13:29:13.098 +08:00] [INFO] [server.go:753] [request=QueryStatus] [payload="name:\"dm-mysql_report\" "] [2021/08/17 13:29:13.098 +08:00] [INFO] [worker.go:509] ["will open a connection to get master status"] [component="worker controller"] ["upstream config"="{\"host\":\"172.16.150.53\",\"port\":15381,\"user\":\"dm_sync\",\"max-allowed-packet\":null,\"session\":{\"time_zone\":\"+00:00\"},\"security\":null}"] [2021/08/17 13:29:29.443 +08:00] [INFO] [syncer.go:2627] ["binlog replication progress"] [task=dm-mysql_report] [unit="binlog replication"] ["total binlog size"=12632410] ["last binlog size"=0] ["cost time"=30] [bytes/Second=421080] ["unsynced binlog size"=0] ["estimate time to catch up"=0]

在新的dm-worker接管后,同步任务正常运行;由于切换需要60s左右,所以延迟至少在60s

宕掉的dm-worker启动后,dm-worker是否会自动启动并重新加入集群会自动加入集群,dm-master leader会尝试重启宕掉的dm-worker

?

[2021/08/17 13:30:28.796 +08:00] [WARN] [grpclog.go:60] ["grpc: addrConn.createTransport failed to connect to {172.17.201.115:8262 <nil> 0 <nil>}. Err :connection error: desc = \"transport: Error while dialing dial tcp 172.17.201.115:8262: connect: connection refused\". Reconnecting..."] [component="embed etcd"] [2021/08/17 13:30:31.625 +08:00] [WARN] [grpclog.go:60] ["grpc: addrConn.createTransport failed to connect to {172.17.201.115:8262 <nil> 0 <nil>}. Err :connection error: desc = \"transport: Error while dialing dial tcp 172.17.201.115:8262: connect: connection refused\". Reconnecting..."] [component="embed etcd"] [2021/08/17 13:30:35.190 +08:00] [WARN] [grpclog.go:60] ["grpc: addrConn.createTransport failed to connect to {172.17.201.115:8262 <nil> 0 <nil>}. Err :connection error: desc = \"transport: Error while dialing dial tcp 172.17.201.115:8262: connect: connection refused\". Reconnecting..."] [component="embed etcd"] [2021/08/17 13:30:37.523 +08:00] [INFO] [server.go:2206] [payload="name:\"dm-172.17.201.115-8262\" address:\"172.17.201.115:8262\" "] [request=RegisterWorker] [2021/08/17 13:30:37.523 +08:00] [WARN] [scheduler.go:836] ["add the same worker again"] [component=scheduler] ["worker info"="{\"name\":\"dm-172.17.201.115-8262\",\"addr\":\"172.17.201.115:8262\"}"] [2021/08/17 13:30:37.523 +08:00] [INFO] [server.go:309] ["register worker successfully"] [name=dm-172.17.201.115-8262] [address=172.17.201.115:8262] [2021/08/17 13:30:37.529 +08:00] [INFO] [keepalive.go:216] ["receive dm-worker keep alive event"] [operation=PUT] [kv=/dm-worker/a/646d2d3137322e31372e3230312e3131352d38323632] [2021/08/17 13:30:37.529 +08:00] [INFO] [scheduler.go:1506] ["receive worker status change event"] [component=scheduler] [delete=false] [event="{\"worker-name\":\"dm-172.17.201.115-8262\",\"join-time\":\"2021-08-17T13:30:37.524837339+08:00\"}"] [2021/08/17 13:30:37.529 +08:00] [INFO] [scheduler.go:1739] ["no unbound sources need to bound"] [component=scheduler] [worker="{\"name\":\"dm-172.17.201.115-8262\",\"addr\":\"172.17.201.115:8262\"}"]

dm-master HA

模拟dm-master宕机

?

date; kill -9 pid; mv <deploy dir> <deploy dir>-1 # 强制kill dm-master pid,并将部署目录改名防止自启动

观察leader切换情况

记录相关数据:leader切换耗时,所有任务状态,延时情况

结论:

leader是否正常选举

?

[2021/08/17 14:17:36.240 +08:00] [WARN] [stream.go:436] ["lost TCP streaming connection with remote peer"] [component="embed etcd"] [stream-reader-type="stream MsgApp v2"] [local-member-id=db326cb7fba547f5] [remote-peer-id=201495974e8233cd] [error="unexpected EOF"] [2021/08/17 14:17:36.240 +08:00] [WARN] [peer_status.go:68] ["peer became inactive (message send to peer failed)"] [component="embed etcd"] [peer-id=201495974e8233cd] [error="failed to read 201495974e8233cd on stream MsgApp v2 (unexpected EOF)"] [2021/08/17 14:17:36.240 +08:00] [WARN] [stream.go:436] ["lost TCP streaming connection with remote peer"] [component="embed etcd"] [stream-reader-type="stream Message"] [local-member-id=db326cb7fba547f5] [remote-peer-id=201495974e8233cd] [error="unexpected EOF"] [2021/08/17 14:17:36.241 +08:00] [WARN] [grpclog.go:60] ["grpc: addrConn.createTransport failed to connect to {172.17.201.115:8261 <nil> 0 <nil>}. Err :connection error: desc = \"transport: Error while dialing dial tcp 172.17.201.115:8261: connect: connection refused\". Reconnecting..."] [component="embed etcd"] [2021/08/17 14:17:37.241 +08:00] [WARN] [grpclog.go:60] ["grpc: addrConn.createTransport failed to connect to {172.17.201.115:8261 <nil> 0 <nil>}. Err :connection error: desc = \"transport: Error while dialing dial tcp 172.17.201.115:8261: connect: connection refused\". Reconnecting..."] [component="embed etcd"] [2021/08/17 14:17:38.097 +08:00] [WARN] [stream.go:193] ["lost TCP streaming connection with remote peer"] [component="embed etcd"] [stream-writer-type="stream Message"] [local-member-id=db326cb7fba547f5] [remote-peer-id=201495974e8233cd] [2021/08/17 14:17:38.855 +08:00] [WARN] [util.go:163] ["apply request took too long"] [component="embed etcd"] [took=2.096232825s] [expected-duration=100ms] [prefix="read-only range "] [request="key:\"/dm-master/bound-worker/646d2d3137322e31372e3230312e3131362d38323634\" "] [response=] [error="etcdserver: leader changed"] [2021/08/17 14:17:38.857 +08:00] [WARN] [cluster_util.go:315] ["failed to reach the peer URL"] [component="embed etcd"] [address=http://172.17.201.115:8291/version] [remote-member-id=201495974e8233cd] [error="Get http://172.17.201.115:8291/version: dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:17:38.857 +08:00] [WARN] [cluster_util.go:168] ["failed to get version"] [component="embed etcd"] [remote-member-id=201495974e8233cd] [error="Get http://172.17.201.115:8291/version: dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:17:38.958 +08:00] [WARN] [grpclog.go:60] ["grpc: addrConn.createTransport failed to connect to {172.17.201.115:8261 <nil> 0 <nil>}. Err :connection error: desc = \"transport: Error while dialing dial tcp 172.17.201.115:8261: connect: connection refused\". Reconnecting..."] [component="embed etcd"] [2021/08/17 14:17:39.763 +08:00] [WARN] [stream.go:193] ["lost TCP streaming connection with remote peer"] [component="embed etcd"] [stream-writer-type="stream MsgApp v2"] [local-member-id=db326cb7fba547f5] [remote-peer-id=201495974e8233cd] [2021/08/17 14:17:41.719 +08:00] [WARN] [grpclog.go:60] ["grpc: addrConn.createTransport failed to connect to {172.17.201.115:8261 <nil> 0 <nil>}. Err :connection error: desc = \"transport: Error while dialing dial tcp 172.17.201.115:8261: connect: connection refused\". Reconnecting..."] [component="embed etcd"] [2021/08/17 14:17:42.859 +08:00] [WARN] [cluster_util.go:315] ["failed to reach the peer URL"] [component="embed etcd"] [address=http://172.17.201.115:8291/version] [remote-member-id=201495974e8233cd] [error="Get http://172.17.201.115:8291/version: dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:17:42.859 +08:00] [WARN] [cluster_util.go:168] ["failed to get version"] [component="embed etcd"] [remote-member-id=201495974e8233cd] [error="Get http://172.17.201.115:8291/version: dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:17:45.053 +08:00] [WARN] [grpclog.go:60] ["grpc: addrConn.createTransport failed to connect to {172.17.201.115:8261 <nil> 0 <nil>}. Err :connection error: desc = \"transport: Error while dialing dial tcp 172.17.201.115:8261: connect: connection refused\". Reconnecting..."] [component="embed etcd"] [2021/08/17 14:17:45.855 +08:00] [WARN] [v3_server.go:746] ["timed out waiting for read index response (local node might have slow network)"] [component="embed etcd"] [timeout=7s] [2021/08/17 14:17:45.855 +08:00] [WARN] [util.go:163] ["apply request took too long"] [component="embed etcd"] [took=9.033455931s] [expected-duration=100ms] [prefix="read-only range "] [request="key:\"/dm-master/bound-worker/646d2d3137322e31372e3230312e3131352d38323633\" "] [response=] [error="etcdserver: request timed out"] [2021/08/17 14:17:45.855 +08:00] [WARN] [util.go:163] ["apply request took too long"] [component="embed etcd"] [took=9.085679831s] [expected-duration=100ms] [prefix="read-only range "] [request="key:\"/dm-master/bound-worker/646d2d3137322e31382e37382e3235342d38323636\" "] [response=] [error="etcdserver: request timed out"] [2021/08/17 14:17:45.855 +08:00] [WARN] [util.go:163] ["apply request took too long"] [component="embed etcd"] [took=9.085819911s] [expected-duration=100ms] [prefix="read-only range "] [request="key:\"/dm-master/bound-worker/646d2d3137322e31382e37382e3235342d38323632\" "] [response=] [error="etcdserver: request timed out"] [2021/08/17 14:17:45.856 +08:00] [WARN] [util.go:163] ["apply request took too long"] [component="embed etcd"] [took=6.99962841s] [expected-duration=100ms] [prefix="read-only range "] [request="key:\"/dm-master/relay-worker/646d2d3137322e31372e3230312e3131362d38323634\" "] [response="range_response_count:0 size:5"] [] [2021/08/17 14:17:46.860 +08:00] [WARN] [cluster_util.go:315] ["failed to reach the peer URL"] [component="embed etcd"] [address=http://172.17.201.115:8291/version] [remote-member-id=201495974e8233cd] [error="Get http://172.17.201.115:8291/version: dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:17:46.860 +08:00] [WARN] [cluster_util.go:168] ["failed to get version"] [component="embed etcd"] [remote-member-id=201

版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。

上一篇:Data Migration 运维常见问题及解答
下一篇:DM 处理 DML 操作的机制
相关文章