黄东旭解析 TiDB 的核心优势
507
2024-03-11
团队升级TiDB版本的事情是规划很久了,迟迟没操作还是因为很多预期意外的问题是否有能力覆盖解决。
本文写的时间是8月底,今天刚好总结的时候看到了分享给大家以作为版本升级的参考。
我们的业务集群TiDB数据量本身是很大,单集群数十TB规模,加之业务的重要性,本着非必要不升级的原则。
随着TiDB版本迭代,很多新的功能特性和性能提升以及我们遇到的一些bug修复,不断吸引着我们作出升级的决定。
文中记录的三则为计划升级16套TiDB集群中遇到的问题,将性能压力大且符合升级需求的TiDB集群升级到v6.1.5。
问题分析过程:
研发侧报错: 1、uncategorized SQLException; SQL state [HY000]; error code [1105]; other error for mpp stream: DB::Exception: Exchange receiver meet error : DB::Exception: Exchange receiver meet error : send async stream request fail; nested exception is java.sql.SQLException: other error for mpp stream: DB::Exception: Exchange receiver meet error : DB::Exception: Exchange receiver meet error : send async stream request fail/span> tidb节点上错误日志: 2、[ERROR] 2023-08-23 ****** [http-nio-8080-exec-6] [com.*****.man.web.framework.exception.GlobalErrHandler] [7c25fb69df2800c5] - 通用异常信息:[ ### Error querying database. Cause: java.sql.SQLException: TiFlash server timeout ### The error may exist in URL [jar:file:/web/servers/tomcat/webapps/ROOT/WEB-INF/lib/***-man-dao-2.0.0-RELEASE.jar!/mapper/tidb/SheetListTidbMapper.xml] ### The error may involve*****.man.dao.tidb.SheetListTidbMapper.selectTidbPageByQuery-Inline ### The error occurred while setting parameters ### SQL: SELECT ****** ; ###Cause: java.sql.SQLException: TiFlash server timeout ; uncategorized SQLException; SQL state [HY000]; error code [9012]; TiFlash server timeout; nested exception is java.sql.SQLException: TiFlash server timeout] org.springframework.jdbc.UncategorizedSQLException: ###Error querying database. Cause: java.sql.SQLException: TiFlash server timeout ; uncategorized SQLException; SQL state [HY000]; error code [9012]; TiFlash server timeout; nested exception is java.sql.SQLException: TiFlash server timeout其查询计划为,没有任何合适的索引能走到tikv查询数据,只有走TiFlash。
第一个错误是MPP老版本的一个bug错误已经修复了,按理不会报这个错误。
后面确认是TiFlash升级完成后挂掉不能正常启动,确认了虽然有两个TiFlash节点实际上还是单副本运行,所以一直报错。
问题分析及处理建议:
v5.1的TiFlash升级到v6.1.5不能保证升级过程中会不会有其他问题,这台机器升级过程是恢复临时占用内存超过机器内存16G,不能在原地升级完成。
大家以后在执行版本升级时,第一硬件资源要跟上,第二升级的组件要是多副本的。
这样在升级过程中不用等待重新扩缩TiFlash同步数据带来的数据迁移时间。
至于为什么会发生这个问题,没去排查了。最后确认的是:tiup中关闭了TiFlash组件,状态也显示已关闭,但服务器上systemctl status tiflash-9000显示running,实际上组件并未正常停止,这个问题也好理解,毕竟是异步关闭的组件。
遇到这个问题处理办法就是去TiFlash组件服务器上关闭一下。
这个服务直接就不可用了,我们来看个组件的日志情况
排查过程,我们一般都是grep err日志,发现错误日志集中在PD上,开始在怀疑PD有问题
PD的相关日志:
[2023/08/24 22:51:27.992 +08:00] [ERROR] [client.go:847] ["[pd] getTS error"] [dc-location=global] [stream-addr=http://10.16.57.62:2379][error="[PD:client:ErrClientGetTSO]rpc error: code = Unknown desc = [PD:tso:ErrGenerateTimestamp]generate timestamp failed, requested pd is not leader of cluster: rpc error: code = Unknown desc = [PD:tso:ErrGenerateTimestamp]generate timestamp failed, requested pd is not leader of cluster"][2023/08/24 22:51:27.993 +08:00] [INFO] [base_client.go:275] ["[pd]cannot update member from this address"][address=http://10.16.57.106:2379] [error="[PD:client:ErrClientGetLeader]get leader from leader address dont exist error"] [2023/08/24 22:51:27.993 +08:00] [INFO] [base_client.go:275] ["[pd] cannot update member from this address"] [address=http://10.16.57.106:2379] [error="[PD:client:ErrClientGetLeader]get leader from leader address dont exist error"] [2023/08/24 22:51:27.994 +08:00] [INFO] [base_client.go:275] ["[pd] cannot update member from this address"] [address=http://10.16.57.62:2379] [error="[PD:client:ErrClientGetLeader]get leader from leader address dont exist error"][2023/08/24 22:51:27.994 +08:00] [INFO] [base_client.go:275] ["[pd] cannot update member from this address"] [address=http://10.16.57.62:2379] [error="[PD:client:ErrClientGetLeader]get leader from leader address dont exist error"] [2023/08/24 22:51:27.995 +08:00] [INFO] [base_client.go:275] ["[pd]cannot update member from this address"][address=http://10.16.57.82:2379] [error="[PD:client:ErrClientGetLeader]get leader from leader address dont exist error"] [2023/08/24 22:51:27.995 +08:00] [INFO] [base_client.go:275] ["[pd] cannot update member from this address"] [address=http://10.16.57.82:2379] [error="[PD:client:ErrClientGetLeader]get leader from leader address dont exist error"] [2023/08/24 22:51:27.995 +08:00] [ERROR] [base_client.go:144] ["[pd]failed updateMember"] [error="[PD:client:ErrClientGetLeader]get leader from [http://10.16.57.106:2379 http://10.16.57.62:2379 http://10.16.57.82:2379] error"][2023/08/24 22:51:28.265 +08:00] [INFO] [base_client.go:275] ["[pd] cannot update member from this address"] [address=http://10.16.57.62:2379] [error="[PD:client:ErrClientGetLeader]get leader from leader address dont exist error"] [2023/08/24 22:51:28.266 +08:00] [INFO] [base_client.go:275] ["[pd] cannot update member from this address"] [address=http://10.16.57.82:2379] [error="[PD:client:ErrClientGetLeader]get leader from leader address dont exist error"] [2023/08/24 22:51:28.266 +08:00] [ERROR] [base_client.go:144] ["[pd]failed updateMember"] [error="[PD:client:ErrClientGetLeader]get leader from [http://10.16.57.106:2379 http://10.16.57.62:2379 http://10.16.57.82:2379] error"][2023/08/24 22:51:28.266 +08:00] [INFO] [base_client.go:275] ["[pd] cannot update member from this address"] [address=http://10.16.57.82:2379] [error="[PD:client:ErrClientGetLeader]get leader from leader address dont exist error"][2023/08/24 22:51:28.476 +08:00] [ERROR] [client.go:847] ["[pd] getTS error"] [dc-location=global] [stream-addr=http://10.16.57.62:2379][error="[PD:client:ErrClientGetTSO]rpc error: code = Unknown desc = [PD:tso:ErrGenerateTimestamp]generate timestamp failed, requested pd is not leader of cluster: rpc error: code = Unknown desc = [PD:tso:ErrGenerateTimestamp]generate timestamp failed, requested pd is not leader of cluster"]大致情况就是执行任何操作请求都提示没有pd leader能处理。
TiDB的日志:
[2023/08/24 23:16:29.989 +08:00] [INFO] [base_client.go:275] ["[pd] cannot update member from this address"] [address=http://10.16.57.106:2379] [error="[PD:client:ErrClientGetMember]error:rpc error: code = Unknown desc = [PD:server:ErrServerNotStarted]server not started target:10.16.57.106:2379 status:READY: error:rpc error: code = Unknown desc = [PD:server:ErrServerNotStarted]server not started target:10.16.57.106:2379 status:READY"][2023/08/24 23:16:29.990 +08:00] [INFO] [base_client.go:275] ["[pd] cannot update member from this address"] [address=http://10.16.57.62:2379] [error="[PD:client:ErrClientGetMember]error:rpc error: code = Unknown desc = [PD:server:ErrServerNotStarted]server not started target:10.16.57.62:2379 status:READY: error:rpc error: code = Unknown desc = [PD:server:ErrServerNotStarted]server not started target:10.16.57.62:2379 status:READY"][2023/08/24 23:16:29.990 +08:00] [INFO] [base_client.go:275] ["[pd] cannot update member from this address"] [address=http://10.16.57.82:2379] [error="[PD:client:ErrClientGetMember]error:rpc error: code = Unknown desc = [PD:server:ErrServerNotStarted]server not started target:10.16.57.82:2379 status:READY: error:rpc error: code = Unknown desc = [PD:server:ErrServerNotStarted]server not started target:10.16.57.82:2379 status:READY"] [2023/08/25 01:07:49.023 +08:00] [INFO] [printer.go:34] ["Welcome to TiDB."] ["Release Version"=v6.1.5] [Edition=Community] ["Git Commit Hash"=73d82e330b02a39f74073d98daefbadd7deab9b9] ["Git Branch"=heads/refs/tags/v6.1.5] ["UTC Build Time"="2023-02-22 06:21:24"] [GoVersion=go1.19.5] ["Race Enabled"=false] ["Check Table Before Drop"=false] ["TiKV Min Version"=v3.0.0-60965b006877ca7234adaced7890d7b029ed1306] [2023/08/25 01:07:49.024 +08:00] [INFO] [printer.go:48] ["loaded config"][config="{\"host\":\"0.0.0.0\",\"advertise-address\":\"10.16.57.82\",\"port\":4000,\"cors\":\"\",\"store\":\"tikv\",\"path\":\"10.16.57.82:2379,10.16.57.106:2379,10.16.57.62:2379\",\"socket\":\"/tmp/tidb-4000.sock\",\"lease\":\"45s\",\"run-ddl\":true,\"split-table\":true,\"token-limit\":1000,\"oom-use-tmp-storage\":true,\"tmp-storage-path\":\"/tmp/0_tidb/MC4wLjAuMDo0MDAwLzAuMC4wLjA6MTAwODA=/tmp-storage\",\"tmp-storage-quota\":-1,\"server-version\":\"\",\"version-comment\":\"\",\"tidb-edition\":\"\",\"tidb-release-version\":\"\",\"log\":{\"level\":\"info\",\"format\":\"text\",\"disable-timestamp\":null,\"enable-timestamp\":null,\"disable-error-stack\":null,\"enable-error-stack\":null,\"file\":{\"filename\":\"/data/tidb-deploy/tidb-4000/log/tidb.log\",\"max-size\":300,\"max-days\":0,\"max-backups\":0},\"enable-slow-log\":true,\"slow-query-file\":\"/data/tidb-deploy/tidb-4000/log/tidb_slow_query.log\",\"slow-threshold\":1000,\"expensive-threshold\":10000,\"record-plan-in-slow-log\":1,\"query-log-max-len\":4096},\"instance\":{\"tidb_general_log\":false,\"tidb_pprof_sql_cpu\":false,\"ddl_slow_threshold\":300,\"tidb_expensive_query_time_threshold\":60,\"tidb_enable_slow_log\":true,\"tidb_slow_log_threshold\":1000,\"tidb_record_plan_in_slow_log\":1,\"tidb_check_mb4_value_in_utf8\":true,\"tidb_force_priority\":\"NO_PRIORITY\",\"tidb_memory_usage_alarm_ratio\":0.8,\"tidb_enable_collect_execution_info\":true,\"plugin_dir\":\"/data/deploy/plugin\",\"plugin_load\":\"\"},\"security\":{\"skip-grant-table\":false,\"ssl-ca\":\"\",\"ssl-cert\":\"\",\"ssl-key\":\"\",\"cluster-ssl-ca\":\"\",\"cluster-ssl-cert\":\"\",\"cluster-ssl-key\":\"\",\"cluster-verify-cn\":null,\"spilled-file-encryption-method\":\"plaintext\",\"enable-sem\":false,\"auto-tls\":false,\"tls-version\":\"\",\"rsa-key-size\":4096,\"secure-bootstrap\":false},\"status\":{\"status-host\":\"0.0.0.0\",\"metrics-addr\":\"\",\"status-port\":10080,\"metrics-interval\":15,\"report-status\":true,\"record-db-qps\":false,\"grpc-keepalive-time\":10,\"grpc-keepalive-timeout\":3,\"grpc-concurrent-streams\":1024,\"grpc-initial-window-size\":2097152,\"grpc-max-send-msg-size\":2147483647},\"performance\":{\"max-procs\":0,\"max-memory\":0,\"server-memory-quota\":0,\"memory-usage-alarm-ratio\":0.8,\"stats-lease\":\"3s\",\"stmt-count-limit\":5000,\"feedback-probability\":0,\"query-feedback-limit\":512,\"pseudo-estimate-ratio\":0.8,\"force-priority\":\"NO_PRIORITY\",\"bind-info-lease\":\"3s\",\"txn-entry-size-limit\":6291456,\"txn-total-size-limit\":104857600,\"tcp-keep-alive\":true,\"tcp-no-delay\":true,\"cross-join\":true,\"distinct-agg-push-down\":false,\"projection-push-down\":false,\"max-txn-ttl\":3600000,\"index-usage-sync-lease\":\"0s\",\"plan-replayer-gc-lease\":\"10m\",\"gogc\":100,\"enforce-mpp\":false,\"stats-load-concurrency\":5,\"stats-load-queue-size\":1000,\"enable-stats-cache-mem-quota\":false,\"committer-concurrency\":128,\"run-auto-analyze\":true},\"prepared-plan-cache\":{\"enabled\":true,\"capacity\":100,\"memory-guard-ratio\":0.1},\"opentracing\":{\"enable\":false,\"rpc-metrics\":false,\"sampler\":{\"type\":\"const\",\"param\":1,\"sampling-server-url\":\"\",\"max-operations\":0,\"sampling-refresh-interval\":0},\"reporter\":{\"queue-size\":0,\"buffer-flush-interval\":0,\"log-spans\":false,\"local-agent-host-port\":\"\"}},\"proxy-protocol\":{\"networks\":\"\",\"header-timeout\":5},\"pd-client\":{\"pd-server-timeout\":3},\"tikv-client\":{\"grpc-connection-count\":4,\"grpc-keepalive-time\":10,\"grpc-keepalive-timeout\":3,\"grpc-compression-type\":\"none\",\"commit-timeout\":\"41s\",\"async-commit\":{\"keys-limit\":256,\"total-key-size-limit\":4096,\"safe-window\":2000000000,\"allowed-clock-drift\":500000000},\"max-batch-size\":128,\"overload-threshold\":200,\"max-batch-wait-time\":0,\"batch-wait-size\":8,\"enable-chunk-rpc\":true,\"region-cache-ttl\":600,\"store-limit\":0,\"store-liveness-timeout\":\"1s\",\"copr-cache\":{\"capacity-mb\":1000},\"ttl-refreshed-txn-size\":33554432,\"resolve-lock-lite-threshold\":16},\"binlog\":{\"enable\":false,\"ignore-error\":false,\"write-timeout\":\"15s\",\"binlog-socket\":\"\",\"strategy\":\"range\"},\"compatible-kill-query\":false,\"plugin\":{\"dir\":\"/data/deploy/plugin\",\"load\":\"\"},\"pessimistic-txn\":{\"max-retry-count\":256,\"deadlock-history-capacity\":10,\"deadlock-history-collect-retryable\":false,\"pessimistic-auto-commit\":false},\"check-mb4-value-in-utf8\":true,\"max-index-length\":3072,\"index-limit\":64,\"table-column-count-limit\":1017,\"graceful-wait-before-shutdown\":0,\"alter-primary-key\":false,\"treat-old-version-utf8-as-utf8mb4\":true,\"enable-table-lock\":false,\"delay-clean-table-lock\":0,\"split-region-max-num\":1000,\"top-sql\":{\"receiver-address\":\"\"},\"repair-mode\":false,\"repair-table-list\":[],\"isolation-read\":{\"engines\":[\"tikv\",\"tiflash\",\"tidb\"]},\"max-server-connections\":1000,\"new_collations_enabled_on_first_bootstrap\":true,\"experimental\":{\"allow-expression-index\":false},\"enable-collect-execution-info\":true,\"skip-register-to-dashboard\":false,\"enable-telemetry\":false,\"labels\":{},\"enable-global-index\":false,\"deprecate-integer-display-length\":false,\"enable-enum-length-limit\":true,\"stores-refresh-interval\":60,\"enable-tcp4-only\":false,\"enable-forwarding\":false,\"max-ballast-object-size\":0,\"ballast-object-size\":0,\"enable-global-kill\":true,\"enable-batch-dml\":false,\"mem-quota-query\":2147483648,\"oom-action\":\"cancel\"}"][2023/08/25 01:07:49.024 +08:00] [INFO] [main.go:343] ["disable Prometheus push client"] 这个地方的日志也显示TiDB节点的配置正常加载正常启动,但是就是未正常打开4000端口对外提供服务 [2023/08/25 01:07:51.410 +08:00] [INFO] [session.go:3311] ["CRUCIAL OPERATION"] [conn=0] [schemaVersion=3242] [cur_db=] [sql="ALTER TABLE mysql.stats_meta ADD COLUMN snapshot BIGINT(64) UNSIGNED NOT NULL DEFAULT 0"] [user=] [2023/08/25 01:07:51.442 +08:00] [INFO] [ddl_worker.go:339] ["[ddl] add DDL jobs"] ["batch count"=1] [jobs="ID:4591, Type:add column, State:queueing, SchemaState:none, SchemaID:3, TableID:21, RowCount:0, ArgLen:3, start time: 2023-08-25 01:07:51.419 +0800 CST, Err:<nil>, ErrCount:0, SnapshotVersion:0; "] [2023/08/25 01:07:51.444 +08:00] [INFO] [ddl.go:632] ["[ddl] start DDL job"] [job="ID:4591, Type:add column, State:queueing, SchemaState:none, SchemaID:3, TableID:21, RowCount:0, ArgLen:3, start time: 2023-08-25 01:07:51.419 +0800 CST, Err:<nil>, ErrCount:0, SnapshotVersion:0"] [query="ALTER TABLE mysql.stats_meta ADD COLUMN snapshot BIGINT(64) UNSIGNED NOT NULL DEFAULT 0"] 怀疑是未正常执行升级因为是按计划升级,也没有在TiDB上执行DDL,从未怀疑这一点是问题的关键。然而事实的关键往往就在你没发现的可能忽略地方,我们排查了tikv都是正常的,间接也证明了PD应该也是正常的,否则多副本的存储服务肯定也不行。
回过头来排查tidb本身,突然发现一个从MySQL同步到这个TiDB集群的一个物理加索引DDL。
问题逐渐清晰,因为MySQL有分库分表,并且只保存着近半年的热数据执行很快就完成。TiDB里面有着全量的数据聚合到一张表,这个加索引的物理DDL也自动同步到了TiDB执行,并且执行时间很长。
因为有了整体问题思路,再回头看TiDB日志
[2023/08/24 22:00:27.008 +08:00] [WARN] [session.go:1533] ["run statement failed"] [conn=5897899] [schemaVersion=3240] [error="[ddl:1061]index already exist idx_create; a background job is trying to add the same index, please check by `ADMIN SHOW DDL JOBS`"] [session="{\n \"currDBName\": \"***_voucher\",\n \"id\": 5897899,\n \"status\": 2,\n \"strictMode\": true,\n \"user\": {\n \"Username\": \"tidb_cold_ w\",\n \"Hostname\": \"10.16.56.178\",\n \"CurrentUser\": false,\n \"AuthUsername\": \"tidb_cold_w\",\n \"AuthHostname\": \"10.%.%.%\"\n }\n}"][2023/08/24 22:00:27.008 +08:00] [INFO] [conn.go:877] ["command dispatched failed"] [conn=5897899] [connInfo="id:5897899, addr:10.16.56.178:28898 status:10, collation:utf8_general_ci, user:tidb_cold_w"] [command =Query] [status="inTxn:0, autocommit:1"] [sql="ALTER TABLE `voucher_detail` ADD INDEX idx_create(created, vender_id)"] [txn_mode=PESSIMISTIC] [err="[ddl:1061]index already exist idx_create; a background job is t rying to add the same index, please check by `ADMIN SHOW DDL JOBS`\ngithub.com/pingcap/errors.AddStack\n\t/nfs/cache/mod/github.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/errors.go:174\ngithub.com/# cat tidb.log |grep "run reorg job wait timeout" [2023/08/24 22:09:51.308 +08:00] [INFO] [reorg.go:284] ["[ddl] run reorg job wait timeout"] [waitTime=5s] [elementType=_idx_] [elementID=11] [totalAddedRowCount=7104213] [doneKey=6774522551193595979] [] [2023/08/24 22:09:56.356 +08:00] [INFO] [reorg.go:284] ["[ddl] run reorg job wait timeout"] [waitTime=5s] [elementType=_idx_] [elementID=11] [totalAddedRowCount=7183573] [doneKey=6774522551193595979] [] [2023/08/24 22:10:01.391 +08:00] [INFO] [reorg.go:284] ["[ddl]run reorg job wait timeout"][waitTime=5s] [elementType=_idx_] [elementID=11] [totalAddedRowCount=7262165] [doneKey=6774522551193595979] [] [2023/08/24 22:10:06.481 +08:00] [INFO] [reorg.go:284] ["[ddl] run reorg job wait timeout"] [waitTime=5s] [elementType=_idx_] [elementID=11] [totalAddedRowCount=7336405][doneKey=6774522551193595979] [][2023/08/24 22:10:11.570 +08:00] [INFO] [reorg.go:284] ["[ddl] run reorg job wait timeout"] [waitTime=5s] [elementType=_idx_] [elementID=11] [totalAddedRowCount=7419143] [doneKey=6774522551193595979] [] [2023/08/24 22:10:16.640 +08:00] [INFO] [reorg.go:284] ["[ddl]run reorg job wait timeout"][waitTime=5s] [elementType=_idx_] [elementID=11] [totalAddedRowCount=7464902] [doneKey=6774522551193595979] [] [2023/08/24 22:10:21.804 +08:00] [INFO] [reorg.go:284] ["[ddl] run reorg job wait timeout"] [waitTime=5s] [elementType=_idx_] [elementID=11] [totalAddedRowCount=7544549][doneKey=6774692334375633788.next] [][2023/08/24 22:11:43.519 +08:00] [INFO] [reorg.go:284] ["[ddl] run reorg job wait timeout"] [waitTime=5s] [elementType=_idx_] [elementID=11] [totalAddedRowCount=8656519] [doneKey=6775048577556492528.next] [] [2023/08/24 22:11:48.593 +08:00] [INFO] [reorg.go:284] ["[ddl]run reorg job wait timeout"][waitTime=5s] [elementType=_idx_] [elementID=11] [totalAddedRowCount=8737159] [doneKey=6775048577556492528.next] [] [2023/08/24 22:11:53.648 +08:00] [INFO] [reorg.go:284] ["[ddl] run reorg job wait timeout"] [waitTime=5s] [elementType=_idx_] [elementID=11] [totalAddedRowCount=8792281][doneKey=6775048577556492528.next] [][2023/08/24 22:11:58.740 +08:00] [INFO] [reorg.go:284] ["[ddl] run reorg job wait timeout"] [waitTime=5s] [elementType=_idx_] [elementID=11] [totalAddedRowCount=8826499] [doneKey=6775048577556492528.next] [] [2023/08/25 04:37:44.221 +08:00] [INFO] [reorg.go:301] ["[ddl]run reorg job wait timeout"][waitTime=5s] [elementType=_idx_] [elementID=11] [totalAddedRowCount=256292839] [doneKey=6858402545246070082.next] [] [2023/08/25 04:37:49.308 +08:00] [INFO] [reorg.go:301] ["[ddl] run reorg job wait timeout"] [waitTime=5s] [elementType=_idx_] [elementID=11] [totalAddedRowCount=256353399] [doneKey=6858402545246070082.next] [] [2023/08/25 04:37:54.366 +08:00] [INFO] [reorg.go:301] ["[ddl] run reorg job wait timeout"] [waitTime=5s] [elementType=_idx_] [elementID=11] [totalAddedRowCount=256386509] [doneKey=6858402545246070082.next] [] [2023/08/25 04:37:59.438 +08:00] [INFO] [reorg.go:301] ["[ddl] run reorg job wait timeout"] [waitTime=5s] [elementType=_idx_] [elementID=11] [totalAddedRowCount=256442291] [doneKey=6858463469667393591.next] [][2023/08/24 22:00:27.008 +08:00] [INFO] [conn.go:877] ["command dispatched failed"] [conn=5897899] [connInfo="id:5897899, addr:10.16.56.178:28898 status:10, collation:utf8_general_ci, user:tidb_cold_w"] [command =Query] [status="inTxn:0, autocommit:1"] [sql="ALTER TABLE `voucher_detail` ADD INDEX idx_create(created, vender_id)"] [txn_mode=PESSIMISTIC] [err="[ddl:1061]index already exist idx_create; a background job is t rying to add the same index, please check by `ADMIN SHOW DDL JOBS`
[2023/08/25 01:07:51.410 +08:00] [INFO] [session.go:3311] ["CRUCIAL OPERATION"] [conn=0] [schemaVersion=3242] [cur_db=] [sql="ALTER TABLE mysql.stats_meta ADD COLUMN snapshot BIGINT(64) UNSIGNED NOT NULL DEFAULT 0"] [user=]
[2023/08/25 01:07:51.444 +08:00] [INFO] [ddl.go:632] ["[ddl] start DDL job"] [job="ID:4591, Type:add column,State:queueing, SchemaState:none, SchemaID:3, TableID:21, RowCount:0, ArgLen:3, start time: 2023-08-25 01:07:51.419 +0800 CST, Err:<nil>, ErrCount:0, SnapshotVersion:0"] [query="ALTER TABLE mysql.stats_meta ADD COLUMN snapshot BIGINT(64) UNSIGNED NOT NULL DEFAULT 0"]
因升级导致内部执行的"ALTER TABLE mysql.stats_meta ADD COLUMN snapshot BIGINT(64) UNSIGNED NOT NULL DEFAULT 0"
在DDL执行队列中排在了物理DDL "ALTER TABLE `voucher_detail` ADD INDEX idx_create(created, vender_id)"后面,导致升级并未真正完成,卡住了打开4000端口的流程。
最奇怪的事情是:物理DDL执行完成后,tidb计算节点打开服务后,进数据库看那个索引并没有加上。
版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。