麒麟v10 上部署 TiDB v5.1.2 生产环境优化实践
1086
2023-05-19
MySQL DBA如何利用strace/pstack/gdb来定位问题
strace简介
strace是Linux环境下的一款程序调试工具,用来监察一个应用程序所使用的系统调用。
Strace是一个简单的跟踪系统调用执行的工具。在其最简单的形式中,它可以从开始到结束跟踪二进制的执行,并在进程的生命周期中输出一行具有系统调用名称,每个系统调用的参数和返回值的文本行。
常用选项
-T:strace输出显示时间 -t:可以在每行的输出之前添加时间戳,-tt可以精确到微妙级别,-ttt也可以精确到微妙级,但是它并不是打印当前时间,而是显示自从epoch以来的所经过的秒数 -s:指定输出的字符串的最大长度,默认为32,如果输出到文件的话会全部输出 -o:指定将strace输出到文件 -ff:如果提供-o filename,则所有进程的跟踪结果输出到相应的filename.pid中,pid是各进程的进程号. -e:指定跟踪某个行为,例如-e trace=open指定只跟踪open行为 -y:将文件句柄用文件路径代替显示
正确姿势
简单使用
strace -T -tt -o /tmp/strace.log CMD strace -T -tt CMD 2>&1 |tee /tmp/strace.log strace -T -tt -s 100 -o /tmp/strace.log CMD strace -T -tt -s 100 -ff -o /tmp/strace.log CMD strace -T -tt -s 100 -e trace=XXXX -o /tmp/strace.log CMD
使用案例
利用strace观察客户端client执行SQL
需要提醒一下的是从客户端连接到MySQL的时候需要注意客户端的版本,例如我用mariadb的客户端连接官方版本8.0就无法获取到pid
查看strace文件
.... 20:54:16.901980 poll([{fd=3, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout) <0.000006> 20:54:16.902001 write(3, "\21\0\0\0\3select * from t1", 21) = 21 <0.000011> 20:54:16.902026 read(3, "\1\0\0\1\2\"\0\0\2\3def\4xucl\2t1\2t1\2id\2id\f?\0\v\0\0\0\3\3P\0\0\0\"\0\0\3\3def\4xucl\2t1\2t1\2c1\2c1\f!\0\36\0\0\0\375\t@\0\0\0\5\0\0\4\376\0\0\2\0\3\0\0\5\0011\0\5\0\0"..., 16384) = 106 <0.000355> 20:54:16.902405 times({tms_utime=0, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 429630696 <0.000005> 20:54:16.902441 write(1, "+----+----+\n", 12) = 12 <0.000026> 20:54:16.902482 write(1, "| id | c1 |\n", 12) = 12 <0.000018> 20:54:16.902514 write(1, "+----+----+\n", 12) = 12 <0.000018> 20:54:16.902549 write(1, "| 1 | |\n", 12) = 12 <0.000019> 20:54:16.902582 write(1, "+----+----+\n", 12) = 12 <0.000019> 20:54:16.902631 write(1, "\33(B\33[0;1m1 row in set (0.00 sec)\n", 33) = 33 <0.000019> 20:54:16.902666 write(1, "\33(B\33[m\33(B\33[0;1m\n", 16) = 16 <0.000019> 20:54:16.902702 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=528, ...}) = 0 <0.000006> 20:54:16.902752 rt_sigprocmask(SIG_BLOCK, [INT], [], 8) = 0 <0.000004> 20:54:16.902771 ioctl(0, TIOCGWINSZ, {ws_row=42, ws_col=163, ws_xpixel=0, ws_ypixel=0}) = 0 <0.000005> 20:54:16.902789 ioctl(0, TIOCSWINSZ, {ws_row=42, ws_col=163, ws_xpixel=0, ws_ypixel=0}) = 0 <0.000004> 20:54:16.902806 ioctl(0, TCGETS, {B38400 opost isig icanon echo ...}) = 0 <0.000005> 20:54:16.902824 ioctl(0, SNDCTL_TMR_STOP or TCSETSW, {B38400 opost isig -icanon -echo ...}) = 0 <0.000006> 20:54:16.902841 write(1, "\33(B\33[m", 6) = 6 <0.000022> 20:54:16.902877 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000005> 20:54:16.902895 rt_sigprocmask(SIG_BLOCK, [INT QUIT ALRM TSTP TTIN TTOU], [], 8) = 0 <0.000005> 20:54:16.902912 rt_sigaction(SIGINT, {0x55e19ff48a00, [], SA_RESTORER, 0x7fc3e63585f0}, {0x55e19fee21d0, [INT], SA_RESTORER|SA_RESTART, 0x7fc3e63585f0}, 8) = 0 <0.000005> 20:54:16.902931 rt_sigaction(SIGTERM, {0x55e19ff48a00, [], SA_RESTORER, 0x7fc3e63585f0}, {SIG_DFL, [], SA_RESTORER, 0x7fc3e63585f0}, 8) = 0 <0.000005> 20:54:16.902948 rt_sigaction(SIGQUIT, {0x55e19ff48a00, [], SA_RESTORER, 0x7fc3e63585f0}, {0x55e19fee1fa0, [QUIT], SA_RESTORER|SA_RESTART, 0x7fc3e63585f0}, 8) = 0 <0.000005> 20:54:16.902966 rt_sigaction(SIGALRM, {0x55e19ff48a00, [], SA_RESTORER, 0x7fc3e63585f0}, {SIG_DFL, [], SA_RESTORER, 0x7fc3e63585f0}, 8) = 0 <0.000005> 20:54:16.902983 rt_sigaction(SIGTSTP, {0x55e19ff48a00, [], SA_RESTORER, 0x7fc3e63585f0}, {SIG_DFL, [], SA_RESTORER, 0x7fc3e63585f0}, 8) = 0 <0.000005> 20:54:16.903001 rt_sigaction(SIGTTOU, {0x55e19ff48a00, [], SA_RESTORER, 0x7fc3e63585f0}, {SIG_DFL, [], SA_RESTORER, 0x7fc3e63585f0}, 8) = 0 <0.000004> 20:54:16.903018 rt_sigaction(SIGTTIN, {0x55e19ff48a00, [], SA_RESTORER, 0x7fc3e63585f0}, {SIG_DFL, [], SA_RESTORER, 0x7fc3e63585f0}, 8) = 0 <0.000006> 20:54:16.903037 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000005> 20:54:16.903055 rt_sigaction(SIGWINCH, {0x55e19ff481a0, [], SA_RESTORER|SA_RESTART, 0x7fc3e63585f0}, {0x55e19fee0760, [WINCH], SA_RESTORER|SA_RESTART, 0x7fc3e63585f0}, 8) = 0 <0.000006> 20:54:16.903088 write(1, "MySQL [xucl]> ", 14) = 14 <0.000032> 20:54:16.903139 read(0,
利用strace观察server端执行
#命令 strace -o /tmp/strace.log -T -tt -f -p `pidof mysqld` #假设你想看跟IO相关的行为,可以用-e trace来指定 # mysql相关的IO操作: # mysql: read, write, open # innodb: pread64, pwrite64 strace -o /tmp/strace.log -T -tt -f -e trace=read,open,write,pwrite64,pread64 -p `pidof mysqld`
示例:
strace会在/tmp文件夹下生成strace.log.xxx,xxx对应的就是THREAD_OS_ID,我们直接看strace.log.8208文件
t@VM_0_9_centos tmp]# cat strace.log.8208 16:44:14.447709 restart_syscall(<... resuming interrupted restart_syscall ...>) = 1 <3.268331> 16:44:17.716076 recvfrom(134
可以清楚地看到有对应的SQL输出
pstack简介
pstack用来跟踪进程栈,这个命令在排查进程问题时非常有用,比如我们发现一个服务一直处于work状态(如假死状态,好似死循环),使用这个命令就能轻松定位问题所在;
可以在一段时间内,多执行几次pstack,若发现代码栈总是停在同一个位置,那个位置就需要重点关注,很可能就是出问题的地方;
正确使用姿势
pstack使用起来非常简单,直接pstack -pid即可
[root@VM_0_9_centos mysql3306]# netstat -lntp|grep 3306 tcp6 0 0 :::33060 :::* LISTEN 13499/mysqld tcp6 0 0 :::3306 :::* LISTEN 13499/mysqld [root@VM_0_9_centos mysql3306]# ps -Lf 13499 UID PID PPID LWP C NLWP STIME TTY STAT TIME CMD mysql 13499 11316 13499 0 53 20:31 pts/0 Sl 0:00 /usr/local/mysql/bin/mysqld --defaults-file=/data/mysql/mysql3306/my3306.cnf --user=mysql mysql 13499 11316 13506 0 53 20:31 pts/0 Sl 0:00 /usr/local/mysql/bin/mysqld --defaults-file=/data/mysql/mysql3306/my3306.cnf --user=mysql mysql 13499 11316 13507 0 53 20:31 pts/0 Sl 0:00 /usr/local/mysql/bin/mysqld --defaults-file=/data/mysql/mysql3306/my3306.cnf --user=mysql mysql 13499 11316 13508 0 53 20:31 pts/0 Sl 0:00 /usr/local/mysql/bin/mysqld --defaults-file=/data/mysql/mysql3306/my3306.cnf --user=mysql mysql 13499 11316 13509 0 53 20:31 pts/0 Sl 0:00 /usr/local/mysql/bin/mysqld --defaults-file=/data/mysql/mysql3306/my3306.cnf --user=mysql mysql 13499 11316 13510 0 53 20:31 pts/0 Sl 0:00 /usr/local/mysql/bin/mysqld --defaults-file=/data/mysql/mysql3306/my3306.cnf --user=mysql mysql 13499 11316 13511 0 53 20:31 pts/0 Sl 0:00 /usr/local/mysql/bin/mysqld --defaults-file=/data/mysql/mysql3306/my3306.cnf --user=mysql ....
mysqld的pid为13499,fork了几十个线程,有的是用户连接,有的是MySQL自己的线程,例如read_thread、write_thread等等,这里的LWP对应的就是performance_schema.threads表的THREAD_OS_ID。
全部抓取
假如发生了mysqld hang住的情况,可以直接pstack mysqldpid来抓取所有的线程堆栈,这也是通常使用的方式,例如
pstack 13499 >> /tmp/pstack.log
这里简单随便看下抓取到的pstack
Thread 27 (Thread 0x7ff1bc29f700 (LWP 13535)): #0 0x00007ff1f6368da2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00000000020ace25 in os_event::timed_wait(timespec const*) () at ../../../mysql-8.0.19/storage/innobase/include/sync0types.h:540 #2 0x00000000020ad9e1 in os_event::wait_time_low (this=0x7ff1dc022458, time_in_usec=
从堆栈可以看出这其实就是MySQL的error monitor线程
单个抓取
假设你运气好,你现在可以通过performance_schema.threads可以查到对应的线程的thread_os_id,你可以指定单个线程的lwp进行抓取
MySQL [performance_schema]> show processlist; +----+-----------------+-----------+--------------------+---------+------+------------------------+---------------------------+ | Id | User | Host | db | Command | Time | State | Info | +----+-----------------+-----------+--------------------+---------+------+------------------------+---------------------------+ | 4 | event_scheduler | localhost | NULL | Daemon | 4319 | Waiting on empty queue | NULL | | 23 | root | localhost | performance_schema | Query | 0 | starting | show processlist | | 24 | root | localhost | xucl | Query | 22 | User sleep | select sleep(600) from t1 | +----+-----------------+-----------+--------------------+---------+------+------------------------+---------------------------+ 3 rows in set (0.00 sec) MySQL [performance_schema]> select THREAD_OS_ID from threads where processlist_id=24\G *************************** 1. row *************************** THREAD_OS_ID: 15020 1 row in set (0.00 sec)
你可以这样操作
pstack 15020 > /tmp/single_pstack.log
查看pstack结果,只抓取了15020线程的堆栈
gdb简介
GDB 是 linux 环境下的一般功能强大的调试器,用来调试 C 或 C++ 写的程序。它可以做这些事情
Start your program, specifying anything that might affect its behavior. Make your program stop on specified conditions. Examine what has happened, when your program has stopped. Change things in your program, so you can experiment with correcting the effects of one bug and go on to learn about another.
常用命令
info threads:查看全部线程 thread n:指定某个线程 b:在某处打断点 c:继续往下走 s:执行一行代码,如果代码函数调用,则进入函数 n:执行一行代码,函数调用不进入 p:打印某个变量值 list:打印代码的文本信息 bt:查看某个线程的栈帧 info b:查看当前所有断点信息
实际案例
进入gdb界面
[root@VM_0_9_centos ~]# gdb /usr/local/mysql5.7/bin/mysqld
进入调试
run --defaults-file=/data/mysql/mysql3308/my3308.cnf --user=mysql --gdb
通过info thread查看所有线程,这个时候我从另外一个会话连接到了MySQL,我从pfs.threads查寻到这个会话的LWP值为513,那么我可以通过thread 41切换到这个线程
(gdb) info thread Id Target Id Frame 41 Thread 0x7fff819cf700 (LWP 513) "mysqld" 0x00007ffff61abbed in poll () from /lib64/libc.so.6 40 Thread 0x7fff5affd700 (LWP 416) "mysqld" 0x00007ffff7bca9f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 39 Thread 0x7fff81a51700 (LWP 415) "mysqld" 0x00007ffff7bce381 in sigwait () from /lib64/libpthread.so.0 38 Thread 0x7fff5b7fe700 (LWP 414) "mysqld" 0x00007ffff7bca9f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 37 Thread 0x7fff5bfff700 (LWP 413) "mysqld" 0x00007ffff7bcada2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 36 Thread 0x7fff6cff9700 (LWP 412) "mysqld" 0x00007ffff7bcada2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 35 Thread 0x7fff6d7fa700 (LWP 411) "mysqld" 0x00007ffff7bca9f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 34 Thread 0x7fff6dffb700 (LWP 410) "mysqld" 0x00007ffff7bca9f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 33 Thread 0x7fff6e7fc700 (LWP 409) "mysqld" 0x00007ffff7bca9f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 32 Thread 0x7fff6effd700 (LWP 408) "mysqld" 0x00007ffff7bca9f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 31 Thread 0x7fff6f7fe700 (LWP 407) "mysqld" 0x00007ffff7bca9f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 30 Thread 0x7fff6ffff700 (LWP 406) "mysqld" 0x00007ffff7bcde5d in nanosleep () from /lib64/libpthread.so.0 29 Thread 0x7fff8084e700 (LWP 405) "mysqld" 0x00007ffff7bcada2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 28 Thread 0x7fff8104f700 (LWP 404) "mysqld" 0x00007ffff7bcada2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 27 Thread 0x7fff81850700 (LWP 403) "mysqld" 0x00007ffff7bcada2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 24 Thread 0x7fff74c24700 (LWP 398) "mysqld" 0x00007ffff7bca9f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 23 Thread 0x7fff75425700 (LWP 397) "mysqld" 0x00007ffff7bca9f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 22 Thread 0x7fff75c26700 (LWP 396) "mysqld" 0x00007ffff7bca9f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 21 Thread 0x7fff76427700 (LWP 395) "mysqld" 0x00007ffff7bcada2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 20 Thread 0x7fff76c28700 (LWP 394) "mysqld" 0x00007ffff79bd644 in __io_getevents_0_4 () from /lib64/libaio.so.1 19 Thread 0x7fff77429700 (LWP 393) "mysqld" 0x00007ffff79bd644 in __io_getevents_0_4 () from /lib64/libaio.so.1 18 Thread 0x7fff77c2a700 (LWP 392) "mysqld" 0x00007ffff79bd644 in __io_getevents_0_4 () from /lib64/libaio.so.1 17 Thread 0x7fff7842b700 (LWP 391) "mysqld" 0x00007ffff79bd644 in __io_getevents_0_4 () from /lib64/libaio.so.1 16 Thread 0x7fff78c2c700 (LWP 390) "mysqld" 0x00007ffff79bd644 in __io_getevents_0_4 () from /lib64/libaio.so.1 15 Thread 0x7fff7942d700 (LWP 389) "mysqld" 0x00007ffff79bd644 in __io_getevents_0_4 () from /lib64/libaio.so.1 14 Thread 0x7fff79c2e700 (LWP 388) "mysqld" 0x00007ffff79bd644 in __io_getevents_0_4 () from /lib64/libaio.so.1 13 Thread 0x7fff7a42f700 (LWP 387) "mysqld" 0x00007ffff79bd644 in __io_getevents_0_4 () from /lib64/libaio.so.1 12 Thread 0x7fff7ac30700 (LWP 386) "mysqld" 0x00007ffff79bd644 in __io_getevents_0_4 () from /lib64/libaio.so.1 11 Thread 0x7fff7b431700 (LWP 385) "mysqld" 0x00007ffff79bd644 in __io_getevents_0_4 () from /lib64/libaio.so.1 10 Thread 0x7fff7bc32700 (LWP 384) "mysqld" 0x00007ffff79bd644 in __io_getevents_0_4 () from /lib64/libaio.so.1 9 Thread 0x7fff7c433700 (LWP 383) "mysqld" 0x00007ffff79bd644 in __io_getevents_0_4 () from /lib64/libaio.so.1 8 Thread 0x7fff7cc34700 (LWP 382) "mysqld" 0x00007ffff79bd644 in __io_getevents_0_4 () from /lib64/libaio.so.1 7 Thread 0x7fff7d435700 (LWP 381) "mysqld" 0x00007ffff79bd644 in __io_getevents_0_4 () from /lib64/libaio.so.1 6 Thread 0x7fff7dc36700 (LWP 380) "mysqld" 0x00007ffff79bd644 in __io_getevents_0_4 () from /lib64/libaio.so.1 5 Thread 0x7fff7e437700 (LWP 379) "mysqld" 0x00007ffff79bd644 in __io_getevents_0_4 () from /lib64/libaio.so.1 4 Thread 0x7fff7ec38700 (LWP 378) "mysqld" 0x00007ffff79bd644 in __io_getevents_0_4 () from /lib64/libaio.so.1 3 Thread 0x7fff7f439700 (LWP 377) "mysqld" 0x00007ffff79bd644 in __io_getevents_0_4 () from /lib64/libaio.so.1 2 Thread 0x7fffec8c4700 (LWP 376) "mysqld" 0x00007ffff60ef53a in sigwaitinfo () from /lib64/libc.so.6 * 1 Thread 0x7ffff7fe4880 (LWP 371) "mysqld" 0x00007ffff61abbed in poll () from /lib64/libc.so.6 (gdb) thread 41 [Switching to thread 41 (Thread 0x7fff819cf700 (LWP 513))] #0 0x00007ffff61abbed in poll () from /lib64/libc.so.6
演示一下打断点:
我们可以在调试过程中随时打印某个变量的值,例如:
(gdb) p pkt_len $1 = 22
甚至gdb可以在紧急情况下救你一命,例如,当MySQL数据库连接打满又没有后台线程可以连接到MySQL的时候,你可以通过gdb来修改MySQL的连接数,例如
gdb -p $(pidof mysqld) -ex "set max_connections=1500" -batch
总结一下:
版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。