调试死锁(deadlock)的方法

来源:互联网 发布:mac os 10.9 iso镜像 编辑:程序博客网 时间:2024/06/08 08:41

如果程序运动不正常的时候,可以利用pstack看一下程序当然的状态,多次执行如下:

[tangliang]$ pstack 31859Thread 3 (Thread 0x7f69b59d2700 (LWP 31860)):#0  0x000000380220e264 in __lll_lock_wait () from /lib64/libpthread.so.0#1  0x0000003802209508 in _L_lock_854 () from /lib64/libpthread.so.0#2  0x00000038022093d7 in pthread_mutex_lock () from /lib64/libpthread.so.0#3  0x000000000040097e in writeTest(void*) ()#4  0x00000038022079d1 in start_thread () from /lib64/libpthread.so.0#5  0x0000003801ae89dd in clone () from /lib64/libc.so.6Thread 2 (Thread 0x7f69b4fd1700 (LWP 31861)):#0  0x000000380220e264 in __lll_lock_wait () from /lib64/libpthread.so.0#1  0x0000003802209508 in _L_lock_854 () from /lib64/libpthread.so.0#2  0x00000038022093d7 in pthread_mutex_lock () from /lib64/libpthread.so.0#3  0x0000000000400a2c in readTest(void*) ()#4  0x00000038022079d1 in start_thread () from /lib64/libpthread.so.0#5  0x0000003801ae89dd in clone () from /lib64/libc.so.6Thread 1 (Thread 0x7f69b59d4720 (LWP 31859)):#0  0x000000380220822d in pthread_join () from /lib64/libpthread.so.0#1  0x0000000000400b13 in main ()[tangliang]$ [tangliang]$ [tangliang]$ pstack 31859Thread 3 (Thread 0x7f69b59d2700 (LWP 31860)):#0  0x000000380220e264 in __lll_lock_wait () from /lib64/libpthread.so.0#1  0x0000003802209508 in _L_lock_854 () from /lib64/libpthread.so.0#2  0x00000038022093d7 in pthread_mutex_lock () from /lib64/libpthread.so.0#3  0x000000000040097e in writeTest(void*) ()#4  0x00000038022079d1 in start_thread () from /lib64/libpthread.so.0#5  0x0000003801ae89dd in clone () from /lib64/libc.so.6Thread 2 (Thread 0x7f69b4fd1700 (LWP 31861)):#0  0x000000380220e264 in __lll_lock_wait () from /lib64/libpthread.so.0#1  0x0000003802209508 in _L_lock_854 () from /lib64/libpthread.so.0#2  0x00000038022093d7 in pthread_mutex_lock () from /lib64/libpthread.so.0#3  0x0000000000400a2c in readTest(void*) ()#4  0x00000038022079d1 in start_thread () from /lib64/libpthread.so.0#5  0x0000003801ae89dd in clone () from /lib64/libc.so.6Thread 1 (Thread 0x7f69b59d4720 (LWP 31859)):#0  0x000000380220822d in pthread_join () from /lib64/libpthread.so.0#1  0x0000000000400b13 in main ()


发现有两个线程的栈有在pthread_mutex_lock (),怀疑可能发生死锁。利用gdb做一些验证吧。

gdb进入当前的进程:gdb -p 31859之后执行

(gdb) thread apply all btThread 3 (Thread 0x7f69b59d2700 (LWP 31860)):#0  0x000000380220e264 in __lll_lock_wait () from /lib64/libpthread.so.0#1  0x0000003802209508 in _L_lock_854 () from /lib64/libpthread.so.0#2  0x00000038022093d7 in pthread_mutex_lock () from /lib64/libpthread.so.0#3  0x000000000040097e in writeTest (temp=0x0) at dead_lock_two_thread.cc:15#4  0x00000038022079d1 in start_thread () from /lib64/libpthread.so.0#5  0x0000003801ae89dd in clone () from /lib64/libc.so.6Thread 2 (Thread 0x7f69b4fd1700 (LWP 31861)):#0  0x000000380220e264 in __lll_lock_wait () from /lib64/libpthread.so.0#1  0x0000003802209508 in _L_lock_854 () from /lib64/libpthread.so.0#2  0x00000038022093d7 in pthread_mutex_lock () from /lib64/libpthread.so.0#3  0x0000000000400a2c in readTest (temp=0x0) at dead_lock_two_thread.cc:35#4  0x00000038022079d1 in start_thread () from /lib64/libpthread.so.0#5  0x0000003801ae89dd in clone () from /lib64/libc.so.6Thread 1 (Thread 0x7f69b59d4720 (LWP 31859)):#0  0x000000380220822d in pthread_join () from /lib64/libpthread.so.0#1  0x0000000000400b13 in main () at dead_lock_two_thread.cc:60

进入线程3:

(gdb) t 3[Switching to thread 3 (Thread 0x7f69b59d2700 (LWP 31860))]#0  0x000000380220e264 in __lll_lock_wait () from /lib64/libpthread.so.0(gdb) f 2#2  0x00000038022093d7 in pthread_mutex_lock () from /lib64/libpthread.so.0(gdb) i rrax            0xfffffffffffffe00-512rbx            0x00rcx            0xffffffffffffffff-1rdx            0x00rsi            0x80128rdi            0x6012c06296256rbp            0x7f69b59d1e900x7f69b59d1e90rsp            0x7f69b59d1e580x7f69b59d1e58r8             0x6012c06296256r9             0x7c7431860r10            0x88r11            0x202514r12            0x380241c360240556032864r13            0x7f69b59d29c0140091995269568r14            0x00r15            0x33rip            0x38022093d70x38022093d7 <pthread_mutex_lock+55>eflags         0x202[ IF ]cs             0x3351ss             0x2b43ds             0x00es             0x00fs             0x00gs             0x00(gdb) p *(pthread_mutex_t *)0x6012c0$1 = {__data = {__lock = 2, __count = 0, __owner = 31861, __nusers = 1, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}},   __size = "\002\000\000\000\000\000\000\000u|\000\000\001", '\000' <repeats 26 times>, __align = 2}(gdb) set print pretty on(gdb) p *(pthread_mutex_t *)0x6012c0$2 = {  __data = {    __lock = 2,     __count = 0,     __owner = 31861,     __nusers = 1,     __kind = 0,     __spins = 0,     __list = {      __prev = 0x0,       __next = 0x0    }  },   __size = "\002\000\000\000\000\000\000\000u|\000\000\001", '\000' <repeats 26 times>,   __align = 2}

之后看一下pthread_mutex_lock对应的参数。因为运行在linux下面的x86_64机器上面,所以函数的第一个参数放到rdi这个寄存器中。

另外,不同的操作系统,存放函数的位置是有差异的。差异见下图。

发现当前线程3(线程id: 31860)锁的owner是31861线程。


进入线程2(线程id: 31861):

(gdb) t 2[Switching to thread 2 (Thread 0x7f69b4fd1700 (LWP 31861))]#0  0x000000380220e264 in __lll_lock_wait () from /lib64/libpthread.so.0(gdb) f 2#2  0x00000038022093d7 in pthread_mutex_lock () from /lib64/libpthread.so.0(gdb) i rrax            0xfffffffffffffe00-512rbx            0x00rcx            0xffffffffffffffff-1rdx            0x00rsi            0x80128rdi            0x6013006296320rbp            0x7f69b4fd0e900x7f69b4fd0e90rsp            0x7f69b4fd0e580x7f69b4fd0e58r8             0x6013006296320r9             0x7c7531861r10            0x88r11            0x202514r12            0x380241c360240556032864r13            0x7f69b4fd19c0140091984779712r14            0x00r15            0x33rip            0x38022093d70x38022093d7 <pthread_mutex_lock+55>eflags         0x202[ IF ]cs             0x3351ss             0x2b43ds             0x00es             0x00fs             0x00gs             0x00(gdb) p *(pthread_mutex_t *)0x601300$3 = {  __data = {    __lock = 2,     __count = 0,     __owner = 31860,     __nusers = 1,     __kind = 0,     __spins = 0,     __list = {      __prev = 0x0,       __next = 0x0    }  },   __size = "\002\000\000\000\000\000\000\000t|\000\000\001", '\000' <repeats 26 times>,   __align = 2}
发现线程2(31861)的锁的owner是31861。

发生了死锁。


参考:

debugging hacks -- 深入调试的技术和工具

https://en.wikipedia.org/wiki/X86_calling_conventions#List_of_x86_calling_conventions