windbg调试之死锁
来源:互联网 发布:js抢红包源代码 编辑:程序博客网 时间:2024/06/05 06:35
发表于2013/9/6 9:28:11 833人阅读
操作系统对死锁的描述如下:
所谓死锁:是指两个或两个以上的进程在执行过程中,因争夺资源而造成的一种互相等待的现象,若无外力作用,它们都将无法推进下去。
那么为什么会产生死锁呢?
1.因为系统资源不足。
2.进程运行推进的顺序不合适。
3.资源分配不当。
而产生死锁的条件有四个:
1.互斥条件:所谓互斥就是进程在某一时间内独占资源。
2.请求与保持条件:一个进程因请求资源而阻塞时,对已获得的资源保持不放。
3.不剥夺条件:进程已获得资源,在末使用完之前,不能强行剥夺。
4.循环等待条件:若干进程之间形成一种头尾相接的循环等待资源关系。
1) 先用!locks查看所有的线程占用的锁
这里可以看到有三个线程正在等待三个锁,第一个线程等待的锁是0043a620,但被5e4这条线程占用,第二个线程等待的锁是0043a844,但被5dc线程占用,
第三个线程等待的锁是031d40d4,也被5dc线程占用。
2)接着,我们需要查看5e4线程和5dc线程的id,具体可以通过查看工具栏中Processes and Threads,如下图所示,5e4的线程的id为53,5dc线程的id为51。
3)分别输入~53kb和~51kb查看这个两个线程的调用栈,结果如下图所示
由数据可知,5e4线程正在等待一把0043a844的锁,而5dc线程也正在等待锁0043a620。
4)结合第一步获取的信息可知,5e4线程要去获取已经被5dc占用的锁0043a844,而5dc又要去获取已经被5e4占用的锁0043a620,如此形成环路,就产生了死锁
以上几步就是分析死锁的几个关键步骤
先上个代码,自己随手写的:
- #include <windows.h >
- CRITICAL_SECTION cs1;
- CRITICAL_SECTION cs2;
- DWORD __stdcall thread1(LPVOID lp)
- {
- EnterCriticalSection(&cs1);
- Sleep(10);
- EnterCriticalSection(&cs2);
- return 0;
- }
- DWORD __stdcall thread2(LPVOID lp)
- {
- EnterCriticalSection(&cs2);
- Sleep(10);
- EnterCriticalSection(&cs1);
- return 0;
- }
- int main()
- {
- InitializeCriticalSection(&cs1);
- InitializeCriticalSection(&cs2);
- CreateThread(NULL, 0, thread1, 0, 0, NULL);
- CreateThread(NULL, 0, thread2, 0, 0, NULL);
- system("pause");
- return 0;
- }
运行,生成release版本,去掉pdb,运行,程序停住了,windbg加载到进程,
先用~*kb查看下所有的线程堆栈:
0:003> ~*kb
0 Id: 1a98.24c Suspend: 1 Teb: 7ffdf000 UnfrozenChildEBP RetAddr Args to Child 0012fddc 7c92df5a 7c8025db 00000044 00000000 ntdll!KiFastSystemCallRet0012fde0 7c8025db 00000044 00000000 00000000 ntdll!NtWaitForSingleObject+0xc0012fe44 7c802542 00000044 ffffffff 00000000 kernel32!WaitForSingleObjectEx+0xa80012fe58 7854bd40 00000044 ffffffff 00000000 kernel32!WaitForSingleObject+0x120012fedc 7854c702 00000000 00392b98 00392de0 MSVCR90!_dospawn+0x1d1 [f:\dd\vctools\crt_bld\self_x86\crt\src\dospawn.c @ 215]0012ff00 7854c84b 00000000 00392b98 0012ff5c MSVCR90!comexecmd+0x60 [f:\dd\vctools\crt_bld\self_x86\crt\src\spawnve.c @ 137]0012ff38 7854cc71 00000000 00392b98 0012ff5c MSVCR90!_spawnve+0x12a [f:\dd\vctools\crt_bld\self_x86\crt\src\spawnve.c @ 273]0012ff70 004010a8 004020f4 00000001 00401218 MSVCR90!system+0x8e [f:\dd\vctools\crt_bld\self_x86\crt\src\system.c @ 87]WARNING: Stack unwind information not available. Following frames may be wrong.0012ffc0 7c817077 00300031 0032002d 7ffdc000 test2+0x10a80012fff0 00000000 00401360 00000000 78746341 kernel32!BaseProcessStart+0x23
1 Id: 1a98.1588 Suspend: 1 Teb: 7ffde000 UnfrozenChildEBP RetAddr Args to Child 0050ff14 7c92df5a 7c939b23 0000002c 00000000 ntdll!KiFastSystemCallRet0050ff18 7c939b23 0000002c 00000000 00000000 ntdll!NtWaitForSingleObject+0xc0050ffa0 7c921046 00403370 0040101d 00403370 ntdll!RtlpWaitForCriticalSection+0x1320050ffa8 0040101d 00403370 000203a8 7c80b729 ntdll!RtlEnterCriticalSection+0x46WARNING: Stack unwind information not available. Following frames may be wrong.0050ffec 00000000 00401000 00000000 00000000 test2+0x101d
2 Id: 1a98.185c Suspend: 1 Teb: 7ffdd000 UnfrozenChildEBP RetAddr Args to Child 0060ff14 7c92df5a 7c939b23 00000034 00000000 ntdll!KiFastSystemCallRet0060ff18 7c939b23 00000034 00000000 00000000 ntdll!NtWaitForSingleObject+0xc0060ffa0 7c921046 00403388 0040104d 00403388 ntdll!RtlpWaitForCriticalSection+0x1320060ffa8 0040104d 00403388 000203a8 7c80b729 ntdll!RtlEnterCriticalSection+0x46WARNING: Stack unwind information not available. Following frames may be wrong.0060ffec 00000000 00401030 00000000 00000000 test2+0x104d
# 3 Id: 1a98.159c Suspend: 1 Teb: 7ffdb000 UnfrozenChildEBP RetAddr Args to Child 003dffc8 7c972119 00000005 00000004 00000001 ntdll!DbgBreakPoint003dfff4 00000000 00000000 00000000 00000000 ntdll!DbgUiRemoteBreakin+0x2d
- 我们注意到1号线程的线程堆栈是从ntdll!RtlEnterCriticalSection中开始的,那么ntdll!RtlEnterCriticalSection又是什么函数的入口呢,首先猜到的是EnterCriticalSection,这个函数是kernel32.dll中的,为了验证猜测,我们用dump查看到kernel32.dll的导出函数:
果然如此,
1. !cs
!cs 扩展显示一个或多个临界区(critical section)或者整个临界区树
前面说的ntdll!RtlEnterCriticalSection的第一个参数是临界区的地址,事实上用uf反汇编它,可以看到是ret 4,说明就只有一个参数
那么,
- 0:003> ~1kb
- ChildEBP RetAddr Args to Child
- 0050ff14 7c92df5a 7c939b23 0000002c 00000000 ntdll!KiFastSystemCallRet
- 0050ff18 7c939b23 0000002c 00000000 00000000 ntdll!NtWaitForSingleObject+0xc
- 0050ffa0 7c921046 00403370 0040101d 00403370 ntdll!RtlpWaitForCriticalSection+0x132
- 0050ffa8 0040101d 00403370 000203a8 7c80b729 ntdll!RtlEnterCriticalSection+0x46
- WARNING: Stack unwind information not available. Following frames may be wrong.
- 0050ffec 00000000 00401000 00000000 00000000 test2+0x101d
- 0:003> !cs 00403370
- -----------------------------------------
- Critical section = 0x00403370 (test2+0x3370)
- DebugInfo = 0x7c99e9e0
- LOCKED
- LockCount = 0x1
- OwningThread = 0x0000185c
- RecursionCount = 0x1
- LockSemaphore = 0x2C
- SpinCount = 0x00000000
- 这里LockCount为1意思为除了一个线程拥有它外,另外还有一个线程在等待它,它是由EnterCriticalSection增加,LeaveCriticalSection来减小的,比如我再加一点代码:
- DWORD __stdcall thread3(LPVOID lp)
- {
- EnterCriticalSection(&cs2);
- Sleep(10);
- EnterCriticalSection(&cs1);
- return 0;
- }
- int main()
- {
- InitializeCriticalSection(&cs1);
- InitializeCriticalSection(&cs2);
- CreateThread(NULL, 0, thread1, 0, 0, NULL);
- CreateThread(NULL, 0, thread2, 0, 0, NULL);
- CreateThread(NULL, 0, thread3, 0, 0, NULL);
- system("pause");
- return 0;
- }
这时运行windbg:
- 0:004> ~1kb
- ChildEBP RetAddr Args to Child
- 0051fe48 7c92df5a 7c939b23 00000034 00000000 ntdll!KiFastSystemCallRet
- 0051fe4c 7c939b23 00000034 00000000 00000000 ntdll!NtWaitForSingleObject+0xc
- 0051fed4 7c921046 00417140 00411420 00417140 ntdll!RtlpWaitForCriticalSection+0x132
- *** WARNING: Unable to verify checksum for D:\Project1\test2\Debug\test2.exe
- 0051fedc 00411420 00417140 00000000 00000000 ntdll!RtlEnterCriticalSection+0x46
- 0051ffb4 7c80b729 00000000 00000000 00000000 test2!thread1+0x50 [d:\project1\test2\test2\test2.cpp @ 10]
- 0051ffec 00000000 00411122 00000000 00000000 kernel32!BaseThreadStart+0x37
- 0:004> !cs 00417140
- -----------------------------------------
- Critical section = 0x00417140 (test2!cs2+0x0)
- DebugInfo = 0x7c99ea00
- LOCKED
- LockCount = 0x2
- OwningThread = 0x00001f60
- RecursionCount = 0x1
- LockSemaphore = 0x34
- SpinCount = 0x00000000
可以发现LockCount变成了2,如果临界区是有信号的,则显示NOT LOCKED(值为-1)
OwningThread表示拥有这个临界区的线程ID,RecursionCount表示拥有线程调了几次EnterCriticalSection,这其实也影响到了LockCount,如果拥有线程多调用一次EnterCriticalSection,那么 LockCount也会相应加1,因为LockCount标识了任意线程调用EnterCriticalSection请求这个互斥量的次数减1,(所以0-1=-1为NOT LOCKED)当然,前面如果调用了LeaveCriticalSection,那么 LockCount也会相应减1
我们继续看原有的程序:
~~[TID]线程 ID 为 TID 的线程。(中括号是必需的,而且在第二个~和左括号间不能有空格)
- 0:003> ~~[0x0000185c]
- 2 Id: 1a98.185c Suspend: 1 Teb: 7ffdd000 Unfrozen
- Start: test2+0x1030 (00401030)
- Priority: 0 Priority class: 32 Affinity: f
这意思就是1号线程等待的临界区拥有者是2号线程,那么同样我们对2号线程进行分析:
- 0:003> ~2kb
- ChildEBP RetAddr Args to Child
- 0060ff14 7c92df5a 7c939b23 00000034 00000000 ntdll!KiFastSystemCallRet
- 0060ff18 7c939b23 00000034 00000000 00000000 ntdll!NtWaitForSingleObject+0xc
- 0060ffa0 7c921046 00403388 0040104d 00403388 ntdll!RtlpWaitForCriticalSection+0x132
- 0060ffa8 0040104d 00403388 000203a8 7c80b729 ntdll!RtlEnterCriticalSection+0x46
- WARNING: Stack unwind information not available. Following frames may be wrong.
- 0060ffec 00000000 00401030 00000000 00000000 test2+0x104d
- 0:003> !cs 00403388
- -----------------------------------------
- Critical section = 0x00403388 (test2+0x3388)
- DebugInfo = 0x7c99e9c0
- LOCKED
- LockCount = 0x1
- OwningThread = 0x00001588
- RecursionCount = 0x1
- LockSemaphore = 0x34
- SpinCount = 0x00000000
- 0:003> ~~[0x00001588]
- 1 Id: 1a98.1588 Suspend: 1 Teb: 7ffde000 Unfrozen
- Start: test2+0x1000 (00401000)
- Priority: 0 Priority class: 32 Affinity: f
原来2号线程等待的临界区拥有者是1号线程,所以经典的死锁现象出现了!!!!!!!!!!!!!!!!!!!!!!!!
下面继续介绍下!cs的扩展:
- 0:003> !cs -l
- -----------------------------------------
- DebugInfo = 0x7c99e9c0
- Critical section = 0x00403388 (test2+0x3388)
- LOCKED
- LockCount = 0x1
- OwningThread = 0x00001588
- RecursionCount = 0x1
- LockSemaphore = 0x34
- SpinCount = 0x00000000
- -----------------------------------------
- DebugInfo = 0x7c99e9e0
- Critical section = 0x00403370 (test2+0x3370)
- LOCKED
- LockCount = 0x1
- OwningThread = 0x0000185c
- RecursionCount = 0x1
- LockSemaphore = 0x2C
- SpinCount = 0x00000000
!cs starAddress EndAddress指定要搜索临界区的地址范围
- 0:003> !cs 0x00400000 0x00500000
- -----------------------------------------
- DebugInfo = 0x7c99e9c0
- Critical section = 0x00403388 (test2+0x3388)
- LOCKED
- LockCount = 0x1
- OwningThread = 0x00001588
- RecursionCount = 0x1
- LockSemaphore = 0x34
- SpinCount = 0x00000000
- -----------------------------------------
- DebugInfo = 0x7c99e9e0
- Critical section = 0x00403370 (test2+0x3370)
- LOCKED
- LockCount = 0x1
- OwningThread = 0x0000185c
- RecursionCount = 0x1
- LockSemaphore = 0x2C
- SpinCount = 0x00000000
!cs -?显示该命令的帮助文本。
- 0:003> !cs -?
- !cs [-s] [-l] [-o] - dump all the active critical sections in the current process.
- !cs [-s] [-o] address - dump critical section at this address.
- !cs [-s] [-l] [-o] address1 address2 - dump all the active critical sections in this range.
- !cs [-s] [-o] -d address - dump critical section corresponding to DebugInfo at this address.
- "-s" will dump the critical section initialization stack trace if it is available.
- "-l" will dump only the locked critical sections.
- "-o" will dump the critical section owner's stack.
- windbg调试之死锁
- windbg调试之死锁
- windbg 调试线程死锁
- windbg 经典死锁调试
- windbg调试死锁问题
- 用Windbg调试Silverlight应用死锁
- windbg调试驱动自旋锁死锁
- windows调试器之windbg
- Windbg内核调试之三: 调试驱动
- Windbg内核调试之三: 调试驱动
- Windbg内核调试之三: 调试驱动
- Windbg内核调试之三: 调试驱动
- Windbg内核调试之三: 调试驱动
- python调试技术之死锁
- Linux死锁调试之softlockup
- Linux死锁调试之hardlockup
- Windbg调试----Windbg入门
- Windbg调试----Windbg入门
- Math类的典型应用
- jS原型与原型链
- 和随机数相关的类Random
- 百度网盘上传文件限制500个
- 少用Data类
- windbg调试之死锁
- Calendar类的典型应用
- 枚举类型enum
- 解决listView嵌套CheckBox的选中错乱
- Java 8新增日期时间新成员
- 当前项目登录掉线之后重新进入到最后访问页面的方法
- Java正则表达式基础
- iOS开发过程中 const 与 static 的使用
- 多线程 AfxBeginThread 与 CreateThread 的区别