Move another blog here

来源:互联网 发布:上海行知教育 英语培训 编辑:程序博客网 时间:2024/06/01 07:41

Thursday, August 12, 2010

Case study: crash in unloaded module

 

Some time ago we got a problem that our application crashed in an unloaded module. The development spent a lot of time on analyzing the fault until we finally noticed that the fault was in an UNLOADED module.

This is the case that the DLL is unloaded but some resources, - i.e., threads and/or memory variables that were allocated in the DLL, are not yet freed. After the DLL is unloaded, the process still wants to access the resources which are no more available. In this situation, you may find it’s hard to understand the crash because the code seems to be very “beautiful”, it shouldn’t crash here. The crash has nothing to do with the code where the crash happens, remember, the complete module is now unloaded.

For this kind of fault, WinDbg will report an access violation exception in UNLOADED module. The crash dump saves also the handle of unloaded modules which make it easier for WinDbg to locate the exception address – in the range of a certain module.

 

Used tool: WinDbg.
Problem: the application crashes several seconds after the system starts up.
Debugger output: ( for security information, the module name has been replaced with XXX and the Image name replaced with MyApp.)


0:019>!analyze -v
FAULTING_IP:
XXX+9d2f
017a9d2f 0000            add byte ptr [eax],al
EXCEPTION_RECORD:  ffffffff--(.exr 0xffffffffffffffff)
ExceptionAddress: 017a9d2f(<Unloaded_XXX.DLL>+0x00009d2f
  ExceptionCode: c0000005 (Access violation)
  ExceptionFlags: 00000000
NumberParameters: 2
  Parameter[0]: 00000000
  Parameter[1]: 0x17a9d2f
Attempt to read from address 017a9d2f
DEFAULT_BUCKET_ID:  WRONG_SYMBOLS
PROCESS_NAME: MyAPP.exe
MODULE_NAME: XXX
FAULTING_MODULE: 7c900000 ntdll
DEBUG_FLR_IMAGE_TIMESTAMP: 48a9711f
ERROR_CODE: (NTSTATUS) 0xc0000005 - The instruction at "0x%08lx" referenced memory at "0x%08lx". The memory could not be "%s".
READ_ADDRESS:   017a9d2f
BUGCHECK_STR: ACCESS_VIOLATION
LAST_CONTROL_TRANSFER:    from 00000000 to 00000000
STACK_TEXT:
0018a168  00000000 00000000 00000000 00000000 0x0
FAULTING_THREAD:    000006c4
FAILED_INSTRUCTION_ADDRESS:
XXX+9d2f
017a9d2f    0000        add byte ptr [eax],al
FOLLOWING_IP:
XXX+9d2f
017a9d2f    0000       add byte ptr [eax],al
SYMBOL_NAME:    XXX+9d2f
FOLLOWUP_NAME:   MachineOwner
IMAGE_NAME: XXX.DLL
STACK_COMMAND:    ~19s;  .ecxr ; kb
BUCKET_ID:    WRONG_SYMBOLS
FAILURE_BUCKET_ID:   XXX.DLL!base_address_c0000005_WRONG_SYMBOLS
Followup: MachineOwner
-------------
0:019>lm
start         end                  module name
00340000 0035b000        MODULE1  (private pdb symbols) D:/symbol/Module1.pdb
... ...
01680000 0168e000        MODULE2   (private pdb symbols) D:/symbol/Module2.pdb
01aa0000 01abd000       MODULE3 (private pdb symbols) D:/symbol/Module3.pdb
... ...
Unloaded modules:
017a0000 017c6000        XXX.DLL
017a0000 017c6000        XXX.DLL
02c50000 02c76000        XXX.DLL
... ...

From the red texts we know that the exception appears in a module which has been already unloaded from the process.

This is caused by a thread in the unloaded DLL that is not yet totally stopped. The situation appears especially when CPU is heavily loaded. The time ticker assigned to that thread is not long enough to execute its stop code before the process unloads the DLL (A call to function CloseModule).

To fix the problem, the function call CloseModule must be blocked before all threads in this DLL are stopped.

Note: The DLL is used as load-time dynamic linking, e.g. the DLL is loaded by an explicit function call to this DLL. The DLL is unloaded after another function call, here we say CloseModule. If your DLL is loaded by an explicit library call to ::LoadLibrary, Windows does automatically increment the module reference counter. The situation in this article won’t appear.

To best develop a DLL, please refer to MSDN DLL best practice:
 
http://www.microsoft.com/whdc/driver/kernel/DLL_bestproc.mspx.

 

原创粉丝点击