Debugging Memory Errors in C/C++

来源:互联网 发布:捕鱼游戏网页版源码 编辑:程序博客网 时间:2024/05/21 05:08

Debugging Memory Errors in C/C++

http://scottmcpeak.com/memory-errors/

This page describes a few key techniques I've learned about howto debug programs that are suspected of containing memory errors.Principally, this includesusing memory after it has been freed,and writing beyond the end of an array. Memory leaks areconsidered briefly at the end.

It's of course rather presumptuous to even write these up, since somuch has already been written. I'm not intending to write the be-alland end-all article, just to write up a few of the techniques I usesince I recently had the opportunity to help a friend debug such anerror. There's also some links at the end to other resources.

Note that I'm only interested here in memory errors that trashpart of the heap. Overwriting the stack may be a cracker's favoritetechnique, but when it happens in front of the programmer it'susually very easy to track down.

Why are memory errors hard to debug?

The first thing to understand about memory errors is why they'redifferent from other bugs. I claim the main reason they are harderto debug is that they arefragile. By fragile, I mean thebug will often only show up under certain conditions, and thatattempts to isolate the bug by changing the program or its inputoften mask its effects. Since the programmer is then forced tofind the needle in the haystack, and cannot use techniques to cutdown on the size of the haystack, locating the cause of the problemis very difficult.

Consequently, the first priority when tracking down suspected memoryerrors is to make the bugmore robust. There is abug in your code, but you need to do something so that the bug'seffects cannot be masked by other actions of the program.

Making the bug more robust

I know of two main techniques for reducing the fragility of a memorybug:

  • Don't re-use memory.
  • Put empty space between memory blocks.

Why do these techniques help? First, by not re-using memory, we caneliminate temporal dependencies between the bug and thesurrounding program. That is, if memory is not re-used, then it nolonger matters in what order the relevant blocks are allocated anddeallocated.

Second, by putting empty space between blocks, overwriting (orunderwriting) past the end of one block won't corrupt another. Thus,we breakspatial dependencies involving the bug. The spacebetween the bugs should be filled with a known value, and the spaceshould be periodically checked (at least when free is called onthat block) to see if the known value has beenchanged.

With temporal and spatial dependencies reduced, it's less likely thata change to the program or its input will disturb the evidence of thebug's presence.

Of course, your machine must have enough spare memory to run theexperiment. But, by making the bug more robust, we can now cut downon the input size! Thus in the end using more space in the short termcan lead to using less space in the final, minimized input test case.

The above two techniques are easily implemented in any debug heapimplementation. I've modified Doug Lea'smalloc to implementthe features; my modified version is here: malloc.c,ckheap.h. To compile with the debug featuresdescribed, set the preprocessor variablesDEBUG andDEBUG_HEAP. But of course you can use any implementation,and the debug versions can simply be wrappers around the real malloc.

Using hardware watchpoints

Intel-compatible x86 processors include debug registers capable ofwatching up to four addresses. Whenever a read or write to any ofthe watched addresses happens, the program traps, and the debuggergets control. The debug registers offer a powerful way to find outwhat line of code is overwriting a given byte, once you know whichbyte is being overwritten.

Ingdb,the notation for using hardware watchpoints is a littleodd, because gdb likes to think of its input as a C expression.If you want to stop when address 0xABCDEF is accessed, then atthe gdb prompt type

  (gdb) watch *((int*)0xABCDEF)

One difficulty is that you can't begin watching an address untilthe memory it refers to has been mapped (requested from the operatingsystem for use by the program). The usual solution is to step throughthe program at a rather coarse granularity (skipping over most functioncalls) until you find a point in time where the address is mapped buthas not yet been trashed. Add the watchpoint, then let the program rununtil the address is accessed.

An example

Suppose I have a program with a suspected memory error. I compileit with the debugmalloc.c, and when I runit I see:

  $ ./tmalloc  trashed 1 bytes  tmalloc: malloc.c:1591: checkZones: Assertion `!"right allocated zone trashed"' failed.  Aborted

I first run the program in the debugger to find the offending address:

  (gdb) run  Starting program: /home/scott/wrk/cplr/smbase/tmalloc  trashed 1 bytes  tmalloc: malloc.c:1591: checkZones: Assertion `!"right allocated zone trashed"' failed.  Program received signal SIGABRT, Aborted.  0x400539f1 in __kill () from /lib/libc.so.6  (gdb) up  #1  0x400536d4 in raise (sig=6) at ../sysdeps/posix/raise.c:27  27      ../sysdeps/posix/raise.c: No such file or directory.  (gdb) up  #2  0x40054e31 in abort () at ../sysdeps/generic/abort.c:88  88      ../sysdeps/generic/abort.c: No such file or directory.  (gdb) up  #3  0x4004dfd2 in __assert_fail () at assert.c:60  60      assert.c: No such file or directory.  (gdb) up  #4  0x8048d55 in checkZones (p=0x8050838 "\016\001", bytes=270)      at malloc.c:1591  (gdb) print p[bytes-1-i]  $1 = 7 '\a'                 <----- trashed! should be 0xAA  (gdb) print p+bytes-1-i  $2 = (unsigned char *) 0x80508c6 "\a", '\252' <repeats 127 times>  (gdb)                  ^^^^^^^^^                         this is the trashed address

Now I restart the program and attempt to set a hardware watchpoint:

  (gdb) break main  Breakpoint 1 at 0x8048b91: file tmalloc.c, line 81.  (gdb) run  The program being debugged has been started already.  Start it from the beginning? (y or n) y  Starting program: /home/scott/wrk/cplr/smbase/tmalloc  Breakpoint 1, main () at tmalloc.c:81  (gdb) watch *((int*)0x80508c6)  Cannot access memory at address 0x80508c6  (gdb)

Ok, the memory isn't mapped yet. Single-stepping through main afew times, I find a place where I can insert the watchpoint butthe memory in question hasn't yet been trashed. When I then continuethe program, the debugger next stops at the bug.

  (gdb) watch *((int*)0x80508c6)  Hardware watchpoint 3: *(int *) 134547654  (gdb) c  Continuing.  Hardware watchpoint 3: *(int *) 134547654  Old value = -1431655766  New value = -1431655929  offEnd () at tmalloc.c:33  (gdb) print /x -1431655766  $1 = 0xaaaaaaaa              <--- what it should be  (gdb) print /x -1431655929  $2 = 0xaaaaaa07              <--- what it became after trashing  (gdb) list  28  29      void offEnd()  30      {  31        char *p = malloc(10);  32        p[10] = 7;    // oops       <--- the bug  33        free(p);  34      }  35  36      void offEndCheck()  37      {  (gdb)

In this small program the bug would have been obvious upon inspection,but the technique of course generalizes to cases that are much morecomplicated.

Dangling references

As mentioned above, a debug heap shouldn't re-use memory. Going onestep further, my debugmalloc.c overwritesfree()'d memory with another known pattern (but does not actually freeit). Then, if the program continues to use the memory the mistakewill become clear, especially if it tries to interpret the values itfinds as pointers (they'll segfault). Double-deallocation is alsoeasy to identify with this scheme.

Memory leaks

I usually debug memory leaks by printing statistics about calls tomalloc and free before and after certain sections of code. If thereare more calls to malloc, but the code isn't supposed to be creatinglong-lived data, then that points to a potential problem. Thisdoesn't easily generalize to long-running programs, but if the programcan be broken into units and the leak properties of each unit checkedin isolation, most leaks can be found relatively easily.

Conclusion

The C and C++ languages are much-maligned for lack of memory safety,but too often this is seen as a greater problem than it really is(setting security issues aside for the moment). Debugging memoryrequires a different approach than debugging other kinds of errors,but with a little practice they can actually be easier and faster tofind, simply because the same techniques (and tools!) can be used overand over.

Some links

I'm not the first or last to write about methods for debugging memoryerrors. Here are some links to other people who also aren't thefirst or last either (actually only the first link really matches thisdescription..).

  • Debugging Tools for Dynamic Storage Allocation and Memory Management: Ben Zorn's long list of tools people have written to help debug memory errors.
  • Doug Lea's malloc: Doug Lea's implementation ofmalloc.
  • malloc.c: My modified version of Doug Lea's malloc, version 2.7.0. I've added:
    • -DDEBUG_HEAP: don't re-use memory, put empty zones on both sides of allocated space, overwrite deallocated space
    • Statistics to track the number of calls to malloc and free.
    • A heap walker interface.
    • -DTRACE_MALLOC_CALLS: print a message to stderr on every malloc and free
  • The above malloc.c also needs the header ckheap.h. That's an oversight I plan to correct, but in the meantime this should be enough to compile malloc.c.
  • gdb: The GNU debugger. The de-facto standard on Linux, for better or worse.
  • Rational: The makers of Purify, one of the best-known tools for finding memory errors. Purify doesn't require recompiling the program, which certainly has its advantages, but as such it is limited in the ways it can make memory bugs more robust. I think sometimes people reach for a heaviweight solution like Purify when a simple debug heap would be faster and easier.
  • CCured: I'd be remiss if I didn't mention CCured, a research project I've done quite a bit of work on. CCured instruments the entire program so it can catch a wide variety of bugs, in a way that is sound: if CCured does not report a problem, then no problem occurred during that run of the program. I can't recommend it as the first solution to reach for during debugging, since it takes a fair bit of time and effort to get a program working under CCured. But in the long run, if you can use CCured, it provides a level of assurance well beyond that of any other current technique.

Valid HTML 4.01!

0 0
原创粉丝点击