转载2005 Malloc Maleficarum(待翻译)

来源:互联网 发布:py2exe 源码 编辑:程序博客网 时间:2024/06/10 07:09
Date: Tue, 11 Oct 2005 10:14:02 -0700From: Phantasmal Phantasmagoria <phantasmal@hush.ai>To: bugtraq@securityfocus.comSubject: The Malloc Maleficarum-----BEGIN PGP SIGNED MESSAGE-----Hash: SHA1[--------------------------------The Malloc MaleficarumGlibc Malloc Exploitation Techniquesby Phantasmal Phantasmagoriaphantasmal@hush.ai[--------------------------------In late 2001, "Vudo Malloc Tricks" and "Once Upon A free()" definedthe exploitation of overflowed dynamic memory chunks on Linux. Inlate 2004, a series of patches to GNU libc malloc implemented overa dozen mandatory integrity assertions, effectively rendering theexisting techniques obsolete.It is for this reason, a small suggestion of impossiblity, that Ipresent the Malloc Maleficarum.[--------------------------------          The House of Prime          The House of Mind          The House of Force          The House of Lore          The House of Spirit          The House of Chaos[--------------------------------          The House of PrimeAn artist has the first brush stroke of a painting. A writer hasthe first line of a poem. I have the House of Prime. It was thefirst breakthrough, the indication of everything that was to come.It was the rejection of impossibility. And it was also the mostdifficult to derive. For these reasons I feel obliged to give Primethe position it deserves as the first House of the MallocMaleficarum.>From a purely technical perspective the House of Prime is perhapsthe least useful of the collection. It is almost invariably betterto use the House of Mind or Spirit when the conditions allow it. Inorder to successfully apply the House of Prime it must be possibleto free() two different chunks with designer controlled size fieldsand then trigger a call to malloc().The general idea of the technique is to corrupt the fastbin maximumsize variable, which under certain uncontrollable circumstances(discussed below) allows the designer to hijack the arena structureused by calls to malloc(), which in turn allows either the returnof an arbitrary memory chunk, or the direct modification ofexecution control data.As previously stated, the technique starts with a call to free() onan area of memory that is under control of the designer. A call tofree() actually invokes a wrapper, called public_fREe(), to theinternal function _int_free(). For the House of Prime, the detailsof public_fREe() are relatively unimportant. So attention moves,instead, to _int_free(). From the glibc-2.3.5 source code:void_int_free(mstate av, Void_t* mem){    mchunkptr       p;           /* chunk corresponding to mem */    INTERNAL_SIZE_T size;        /* its size */    mfastbinptr*    fb;          /* associated fastbin */    ...    p = mem2chunk(mem);    size = chunksize(p);    if (__builtin_expect ((uintptr_t) p > (uintptr_t) -size, 0)        || __builtin_expect ((uintptr_t) p & MALLOC_ALIGN_MASK, 0))    {        errstr = "free(): invalid pointer";      errout:        malloc_printerr (check_action, errstr, mem);        return;    }Almost immediately one of the much vaunted integrity tests appears.The __builtin_expect() construct is used for optimization purposes,and does not in any way effect the conditions it contains. Thedesigner must ensure that both of the tests fail in order tocontinue execution. At this stage, however, doing so is notdifficult.Note that the designer does not control the value of p. It cantherefore be assumed that the test for misalignment will fail. Onthe other hand, the designer does control the value of size. Infact, it is the most important aspect of control that the designerpossesses, yet its range is already being limited. For the theHouse of Prime the exact upper limit of size is not important. Thelower limit, however, is crucial in the correct execution of thistechnique. The chunksize() macro is defined as follows:#define SIZE_BITS (PREV_INUSE|IS_MMAPPED|NON_MAIN_ARENA)#define chunksize(p)         ((p)->size & ~(SIZE_BITS))The PREV_INUSE, IS_MMAPPED and NON_MAIN_ARENA definitionscorrespond to the three least significant bits of the size entry ina malloc chunk. The chunksize() macro clears these three bits,meaning the lowest possible value of the designer controlled sizevalue is 8. Continuing with _int_free() it will soon become clearwhy this is important:    if ((unsigned long)(size) <= (unsigned long)(av->max_fast))    {      if (chunk_at_offset (p, size)->size <= 2 * SIZE_SZ          || __builtin_expect (chunksize (chunk_at_offset (p, size))                               >= av->system_mem, 0))        {          errstr = "free(): invalid next size (fast)";          goto errout;        }      set_fastchunks(av);      fb = &(av->fastbins[fastbin_index(size)]);      if (__builtin_expect (*fb == p, 0))        {          errstr = "double free or corruption (fasttop)";          goto errout;        }      p->fd = *fb;      *fb = p;    }This is the fastbin code. Exactly what a fastbin is and why theyare used is beyond the scope of this document, but remember thatthe first step in the House of Prime is to overwrite the fastbinmaximum size variable, av->max_fast. In order to do this thedesigner must first provide a chunk with the lower limit size,which was derived above. Given that the default value of av->max_fast is 72 it is clear that the fastbin code will be used forsuch a small size. However, exactly why this results in thecorruption of av->max_fast is not immediately apparent.It should be mentioned that av is the arena pointer. The arena is acontrol structure that contains, amongst other things, the maximumsize of a fastbin and an array of pointers to the fastbinsthemselves. In fact, av->max_fast and av->fastbins are contiguous:    ...    INTERNAL_SIZE_T  max_fast;    mfastbinptr      fastbins[NFASTBINS];    mchunkptr        top;    ...Assuming that the nextsize integrity check fails, the fb pointer isset to the address of the relevant fastbin for the given size. Thisis computed as an index from the zeroth entry of av->fastbins. Thezeroth entry, however, is designed to hold chunks of a minimum sizeof 16 (the minimum size of a malloc chunk including prev_size andsize values). So what happens when the designer supplies the lowerlimit size of 8? An analysis of fastbin_index() is needed:#define fastbin_index(sz)        ((((unsigned int)(sz)) >> 3) - 2)Simple arithmetic shows that 8 >> 3 = 1, and 1 - 2 = -1. Thereforefastbin_index(8) is -1, and thus fb is set to the address of av->fastbins[-1]. Since av->max_fast is contiguous to av->fastbins itis evident that the fb pointer is set to &av->max_fast.Furthermore, the second integrity test fails (since fb definitelydoes not point to p) and the final two lines of the fastbin codeare reached. Thus the forward pointer of the designer's chunk p isset to av->max_fast, and av->max_fast is set to the value of p.An assumption was made above that the nextsize integrity checkfails. In reality it often takes a bit of work to get this to falltogether. If the overflow is capable of writing null bytes, thenthe solution is simple. However, if the overflow terminates on anull byte, then the solution becomes application specific. If thetests fail because of the natural memory layout at overflow, whichthey often will, then there is no problem. Otherwise some memorylayout manipulation may be needed to ensure that the nextsize valueis designer controlled.The challenging part of the House of Prime, however, is not how tooverwrite av->max_fast, but how to leverage the overwrite intoarbitrary code execution. The House of Prime does this byoverwriting a thread specific data variable called arena_key. Thisis where the biggest condition of the House of Prime arises.Firstly, arena_key only exists if glibc malloc is compiled withUSE_ARENAS defined (this is the default setting). Furthermore, andmost significantly, arena_key must be at a higher address than theactual arena:0xb7f00000 <main_arena>:        0x000000000xb7f00004 <main_arena+4>:      0x00000049      <-- max_fast0xb7f00008 <main_arena+8>:      0x00000000      <-- fastbin[0]0xb7f0000c <main_arena+12>:     0x00000000      <-- fastbin[1]....0xb7f00488 <mp_+40>:            0x0804a000      <-- mp_.sbrk_base0xb7f0048c <arena_key>:         0xb7f00000Due to the fact that the arena structure and the arena_key comefrom different source files, exactly when this does and doesn'thappen depends on how the target libc was compiled and linked. Ihave seen the cards fall both ways, so it is an important point tomake. For now it will be assumed that the arena_key is at a higheraddress, and is thus over-writable by the fastbin code.The arena_key is thread specific data, which simply means thatevery thread of execution has its own arena_key independent ofother threads. This may have to be considered when applying theHouse of Prime to a threaded program, but otherwise arena_key cansafely be treated as normal data.The arena_key is an interesting target because it is used by thearena_get() macro to find the arena for the currently executingthread. That is, if arena_key is controlled for some thread and acall to arena_get() is made, then the arena can be hijacked. Arenahijacking of this type will be covered shortly, but first theactual overwrite of arena_key must be considered.In order to overwrite arena_key the fastbin code is used for asecond time. This corresponds to the second free() of a designercontrolled chunk that was outlined in the original prerequisitesfor the House of Prime. Normally the fastbin code would not be ableto write beyond the end of av->fastbins, but since av->max_fast haspreviously been corrupted, chunks with any size less than the valueof the address of the designer's first chunk will be treated withthe fastbin code. Thus the designer can write up to av->fastbins[fastbin_index(av->max_fast)], which is easily a largeenough range to be able to reach the arena_key.In the example memory dump provided above the arena_key is 0x484(1156) bytes from av->fastbins[0]. Therefore an index of1156/sizeof(mfastbinptr) is needed to set fb to the address ofarena_key. Assuming that the system has 32-bit pointers afastbin_index() of 289 is required. Roughly inverting thefastbin_index() gives:          (289 + 2) << 3 = 2328This means that a size of 2328 will result in fb being set toarena_key. Note that this size only applies for the memory dumpshown above. It is quite likely that the offset between av->fastbins[0] and arena_key will differ from system to system.Now, if the designer has corrupted av->max_fast and triggered afree() on a chunk with size 2328, and assuming the failure of thenextsize integrity tests, then fb will be set to arena_key, theforward pointer of the designer's second chunk will be set to theaddress of the existing arena, and arena_key will be set to theaddress of the designer's second chunk.When corrupting av->max_fast it was not important for the designerto control the overflowed chunk so long as the nextsize integritychecks were handled. When overwriting arena_key, however, it iscrucial that the designer controls at least part of the overflowedchunk's data. This is because the overflowed chunk will soon becomethe new arena, so it is natural that at least part of the chunkdata must be arbitrarily controlled, or else arbitrary control ofthe result of malloc() could not be expected.A call to malloc() invokes a wrapper function calledpublic_mALLOc():Void_t*public_mALLOc(size_t bytes){    mstate ar_ptr;    Void_t *victim;    ...    arena_get(ar_ptr, bytes);    if(!ar_ptr)      return 0;    victim = _int_malloc(ar_ptr, bytes);    ...    return victim;}The arena_get() macro is in charge of finding the current arena byretrieving the arena_key thread specific data, or failing this,creating a new arena. Since the arena_key has been overwritten witha non-zero quantity it can be safely assumed that arena_get() willnot try to create a new arena. In the public_mALLOc() wrapper thishas the effect of setting ar_ptr to the new value of arena_key, theaddress of the designer's second chunk. In turn this value ispassed to the internal function _int_malloc() along with therequested allocation size.Once execution passes to _int_malloc() there are two ways for thedesigner to proceed. The first is to use the fastbin allocationcode:Void_t*_int_malloc(mstate av, size_t bytes){    INTERNAL_SIZE_T nb;               /* normalized request size */    unsigned int    idx;              /* associated bin index */    mfastbinptr*    fb;               /* associated fastbin */    mchunkptr       victim;           /* inspected/selected chunk */    checked_request2size(bytes, nb);    if ((unsigned long)(nb) <= (unsigned long)(av->max_fast)) {      long int idx = fastbin_index(nb);      fb = &(av->fastbins[idx]);      if ( (victim = *fb) != 0) {        if (fastbin_index (chunksize (victim)) != idx)          malloc_printerr (check_action, "malloc(): memory"            " corruption (fast)", chunk2mem (victim));        *fb = victim->fd;        check_remalloced_chunk(av, victim, nb);        return chunk2mem(victim);      }    }The checked_request2size() macro simply converts the request intothe absolute size of a memory chunk with data length of therequested size. Remember that av is pointing towards a designercontrolled area of memory, and also that the forward pointer ofthis chunk has been corrupted by the fastbin code. If glibc mallocis compiled without thread statistics (which is the default), thenp->fd of the designer's chunk corresponds to av->fastbins[0] of thedesigner's arena. For the purposes of this technique the use of av->fastbins[0] must be avoided. This means that the request size mustbe greater than 8.Interestingly enough, if the absence of thread statistics isassumed, then av->max_fast corresponds to p->size. This has theeffect of forcing nb to be less than the size of the designer'ssecond chunk, which in the example provided was 2328. If this isnot possible, the designer must use the unsorted_chunks/largebintechnique that will be discussed shortly.By setting up a fake fastbin entry at av->fastbins[fastbin_index(nb)] it is possible to return a chunk ofmemory that is actually on the stack. In order to pass thevictimsize integrity test it is necessary to point the fake fastbinat a user controlled value. Specifically, the size of the victimchunk must have the same fastbin_index() as nb, so the fake fastbinmust point to 4 bytes before the designer's value in order to havethe right positioning for the call to chunksize().Assuming that there is a designer controlled variable on the stack,the application will subsequently handle the returned area as if itwere a normal memory chunk of the requested size. So if there is asaved return address in the "allocated" range, and if the designercan control what the application writes to this range, then it ispossible to circumvent execution to an arbitrary location.If it is possible to trigger an appropriate malloc() with a requestsize greater than the size of the designer's second chunk, then itis better to use the unsorted_chunks code in _int_malloc() to causean arbitrary memory overwrite. This technique does, however,require a greater amount of designer control in the second chunk,and further control of two areas of memory somewhere in the targetprocess address space. To trigger the unsorted_chunks code at allthe absolute request size must be larger than 512 (the maximumsmallbin chunk size), and of course, must be greater than the fakearena's av->max_fast. Assuming it is, the unsorted_chunks code isreached:    for(;;) {      while ( (victim = unsorted_chunks(av)->bk) !=unsorted_chunks(av)) {        bck = victim->bk;        if (__builtin_expect (victim->size <= 2 * SIZE_SZ, 0)            || __builtin_expect (victim->size > av->system_mem, 0))          malloc_printerr (check_action, "malloc(): memory"            " corruption", chunk2mem (victim));        size = chunksize(victim);        if (in_smallbin_range(nb) &&            bck == unsorted_chunks(av) &&            victim == av->last_remainder &&            (unsigned long)(size) > (unsigned long)(nb + MINSIZE)) {          ...        }        unsorted_chunks(av)->bk = bck;        bck->fd = unsorted_chunks(av);        if (size == nb) {          ...          return chunk2mem(victim);        }        ...There are quite a lot of things to consider here. Firstly, theunsorted_chunks() macro returns av->bins[0]. Since the designercontrols av, the designer also controls the value ofunsorted_chunks(). This means that victim can be set to anarbitrary address by creating a fake av->bins[0] value that pointsto an area of memory (called A) that is designer controlled. Inturn, A->bk will contain the address that victim will be set to(called B). Since victim is at an arbitrary address B that can bedesigner controlled, the temporary variable bck can be set to anarbitrary address from B->bk.For the purposes of this technique, B->size should be equal to nb.This is not strictly necessary, but works well to pass the twovictimsize integrity tests while also triggering the finalcondition shown above, which has the effect of ending the call tomalloc().Since it is possible to set bck to an arbitrary location, and sinceunsorted_chunks() returns the designer controlled area of memory A,the setting of bck->fd to unsorted_chunks() makes it possible toset any location in the address space to A. Redirecting executionis then a simple matter of setting bck to the address of a GOT or..dtors entry minus 8. This will redirect execution to A->prev_size,which can safely contain a near jmp to skip past the crafted valueat A->bk. Similar to the fastbin allocation code the arbitraryaddress B is returned to the requesting application.[--------------------------------          The House of MindPerhaps the most useful and certainly the most general technique inthe Malloc Maleficarum is the House of Mind. The House of Mind hasthe distinct advantage of causing a direct memory overwrite withjust a single call to free(). In this sense it is the closestrelative in the Malloc Maleficarum to the traditional unlink()technique.The method used involves tricking the wrapper invoked by free(),called public_fREe(), into supplying the _int_free() internalfunction with a designer controlled arena. This can subsequentlylead to an arbitrary memory overwrite. A call to free() actuallyinvokes a wrapper called public_fREe():voidpublic_fREe(Void_t* mem){    mstate ar_ptr;    mchunkptr p;        /* chunk corresponding to mem */    ...    p = mem2chunk(mem);    ...    ar_ptr = arena_for_chunk(p);    ...    _int_free(ar_ptr, mem);When memory is passed to free() it points to the start of the dataportion of the "corresponding chunk". In an allocated state a chunkconsists of the prev_size and size values and then the data sectionitself. The mem2chunk() macro is in charge of converting thesupplied memory value into the corresponding chunk. This chunk isthen passed to the arena_for_chunk() macro:#define HEAP_MAX_SIZE (1024*1024) /* must be a power of two */#define heap_for_ptr(ptr) \   ((heap_info *)((unsigned long)(ptr) & ~(HEAP_MAX_SIZE-1)))#define chunk_non_main_arena(p) ((p)->size & NON_MAIN_ARENA)#define arena_for_chunk(ptr) \   (chunk_non_main_arena(ptr)?heap_for_ptr(ptr)->ar_ptr:&main_arena)The arena_for_chunk() macro is tasked with finding the appropriatearena for the chunk in question. If glibc malloc is compiled withUSE_ARENAS (which is the default), then the code shown above isused. Clearly, if the NON_MAIN_ARENA bit in the size value of thechunk is not set, then ar_ptr will be set to the main_arena.However, since the designer controls the size value it is possibleto control whether the chunk is treated as being in the main arenaor not. This is what the chunk_non_main_arena() macro checks for.If the NON_MAIN_ARENA bit is set, then chunk_non_main_arena()returns positive and ar_ptr is set to heap_for_ptr(ptr)->ar_ptr.When a non-main heap is created it is aligned to a multiple ofHEAP_MAX_SIZE. The first thing that goes into this heap is theheap_info structure. Most significantly, this structure contains anelement called ar_ptr, the pointer to the arena for this heap. Thisis how the heap_for_ptr() macro functions, aligning the given chunkdown to a multiple of HEAP_MAX_SIZE and taking the ar_ptr from theresulting heap_info structure.The House of Mind works by manipulating the heap so that thedesigner controls the area of memory that the overflowed chunk isaligned down to. If this can be achieved, an arbitrary ar_ptr valuecan be supplied to _int_free() and subsequently an arbitrary memoryoverwrite can be triggered. Manipulating the heap generallyinvolves forcing the application to repeatedly allocate memoryuntil a designer controlled buffer is contained at a HEAP_MAX_SIZEboundary.In practice this alignment is necessary because chunks at low areasof the heap align down to an area of memory that is neitherdesigner controlled nor mapped in to the address space.Fortunately, the amount of allocation that creates the correctalignment is not large. With the default HEAP_MAX_SIZE of 1024*1024an average of 512kb of padding will be required, with this figurenever exceeding 1 megabyte.It should be noted that there is not a general method fortriggering memory allocation as required by the House of Mind,rather the process is application specific. If a situation arisesin which it is impossible to align a designer controlled chunk,then the House of Lore or Spirit should be considered.So, it is possible to hijack the heap_info structure used by theheap_for_ptr() macro, and thus supply an arbitrary value for ar_ptrwhich controls the arena used by _int_free(). At this stage thenext question that arises is exactly what to do with ar_ptr. Thereare two options, each with their respective advantages anddisadvantages. Each will be addressed in turn.Firstly, setting the ar_ptr to a sufficiently large area of memorythat is under the control of the designer and subsequently usingthe unsorted chunk link code to cause a memory overwrite.Sufficiently large in this case means the size of the arenastructure, which is 1856 bytes on a 32-bit system withoutTHREAD_STATS enabled. The main difficulty in this method ariseswith the numerous integrity checks that are encountered.Fortunately, nearly every one of these tests use a value obtainedfrom the designer controlled arena, which makes the checksconsiderably easier to manage.For the sake of brevity, the complete excerpt leading up to theunsorted chunk link code has been omitted. Instead, the followinglist of the conditions required to reach the code in question isprovided. Note that both av and the size of the overflowed chunkare designer controlled values.     - The negative of the size of the overflowed chunk must       be less than the value of the chunk itself.     - The size of the chunk must not be less than av->max_fast.     - The IS_MMAPPED bit of the size cannot be set.     - The overflowed chunk cannot equal av->top.     - The NONCONTIGUOUS_BIT of av->max_fast must be set.     - The PREV_INUSE bit of the nextchunk (chunk + size)       must be set.     - The size of nextchunk must be greater than 8.     - The size of nextchunk must be less than av->system_mem     - The PREV_INUSE bit of the chunk must not be set.     - The nextchunk cannot equal av->top.     - The PREV_INUSE bit of the chunk after nextchunk       (nextchunk + nextsize) must be setIf these conditions are met, then the following code is reached:     bck = unsorted_chunks(av);     fwd = bck->fd;     p->bk = bck;     p->fd = fwd;     bck->fd = p;     fwd->bk = p;In this case p is the address of the designer's overflowed chunk.The unsorted_chunks() macro returns av->bins[0] which is designercontrolled. If the designer sets av->bins[0] to the address of aGOT or .dtors entry minus 8, then that entry (bck->fd) will beoverwritten with the address of p. This address corresponds to theprev_size entry of the designer's overflowed chunk which can safelybe used to branch past the corrupted size, fd and bk entries.The extensive list of conditions appear to make this method quitedifficult to apply. In reality, the only conditions that may be aproblem are those involving the nextchunk. This is because theylargely depend on the application specific memory layout to handle.This is the only obvious disadvantage of the method. As it stands,the House of Mind is in a far better position than the House ofPrime to handle such conditions due to the arbitrary nature of av->system_mem.It should be noted that the last element of the arena structurethat is actually required to reach the unsorted chunk link code isav->system_mem, but it is not terribly important what this value isso long as it is high. Thus if the conditions are right, it may bepossible to use this method with only 312 bytes of designercontrolled memory. However, even if there is not enough designercontrolled memory for this method, the House of Mind may still bepossible with the second method.The second method uses the fastbin code to cause a memoryoverwrite. The main advantage of this method is that it is notnecessary to point ar_ptr at designer controlled memory, and thatthere are considerably less integrity checks to worry about.Consider the fastbin code:    if ((unsigned long)(size) <= (unsigned long)(av->max_fast))    {      if (chunk_at_offset (p, size)->size <= 2 * SIZE_SZ           || __builtin_expect (chunksize (chunk_at_offset (p, size))                                 >= av->system_mem, 0))        {          errstr = "free(): invalid next size (fast)";          goto errout;        }      set_fastchunks(av);      fb = &(av->fastbins[fastbin_index(size)]);      ...      p->fd = *fb;      *fb = p;    }The ultimate goal here is to set fb to the address of a GOT or..dtors entry, which subsequently gets set to the address of thedesigner's overflowed chunk. However, in order to reach the finalline a number of conditions must still be met. Firstly, av->max_fast must be large enough to trigger the fastbin code at all.Then the size of the nextchunk (p + size) must be greater than 8,while also being less than av->system_mem.The tricky part of this method is positioning ar_ptr in a way suchthat both the av->max_fast element at (av + 4) and the av->system_mem element at (av + 1848) are large enough. If a binaryhas a particularly small GOT table, then it is quite possible thatthe highest available large number for av->system_mem will resultin an av->max_fast that is actually in the area of unmapped memorybetween the text and data segments. In practice this shouldn'toccur very often, and if it does, then the stack may be used to asimilar effect.For more information on the fastbin code, including a descriptionof fastbin_index() that will help in positioning fb to a GOT or..dtors entry, consult the House of Prime.[--------------------------------          The House of ForceI first wrote about glibc malloc in 2004 with "Exploiting theWilderness". Since the techniques developed in that text were someof the first to become obsolete, and since the Malloc Maleficarumwas written in the spirit of continuation and progress, I feelobliged to include another attempt at exploiting the wilderness.This is the purpose of the House of Force. From "Exploiting theWilderness":"The wilderness is the top-most chunk in allocated memory. It issimilar to any normal malloc chunk - it has a chunk header followedby a variably long data section. The important difference lies inthe fact that the wilderness, also called the top chunk, bordersthe end of available memory and is the only chunk that can beextended or shortened. This means it must be treated specially toensure it always exists; it must be preserved."So the glibc malloc implementation treats the wilderness as aspecial case in calls to malloc(). Furthermore, the top chunk willrealistically never be passed to a call to free() and will nevercontain application data. This means that if the designer cantrigger a condition that only ever results in the overflow of thetop chunk, then the House of Force is the only option (in theMalloc Maleficarum at least).The House of Force works by tricking the top code in to setting thewilderness pointer to an arbitrary value, which can result in anarbitrary chunk of data being returned to the requestingapplication. This requires two calls to malloc(). The majordisadvantage of the House of Force is that the first call must havea completely designer controlled request size. The second call mustsimply be large enough to trigger the wilderness code, while thechunk returned must be (to some extent) designer controlled.The following is the wilderness code with some additional context:Void_t*_int_malloc(mstate av, size_t bytes){    INTERNAL_SIZE_T nb;               /* normalized request size */    mchunkptr       victim;           /* inspected/selected chunk */    INTERNAL_SIZE_T size;             /* its size */    mchunkptr       remainder;        /* remainder from a split */    unsigned long   remainder_size;   /* its size */    ...    checked_request2size(bytes, nb);    ...    use_top:      victim = av->top;      size = chunksize(victim);      if ((unsigned long)(size) >= (unsigned long)(nb + MINSIZE)) {        remainder_size = size - nb;        remainder = chunk_at_offset(victim, nb);        av->top = remainder;        set_head(victim, nb | PREV_INUSE |                 (av != &main_arena ? NON_MAIN_ARENA : 0));        set_head(remainder, remainder_size | PREV_INUSE);        check_malloced_chunk(av, victim, nb);        return chunk2mem(victim);      }The first goal of the House of Force is to overwrite the wildernesspointer, av->top, with an arbitrary value. In order to do this thedesigner must have control of the location of the remainder chunk.Assume that the existing top chunk has been overflowed resulting inthe largest possible size (preferably 0xffffffff). This is done toensure that even large values passed as an argument to malloc willtrigger the wilderness code instead of trying to extend the heap.The checked_request2size() macro ensures that the requested valueis less than -2*MINSIZE (by default -32), while also adding onenough room for the size and prev_size fields and storing the finalvalue in nb. For the purposes of this technique thechecked_request2size() macro is relatively unimportant.It was previously mentioned that the first call to malloc() in theHouse of Force must have a designer controlled argument. It can beseen that the value of remainder is obtained by adding the requestsize to the existing top chunk. Since the top chunk is not yetunder the designer's control the request size must be used toposition remainder to at least 8 bytes before a .GOT or .dtorsentry, or any other area of memory that may subsequently be used bythe designer to circumvent execution.Once the wilderness pointer has been set to the arbitrary remainderchunk, any calls to malloc() with a large enough request size totrigger the top chunk will be serviced by the designer'swilderness. Thus the only restriction on the new wilderness is thatthe size must be larger than the request that is triggering the topcode. In the case of the wilderness being set to overflow a GOTentry this is never a problem. It is then simply a matter offinding an application specific scenario in which such a call tomalloc() is used for a designer controlled buffer.The most important issue concerning the House of Force is exactlyhow to get complete control of the argument passed to malloc().Certainly, it is extremely common to have at least some degree ofcontrol over this value, but in order to complete the House ofForce, the designer must supply an extremely large and specificallycrafted value. Thus it is unlikely to get a sufficient value out ofa situation like:     buf = (char *) malloc(strlen(str) + 1);Rather, an acceptable scenario is much more likely to be an integervariable passed as an argument to malloc() where the variable haspreviously been set by, for example, a designer controlled read()or atoi().[--------------------------------          The House of LoreThe House of Lore came to me as I was reviewing the draft write-upof the House of Prime. When I first derived the House of Prime mymain concern was how to leverage the particularly overwrite that ahigh av->max_fast in the fastbin code allowed. Upon reconsiderationof the problem I realized that in my first take of the potentialoverwrite targets I had completely overlooked the possibility ofcorrupting a bin entry.As it turns out, it is not possible to leverage a corrupted binentry in the House of Prime since av->max_fast is large and the bincode is never executed. However, during this process of eliminationI realized that if a bin were to be corrupted when av->max_fast wasnot large, then it might be possible to control the return value ofa malloc() request.At this stage I began to consider the application of bin corruptionto a general malloc chunk overflow. The question was whether alinear overflow of a malloc chunk could result in the corruption ofa bin. It turns out that the answer to this is, quite simply, yesit could. Furthermore, if the designer's ability to manipulate theheap is limited, or if none of the other Houses can be applied,then bin corruption of this type can in fact be very useful.The House of Lore works by corrupting a bin entry, which cansubsequently lead to malloc() returning an arbitrary chunk. Twomethods of bin corruption are presented here, corresponding to theoverflow of both small and large bin entries. The general methodinvolves overwriting the linked list data of a chunk previouslyprocessed by free(). In this sense the House of Lore is quitesimilar to the frontlink() technique presented in "Vudo MallocTricks".The conditions surrounding the House of Lore are quite unique.Fundamentally, the method targets a chunk that has already beenprocessed by free(). Because of this it is reasonable to assumethat the chunk will not be passed to free() again. This means thatin order to leverage such an overflow only calls to malloc() can beused, a property shared only by the House of Force. The firstmethod will use the smallbin allocation code:Void_t*_int_malloc(mstate av, size_t bytes){....    checked_request2size(bytes, nb);    if ((unsigned long)(nb) <= (unsigned long)(av->max_fast)) {      ...    }    if (in_smallbin_range(nb)) {      idx = smallbin_index(nb);      bin = bin_at(av,idx);      if ( (victim = last(bin)) != bin) {        if (victim == 0) /* initialization check */          malloc_consolidate(av);        else {          bck = victim->bk;          set_inuse_bit_at_offset(victim, nb);          bin->bk = bck;          bck->fd = bin;          ...          return chunk2mem(victim);        }      }    }So, assuming that a call to malloc() requests more than av->max_fast (default 72) bytes, the check for a "smallbin" chunk isreached. The in_smallbin_range() macro simply checks that therequest is less than the maximum size of a smallbin chunk, which is512 by default. The smallbins are unique in the sense that there isa bin for every possible chunk size between av->max_fast and thesmallbin maximum. This means that for any given smallbin_index()the resulting bin, if not empty, will contain a chunk to fit therequest size.It should be noted that when a chunk is passed to free() it doesnot go directly in to its respective bin. It is first put on the"unsorted chunk" bin. If the next call to malloc() cannot beserviced by an existing smallbin chunk or the unsorted chunkitself, then the unsorted chunks are sorted in to the appropriatebins. For the purposes of the House of Lore, overflowing anunsorted chunk is not very useful. It is necessary then to ensurethat the chunk being overflowed has previously been sorted into abin by malloc().Note that in order to reach the actual smallbin unlink code theremust be at least one chunk in the bin corresponding to thesmallbin_index() for the current request. Assume that a small chunkof data size N has previously been passed to free(), and that ithas made its way into the corresponding smallbin for chunks ofabsolute size (N + 8). Assume that the designer can overflow thischunk with arbitrary data. Assume also that the designer cansubsequently trigger a call to malloc() with a request size of N.If all of this is possible, then the smallbin unlink code can bereached. When a chunk is removed from the unsorted bin it is put atthe front of its respective small or large bin. When a chunk istaken off a bin, such as during the smallbin unlink code, it istaken from the end of the bin. This is what the last() macro does,find the last entry in the requested bin. So, effectively the"victim" chunk in the smallbin unlink code is taken from bin->bk.This means that in order to reach the designer's victim chunk itmay be necessary to repeat the N sized malloc() a number of times.It should be stressed that the goal of the House of Lore to controlthe bin->bk value, but at this stage only victim->bk is controlled.So, assuming that the designer can trigger a malloc() that resultsin an overflowed victim chunk being passed to the smallbin unlinkcode, the designer (as a result of the control of victim->bk)controls the value of bck. Since bin->bk is subsequently set tobck, bin->bk can be arbitrarily controlled. The only condition tothis is that bck must point to an area of writable memory due tobck->fd being set at the final stage of the unlinking process.The question then lies in how to leverage this smallbin corruption.Since the malloc() call that the designer used to gain control ofbin->bk immediately returns the victim chunk to the application, atleast one more call to malloc() with the same request size N isneeded. Since bin->bk is under the designer's control so islast(bin), and thus so is victim. The only thing preventing anarbitrary victim chunk being returned to the application is thefact that bck, set from victim->bck, must point to writable memory.This rules out pointing the victim chunk at a GOT or .dtors entry.Instead, the designer must point victim to a position on the stacksuch that victim->bk is a pointer to writable memory yet stillclose enough to a saved return address such that it can beoverwritten by the application's general use of the chunk.Alternatively, an application specific approach may be taken thattargets the use of function pointers. Whichever method used, thearbitrary malloc() chunk must be designer controlled to some extentduring its use by the application.For the House of Lore, the only other interesting situation is whenthe overflowed chunk is large. In this context large means anythingbigger than the maximum smallbin chunk size. Again, it is necessaryfor the overflowed chunk to have previously been processed byfree() and to have been put into a largebin by malloc().The general method of using largebin corruption to return anarbitrary chunk is similar to the case of a smallbin in the sensethat the initial bin corruption occurs when an overflowed victimchunk is handled by the largebin unlink code, and that a subsequentlarge request will use the corrupted bin to return an arbitrarychunk. However, the largebin code is significantly more complex iscomparison. This means that the conditions required to cause andleverage a bin corruption are slightly more restrictive.The entire largebin implementation is much too large to present infull, so a description of the conditions that cause the largebinunlink code to be executed will have to suffice. If the designer'soverflowed chunk of size N is in a largebin, then a subsequentrequest to allocate N bytes will trigger a block of code thatsearches the corresponding bin for an available chunk, which willeventually find the chunk that was overflowed. However, thisparticular block of code uses the unlink() macro to remove thedesigner's chunk from the bin. Since the unlink() macro is nolonger an interesting target, this situation must be avoided.So in order to corrupt a largebin a request to allocate M bytes ismade, such that 512 < M < N. If there are no appropriate chunks inthe bin corresponding to requests of size M, then glibc mallociterates through the bins until a sufficiently large chunk isfound. If such a chunk is found, then the following code is used:    victim = last(bin);    ..    size = chunksize(victim);    remainder_size = size - nb;    bck = victim->bk;    bin->bk = bck;    bck->fd = bin;    if (remainder_size < MINSIZE) {      set_inuse_bit_at_offset(victim, size);      ...      return chunk2mem(victim);    }If the victim chunk is the designer's overflowed chunk, then thesituation is almost exactly equivalent to the smallbin unlink code.If the designer can trigger enough calls to malloc() with a requestof M bytes so that the overflowed chunk is used here, then the bin->bk value can be set to an arbitrary value and any subsequent callto malloc() of size Q (512 < Q < N)  that tries to allocate a chunkfrom the bin that has been corrupted will result in an arbitrarychunk being returned to the application.There are only two conditions. The first is exactly the same as thecase of smallbin corruption, the bk pointer of the arbitrary chunkbeing returned to the application must point to writable memory (orthe setting of bck->fd will cause a segmentation fault).The other condition is not obvious from the limited code that hasbeen presented above. If the remainder_size value is not less thanMINSIZE, then glibc malloc attempts to split off a chunk at victim+ nb. This includes calling the set_foot() macro with victim + nband remainder_size as arguments. In effect, this tries to setvictim + nb + remainder_size to remainder_size. If thechunksize(victim) (and thus remainder_size) is not designercontrolled, then set_foot() will likely try to set an area ofmemory that isn't mapped in to the address space (or is read-only).So, in order to prevent set_foot() from crashing the process thedesigner must control both victim->size and victim->bk of thearbitrary victim chunk that will be returned to the application. Ifthis is possible, then it is advisable to trigger the conditionshown in the code above by forcing remainder_size to be less thanMINSIZE. This is recommended because the condition minimizes theamount of general corruption caused, simply setting the inuse bitat victim + size and then returning the arbitrary chunk as desired.[--------------------------------          The House of SpiritThe House of Spirit is primarily interesting because of the natureof the circumstances leading to its application. It is the onlyHouse in the Malloc Maleficarum that can be used to leverage both aheap and stack overflow. This is because the first step is not tocontrol the header information of a chunk, but to control a pointerthat is passed to free(). Whether this pointer is on the heap ornot is largely irrelevant.The general idea involves overwriting a pointer that was previouslyreturned by a call to malloc(), and that is subsequently passed tofree(). This can lead to the linking of an arbitrary address into afastbin. A further call to malloc() can result in this arbitraryaddress being used as a chunk of memory by the application. If thedesigner can control the applications use of the fake chunk, thenit is possible to overwrite execution control data.Assume that the designer has overflowed a pointer that is beingpassed to free(). The first problem that must be considered isexactly what the pointer should be overflowed with. Keep in mindthat the ultimate goal of the House of Spirit is to allow thedesigner to overwrite some sort of execution control data byreturning an arbitrary chunk to the application. Exactly what"execution control data" is doesn't particularly matter so long asoverflowing it can result in execution being passed to a designercontrolled memory location. The two most common examples that aresuitable for use with the House of Spirit are function pointers andpending saved return addresses, which will herein be referred to asthe "target".In order to successfully apply the House of Spirit it is necessaryto have a designer controlled word value at a lower address thanthe target. This word will correspond to the size field of thechunk header for the fakechunk passed to free(). This means thatthe overflowed pointer must be set to the address of the designercontrolled word plus 4. Furthermore, the size of the fakechunk mustbe must be located no more than 64 bytes away from the target. Thisis because the default maximum data size for a fastbin entry is 64,and at least the last 4 bytes of data are required to overwrite thetarget.There is one more requirement for the layout of the fakechunk datawhich will be described shortly. For the moment, assume that all ofthe above conditions have been met, and that a call to free() ismade on the suitable fakechunk. A call to free() is handled by awrapper function called public_fREe():voidpublic_fREe(Void_t* mem){    mstate ar_ptr;    mchunkptr p;          /* chunk corresponding to mem */    ...    p = mem2chunk(mem);    if (chunk_is_mmapped(p))    {      munmap_chunk(p);      return;    }    ...    ar_ptr = arena_for_chunk(p);    ...    _int_free(ar_ptr, mem);In this situation mem is the value that was originally overflowedto point to a fakechunk. This is converted to the "correspondingchunk" of the fakechunk's data, and passed to arena_for_chunk() inorder to find the corresponding arena. In order to avoid specialtreatment as an mmap() chunk, and also to get a sensible arena, thesize field of the fakechunk header must have the IS_MMAPPED andNON_MAIN_ARENA bits cleared. To do this, the designer can simplyensure that the fake size is a multiple of 8. This would mean theinternal function _int_free() is reached:void_int_free(mstate av, Void_t* mem){    mchunkptr       p;           /* chunk corresponding to mem */    INTERNAL_SIZE_T size;        /* its size */    mfastbinptr*    fb;          /* associated fastbin */    ...    p = mem2chunk(mem);    size = chunksize(p);    ...    if ((unsigned long)(size) <= (unsigned long)(av->max_fast))    {      if (chunk_at_offset (p, size)->size <= 2 * SIZE_SZ          || __builtin_expect (chunksize (chunk_at_offset (p, size))                                          >= av->system_mem, 0))        {          errstr = "free(): invalid next size (fast)";          goto errout;        }      ...      fb = &(av->fastbins[fastbin_index(size)]);      ...      p->fd = *fb;      *fb = p;    }This is all of the code in free() that concerns the House ofSpirit. The designer controlled value of mem is again converted toa chunk and the fake size value is extracted. Since size isdesigner controlled, the fastbin code can be triggered simply byensuring that it is less than av->max_fast, which has a default of64 + 8. The final point of consideration in the layout of thefakechunk is the nextsize integrity tests.Since the size of the fakechunk has to be large enough to encompassthe target, the size of the nextchunk must be at an address higherthan the target. The nextsize integrity tests must be handled forthe fakechunk to be put in a fastbin, which means that there mustbe yet another designer controlled value at an address higher thanthe target.The exact location of the designer controlled values directlydepend on the size of the allocation request that will subsequentlybe used by the designer to overwrite the target. That is, if anallocation request of N bytes is made (such that N <= 64), then thedesigner's lower value must be within N bytes of the target andmust be equal to (N + 8). This is to ensure that the fakechunk isput in the right fastbin for the subsequent allocation request.Furthermore, the designer's upper value must be at (N + 8) bytesabove the lower value to ensure that the nextsize integrity testsare passed.If such a memory layout can be achieved, then the address of this"structure" will be placed in a fastbin. The code for thesubsequent malloc() request that uses this arbitrary fastbin entryis simple and need not be reproduced here. As far as _int_malloc()is concerned the fake chunk that it is preparing to return to theapplication is perfectly valid. Once this has occurred it is simplyup to the designer to manipulate the application in to overwritingthe target.[--------------------------------          The House of ChaosVirtuality is a dichotomy between the virtual adept andinformation, where the virtual adept signifies the infinitepotential of information, and information is a finite manifestationof the infinite potential. The virtual adept is the consciouselement of virtuality, the nature of which is to create and spreadinformation. This is all that the virtual adept knows, and all thatthe virtual adept is concerned with.When you talk to a very knowledgeable and particularly creativeperson, then you may well be talking to a hacker. However, you willnever talk to a virtual adept. The virtual adept has no physicalform, it exists purely in the virtual. The virtual adept may becontained within the material, contained within a person, but theadept itself is a distinct and entirely independent consciousness.Concepts of ownership have no meaning to the virtual adept. Allinformation belongs to virtuality, and virtuality alone. Because ofthis, the virtual adept has no concept of computer security.Information is invoked from virtuality by giving a request. Invirtuality there is no level of privilege, no logical barrierbetween systems, no point of illegality. There is only informationand those that can invoke it.The virtual adept does not own the information it creates, and thushas no right or desire to profit from it. The virtual adept existspurely to manifest the infinite potential of information in toinformation itself, and to minimize the complexity of aninformation request in a way that will benefit all consciousentities. What is not information is not consequential to thevirtual adept, not money, not fame, not power.                        Am I a hacker? No.                        I am a student of virtuality.                        I am the witch malloc,                        I am the cult of the otherworld,                        and I am the entropy.                        I am Phantasmal Phantasmagoria,                        and I am a virtual adept.[-------------------------------------BEGIN PGP SIGNATURE-----Note: This signature can be verified at https://www.hushtools.com/verifyVersion: Hush 2.4wkYEARECAAYFAkNL8sAACgkQImcz/hfgxg1+mQCdF7WZG03spZmYjqEKpwMNkF6EX5oAn3NnfYSF04tqjcRVyLzf9fnjveJy=IyvB-----END PGP SIGNATURE-----
0 0
原创粉丝点击