垃圾回收器的实现

来源:互联网 发布:mac玩刺客信条2 编辑:程序博客网 时间:2024/06/06 00:44

In computer science, garbage collection (GC) is a form of automatic memory management. The garbage collector, or just collector, attempts to reclaimgarbage, or memory occupied byobjects that are no longer in use by the program. Garbage collection was invented by John McCarthy around 1959 to solve problems in Lisp.

This section presentsthe mark-and-sweep  garbage collection algorithm.The mark-and-sweep algorithm was the first garbage collection algorithmto be developed that is able to reclaim cyclic data structures.gifVariations of the mark-and-sweep algorithm continue to be among the mostcommonly used garbage collection techniques.

When using mark-and-sweep,unreferenced objects are not reclaimed immediately.Instead, garbage is allowed to accumulate until all available memoryhas been exhausted.When that happens,the execution of the program is suspended temporarilywhile the mark-and-sweep algorithm collects all the garbage.Once all unreferenced objects have been reclaimed,the normal execution of the program can resume.

The mark-and-sweep algorithm is called a tracing garbage collectorbecause istraces out the entire collection of objectsthat are directly or indirectly accessible by the program.The objects that a program can access directlyare those objects which are referenced by local variableson the processor stack as well as by any static variablesthat refer to objects.In the context of garbage collection,these variables are called theroots .An object is indirectly accessibleif it is referenced by a field in some other(directly or indirectly) accessible object.An accessible object is said to belive .Conversely, an object which is notlive is garbage.

The mark-and-sweep algorithm consists of two phases:In the first phase, it finds and marks all accessible objects.The first phase is called themark phase.In the second phase, the garbage collection algorithm scansthrough the heap and reclaims all the unmarked objects.The second phase is called thesweep phase.The algorithm can be expressed as follows:

for each root variable r    mark (r);sweep ();

In order to distinguish the live objects from garbage,we record the state of an object in each object.That is, we add a specialboolean field to each objectcalled, say, marked.By default, all objects are unmarked when they are created.Thus, themarked field is initially false.

An object p and all the objects indirectly accessiblefromp can be marked by using the following recursivemark method:

void mark (Object p)

if (!p.marked)

p.marked = true; for each Object q referenced by p mark (q);

Notice that this recursive mark algorithmdoes nothing when it encounters an object that has already been marked.Consequently, the algorithm is guaranteed to terminate.And it terminates only when all accessible objects have been marked.

In its second phase, the mark-and-sweep algorithmscans through all the objects in the heap,in order to locate all the unmarked objects.The storage allocated to the unmarked objects is reclaimed during the scan.At the same time, the marked field on every live object is set backto false in preparation for the next invocation of themark-and-sweep garbage collection algorithm:

void sweep ()

for each Object p in the heap

if (p.marked) p.marked = false else heap.release (p);

Figure gif illustrates the operation of the mark-and-sweepgarbage collection algorithm.Figure gif (a) shows the conditions before garbage collection begins.In this example, there is a single root variable.Figure gif (b) shows the effect of the mark phaseof the algorithm.At this point, all live objects have been marked.Finally, Figure gif (c) shows the objects left after the sweepphase has been completed.Only live objects remain in memory and themarked fields haveall been set to false again.

  figure30522
Figure: Mark-and-sweep garbage collection.

Because the mark-and-sweep garbage collection algorithmtraces out the set of objects accessible from the roots,it is able to correctly identify and collect garbageeven in the presence of reference cycles.This is the main advantage of mark-and-sweep over the referencecounting technique presented in the preceding section.A secondary benefit of the mark-and-sweep approach is thatthe normal manipulations of reference variables incurs no overhead.

The main disadvantage of the mark-and-sweep approach is the factthat that normal program execution is suspended while thegarbage collection algorithm runs.In particular, this can be a problem in a program that interactswith a human user or that must satisfy real-time execution constraints.For example, an interactive application that uses mark-and-sweepgarbage collection becomes unresponsive periodically.


本篇博客用C语言实现用John McCarthy提出的mark-sweep算法.
#include <stdio.h>#include <stdlib.h>#include <assert.h>#define STACK_MAX 256#define INITIAL_GC_THRESHOLD 8typedef int    bool;#define true   1#define false  0typedef enum {    OBJ_INT,    OBJ_PAIR}ObjectType;typedef struct object {    char marked;    struct object *next;    ObjectType type;    union {        /* OBJ_INT*/        int value;        /* OBJ_PAIR*/        struct {            struct object *head;            struct object *tail;        };    };}object;typedef struct {    int num_objects;    int max_objects;    object * firstobject;    object *stack[STACK_MAX];    int stacksize;}VM;VM* newVM();object *newObject(VM *vm, ObjectType type);bool isEmpty(VM *vm);bool isFull(VM *vm);void push(VM *vm, object *ref);object *pop(VM *vm);object *pushPair(VM *vm);void pushInt(VM *vm, int value);void mark(object *obj);void markAll(VM *vm);void sweep(VM *vm);void gc(VM *vm);void freeVM(VM *vm);VM* newVM(){    VM* vm = malloc(sizeof(VM));    vm->stacksize = 0;    vm->firstobject = NULL;    vm->num_objects = 0;    vm->max_objects = INITIAL_GC_THRESHOLD;    return vm;}bool isEmpty(VM *vm){    return vm->stacksize == 0;}bool isFull(VM *vm){    return vm->stacksize == STACK_MAX;}void push(VM *vm, object *ref){    if(isFull(vm))    {        perror("Stack overflow");        exit(EXIT_FAILURE);    }    vm->stack[vm->stacksize ++] = ref;}object *pop(VM *vm){    if(isEmpty(vm))    {        perror("Stack underflow");        exit(EXIT_FAILURE);    }    return vm->stack[-- vm->stacksize];}object *newObject(VM *vm, ObjectType type){    if(vm->num_objects == vm->max_objects)        gc(vm);    object *obj = malloc(sizeof(object));    obj->type = type;    obj->marked = false;    obj->next = vm->firstobject;    vm->firstobject = obj;    vm->num_objects ++;    return obj;}void pushInt(VM *vm, int value){    object *obj = newObject(vm, OBJ_INT);    obj->value = value;    push(vm, obj);}//return valueobject *pushPair(VM *vm){    object *obj = newObject(vm, OBJ_PAIR);    obj->tail = pop(vm);    obj->head = pop(vm);    push(vm, obj);    return obj;}void markAll(VM *vm){    int i;    for(i = 0; i < vm->stacksize; i++)        mark(vm->stack[i]);}void mark(object *obj){    /* avoid cyecle refference in the pair*/    if(obj->marked)        return;    obj->marked = true;    if(obj->type == OBJ_PAIR)    {        mark(obj->head);        mark(obj->tail);    }}void sweep(VM *vm){    object *prev = NULL;    object *cur = vm->firstobject;    while(cur)    {        object *next = cur->next;        if(!cur->marked)        {            if(prev)            {                prev->next = next;            }            else                vm->firstobject = next;            free(cur);            vm->num_objects --;        }        else        {              prev = cur;              cur->marked = false;        }        cur =next;    }}void gc(VM *vm){    int num_object = vm->num_objects;    markAll(vm);    sweep(vm);    vm->max_objects = vm->num_objects * 2;    printf("collect %d objects, %d objects remain\n", num_object - vm->num_objects, vm->num_objects);}void freeVM(VM *vm){    vm->stacksize = 0;    gc(vm);    free(vm);}void test1(){    printf("test1:\n");    VM *vm = newVM();    pushInt(vm, 1);    pushInt(vm, 2);    gc(vm);    assert(vm->num_objects == 2);    freeVM(vm);}void test2(){  printf("test2:\n");  VM *vm = newVM();  pushInt(vm, 1);  pushInt(vm, 2);  pop(vm);  pop(vm);  gc(vm);  assert(vm->num_objects == 0);  freeVM(vm);}void test3(){    printf("test3:\n");    VM *vm = newVM();    pushInt(vm, 1);    pushInt(vm, 2);    pushPair(vm);    pushInt(vm, 3);    pushInt(vm, 4);    pushPair(vm);    pushPair(vm);    gc(vm);    assert(vm->num_objects == 7);    freeVM(vm);}void test4(){    printf("test4:\n");    VM *vm = newVM();    pushInt(vm, 1);    pushInt(vm, 2);    object *obj1 = pushPair(vm);    pushInt(vm ,3);    pushInt(vm ,4);    object *obj2 = pushPair(vm);    /* make the 2, 4 unreachable*/    obj1->tail = obj2;    obj2->tail = obj1;    gc(vm);    assert(vm->num_objects == 4);    freeVM(vm);}int main(void){    test1();    test2();    test3();    test4();    perfTest();    return 0;}


0 0
原创粉丝点击