linux 内核缓冲区管理之slab机制

来源:互联网 发布:深度linux在虚拟机玩 编辑:程序博客网 时间:2024/06/05 14:29
【原创】  未经允许,请勿转载
   
  linux 内核采用一种称为"slab"的缓冲区分配和管理的方法。在slab方法中,每种重要的数据结构都有自己专用的缓冲区队列,每种数据结构都有相应的构造constructor和析构destrucor函数。
     slab管理特点,每种对象的缓冲区队列并非由各个对象直接构成,而是由一连串的大块slab构成,而每个大块中则包含了若干同种的对象。一般对象分2种,一种是大对象,一种是小对象。小对象是指在一个页面中可以容纳下好几个的那种对象。内核中大多数数据结构都是这样的小对象。

     为每种对象建立的slab队列都有个对猎头,其控制结构为kmem_cache_t。每种对象的slab队列头也是在slab上,系统中有个总的slab队列,其对象是各个其他对象的slab对猎头,其队列头也是一个kmem_cache_t结构,称为cache_cache。
     这样形成一种层次的树形结构。

     当数据结构比较大,不属于小对象时,slab的结构略有不同。不同之处是将slab的控制结构游离出来,几种放在另外的slab上。由于在slab的控制结构kmem_slab_t 中有一个指针指向相应的slab上的第一个对象,所以逻辑上是一样的。
     另外,当对象的大小恰好是物理页面的1/2、1/4、1/8时,将依附于每个对象的链接指针紧挨着放在一起会造成slab空间的重大浪费,所以,在这种情况下,将链接指针也从slab上游离出来几种存放。
     Linux 内核还有一种既类似于物理页面分配中采用按大小分区,又采用slab方式管理的通用缓冲池,称为“slab_cache”。与cache_cache大同小异,只不过其顶层不是一个队列,而是一个结构数组,数组中的每个元素指向一个不同的slab队列。这些slab队列的不同之处仅在于所载对象大小,从32,64,128....直到128k。从通用缓冲池中分配和释放缓冲区的函数为:kmalloc,kfree.

   
     在kernel的 mm/slab_commom.c文件中,定义了两个全局的成员: LIST_HEAD(slab_caches)和struct kmem_cache *kmem_cache .所有创建的kmem_cache 对象都将链入slab_caches这个全局链表中。通过cat /proc/slabinfo可以查看所有该链表中的slab对象。
     kmem_cache为系统静态初始化的第一个kmem_cache,此后分配的kmem_cache结构都是从该全局的kmem_cache slab上分配的。
     这里全局的kmem_cache本身就是一个kmem_cache类型的结构体,这里为第一个kmem_cache,如何用kmem_cache_create分配呢?因此,第一个kmem_cache在系统初始化时,调用create_boot_cache静态分配,当有了第一个kmem_cache后,以后创建的kmem_cache都是基于该全局的kmem_cache分配的对象。
     分配第一个kmem_cache对象后,__kmeme_cache_create就进行一系列的计算,以确定最佳的slab构成。包括:每个slab由几个页面组成,划分多少个缓冲区;slab的控制结构kmem_slab_t应该在slab外面集中存放还是就放在每个slab的尾部;每个缓冲区的链接指针应该在slab外面集中存放还是在slab上与相应的缓冲区紧挨着放在一起;还有“colour”数量等等。



第一阶段 创建第一个kmem_cache 
     首先,分配第一个kmem_cache 对象,即全局的kmem_cache. 此时分配好kmem_cache slab,但并为真正给起分配page,分配page的过程是在真正在该slab分配缓冲区时分配的。

     kmem_cache_init->create_boot_cache->__kmem_cache_create.
kmem_cache_init的功能很多,不仅仅是分配第一个kmem_cache,还包括分配第一个kmalloc_cache和创建剩余的kmalloc_cache等;将初始化好的array data 拷贝到kmem_cache->array 相应的位置和kmalloc_caches中array的相应位置。
void __init kmem_cache_init(void)
{
    int i;

    BUILD_BUG_ON(sizeof(((struct page *)NULL)->lru) <
                    sizeof(struct rcu_head));
    kmem_cache = &kmem_cache_boot;//第一个kmem_cache是静态分配的。
    setup_node_pointer(kmem_cache); //kmem_cache->node = &kmem_cache->array[nr_cpu_ids];

    if (num_possible_nodes() == 1)
        use_alien_caches = 0;

    for (i = 0; i < NUM_INIT_LISTS; i++)
        kmem_cache_node_init(&init_kmem_cache_node[i]);//初始化kmem_cache_node,对于UMA架构的计算机,只有一个node。

    set_up_node(kmem_cache, CACHE_CACHE); //初始化kmem_cache->node[node]=&init_kmem_cache_node。

    /*
     * Fragmentation resistance on low memory - only use bigger
     * page orders on machines with more than 32MB of memory if
     * not overridden on the command line.
     */
    if (!slab_max_order_set && totalram_pages > (32 << 20) >> PAGE_SHIFT)
        slab_max_order = SLAB_MAX_ORDER_HI;

    /* Bootstrap is tricky, because several objects are allocated
     * from caches that do not exist yet:
     * 1) initialize the kmem_cache cache: it contains the struct
     *    kmem_cache structures of all caches, except kmem_cache itself:
     *    kmem_cache is statically allocated.
     *    Initially an __init data area is used for the head array and the
     *    kmem_cache_node structures, it's replaced with a kmalloc allocated
     *    array at the end of the bootstrap.
     * 2) Create the first kmalloc cache.
     *    The struct kmem_cache for the new cache is allocated normally.
     *    An __init data area is used for the head array.
     * 3) Create the remaining kmalloc caches, with minimally sized
     *    head arrays.
     * 4) Replace the __init data head arrays for kmem_cache and the first
     *    kmalloc cache with kmalloc allocated arrays.
     * 5) Replace the __init data for kmem_cache_node for kmem_cache and
     *    the other cache's with kmalloc allocated memory.
     * 6) Resize the head arrays of the kmalloc caches to their final sizes.
     */

    /* 1) create the kmem_cache */

    /*
     * struct kmem_cache size depends on nr_node_ids & nr_cpu_ids
     */
    create_boot_cache(kmem_cache, "kmem_cache",
        offsetof(struct kmem_cache, array[nr_cpu_ids]) +
                  nr_node_ids * sizeof(struct kmem_cache_node *),
                  SLAB_HWCACHE_ALIGN); //创建第一个kmem_cache
    list_add(&kmem_cache->list, &slab_caches);//将kmem_cache链入全局链表slab_caches

    /* 2+3) create the kmalloc caches */

    /*
     * Initialize the caches that provide memory for the array cache and the
     * kmem_cache_node structures first.  Without this, further allocations will
     * bug.
     */

    kmalloc_caches[INDEX_AC] = create_kmalloc_cache("kmalloc-ac",
                    kmalloc_size(INDEX_AC), ARCH_KMALLOC_FLAGS); //创建kmalloc-array cache使用的kmem_cache

    if (INDEX_AC != INDEX_NODE)
        kmalloc_caches[INDEX_NODE] =
            create_kmalloc_cache("kmalloc-node",
                kmalloc_size(INDEX_NODE), ARCH_KMALLOC_FLAGS); //创建kmalloc node cache

    slab_early_init = 0;

    /* 4) Replace the bootstrap head arrays */
    {
        struct array_cache *ptr;

        ptr = kmalloc(sizeof(struct arraycache_init), GFP_NOWAIT);

        memcpy(ptr, cpu_cache_get(kmem_cache),
               sizeof(struct arraycache_init));
        /*
         * Do not assume that spinlocks can be initialized via memcpy:
         */
        spin_lock_init(&ptr->lock);

        kmem_cache->array[smp_processor_id()] = ptr;

        ptr = kmalloc(sizeof(struct arraycache_init), GFP_NOWAIT);

        BUG_ON(cpu_cache_get(kmalloc_caches[INDEX_AC])
               != &initarray_generic.cache);
        memcpy(ptr, cpu_cache_get(kmalloc_caches[INDEX_AC]),
               sizeof(struct arraycache_init));
        /*
         * Do not assume that spinlocks can be initialized via memcpy:
         */
        spin_lock_init(&ptr->lock);

        kmalloc_caches[INDEX_AC]->array[smp_processor_id()] = ptr;//该段程序是分配array ptr替换初始化时用的array
    }
    /* 5) Replace the bootstrap kmem_cache_node */
    {
        int nid;

        for_each_online_node(nid) {
            init_list(kmem_cache, &init_kmem_cache_node[CACHE_CACHE + nid], nid);//替换kmem_cache的kmem_cache_node

            init_list(kmalloc_caches[INDEX_AC],
                  &init_kmem_cache_node[SIZE_AC + nid], nid); //替换kmalloc_caches array的kmem_cache_node

            if (INDEX_AC != INDEX_NODE) {
                init_list(kmalloc_caches[INDEX_NODE],
                      &init_kmem_cache_node[SIZE_NODE + nid], nid); //替换kmalloc_cache node的kmem_cache_node
            }
        }
    }
/*这里注意的是,正常创建kmem_cache的前提是,第一个kmem_cache要先创建好, kmalloc_cache array的kmem_cache 要创建好,kmalloc_cache node的kmem_cache要创建好之后,才可以创建其他normal kmem_cache*/
    create_kmalloc_caches(ARCH_KMALLOC_FLAGS);//一切都就绪,可以创建kmalloc需要的的kmem_cache了。
}

下面先看第一个kmem_cache创建过程。
void __init create_boot_cache(struct kmem_cache *s, const char *name, size_t size,
        unsigned long flags)
{
    int err;

    s->name = name;
    s->size = s->object_size = size;
    s->align = calculate_alignment(flags, ARCH_KMALLOC_MINALIGN, size);
    err = __kmem_cache_create(s, flags);//真正创建kmem_cache的函数

    if (err)
        panic("Creation of kmalloc slab %s size=%zu failed. Reason %d\n",
                    name, size, err);

    s->refcount = -1;    /* Exempt from merging for now */
}
这里去除一些debug的代码和不重要的代码,避免代码太长。
int  __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
{
    size_t left_over, freelist_size, ralign;
    gfp_t gfp;
    int err;
    size_t size = cachep->size;
//下面是一些对齐操作
    /*
     * Check that size is in terms of words.  This is needed to avoid
     * unaligned accesses for some archs when redzoning is used, and makes
     * sure any on-slab bufctl's are also correctly aligned.
     */
    if (size & (BYTES_PER_WORD - 1)) {//size 按byte对齐
        size += (BYTES_PER_WORD - 1);
        size &= ~(BYTES_PER_WORD - 1);
    }

    /*
     * Redzoning and user store require word alignment or possibly larger.
     * Note this will be overridden by architecture or caller mandated
     * alignment if either is greater than BYTES_PER_WORD.
     */
    if (flags & SLAB_STORE_USER)
        ralign = BYTES_PER_WORD;

    if (flags & SLAB_RED_ZONE) {
        ralign = REDZONE_ALIGN;
        /* If redzoning, ensure that the second redzone is suitably
         * aligned, by adjusting the object size accordingly. */
        size += REDZONE_ALIGN - 1;
        size &= ~(REDZONE_ALIGN - 1);
    }

    /* 3) caller mandated alignment */
    if (ralign < cachep->align) {
        ralign = cachep->align;
    }
    /* disable debug if necessary */
    if (ralign > __alignof__(unsigned long long))
        flags &= ~(SLAB_RED_ZONE | SLAB_STORE_USER);
    /*
     * 4) Store it.
     */
    cachep->align = ralign;

    if (slab_is_available())
        gfp = GFP_KERNEL;
    else
        gfp = GFP_NOWAIT;

    setup_node_pointer(cachep);

    /*
     * Determine if the slab management is 'on' or 'off' slab.
     * (bootstrapping cannot cope with offslab caches so don't do
     * it too early on. Always use on-slab management when
     * SLAB_NOLEAKTRACE to avoid recursive calls into kmemleak)
     */
    if ((size >= (PAGE_SIZE >> 5)) && !slab_early_init &&
        !(flags & SLAB_NOLEAKTRACE))//size > 128时,将slab控制结构脱离slab
        /*
         * Size is large, assume best to place the slab management obj
         * off-slab (should allow better packing of objs).
         */
        flags |= CFLGS_OFF_SLAB;

    size = ALIGN(size, cachep->align);
    /*
     * We should restrict the number of objects in a slab to implement
     * byte sized index. Refer comment on SLAB_OBJ_MIN_SIZE definition.
     */
    if (FREELIST_BYTE_INDEX && size < SLAB_OBJ_MIN_SIZE)
        size = ALIGN(SLAB_OBJ_MIN_SIZE, cachep->align);

    left_over = calculate_slab_order(cachep, size, cachep->align, flags); //计算slab每次分配使用page order和一个slab可以容纳多少个对象。返回剩余空间。

    if (!cachep->num)
        return -E2BIG;

    freelist_size = calculate_freelist_size(cachep->num, cachep->align); //

    /*
     * If the slab has been placed off-slab, and we have enough space then
     * move it on-slab. This is at the expense of any extra colouring.
     */
    if (flags & CFLGS_OFF_SLAB && left_over >= freelist_size) { //判断剩余空间是否可以容纳下freelist_size,可以的话,就不适用OFF_SLAB
        flags &= ~CFLGS_OFF_SLAB;
        left_over -= freelist_size;
    }

    if (flags & CFLGS_OFF_SLAB) {
        /* really off slab. No need for manual alignment */
        freelist_size = calculate_freelist_size(cachep->num, 0);

#ifdef CONFIG_PAGE_POISONING
        /* If we're going to use the generic kernel_map_pages()
         * poisoning, then it's going to smash the contents of
         * the redzone and userword anyhow, so switch them off.
         */
        if (size % PAGE_SIZE == 0 && flags & SLAB_POISON)
            flags &= ~(SLAB_RED_ZONE | SLAB_STORE_USER);
#endif
    }

    cachep->colour_off = cache_line_size();
    /* Offset must be a multiple of the alignment. */
    if (cachep->colour_off < cachep->align)
        cachep->colour_off = cachep->align;//color_off必须是对齐的倍数
    cachep->colour = left_over / cachep->colour_off;
    cachep->freelist_size = freelist_size;
    cachep->flags = flags;
    cachep->allocflags = __GFP_COMP;
    if (CONFIG_ZONE_DMA_FLAG && (flags & SLAB_CACHE_DMA))
        cachep->allocflags |= GFP_DMA;
    cachep->size = size;
    cachep->reciprocal_buffer_size = reciprocal_value(size);

    if (flags & CFLGS_OFF_SLAB) {
        cachep->freelist_cache = kmalloc_slab(freelist_size, 0u);//如果是OFF_SLAB,则分配freelist_cache
        /*
         * This is a possibility for one of the kmalloc_{dma,}_caches.
         * But since we go off slab only for object size greater than
         * PAGE_SIZE/8, and kmalloc_{dma,}_caches get created
         * in ascending order,this should not happen at all.
         * But leave a BUG_ON for some lucky dude.
         */
        BUG_ON(ZERO_OR_NULL_PTR(cachep->freelist_cache));
    }

    err = setup_cpu_cache(cachep, gfp);
    if (err) {
        __kmem_cache_shutdown(cachep);
        return err;
    }

    if (flags & SLAB_DEBUG_OBJECTS) {
        /*
         * Would deadlock through slab_destroy()->call_rcu()->
         * debug_object_activate()->kmem_cache_alloc().
         */
        WARN_ON_ONCE(flags & SLAB_DESTROY_BY_RCU);

        slab_set_debugobj_lock_classes(cachep);
    } else if (!OFF_SLAB(cachep) && !(flags & SLAB_DESTROY_BY_RCU))
        on_slab_lock_classes(cachep);

    return 0;
}

     到此,已经初始化第一个kmem_cache,但是并未给kmem_cache分配page。下面当有要创建新的kmem_cache时,就会为第一个kmem_cache分配page,并在该page中分配一个obj。

第二阶段 从kmem_cache slab中分配kmalloc_cache
     下面来看如何利用第一阶段分配的kmem_cache创建kmalloc_cache.
     分配的过程分两步,第一步是从kmem_cache中分配一个kmem_cache obj;第二步是调用create_boot_cache把刚分配的kmem_cache obj 初始化和setup,该过程同kmem_cache 的创建过程,这里不再赘述。
     create_kmalloc_cache->kmem_cache_zalloc->kmem_cache_alloc->slab_alloc->__do_cache_alloc->____cache_alloc.
     create_kmalloc_cache->create_boot_cache.
     

上述调用栈不一一细看了,直接看关键的函数:slab_alloc函数将从kmem_cache slab中分配一个kmem_cache struct 对象。
static __always_inline void *
slab_alloc(struct kmem_cache *cachep, gfp_t flags, unsigned long caller) //此时cachep为上面刚创建好的kmem_cache
{
    unsigned long save_flags;
    void *objp;

    flags &= gfp_allowed_mask;

    lockdep_trace_alloc(flags);

    if (slab_should_failslab(cachep, flags))
        return NULL;

    cachep = memcg_kmem_get_cache(cachep, flags);

    cache_alloc_debugcheck_before(cachep, flags);
    local_irq_save(save_flags);
    objp = __do_cache_alloc(cachep, flags);//真正分配slab obj的函数
    local_irq_restore(save_flags);
    objp = cache_alloc_debugcheck_after(cachep, flags, objp, caller);
    kmemleak_alloc_recursive(objp, cachep->object_size, 1, cachep->flags,
                 flags);
    prefetchw(objp);

    if (likely(objp)) {
        kmemcheck_slab_alloc(cachep, flags, objp, cachep->object_size);
        if (unlikely(flags & __GFP_ZERO))
            memset(objp, 0, cachep->object_size);
    }

    return objp; //返回分配的slab obj
}

static __always_inline void *
__do_cache_alloc(struct kmem_cache *cachep, gfp_t flags)
{
    return ____cache_alloc(cachep, flags);
}

     static inline void *____cache_alloc(struct kmem_cache *cachep, gfp_t flags)
{
    void *objp;
    struct array_cache *ac;
    bool force_refill = false;

    check_irq_off();

    ac = cpu_cache_get(cachep);//获取array_cache,
    if (likely(ac->avail)) {//如果当前ac有可分配的obj,则直接分配
        ac->touched = 1;
        objp = ac_get_obj(cachep, ac, flags, false);

        /*
         * Allow for the possibility all avail objects are not allowed
         * by the current flags
         */
        if (objp) {
            STATS_INC_ALLOCHIT(cachep);
            goto out;
        }
        force_refill = true;
    }

     //在当前slab的arraycache数组中没有可分配的obj,则通过cache_alloc_refill,新分配一个page,并把相应的slab objs填入arraycache中
    STATS_INC_ALLOCMISS(cachep);
    objp = cache_alloc_refill(cachep, flags, force_refill);
    /*
     * the 'ac' may be updated by cache_alloc_refill(),
     * and kmemleak_erase() requires its correct value.
     */
    ac = cpu_cache_get(cachep);

out:
    /*
     * To avoid a false negative, if an object that is in one of the
     * per-CPU caches is leaked, we need to make sure kmemleak doesn't
     * treat the array pointers as a reference to the object.
     */
    if (objp)
        kmemleak_erase(&ac->entry[ac->avail]);//分配完obj后,把obj对应的ac位置清空
    return objp;
}
具体从新分配的page中获取slab obj的函数如下:
static void *cache_alloc_refill(struct kmem_cache *cachep, gfp_t flags,
                            bool force_refill)
{
    int batchcount;
    struct kmem_cache_node *n;
    struct array_cache *ac;
    int node;

    check_irq_off();
    node = numa_mem_id();
    if (unlikely(force_refill))
        goto force_grow;
retry:
    ac = cpu_cache_get(cachep);//获取当前cpu的缓冲区数组
    batchcount = ac->batchcount;
    if (!ac->touched && batchcount > BATCHREFILL_LIMIT) {//对批量申请个数进行调整
        /*
         * If there was little recent activity on this cache, then
         * perform only a partial refill.  Otherwise we could generate
         * refill bouncing.
         */
        batchcount = BATCHREFILL_LIMIT;
    }
    n = cachep->node[node];//获取本节点对应的L3list及其他一些管理结构

    BUG_ON(ac->avail > 0 || !n);
    spin_lock(&n->list_lock);//对n->l3list 加锁

    /* See if we can refill from the shared array */
    //对于多node的情况,首先判断其他节点是否有共享的缓存,有的话先从其他节点共享缓冲区中拷贝
    if (n->shared && transfer_objects(ac, n->shared, batchcount)) {
        n->shared->touched = 1;
        goto alloc_done;
    }

    while (batchcount > 0) {
        struct list_head *entry;
        struct page *page;
        /* Get slab alloc is to come from. */
        entry = n->slabs_partial.next;
        if (entry == &n->slabs_partial) {//判断partial list是否有obj,没有则往下判断slabs_free
            n->free_touched = 1;
            entry = n->slabs_free.next;
            if (entry == &n->slabs_free)//判断slabs_free中是否有slab obj,没有则goto must_grow
                goto must_grow;
        }

        page = list_entry(entry, struct page, lru);//获取slabs对象所在page
        check_spinlock_acquired(cachep);

        /*
         * The slab was either on partial or free list so
         * there must be at least one object available for
         * allocation.
         */
        BUG_ON(page->active >= cachep->num);

        while (page->active < cachep->num && batchcount--) {//如果page中有可分配的slab objs
            STATS_INC_ALLOCED(cachep);
            STATS_INC_ACTIVE(cachep);
            STATS_SET_HIGH(cachep);

            ac_put_obj(cachep, ac, slab_get_obj(cachep, page,
                                    node));//从page中获取slab obj赋值到ac中
        }

        /* move slabp to correct slabp list: */
        list_del(&page->lru);//将page从原先的链表删除
        if (page->active == cachep->num)//将 page从free_list移动到partial或者full slabs list中
            list_add(&page->lru, &n->slabs_full);
        else
            list_add(&page->lru, &n->slabs_partial);
    }

must_grow://当partial或者free list中都没有可分配的page,这时必须新分配page,增长slab
    n->free_objects -= ac->avail;//slab中还剩多少个object未分配
alloc_done:
    spin_unlock(&n->list_lock);

    if (unlikely(!ac->avail)) {
        int x;
force_grow:
        x = cache_grow(cachep, flags | GFP_THISNODE, node, NULL);//增加一个slab函数,成功返回1 

        /* cache_grow can reenable interrupts, then ac could change. */
        ac = cpu_cache_get(cachep);
        node = numa_mem_id();

        /* no objects in sight? abort */
        if (!x && (ac->avail == 0 || force_refill))
            return NULL;

        if (!ac->avail)        /* objects refilled by interrupt? *///此时ac还没有aval,但slab中有可分配的objs,retry重新获取slab obj到ac中
            goto retry;//如果该cpu对应的ac中没有可用的slab object,则retry重新获得
    }
    ac->touched = 1;//ac设置为访问过

    return ac_get_obj(cachep, ac, flags, force_refill);//返回获取的obj
}

下面分析一下kmem_cache如何新分配一个slab空间的。
static int cache_grow(struct kmem_cache *cachep,
        gfp_t flags, int nodeid, struct page *page)
{
    void *freelist;
    size_t offset;
    gfp_t local_flags;
    struct kmem_cache_node *n;

    /*
     * Be lazy and only check for valid flags here,  keeping it out of the
     * critical path in kmem_cache_alloc().
     */
    BUG_ON(flags & GFP_SLAB_BUG_MASK);
    local_flags = flags & (GFP_CONSTRAINT_MASK|GFP_RECLAIM_MASK);

    /* Take the node list lock to change the colour_next on this node */
    check_irq_off();
    n = cachep->node[nodeid];//获取kmem_cache的对应node节点的kmem_cache_node结构,kmem_cache_node中保存了该节点上partial,full,free list上的slab obj的情况
    spin_lock(&n->list_lock);

    /* Get colour for the slab, and cal the next value. */
    offset = n->colour_next;
    n->colour_next++;
    if (n->colour_next >= cachep->colour)
        n->colour_next = 0;
    spin_unlock(&n->list_lock);

    offset *= cachep->colour_off;

    if (local_flags & __GFP_WAIT)
        local_irq_enable();

    /*
     * The test for missing atomic flag is performed here, rather than
     * the more obvious place, simply to reduce the critical path length
     * in kmem_cache_alloc(). If a caller is seriously mis-behaving they
     * will eventually be caught here (where it matters).
     */
    kmem_flagcheck(cachep, flags);

    /*
     * Get mem for the objs.  Attempt to allocate a physical page from
     * 'nodeid'.
     */
    if (!page)
        page = kmem_getpages(cachep, local_flags, nodeid); //从系统page分配器中获取一个page
    if (!page)
        goto failed;

    /* Get slab management. */
    freelist = alloc_slabmgmt(cachep, page, offset,
            local_flags & ~GFP_CONSTRAINT_MASK, nodeid);//获取slab 管理结构的指针
    if (!freelist)
        goto opps1;

    slab_map_pages(cachep, page, freelist); //将cachep,freelist设置到page结构中去

    cache_init_objs(cachep, page);//初始化objs,freelist[i]=i,并调用ctor构造函数

    if (local_flags & __GFP_WAIT)
        local_irq_disable();
    check_irq_off();
    spin_lock(&n->list_lock);

    /* Make slab active. */
    list_add_tail(&page->lru, &(n->slabs_free));//将该page加入到slabs_free中去,等待下次分配obj时,可以从freelist得到objs
    STATS_INC_GROWN(cachep);
    n->free_objects += cachep->num;//该node节点可分配的free_object 增加cachep->num个
    spin_unlock(&n->list_lock);
    return 1;
opps1:
    kmem_freepages(cachep, page);
failed:
    if (local_flags & __GFP_WAIT)
        local_irq_disable();
    return 0;
}
到此为止,已经从kmem_cache中分配出一个obj了,然后调用create_boot_cache对cachep的一些成员初始化,然后一级级返回,则回到kmem_cache_init中的kmalloc_caches[INDEX_AC] =create_kmalloc_cache("kmalloc-ac",  kmalloc_size(INDEX_AC), ARCH_KMALLOC_FLAGS);该位置,即已经创建出kmalloc_caches[INDEX_AC],用来分配arraycache_init struct。

接下来,调用kmalloc_caches[INDEX_NODE] = create_kmalloc_cache("kmalloc-node", malloc_size(INDEX_NODE), ARCH_KMALLOC_FLAGS);创建分配kmem_cache_node 结构的slab。

第三阶段 创建kmalloc caches
     通过第一,第二阶段,已经创建出kmem_cache, kmalloc_cache[INDEX_AC],kmalloc_caches[INDEX_NODE]。现在利用这几个slab,为kmalloc_caches做准备。
调用create_kmalloc_caches分配kmalloc 分配不同size时所需要的slab。
void __init create_kmalloc_caches(unsigned long flags)
{
    int i;

    /*
     * Patch up the size_index table if we have strange large alignment
     * requirements for the kmalloc array. This is only the case for
     * MIPS it seems. The standard arches will not generate any code here.
     *
     * Largest permitted alignment is 256 bytes due to the way we
     * handle the index determination for the smaller caches.
     *
     * Make sure that nothing crazy happens if someone starts tinkering
     * around with ARCH_KMALLOC_MINALIGN
     */
    BUILD_BUG_ON(KMALLOC_MIN_SIZE > 256 ||
        (KMALLOC_MIN_SIZE & (KMALLOC_MIN_SIZE - 1)));

    for (i = 8; i < KMALLOC_MIN_SIZE; i += 8) {
        int elem = size_index_elem(i);

        if (elem >= ARRAY_SIZE(size_index))
            break;
        size_index[elem] = KMALLOC_SHIFT_LOW;
    }

    if (KMALLOC_MIN_SIZE >= 64) {
        /*
         * The 96 byte size cache is not used if the alignment
         * is 64 byte.
         */
        for (i = 64 + 8; i <= 96; i += 8)
            size_index[size_index_elem(i)] = 7;

    }

    if (KMALLOC_MIN_SIZE >= 128) {
        /*
         * The 192 byte sized cache is not used if the alignment
         * is 128 byte. Redirect kmalloc to use the 256 byte cache
         * instead.
         */
        for (i = 128 + 8; i <= 192; i += 8)
            size_index[size_index_elem(i)] = 8;
    }
    for (i = KMALLOC_SHIFT_LOW; i <= KMALLOC_SHIFT_HIGH; i++) {
        if (!kmalloc_caches[i]) {
            kmalloc_caches[i] = create_kmalloc_cache(NULL,
                            1 << i, flags);//调用create_kmalloc_cache 创建对应saize的slab。该函数在第二阶段有详细介绍。
        }

        /*
         * Caches that are not of the two-to-the-power-of size.
         * These have to be created immediately after the
         * earlier power of two caches
         */
        if (KMALLOC_MIN_SIZE <= 32 && !kmalloc_caches[1] && i == 6)
            kmalloc_caches[1] = create_kmalloc_cache(NULL, 96, flags);

        if (KMALLOC_MIN_SIZE <= 64 && !kmalloc_caches[2] && i == 7)
            kmalloc_caches[2] = create_kmalloc_cache(NULL, 192, flags);
    }

    /* Kmalloc array is now usable */
    slab_state = UP;//一切就绪后,将slab_state设置为up,此后,就可以直接用kmem_cache_create和kmalloc功能来创建slab和分配objs了。

    for (i = 0; i <= KMALLOC_SHIFT_HIGH; i++) {
        struct kmem_cache *s = kmalloc_caches[i];
        char *n;

        if (s) {
            n = kasprintf(GFP_NOWAIT, "kmalloc-%d", kmalloc_size(i));

            BUG_ON(!n);
            s->name = n;
        }
    }

#ifdef CONFIG_ZONE_DMA
    for (i = 0; i <= KMALLOC_SHIFT_HIGH; i++) {
        struct kmem_cache *s = kmalloc_caches[i];

        if (s) {
            int size = kmalloc_size(i);
            char *n = kasprintf(GFP_NOWAIT,
                 "dma-kmalloc-%d", size);

            BUG_ON(!n);
            kmalloc_dma_caches[i] = create_kmalloc_cache(n,
                size, SLAB_CACHE_DMA | flags); //如果define ZONE_DMA,则调用create_kmalloc_cache 创建对应saize的slab。该函数在第二阶段有详细介绍。
        }
    }
#endif
}
#endif /* !CONFIG_SLOB */

到此为止,基本上把 kmem_cache_init函数介绍完了,总结一下,就是先创建第一个kmem_cache,然后从kmem_cache slab中再分配其他的kmem_cache object,从而形成2级的树形结构。并详细介绍了如何从新创建的slab中,增长一个page,从中分配出objs的过程。   
 
---------------------------------------------------------------------------------------------------------------------------------------------------
如果对您有理解内存管理有帮助,请打个赏吧!吐舌头



     
阅读全文
2 0
原创粉丝点击