初读SLAB

来源：互联网发布：乐高天启淘宝编辑：程序博客网时间：2024/05/19 12:27

快乐虾

http://blog.csdn.net/lights_joy/

lights@hb165.com

本文适用于

ADI bf561 DSP

uclinux-2008r1-rc8 (移植到vdsp5)

Visual DSP++ 5.0

欢迎转载，但请保留作者信息

1.1.1 基本思想

uclinux 所使用的 slab 分配器的基础是 Jeff Bonwick 为 SunOS 操作系统首次引入的一种算法。Jeff 的分配器是围绕对象缓存进行的。在内核中，会为有限的对象集（例如文件描述符和其他常见结构）分配大量内存。Jeff 发现对内核中普通对象进行初始化所需的时间超过了对其进行分配和释放所需的时间。因此他的结论是不应该将内存释放回一个全局的内存池，而是将内存保持为针对特定目而初始化的状态。例如，如果内存被分配给了一个互斥锁，那么只需在为互斥锁首次分配内存时执行一次互斥锁初始化函数（mutex_init）即可。后续的内存分配不需要执行这个初始化函数，因为从上次释放和调用析构之后，它已经处于所需的状态中了。

uclinux slab 分配器使用了这种思想和其他一些思想来构建一个在空间和时间上都具有高效性的内存分配器。

下图给出了 slab 结构的高层组织结构。在最高层是 cache_chain，这是一个 slab 缓存的链接列表。这对于 best-fit 算法非常有用，可以用来查找最适合所需要的分配大小的缓存（遍历列表）。cache_chain 的每个元素都是一个 kmem_cache 结构的引用（称为一个 cache）。它定义了一个要管理的给定大小的对象池。

cache

每个缓存都包含了一个 slabs 列表，这是一段连续的内存块（通常都是页面）。存在 3 种 slab：

slabs_full：完全分配的 slab

slabs_partial：部分分配的 slab

slabs_empty：空 slab，或者没有对象被分配

注意 slabs_empty 列表中的 slab 是进行回收（reaping）的主要备选对象。正是通过此过程，slab 所使用的内存被返回给操作系统供其他用户使用。

slab 列表中的每个 slab 都是一个连续的内存块（一个或多个连续页），它们被划分成一个个对象。这些对象是从特定缓存中进行分配和释放的基本元素。注意 slab 是 slab 分配器进行操作的最小分配单位，因此如果需要对 slab 进行扩展，这也就是所扩展的最小值。通常来说，每个 slab 被分配为多个对象。

由于对象是从 slab 中进行分配和释放的，因此单个 slab 可以在 slab 列表之间进行移动。例如，当一个 slab 中的所有对象都被使用完时，就从 slabs_partial 列表中移动到 slabs_full 列表中。当一个 slab 完全被分配并且有对象被释放后，就从 slabs_full 列表中移动到 slabs_partial 列表中。当所有对象都被释放之后，就从 slabs_partial 列表移动到 slabs_empty 列表中。

与传统的内存管理模式相比， slab 缓存分配器提供了很多优点。首先，内核通常依赖于对小对象的分配，它们会在系统生命周期内进行无数次分配。slab 缓存分配器通过对类似大小的对象进行缓存而提供这种功能，从而避免了常见的碎片问题。slab 分配器还支持通用对象的初始化，从而避免了为同一目而对一个对象重复进行初始化。最后，slab 分配器还可以支持硬件缓存对齐和着色，这允许不同缓存中的对象占用相同的缓存行，从而提高缓存的利用率并获得更好的性能。

1.1.2 相关数据结构

1.1.2.1 array_cache

这个结构体的定义在mm/slab.c中：

* struct array_cache

* Purpose:

* - LIFO ordering, to hand out cache-warm objects from _alloc

* - reduce the number of linked list operations

* - reduce spinlock operations

* The limit is stored in the per-cpu structure to reduce the data cache

* footprint.

struct array_cache {

unsigned int avail;

unsigned int limit;

unsigned int batchcount;

unsigned int touched;

spinlock_t lock;

void *entry[0]; /*

* Must have this definition in here for the proper

* alignment of array_cache. Also simplifies accessing

* the entries.

* [0] is for gcc 2.95. It should really be [].

};

l entry

这个成员存放了要分配给每个对象的空间。

l avail

表示在entry中还有多少个对象可供使用。

1.1.2.2 arraycache_init

这个结构体的定义在mm/slab.c中：

* bootstrap: The caches do not work without cpuarrays anymore, but the

* cpuarrays are allocated from the generic caches...

#define BOOT_CPUCACHE_ENTRIES 1

struct arraycache_init {

struct array_cache cache;

void *entries[BOOT_CPUCACHE_ENTRIES];

};

1.1.2.3 kmem_list3

这个结构体的定义在mm/slab.c中：

* The slab lists for all objects.

struct kmem_list3 {

struct list_head slabs_partial; /* partial list first, better asm code */

struct list_head slabs_full;

struct list_head slabs_free;

unsigned long free_objects;

unsigned int free_limit;

unsigned int colour_next; /* Per-node cache coloring */

spinlock_t list_lock;

struct array_cache *shared; /* shared per node */

struct array_cache **alien; /* on other nodes */

unsigned long next_reap; /* updated without locking */

int free_touched; /* updated without locking */

};

这个结构体存放了三个不同类型的SLAB链表，partial，full和free。

1.1.2.4 kmem_cache

这个结构体的定义在mm/slab.c中：

* struct kmem_cache

* manages a cache.

struct kmem_cache {

/* 1) per-cpu data, touched during every alloc/free */

struct array_cache *array[NR_CPUS];

/* 2) Cache tunables. Protected by cache_chain_mutex */

unsigned int batchcount;

unsigned int limit;

unsigned int shared;

unsigned int buffer_size;

u32 reciprocal_buffer_size;

/* 3) touched by every alloc & free from the backend */

unsigned int flags; /* constant flags */

unsigned int num; /* # of objs per slab */

/* 4) cache_grow/shrink */

/* order of pgs per slab (2^n) */

unsigned int gfporder;

/* force GFP flags, e.g. GFP_DMA */

gfp_t gfpflags;

size_t colour; /* cache colouring range */

unsigned int colour_off; /* colour offset */

struct kmem_cache *slabp_cache;

unsigned int slab_size;

unsigned int dflags; /* dynamic flags */

/* constructor func */

void (*ctor) (void *, struct kmem_cache *, unsigned long);

/* 5) cache creation/removal */

const char *name;

struct list_head next;

* We put nodelists[] at the end of kmem_cache, because we want to size

* this array to nr_node_ids slots instead of MAX_NUMNODES

* (see kmem_cache_init())

* We still use [MAX_NUMNODES] and not [1] or [0] because cache_cache

* is statically defined, so we reserve the max number of nodes.

struct kmem_list3 *nodelists[MAX_NUMNODES];

* Do not add fields after nodelists[]

};

l nodelists

这个值用于保存slab的三种不同类型的列表。在这里有

#define MAX_NUMNODES (1 << NODES_SHIFT)

#define NODES_SHIFT 0

因而其值为1。但是从这个成员的注释可知，这个数组的长度将是可变的，这也是将这个成员放在最后的原因。通过改变整个结构体分配空间的大小即可达到改变此数组长度的目的。当然，内核实际上也是将nr_node_ids定义为1的。

#define nr_node_ids 1

l colour_off

这个值实际表示cacheline的大小，对于BF561，这个值为32个字节。

l buffer_size

用于表示缓存对象的大小。这个值需要使用cacheline的大小进行对齐，对于BF561来讲，就是32个字节。

l reciprocal_buffer_size

这个值表示在内存中最多可以缓存对象的个数。这个值的计算由下面的函数完成：

u32 reciprocal_value(u32 k)

{

u64 val = (1LL << 32) + (k - 1);

do_div(val, k);

return (u32)val;

}

在这里do_div就是做64位的除法。

l gfporder

这个成员用于保存slab的阶数。因为小对象是在大块内存的基础上进行分配的，而每块大内存都称之为一个slab。这块大内存将由2^gfporder个页面组成。

l colour

这个成员保存了每个SLAB的剩余空间除以cacheline size的值，即它保存了每个slab还可以使用的cacheline数量。

l slab_size

这个值表示了每个slab实际使用的大小，它的值小于等于大块内存的大小。

l slabp_cache

这个值仅当slab需要单独使用附加管理信息时使用，在此cache之上可以分配管理slab所需要的小块空间。

l array

这个成员存放了可供分配的小对象数组。这是个PER_CPU的量，也就是说每个CPU有自己的缓冲区域。

1.1.2.5 slab

这个结构体的定义在mm/slab.c中：

* struct slab

* Manages the objs in a slab. Placed either at the beginning of mem allocated

* for a slab, or allocated from an general cache.

* Slabs are chained into three list: fully used, partial, fully free slabs.

struct slab {

struct list_head list;

unsigned long colouroff;

void *s_mem; /* including colour offset */

unsigned int inuse; /* num of objs active in slab */

kmem_bufctl_t free;

unsigned short nodeid;

};

1.1.3 相关的全局变量

1.1.3.1 initkmem_list3

这个变量的定义在mm/slab.c中：

#define NUM_INIT_LISTS (2 * MAX_NUMNODES + 1)

struct kmem_list3 __initdata initkmem_list3[NUM_INIT_LISTS];

其中MAX_NUMNODES的值为1。

对这个全局变量的初始化在kmem_cache_init函数中完成：

for (i = 0; i < NUM_INIT_LISTS; i++) {

kmem_list3_init(&initkmem_list3[i]);

if (i < MAX_NUMNODES)

cache_cache.nodelists[i] = NULL;

}

跟踪kmem_list3_init：

static void kmem_list3_init(struct kmem_list3 *parent)

{

INIT_LIST_HEAD(&parent->slabs_full);

INIT_LIST_HEAD(&parent->slabs_partial);

INIT_LIST_HEAD(&parent->slabs_free);

parent->shared = NULL;

parent->alien = NULL;

parent->colour_next = 0;

spin_lock_init(&parent->list_lock);

parent->free_objects = 0;

parent->free_touched = 0;

}

没有什么很特殊的东西，大部分就是给成员.0。

1.1.3.2 cache_cache

这个变量的定义在mm/slab.c中，且仅用于此文件：

/* internal cache of cache description objs */

static struct kmem_cache cache_cache = {

.batchcount = 1,

.limit = BOOT_CPUCACHE_ENTRIES,

.shared = 1,

.buffer_size = sizeof(struct kmem_cache),

.name = "kmem_cache",

};

这个变量用于表示缓存的缓存。在内核中，slab算法用kmem_cache的链表来表示缓存的列表，与此同时，kmem_cache这个结构体也必须使用slab算法进行分配，因此在内核中专门用一个cache_cache变量来表示这个缓存。它同时也是cache_chain链表中的第一个节点。

1.1.3.3 malloc_sizes和cache_names

在内核中用kmem_cache可以表示一个对象分配的缓冲区，但是为了提高效率，slab并不是为每个不同大小的对象都创建一个单独的缓冲区，而是用了一些指定大小的缓冲区数组，用malloc_sizes表示，而这些缓冲区的名称则放在cache_names数组中：

/* Size description struct for general caches. */

struct cache_sizes {

size_t cs_size;

struct kmem_cache *cs_cachep;

struct kmem_cache *cs_dmacachep;

};

struct cache_sizes malloc_sizes[] = {

#define CACHE(x) { .cs_size = (x) },

#include <linux/kmalloc_sizes.h>

CACHE(ULONG_MAX)

#undef CACHE

};

打开kmalloc_sizes.h可以看到这样的定义：

CACHE(32)

CACHE(64)

CACHE(96)

CACHE(128)

CACHE(192)

CACHE(256)

CACHE(512)

CACHE(1024)

CACHE(2048)

CACHE(4096)

CACHE(8192)

CACHE(16384)

CACHE(32768)

CACHE(65536)

CACHE(131072)

展开后可以发现这些宏里面的数字就是缓冲区的大小。

而cache_names数组的定义为：

/* Must match cache_sizes above. Out of line to keep cache footprint low. */

struct cache_names {

char *name;

char *name_dma;

};

static struct cache_names __initdata cache_names[] = {

#define CACHE(x) { .name = "size-" #x, .name_dma = "size-" #x "(DMA)" },

#include <linux/kmalloc_sizes.h>

{NULL,}

#undef CACHE

};

展开后就可以发现它定义的名称和malloc_sizes中的缓冲区一致。

1.1.3.4 cache_chain

这个变量的定义在mm/slab.c中，且仅用于此文件：

* 1. Guard access to the cache-chain.

* 2. Protect sanity of cpu_online_map against cpu hotplug events

static DEFINE_MUTEX(cache_chain_mutex);

static struct list_head cache_chain;

正如slab基本思想里所说的，这个全局变量用于将所有的kmem_cache链接起来，它就是这个双链表的表头。

1.1.3.5 initarray_cache

这个变量的定义在mm/slab.c中，且仅用于此文件：

static struct arraycache_init initarray_cache __initdata =

{ {0, BOOT_CPUCACHE_ENTRIES, 1, 0} };

arraycache_init结构体的定义为：

struct arraycache_init {

struct array_cache cache;

void *entries[BOOT_CPUCACHE_ENTRIES];

};

因此其cache成员的初始值为：

unsigned int avail = 0;

unsigned int limit = BOOT_CPUCACHE_ENTRIES = 1;

unsigned int batchcount = 1;

unsigned int touched = 0;

spinlock_t lock;

void *entry[0];

1.1.4 初始化：kmem_cache_init

这个初始化函数位于mm/slab.c，其实现过程较长，但大体可以分为以下几个部分：

1.1.4.1 全局变量初始化

if (num_possible_nodes() == 1) // 恒为true

use_alien_caches = 0;

for (i = 0; i < NUM_INIT_LISTS; i++) {

kmem_list3_init(&initkmem_list3[i]);

if (i < MAX_NUMNODES)

cache_cache.nodelists[i] = NULL;

}

* Fragmentation resistance on low memory - only use bigger

* page orders on machines with more than 32MB of memory.

if (num_physpages > (32 << 20) >> PAGE_SHIFT)

slab_break_gfp_order = BREAK_GFP_ORDER_HI;

在这里有

#define NUM_INIT_LISTS (2 * MAX_NUMNODES + 1)

#define MAX_NUMNODES (1 << NODES_SHIFT)

#define NODES_SHIFT 0

因而MAX_NUMNODES的值为1，NUM_INIT_LISTS的值为3。

而num_physpages这个全局变量则表示了SDRAM中总共的页面数量。

1.1.4.2 指明此函数要做的事情

/* Bootstrap is tricky, because several objects are allocated

* from caches that do not exist yet:

* 1) initialize the cache_cache cache: it contains the struct

* kmem_cache structures of all caches, except cache_cache itself:

* cache_cache is statically allocated.

* Initially an __init data area is used for the head array and the

* kmem_list3 structures, it's replaced with a kmalloc allocated

* array at the end of the bootstrap.

* 2) Create the first kmalloc cache.

* The struct kmem_cache for the new cache is allocated normally.

* An __init data area is used for the head array.

* 3) Create the remaining kmalloc caches, with minimally sized

* head arrays.

* 4) Replace the __init data head arrays for cache_cache and the first

* kmalloc cache with kmalloc allocated arrays.

* 5) Replace the __init data for kmem_list3 for cache_cache and

* the other cache's with kmalloc allocated memory.

* 6) Resize the head arrays of the kmalloc caches to their final sizes.

node = numa_node_id();

这一段注释其实已经很清楚地指明了这个初始化函数要做的事情。

在此numa_node_id恒为0。

1.1.4.3 cache_cache初始化

cache_cache在定义的时候已经初始化了部分成员：

static struct kmem_cache cache_cache = {

.batchcount = 1,

.limit = BOOT_CPUCACHE_ENTRIES,

.shared = 1,

.buffer_size = sizeof(struct kmem_cache),

.name = "kmem_cache",

};

在这个函数中又初始化了部分成员：

/* 1) create the cache_cache */

INIT_LIST_HEAD(&cache_chain);

list_add(&cache_cache.next, &cache_chain);

cache_cache.colour_off = cache_line_size();

cache_cache.array[smp_processor_id()] = &initarray_cache.cache;

cache_cache.nodelists[node] = &initkmem_list3[CACHE_CACHE];

* struct kmem_cache size depends on nr_node_ids, which

* can be less than MAX_NUMNODES.

cache_cache.buffer_size = offsetof(struct kmem_cache, nodelists) +

nr_node_ids * sizeof(struct kmem_list3 *);

#if DEBUG

// cache_cache.obj_size = cache_cache.buffer_size;

WARN();

#endif

cache_cache.buffer_size = ALIGN(cache_cache.buffer_size,

cache_line_size());

cache_cache.reciprocal_buffer_size =

reciprocal_value(cache_cache.buffer_size);

for (order = 0; order < MAX_ORDER; order++) {

cache_estimate(order, cache_cache.buffer_size,

cache_line_size(), 0, &left_over, &cache_cache.num);

if (cache_cache.num)

break;

}

BUG_ON(!cache_cache.num);

cache_cache.gfporder = order;

cache_cache.colour = left_over / cache_cache.colour_off;

cache_cache.slab_size = ALIGN(cache_cache.num * sizeof(kmem_bufctl_t) +

sizeof(struct slab), cache_line_size());

1、在这里有

#define cache_line_size() L1_CACHE_BYTES

#define L1_CACHE_SHIFT 5

#define L1_CACHE_BYTES (1 << L1_CACHE_SHIFT)

即32个字节，因而cache_cache.colour_off的值即为32。

2、可以看出cache_cache这个全局变量有幸成为cache_chain这个链表的第一个成员。

3、cache_cache.buffer_size经过对齐后的值变为0x60。

4、cache_cache.reciprocal_buffer_size的值经过计算为0x02aa aaab。其计算过程由reciprocal_value完成：

u32 reciprocal_value(u32 k)

{

u64 val = (1LL << 32) + (k - 1);

do_div(val, k);

return (u32)val;

}

由于buffer_size表示了kmem_cache这个结构体的大小，因而这个值似乎可以理解为是最大可以容纳的kmem_cache数量。(1LL << 32)可理解为最大的内存容量。

1.1.4.4 创建slab缓冲区

代码如下：

/* 2+3) create the kmalloc caches */

sizes = malloc_sizes;

names = cache_names;

* Initialize the caches that provide memory for the array cache and the

* kmem_list3 structures first. Without this, further allocations will

* bug.

sizes[INDEX_AC].cs_cachep = kmem_cache_create(names[INDEX_AC].name,

sizes[INDEX_AC].cs_size,

ARCH_KMALLOC_MINALIGN,

ARCH_KMALLOC_FLAGS|SLAB_PANIC,

NULL, NULL);

if (INDEX_AC != INDEX_L3) {

sizes[INDEX_L3].cs_cachep =

kmem_cache_create(names[INDEX_L3].name,

sizes[INDEX_L3].cs_size,

ARCH_KMALLOC_MINALIGN,

ARCH_KMALLOC_FLAGS|SLAB_PANIC,

NULL, NULL);

}

slab_early_init = 0;

while (sizes->cs_size != ULONG_MAX) {

* For performance, all the general caches are L1 aligned.

* This should be particularly beneficial on SMP boxes, as it

* eliminates "false sharing".

* Note for systems short on memory removing the alignment will

* allow tighter packing of the smaller caches.

if (!sizes->cs_cachep) {

sizes->cs_cachep = kmem_cache_create(names->name,

sizes->cs_size,

ARCH_KMALLOC_MINALIGN,

ARCH_KMALLOC_FLAGS|SLAB_PANIC,

NULL, NULL);

}

#ifdef CONFIG_ZONE_DMA

sizes->cs_dmacachep = kmem_cache_create(

names->name_dma,

sizes->cs_size,

ARCH_KMALLOC_MINALIGN,

ARCH_KMALLOC_FLAGS|SLAB_CACHE_DMA|

SLAB_PANIC,

NULL, NULL);

#endif

sizes++;

names++;

}

整个过程比较简单，就是初始化malloc_sizes。

1.1.4.5 替换array

在kmem_cache链表初始化完成之后，就可以使用kmalloc在上面分配对象了，在初始情况下kmem_cache中的array成员将指向initarray_cache.cache。但是在分配对象时每个kmem_cache必须有自己的array，因此在下面的代码中重新分配了一个array_cache：

/* 4) Replace the bootstrap head arrays */

{

struct array_cache *ptr;

ptr = kmalloc(sizeof(struct arraycache_init), GFP_KERNEL);

local_irq_disable();

BUG_ON(cpu_cache_get(&cache_cache) != &initarray_cache.cache);

memcpy(ptr, cpu_cache_get(&cache_cache),

sizeof(struct arraycache_init));

* Do not assume that spinlocks can be initialized via memcpy:

spin_lock_init(&ptr->lock);

cache_cache.array[smp_processor_id()] = ptr;

local_irq_enable();

ptr = kmalloc(sizeof(struct arraycache_init), GFP_KERNEL);

local_irq_disable();

BUG_ON(cpu_cache_get(malloc_sizes[INDEX_AC].cs_cachep)

!= &initarray_generic.cache);

memcpy(ptr, cpu_cache_get(malloc_sizes[INDEX_AC].cs_cachep),

sizeof(struct arraycache_init));

* Do not assume that spinlocks can be initialized via memcpy:

spin_lock_init(&ptr->lock);

malloc_sizes[INDEX_AC].cs_cachep->array[smp_processor_id()] =

ptr;

local_irq_enable();

}

代码比较简单。

1.1.4.6 替换nodelists

在初始情况下，kmem_cache中的nodelists成员将指向initkmem_list3，在此将重新分配一个kmem_list3结构体，并将nodelists成员指向新分配的内存：

/* 5) Replace the bootstrap kmem_list3's */

{

int nid;

/* Replace the static kmem_list3 structures for the boot cpu */

init_list(&cache_cache, &initkmem_list3[CACHE_CACHE], node);

for_each_online_node(nid) {

init_list(malloc_sizes[INDEX_AC].cs_cachep,

&initkmem_list3[SIZE_AC + nid], nid);

if (INDEX_AC != INDEX_L3) {

init_list(malloc_sizes[INDEX_L3].cs_cachep,

&initkmem_list3[SIZE_L3 + nid], nid);

}

由于未使用NUMA，for_each_online_node将只执行一次，nid的值为0。

在此init_list的实现为：

* swap the static kmem_list3 with kmalloced memory

static void init_list(struct kmem_cache *cachep, struct kmem_list3 *list,

int nodeid)

{

struct kmem_list3 *ptr;

ptr = kmalloc_node(sizeof(struct kmem_list3), GFP_KERNEL, nodeid);

BUG_ON(!ptr);

local_irq_disable();

memcpy(ptr, list, sizeof(struct kmem_list3));

* Do not assume that spinlocks can be initialized via memcpy:

spin_lock_init(&ptr->list_lock);

MAKE_ALL_LISTS(cachep, ptr, nodeid);

cachep->nodelists[nodeid] = ptr;

local_irq_enable();

}

1.1.5 确定页块可分配的对象数量

在mm/slab.c中有一个叫cache_estimate的函数：

* Calculate the number of objects and left-over bytes for a given buffer size.

static void cache_estimate(unsigned long gfporder, size_t buffer_size,

size_t align, int flags, size_t *left_over,

unsigned int *num)

{

int nr_objs;

size_t mgmt_size;

size_t slab_size = PAGE_SIZE << gfporder;

* The slab management structure can be either off the slab or

* on it. For the latter case, the memory allocated for a

* slab is used for:

* - The struct slab

* - One kmem_bufctl_t for each object

* - Padding to respect alignment of @align

* - @buffer_size bytes for each object

* If the slab management structure is off the slab, then the

* alignment will already be calculated into the size. Because

* the slabs are all pages aligned, the objects will be at the

* correct alignment when allocated.

if (flags & CFLGS_OFF_SLAB) {

mgmt_size = 0;

nr_objs = slab_size / buffer_size;

if (nr_objs > SLAB_LIMIT)

nr_objs = SLAB_LIMIT;

} else {

* Ignore padding for the initial guess. The padding

* is at most @align-1 bytes, and @buffer_size is at

* least @align. In the worst case, this result will

* be one greater than the number of objects that fit

* into the memory allocation when taking the padding

* into account.

nr_objs = (slab_size - sizeof(struct slab)) /

(buffer_size + sizeof(kmem_bufctl_t));

* This calculated number will be either the right

* amount, or one greater than what we want.

if (slab_mgmt_size(nr_objs, align) + nr_objs*buffer_size

> slab_size)

nr_objs--;

if (nr_objs > SLAB_LIMIT)

nr_objs = SLAB_LIMIT;

mgmt_size = slab_mgmt_size(nr_objs, align);

}

*num = nr_objs;

*left_over = slab_size - nr_objs*buffer_size - mgmt_size;

}

这个函数用于计算一个指定大小的页块可以分配多少个对象以及剩余空间的大小。

因为slab算法最终需要通过buddy算法取得可用的内存，因此在这里传入了一个order参数，然后slab算法将在这大块的内存上分配一些小对象。

从这个函数的注释可以看出，内核将这一块大内存视为一个slab，然后用一个叫slab的结构体来描述这大块内存的信息。对于在这大块内存上分配的小对象，每个对象需要用一个kmem_bufctl_t类型的变量来保存对象信息。而对于slab结构体和kmem_bufctl_t这些附加信息，有两种保存方式，一种是直接保存在这一块大内存里面，另一种是用单独的空间来保存它们。通过CFLGS_OFF_SLAB这个标志可以控制它们的存储方式。

在这里，slab_mgmt_size函数根据对象的数量计算附加空间的大小并用cacheline size进行对齐。

left_over参数可以返回这块大内存分配最后剩余的小空间字节数。

num则返回这块大内存可以分配的对象数量。

1.1.6 创建缓冲区：kmem_cache_create

这个函数位于mm/slab.c，它主要做了以下几件事：

1、修正传递进来的几个参数。

2、调用kmem_cache_zalloc分配一个kmem_cache的结构体。

3、对kmem_cache结构体的成员进行初始化。

1.1.7 对象分配：kmem_cache_zalloc

这个函数用于在指定的cache上分配一个对象，此函数接受一个kmem_cache的指针做为参数，且要求此结构体已经初始化完成：

/**

* kmem_cache_zalloc - Allocate an object. The memory is set to zero.

* @cache: The cache to allocate from.

* @flags: See kmalloc().

* Allocate an object from this cache and set the allocated memory to zero.

* The flags are only relevant if the cache has no available objects.

void *kmem_cache_zalloc(struct kmem_cache *cache, gfp_t flags)

{

void *ret = __cache_alloc(cache, flags, __builtin_return_address(0));

if (ret)

memset(ret, 0, obj_size(cache));

return ret;

}

接着跟踪__cache_alloc：

static __always_inline void *

__cache_alloc(struct kmem_cache *cachep, gfp_t flags, void *caller)

{

unsigned long save_flags;

void *objp;

if (should_failslab(cachep, flags)) // 直接返回

return NULL;

cache_alloc_debugcheck_before(cachep, flags); // 对非抢占式内核为空语句

local_irq_save(save_flags);

objp = __do_cache_alloc(cachep, flags);

local_irq_restore(save_flags);

objp = cache_alloc_debugcheck_after(cachep, flags, objp, caller); // 空语句，直接返回objp

prefetchw(objp); // 空语句

return objp;

}

继续跟踪__do_cache_alloc：

static __always_inline void *

__do_cache_alloc(struct kmem_cache *cachep, gfp_t flags)

{

return ____cache_alloc(cachep, flags);

}

继续跟踪____cache_alloc：

static inline void *____cache_alloc(struct kmem_cache *cachep, gfp_t flags)

{

void *objp;

struct array_cache *ac;

check_irq_off(); // 空语句

ac = cpu_cache_get(cachep); // 返回cachep->array[smp_processor_id()]

if (likely(ac->avail)) {

STATS_INC_ALLOCHIT(cachep); // 空语句

ac->touched = 1;

objp = ac->entry[--ac->avail];

} else {

STATS_INC_ALLOCMISS(cachep); // 空语句

objp = cache_alloc_refill(cachep, flags);

}

return objp;

}

看着挺简单的，如果array_cache里面还有小对象可供分配，直接取一个小对象出来，如果没有小对象可供分配了，那就麻烦点，先分配一块大内存，然后在此大内存块的基础上再取一个小对象出来。