关于slab的讨论和源码解释

来源:互联网 发布:主流编程语言对比 编辑:程序博客网 时间:2024/06/06 05:51

Hi tommytang,
这是我前些天一个操作系统小组的mailinglist写的文档,不一定正确,希望对你有帮助 :-)


关于slab的讨论和源码解释,请指正(lucian_museful@sina.com):

说明:
1. 参考资料: UNIX高级教程 系统技术内幕 Uresh Vahalia 清华大学出版社
2. 源代码 : slab.c 2.4.0 (可能是test 10 , 记不清了)

3. 建议用 UltrEdit或 C编辑器(带彩色的)看,会清楚些
4. 该文档中的代码是原来的代码的子集, 删除了关于测试和统计(猜测是), SMP等代码
5. /*-- */的注释是我加的
6. 该文档遵守GPL, 版权归姚宏(lucian_museful@sina.com)所有

--------------------------------------------------------------------------
1. slab分配器扮演的角色

有的书上说,kmalloc主要用于分配在物理上连续的内存,但是我认为这不是本质的(事实上,可以通过__get_free_pages获得物理连续内存)

和页面分配器不同的是,Slab分配器管理小的内存,用于快速分配一些数据结构
Slab分配器通常在页分配器分配页面后,在页面中做进一步的精细分配,有点类似c中的malloc, mfree的作用

它的主要职责是数据结构的管理,偏逻辑的.而页面分配器主要和硬件打交道,所以主要在页面粒度上工作
还有一个vmalloc, vfree, 本来还觉得比较清楚,现在又糊涂的,所以不作评论
---------------------------------------------------------------------------

2.为什么要采用slab算法来分配小的内存

首先,内核需要分配一些小块(或者不需要页对齐的)内存,需要有一个分配器在页中进行精细的分配,这个前面已经提到.

由于各个小块内存大小不一, 一个常用的办法是分配2的幂的大小的内存.由于许多内存块不正好是2的幂,所以造成了空间的浪费.
一个解决办法是将相同大小的内存块放在一起,这样就不会出现上述情况.

事实上通常内核不仅分配回收内存块,还需要初始化和析构, 也就是这些内存实质上是对象. 而这些对象有两个特点:

特点一:在对象内存回收前, 通常回到初始状态.例如一个链表头在回收前,通常是指向null或者自己等,也就是处于初始化状态
特点二:这些对象初始化是统一的,也就是对象A和对象B初始化没有什么不同

Slab利用这两个特点,采用下面的算法:当用户不再使用某个对象时,slab并不真正析构它,而是仅将它设为空的对象.
这时该对象处于初始状态.当用户需要重新构造一个对象时, slab就可以将这个空的对象提供,而无须再次初始化.

(这就是为什么一旦一个cache分配了一个slab, 就初始化中间的所有对象)

--------------------------------------------------------------------------------

3. slab分配器中的cache和slab(这个slab不同于前面的slab,前面只是借用后者命名的算法)

为了对这些相同大小,同样性质的对象进行管理, slab分配器引入cache.
一个cache管理同样大小,同样属性的对象.
cache通过slab来对对象进行管理.图示:
cache
------------------------------
| |
| ---------------------- |
|slab| obj | obj | obj | |
| ---------------------- |
| ---------------------- |
|slab| obj | obj | obj | |
| ---------------------- |
------------------------------

一个slab建起了cache对象和页面分配器的桥梁.当cache 要分配一个对象的时候,而每个空的slab, 它就会向
页面分配器申请一个(或多个)页面,用一个slab来对这个(些)页面进行进一步细分为对象
所以可以把slab就看成是一个(些)页面.

---------------------------------------------------------------------------------

4. 一般对象的管理

有许多类的对象是少量的,也没有构造器和析构器,不需要一个cache
所以Slab分配器提供几个通用的cache(general_cache)来管理这些对象
每个cache包含的对象大小为2的幂
当用户在这些 cache中分配对象时,slab分配器会找一个大小最适合的cache

---------------------------------------------------------------------------------

5. cache_cache

每个cache都有一个cache描述符(kmem_cache_t)来管理,这些描述符本身可以看成是对象,所以自然有一个cache来管理,
这就产生了所谓的cache_cache, 当然cache_cache的描述符本身也是和它们同样性质的对象,也需要cache管理?
这就陷入了类似逻辑上的"自指"的问题, 初始化时也有了"鸡和蛋"的问题,所以cache_cache本身是一个全局变量,

-----------------------------------------------------------------------------------------

6.slab如何对对象进行管理

(1)如何对空的对象进行管理

slab有一个kmem_bufctl_t数组,数组每项和对象一一对应, 而每项的值就指向下一个空对象对应的项, 这就构成一个指针队列,
队列每项对应一个空的对象.

例如 该slab中第3,8,12个对象是空的(inactive)
那么, slab描述符slab_s中有一个字段为3, 而这个数组第3项的值为8,第8项的值为12, 第12项的值为一个结尾符号

这样slab能迅速找到空的对象

(2)slab的结构

slab的管理部分(包括上面提到的描述符slab_s和kmem_bufctl_t数组)和对象可能都放在slab中,也可能放在外面(CFLGS_OFF_SLAB)
(很自然的,就放在通用cache中, 优雅 :-) ), 视如何节省空间而定

一个slab的结构大致如下

colour部分(后面会讨论) 管理部分(可能不放在里面) 对象序列

-----------------------------------------------------------------------------------------

7. 一个cache如何对slab管理

cache将它的slab串成一个队列,队列前面部分是那么被填满的slab, 中间是部分满的slab, 后面是全空的slab,便于下面原则的实施和
cache的reap,shrink(后面会讨论)

cache分配对象的时候首先考虑部分满的slab,只有在没有部分满的slab情况下才考虑全空的,进而考虑向页面分配器申请一些页面生成一个slab
这种约束是必要的, 试想没有这种约束的极端情况:每个slab只分配一个对象, 这样会造成很大的浪费

为保证队列的格局,这个队列随时被调整:

当在某个满的slab中释放了一个对象,那么它会被转移到队列中部分满的slab的头部
当某个部分满的slab中释放了一个对象,成为全空的slab, 它会被转移到队列的最末尾 

-----------------------------------------------------------------------------------------

8. 如何从一个cache中分配一个对象

首先在这个cache中找到一个不满的slab(首先可能是部分满的,其次是全空的), 由指针firstnotfull指定
如果没有则向页面分配器申请一个(些)页面, 生成一个slab

在这个slab中找一个空的对象(前面已经讨论)

------------------------------------------------------------------------------------------

9. 页面的回收

__kmem_cache_shrink() 针对某个cache回首它全空的slab, 将页面返回给页面分配器, 可供用户主动调用
kmem_cache_reap() 在__get_free_pages()中调用, 页面分配器发现没有空的物理页面时,首先
考虑的是从slab那里获得页面,而不是交换 , 因为后者要慢得多(猜想是,没有得到肯定的验证)

-------------------------------------------------------------------------------------

10. slab算法和CPU中L1 CACHE(slab中的cache和这个没有任何关系)的配合

(没有看到十分明确的文档,所以里面有我的猜测)

(1)CACHE有许多数据线( cacheline ), 每个线可以同时传输若干个字节,比如16, 32个
例如第一个可能传输0-15,第二个可能传输16-31

(2)为避免slab中对象跨越过多的线, slab算法中常采用CACHE线对齐(L1_CACHE_ALIGN)
但这个可能不一定特别关键,原因是经常只对对象中部分字段操作,所以也无所谓跨线了,不知道对不对

(3)colour 放在11中说明

--------------------------------------------------------------------------------------
11. slab 的 colour

有两个假定和一个前提(除第二个外是我猜测的(尤其是第三个),没有找到第一手资料,不知道理解对不对)

(1)一般程序对对象的访问都是"交叉"的, 也就是,比如,不会访问修改对象A的数据100次,再访问对象B, 而通常是
访问/修改A,随后访问/修改B,随后访问/修改C...然后再访问A

(2) 每个对象某些字段访问特别频繁, 某些则比较少些

(3) CACHE的线读取是受限制的, 比如CACHE是16K大,那么第一个线只能和地址0-15, 16K-16K+15, 32K- 32K+15交互
第二个线只能和地址16-31, 16K+16-16K+31, 32K+16- 32K+31交互
而第二个线不能和0-15交互
...


可能在这个假定和前提下, 产生了colour的必要,为说明问题,举个简化的例子:
slab 1 中有对象A, 位于地址 0-100 字段c 0 - 15
slab 2 中有对象B, 位于地址16K +0-100 字段c 16K +0 - 16K +15

由(1)(3) 由于可能交叉访问对象A的c字段和对象B的c字段, CACHE不得不频繁地在第一线上将A的c写出,B的c写入,或是A的c写入,B的c写出

由(2)我们假定c字段是被最频繁访问的(这种假定是有一般性的)

采用错位的方式,比方说,将对象B向高位移动16个字节,字段c变成了 16K + 16 - 16K + 31, 这样和它交互的数据线变成了第二线
由于我们已经假定c字段是被最频繁访问的, 这两个线的交互频率都有所下降, 从而提高了CPU的执行效率


slab的colour就是这个错位, 可以分解成下面的描述:

(1)对象和slab管理部分通常不会占满整个slab, 有空余部分
(2)这些空余部分成为colour的最大值
(3)每个slab依次循环地取其中某个值,进行移位
(4)移动当然不会以字节,可能是字或是以CACHE线对齐的

-----------------------------------------------------------------------------------------------------------------

遗留的问题:

1.有一些问题,打了问号了
2.关于线程同步的问题,比如是否会出现循环调用,死锁等,(sem是用在这方面的吧?),因为不太熟,希望以后补上
3.SMP






typedef unsigned int kmem_bufctl_t;

/* Max number of objs-per-slab for caches which use off-slab slabs.
* Needed to avoid a possible looping condition in kmem_cache_grow().
*/
static unsigned long offslab_limit;

/*
* slab_t
*
* Manages the objs in a slab. Placed either at the beginning of mem allocated
* for a slab, or allocated from an general cache.
* Slabs are chained into one ordered list: fully used, partial, then fully
* free slabs.
*/
typedef struct slab_s {
struct list_head list; /*-- 一个cache的所有slab是一个双向链表, 这个是链表指针 */
unsigned long colouroff; /*-- 这个slab 的colour */
void *s_mem; /* including colour offset *//*-- slab中第一个对象起始地址*/
unsigned int inuse; /* num of objs active in slab *//*- 这个slab中被使用的对象数 */
kmem_bufctl_t free; /*-- 指向一个空的对象的指针项, 用于分配空的对象 */
} slab_t;

/*-- 对象指针数组的首地址,它紧跟着slab的描述符 */

#define slab_bufctl(slabp) /
((kmem_bufctl_t *)(((slab_t*)slabp)+1))

/*
* cpucache_t
*
* Per cpu structures
* The limit is stored in the per-cpu structure to reduce the data cache
* footprint.
*/
typedef struct cpucache_s {
unsigned int avail;
unsigned int limit;
} cpucache_t;

#define cc_entry(cpucache) /
((void **)(((cpucache_t*)cpucache)+1))
#define cc_data(cachep) /
((cachep)->cpudata[smp_processor_id()])
/*
* kmem_cache_t
*
* manages a cache.
*/

#define CACHE_NAMELEN 20 /* max name length for a slab cache */

struct kmem_cache_s {
/* 1) each alloc & free */
/* full, partial first, then free */
struct list_head slabs; /*-- 一个cache的所有slab是一个双向链表, 这是cache中的链表指针 */
struct list_head *firstnotfull; /*-- 指向第一个不满的slab, 分配对象时使用 */
unsigned int objsize; /*-- 对象大小,通常比用户提供的对象大小大些,因为对齐的需要 */
unsigned int flags; /* constant flags */ /*-- 一些属性 */
unsigned int num; /* # of objs per slab *//*-- 这个cache中每个slab能容纳的对象个数 */
spinlock_t spinlock;


/* 2) slab additions /removals */
/* order of pgs per slab (2^n) */
unsigned int gfporder; /*-- 每个slab使用的页面指数 */

/* force GFP flags, e.g. GFP_DMA */
unsigned int gfpflags;

size_t colour; /* cache colouring range */
unsigned int colour_off; /* colour offset */
unsigned int colour_next; /* cache colouring */ /*-- 下一个 slab 的colour */
kmem_cache_t *slabp_cache;
unsigned int growing;
unsigned int dflags; /* dynamic flags */

/* constructor func */
void (*ctor)(void *, kmem_cache_t *, unsigned long);

/* de-constructor func */
+ void (*dtor)(void *, kmem_cache_t *, unsigned long);

unsigned long failures;

/* 3) cache creation/removal */
char name[CACHE_NAMELEN];
struct list_head next;
};

/* internal c_flags */
#define CFLGS_OFF_SLAB 0x010000UL /* slab management in own cache */
#define CFLGS_OPTIMIZE 0x020000UL /* optimized slab lookup */

/* c_dflags (dynamic flags). Need to hold the spinlock to access this member */
#define DFLGS_GROWN 0x000001UL /* don't reap a recently grown */

#define OFF_SLAB(x) ((x)->flags & CFLGS_OFF_SLAB)
#define OPTIMIZE(x) ((x)->flags & CFLGS_OPTIMIZE)
#define GROWN(x) ((x)->dlags & DFLGS_GROWN)

#define STATS_INC_ACTIVE(x) do { } while (0)
#define STATS_DEC_ACTIVE(x) do { } while (0)
#define STATS_INC_ALLOCED(x) do { } while (0)
#define STATS_INC_GROWN(x) do { } while (0)
#define STATS_INC_REAPED(x) do { } while (0)
#define STATS_SET_HIGH(x) do { } while (0)
#define STATS_INC_ERR(x) do { } while (0)


/* maximum size of an obj (in 2^order pages) */
#define MAX_OBJ_ORDER 5 /* 32 pages */

/*
* Do not go above this order unless 0 objects fit into the slab.
*/
#define BREAK_GFP_ORDER_HI 2
#define BREAK_GFP_ORDER_LO 1
static int slab_break_gfp_order = BREAK_GFP_ORDER_LO;

/*
* Absolute limit for the gfp order
*/
#define MAX_GFP_ORDER 5 /* 32 pages */


/* Macros for storing/retrieving the cachep and or slab from the
* global 'mem_map'. These are used to find the slab an obj belongs to.
* With kfree(), these are used to find the cache which an obj belongs to.
*/
#define SET_PAGE_CACHE(pg,x) ((pg)->list.next = (struct list_head *)(x))
#define GET_PAGE_CACHE(pg) ((kmem_cache_t *)(pg)->list.next)
#define SET_PAGE_SLAB(pg,x) ((pg)->list.prev = (struct list_head *)(x))
#define GET_PAGE_SLAB(pg) ((slab_t *)(pg)->list.prev)

/* Size description struct for general caches. */
/*-- 下面用来管理通用的cache, 每个尺寸有两个cache的指针,一个是DMA的一个不是 */

typedef struct cache_sizes {
size_t cs_size;
kmem_cache_t *cs_cachep;
kmem_cache_t *cs_dmacachep;
} cache_sizes_t;


static cache_sizes_t cache_sizes[] = {
#if PAGE_SIZE == 4096
{ 32, NULL, NULL},
#endif
{ 64, NULL, NULL},
{ 128, NULL, NULL},
{ 256, NULL, NULL},
{ 512, NULL, NULL},
{ 1024, NULL, NULL},
{ 2048, NULL, NULL},
{ 4096, NULL, NULL},
{ 8192, NULL, NULL},
{ 16384, NULL, NULL},
{ 32768, NULL, NULL},
{ 65536, NULL, NULL},
{131072, NULL, NULL},
{ 0, NULL, NULL}
};

/* internal cache of cache description objs */
static kmem_cache_t cache_cache = {
slabs: LIST_HEAD_INIT(cache_cache.slabs),
firstnotfull: &cache_cache.slabs,
objsize: sizeof(kmem_cache_t),
flags: SLAB_NO_REAP,
spinlock: SPIN_LOCK_UNLOCKED,
colour_off: L1_CACHE_BYTES,
name: "kmem_cache",
};

/* Guard access to the cache-chain. */
static struct semaphore cache_chain_sem;

/* Place maintainer for reaping. */
static kmem_cache_t *clock_searchp = &cache_cache;

#define cache_chain (cache_cache.next)


/* Cal the num objs, wastage, and bytes left over for a given slab size. */

/*-- 计算一个slab能容纳几个对象,
gfporder: 一个slab需要使用的页面数为2^gfporder,也就是:
当gfporder为0, 需要页面为1;当gfporder为1, 需要页面为2… …
size : 对象占用的字节数
flag : cache的属性,主要看是否要求将slab管理部分放在slab中,如果
放在slab中,那么这个slab包含的对象数可能会少些
left_over:由于这些对象(和管理部分)不一定能将这些页面正好充满,会有一
些空缺,函数返回时在Left_over中保存空缺大小,用来计算colour
num :函数返回时在num中保存这个slab中可容纳的对象的个数


*/

static void kmem_cache_estimate (unsigned long gfporder, size_t size,
int flags, size_t *left_over, unsigned int *num)
{
int i;
/*-- wastage 计算一个slab中除去管理部分和对象外,剩余的空间,赋予left_over为colour使用 */

size_t wastage = PAGE_SIZE<<gfporder;
size_t extra = 0;
size_t base = 0;

/*-- slab 管理部分(描述符slab_t和空对象指针数祖)可能放在slab里面,也可能不放在里面
如果slab管理部分不放在slab里面( CFLGS_OFF_SLAB ),那么base = 0, extra = 0 */


if (!(flags & CFLGS_OFF_SLAB)) {
base = sizeof(slab_t);
extra = sizeof(kmem_bufctl_t);
}
i = 0;

/*-- L1_CACHE_ALIGN 是计算L1 CACHE对齐的
L1_CACHE 有许多线( cache line )每一个线可以同时传输多个字节,比如16,32
如果进行L1 CACHE对齐可以使得对象不占用过多的线,不知道是否可以加快速度?? */

while (i*size + L1_CACHE_ALIGN(base+i*extra) <= wastage)
i++;
if (i > 0)
i--;

if (i > SLAB_LIMIT)
i = SLAB_LIMIT;

*num = i;
wastage -= i*size;
wastage -= L1_CACHE_ALIGN(base+i*extra);
*left_over = wastage;
}

/* Initialisation - setup the `cache' cache. */

void __init kmem_cache_init(void)
{
size_t left_over;

init_MUTEX(&cache_chain_sem);

INIT_LIST_HEAD(&cache_chain);

/*-- 计算一下cache_cache一个slab能容纳几个kmem_cache_t*/

kmem_cache_estimate(0, cache_cache.objsize, 0,
&left_over, &cache_cache.num);
if (!cache_cache.num)
BUG();

/*-- 计算一下colour最大值和第一个slab的colour, colur_off是colour的粒度 */

cache_cache.colour = left_over/cache_cache.colour_off;
cache_cache.colour_next = 0;
}


/* Initialisation - setup remaining internal and general caches.
* Called after the gfp() functions have been enabled, and before smp_init().
*/

/*-- 这是通用cache的初始化, 对每一个cache_sizes中的对象尺寸,初始化两个cache, 一个是DMA的,一个是
非DMA的 */

void __init kmem_cache_sizes_init(void)
{
cache_sizes_t *sizes = cache_sizes;
char name[20];
/*
* Fragmentation resistance on low memory - only use bigger
* page orders on machines with more than 32MB of memory.
*/
/*-- ??不清楚BREAK_GFP_ORDER_HI是什么意思,应该和页面分配器有关 */

if (num_physpages > (32 << 20) >> PAGE_SHIFT)
slab_break_gfp_order = BREAK_GFP_ORDER_HI;
do {
/* For performance, all the general caches are L1 aligned.
* This should be particularly beneficial on SMP boxes, as it
* eliminates "false sharing".
* Note for systems short on memory removing the alignment will
* allow tighter packing of the smaller caches. */
sprintf(name,"size-%Zd",sizes->cs_size);

/*--创建非DMA的cache */

if (!(sizes->cs_cachep =
kmem_cache_create(name, sizes->cs_size,
0, SLAB_HWCACHE_ALIGN, NULL, NULL))) {
BUG();
}

/* Inc off-slab bufctl limit until the ceiling is hit. */

/*-- ?? */

if (!(OFF_SLAB(sizes->cs_cachep))) {
offslab_limit = sizes->cs_size-sizeof(slab_t);
offslab_limit /= 2;
}
sprintf(name, "size-%Zd(DMA)",sizes->cs_size);

/*-- 创建DMA的cache */

sizes->cs_dmacachep = kmem_cache_create(name, sizes->cs_size, 0,
SLAB_CACHE_DMA|SLAB_HWCACHE_ALIGN, NULL, NULL);
if (!sizes->cs_dmacachep)
BUG();
sizes++;
} while (sizes->cs_size);
}

int __init kmem_cpucache_init(void)
{
return 0;
}

__initcall(kmem_cpucache_init);

/* Interface to system's page allocator. No need to hold the cache-lock.
*/

/*-- 向页面分配器申请一个slab所需的页面 */

static inline void * kmem_getpages (kmem_cache_t *cachep, unsigned long flags)
{
void *addr;

/*
* If we requested dmaable memory, we will get it. Even if we
* did not request dmaable memory, we might get it, but that
* would be relatively rare and ignorable.
*/
flags |= cachep->gfpflags;
addr = (void*) __get_free_pages(flags, cachep->gfporder);
/* Assume that now we have the pages no one else can legally
* messes with the 'struct page's.
* However vm_scan() might try to test the structure to see if
* it is a named-page or buffer-page. The members it tests are
* of no interest here.....
*/
return addr;
}

/* Interface to system's page release. */

/*-- 释放一个slab占用的页面, addr是页面的首地址 */

static inline void kmem_freepages (kmem_cache_t *cachep, void *addr)
{
unsigned long i = (1<<cachep->gfporder);
struct page *page = virt_to_page(addr);

/* free_pages() does not clear the type bit - we do that.
* The pages have been unlinked from their cache-slab,
* but their 'struct page's might be accessed in
* vm_scan(). Shouldn't be a worry.
*/
while (i--) {
PageClearSlab(page);
page++;
}
free_pages((unsigned long)addr, cachep->gfporder);
}

/* Destroy all the objs in a slab, and release the mem back to the system.
* Before calling the slab must have been unlinked from the cache.
* The cache-lock is not held/needed.
*/


static void kmem_slab_destroy (kmem_cache_t *cachep, slab_t *slabp)
{
if (cachep->dtor
) {
int i;

/*-- 如果这个cache提供析构器的话,析构该slab中的每一个对象 */

for (i = 0; i < cachep->num; i++) {

/*-- s_mem指向第一个对象的首地址 */

void* objp = slabp->s_mem+cachep->objsize*i;
if (cachep->dtor)
(cachep->dtor)(objp, cachep, 0);
}
}

/*-- 释放这个slab所占的页面, 第二个参数指向页面首地址 */

kmem_freepages(cachep, slabp->s_mem-slabp->colouroff);

/*-- 如果slab管理部分放在外面(也就是放在通用cache里)的话,还要清除这个对象 */
if (OFF_SLAB(cachep))
kmem_cache_free(cachep->slabp_cache, slabp);
}


/**
* kmem_cache_create - Create a cache.
* @name: A string which is used in /proc/slabinfo to identify this cache.
* @size: The size of objects to be created in this cache.
* @offset: The offset to use within the page.
* @flags: SLAB flags
* @ctor: A constructor for the objects.
* @dtor: A destructor for the objects.
*
* Returns a ptr to the cache on success, NULL on failure.
* Cannot be called within a int, but can be interrupted.
* The @ctor is run when new pages are allocated by the cache
* and the @dtor is run before the pages are handed back.
* The flags are
*
* %SLAB_POISON - Poison the slab with a known test pattern (a5a5a5a5)
* to catch references to uninitialised memory.
*
* %SLAB_RED_ZONE - Insert `Red' zones around the allocated memory to check
* for buffer overruns.
*
* %SLAB_NO_REAP - Don't automatically reap this cache when we're under
* memory pressure.
*
* %SLAB_HWCACHE_ALIGN - Align the objects in this cache to a hardware
* cacheline. This can be beneficial if you're counting cycles as closely
* as davem.
*/

/*-- 创建一个cache */

kmem_cache_t *
kmem_cache_create (const char *name, size_t size, size_t offset,
unsigned long flags, void (*ctor)(void*, kmem_cache_t *, unsigned long),
void (*dtor)(void*, kmem_cache_t *, unsigned long))
{
const char *func_nm = KERN_ERR "kmem_create: ";
size_t left_over, align, slab_size;
kmem_cache_t *cachep = NULL;

/*
* Sanity checks... these are all serious usage bugs.
*/
if ((!name) ||
((strlen(name) >= CACHE_NAMELEN - 1)) ||
in_interrupt() ||
(size < BYTES_PER_WORD) ||
(size > (1<<MAX_OBJ_ORDER)*PAGE_SIZE) ||
(dtor && !ctor) ||
(offset < 0 || offset > size))
BUG();


/*
* Always checks flags, a caller might be expecting debug
* support which isn't available.
*/
if (flags & ~CREATE_MASK)
BUG();

/* Get cache's description obj. */

/*-- 在cache_cache中分配一个cache 描述部分,也就是一个kmem_cache_t*/

cachep = (kmem_cache_t *) kmem_cache_alloc(&cache_cache, SLAB_KERNEL);
if (!cachep)
goto opps;
memset(cachep, 0, sizeof(kmem_cache_t));

/* Check that size is in terms of words. This is needed to avoid
* unaligned accesses for some archs when redzoning is used, and makes
* sure any on-slab bufctl's are also correctly aligned.
*/

/*-- 用户提供的对象尺寸至少是字对齐的 */

if (size & (BYTES_PER_WORD-1)) {
size += (BYTES_PER_WORD-1);
size &= ~(BYTES_PER_WORD-1);
printk("%sForcing size word alignment - %s/n", func_nm, name);
}

/*-- align描述对象的对齐方式,至少是字对齐的,也可根据用户的需要设成L1 CACHE对齐的 */

align = BYTES_PER_WORD;
if (flags & SLAB_HWCACHE_ALIGN)
align = L1_CACHE_BYTES;

/* Determine if the slab management is 'on' or 'off' slab. */

/*-- 考虑slab管理部分是放在slab里面还是外面,如果slab对象太大,比如说占了页面的1/3, 那么
   如果把管理部分放在slab里面的话,一个slab只能放2个对象,而管理部分占的又很少,这样,
   浪费了大量的空间, 所以放在外面,下面的表达式的没有去证明是最优的 */

if (size >= (PAGE_SIZE>>3))
/*
* Size is large, assume best to place the slab management obj
* off-slab (should allow better packing of objs).
*/
flags |= CFLGS_OFF_SLAB;

/*-- 如果要求是CACHE对齐的,而对象很小,这样会很浪费空间,所以考虑CACHE一半对齐,
   这样CACHE一个线上可以传输两个对象 */ 

if (flags & SLAB_HWCACHE_ALIGN) {
/* Need to adjust size so that objs are cache aligned. */
/* Small obj size, can get at least two per cache line. */
/* FIXME: only power of 2 supported, was better */
while (size < align/2)
align /= 2;
size = (size+align-1)&(~(align-1));
}

/* Cal size (in pages) of slabs, and the num of objs per slab.
* This could be made much more intelligent. For now, try to avoid
* using high page-orders for slabs. When the gfp() funcs are more
* friendly towards high-order requests, this should be changed.
*/

/*-- 用逐渐增大的页面指数来试,直到这些页面能容纳至少一个对象, 并且满足一些条件 */

do {
unsigned int break_flag = 0;
cal_wastage:
kmem_cache_estimate(cachep->gfporder, size, flags,
&left_over, &cachep->num);
if (break_flag)
break;

/*-- 超过系统能提供的最大值,报错(当然可以修改MAX_GFP_ORDER使得它正确) */

if (cachep->gfporder >= MAX_GFP_ORDER)
break;

/*-- 2^ gfporder个页面还不足以容纳一个对象,那么再试一下2^(gforder + 1 ) */

if (!cachep->num)
goto next;

/*?? */

if (flags & CFLGS_OFF_SLAB && cachep->num > offslab_limit) {
/* Oops, this num of objs will cause problems. */
cachep->gfporder--;
break_flag++;
goto cal_wastage;
}

/*
* Large num of objs is good, but v. large slabs are currently
* bad for the gfp()s.
*/
if (cachep->gfporder >= slab_break_gfp_order)
break;

/* left_over不能太大 */

if ((left_over*8) <= (PAGE_SIZE<<cachep->gfporder))
break; /* Acceptable internal fragmentation. */
next:
cachep->gfporder++;
} while (1);

/*-- 如果对象太大,以至于系统无法在一个slab中容纳一个对象,报错 */

if (!cachep->num) {
printk("kmem_cache_create: couldn't create cache %s./n", name);
kmem_cache_free(&cache_cache, cachep);
cachep = NULL;
goto opps;
}

/*-- slab_size指的是slab管理部分的大小 */

slab_size = L1_CACHE_ALIGN(cachep->num*sizeof(kmem_bufctl_t)+sizeof(slab_t));

/*
* If the slab has been placed off-slab, and we have enough space then
* move it on-slab. This is at the expense of any extra colouring.
*/

/*-- 如果用户要求将slab管理部分放在slab外面,但是如果slab中的空余部分大于slab管理部分,
   还是将管理部分放在slab里面, 这样效率更高,空间使用率也更高 */

if (flags & CFLGS_OFF_SLAB && left_over >= slab_size) {
flags &= ~CFLGS_OFF_SLAB;
left_over -= slab_size;
}

/*-- offset就是colour的粒度,它的对齐方式和对象是一样的 */

/* Offset must be a mult iple of the alignment. */
offset += (align-1);
offset &= ~(align-1);
if (!offset)
offset = L1_CACHE_BYTES;

/*-- 计算colour粒度,以及范围 */

cachep->colour_off = offset;
cachep->colour = left_over/offset;

/* init remaining fields */
/*虽然说明了slab只有一个页面大小和管理部分在里面的情况是最优的,但是又有什么用呢? */

if (!cachep->gfporder && !(flags & CFLGS_OFF_SLAB))
flags |= CFLGS_OPTIMIZE;

cachep->flags = flags;
cachep->gfpflags = 0;
if (flags & SLAB_CACHE_DMA)
cachep->gfpflags |= GFP_DMA;
spin_lock_init(&cachep->spinlock);
cachep->objsize = size;
INIT_LIST_HEAD(&cachep->slabs);

/*-- 这时候cache还没有slab,因而也就没有不满的slab, 所以指向不满的slab的指针指向
cachep->slabs, 表明没有不满的slab */

cachep->firstnotfull = &cachep->slabs;

/*-- 如果cache的slab中管理部分是放在外面的话,那么这些管理部分放在通用cache中,因而事先为这些管理部分找好
   cache */

if (flags & CFLGS_OFF_SLAB)
cachep->slabp_cache = kmem_find_general_cachep(slab_size,0);

/*-- 将用户提供的构造器和析构器赋予cache */

cachep->ctor = ctor;
cachep->dtor = dtor;
/* Copy name over so we don't have problems with unloaded modules */
strcpy(cachep->name, name);

#ifdef CONFIG_SMP
if (g_cpucache_up)
enable_cpucache(cachep);
#endif
/* Need the semaphore to access the chain. */

/*-- 不太清楚是干什么用的 */

down(&cache_chain_sem);
{
struct list_head *p;

list_for_each(p, &cache_chain) {
kmem_cache_t *pc = list_entry(p, kmem_cache_t, next);

/* The name field is constant - no lock needed. */
if (!strcmp(pc->name, name))
BUG();
}
}

/* There is no reason to lock our new cache before we
* link it in - no one knows about it yet...
*/
list_add(&cachep->next, &cache_chain);
up(&cache_chain_sem);
opps:
return cachep;
}

/*
* This check if the kmem_cache_t pointer is chained in the cache_cache
* list. -arca
*/
static int is_chained_kmem_cache(kmem_cache_t * cachep)
{
struct list_head *p;
int ret = 0;

/* Find the cache in the chain of caches. */
down(&cache_chain_sem);
list_for_each(p, &cache_chain) {
if (p == &cachep->next) {
ret = 1;
break;
}
}
up(&cache_chain_sem);

return ret;
}

#define drain_cpu_caches(cachep) do { } while (0)

/* 回收一个cache中的空的slab , 返回0表明没有全部回收 */

static int __kmem_cache_shrink(kmem_cache_t *cachep)
{
slab_t *slabp;
int ret;

drain_cpu_caches(cachep);

spin_lock_irq(&cachep->spinlock);

/* If the cache is growing, stop shrinking. */
while (!cachep->growing) {
struct list_head *p;

/*-- 从最后一个slab开始,因为它最可能是全空的 */

p = cachep->slabs.prev;
if (p == &cachep->slabs)
break;

slabp = list_entry(cachep->slabs.prev, slab_t, list);

/*-- 如果该slab中还有对象被使用,则不能释放这个slab */

if (slabp->inuse)
break;

/*-- 从队列中剔除, 如果它恰好是cache指向第一个非满的slab,
删除它后, cache中将没有非满的slab了 */

list_del(&slabp->list);
if (cachep->firstnotfull == &slabp->list)
cachep->firstnotfull = &cachep->slabs;

/*-- 释放该slab */

spin_unlock_irq(&cachep->spinlock);
kmem_slab_destroy(cachep, slabp);
spin_lock_irq(&cachep->spinlock);
}
ret = !list_empty(&cachep->slabs);
spin_unlock_irq(&cachep->spinlock);
return ret;
}

/**
* kmem_cache_shrink - Shrink a cache.
* @cachep: The cache to shrink.
*
* Releases as many slabs as possible for a cache.
* To help debugging, a zero exit status indicates all slabs were released.
*/
int kmem_cache_shrink(kmem_cache_t *cachep)
{
if (!cachep || in_interrupt() || !is_chained_kmem_cache(cachep))
BUG();

return __kmem_cache_shrink(cachep);
}

/**
* kmem_cache_destroy - delete a cache
* @cachep: the cache to destroy
*
* Remove a kmem_cache_t object from the slab cache.
* Returns 0 on success.
*
* It is expected this function will be called by a module when it is
* unloaded. This will remove the cache completely, and avoid a duplicate
* cache being allocated each time a module is loaded and unloaded, if the
* module doesn't have persistent in-kernel storage across loads and unloads.
*
* The caller must guarantee that noone will allocate memory from the cache
* during the kmem_cache_destroy().
*/

/*-- 删除一个cache的下一个cache */

int kmem_cache_destroy (kmem_cache_t * cachep)
{
if (!cachep || in_interrupt() || cachep->growing)
BUG();

/* Find the cache in the chain of caches. */
down(&cache_chain_sem);
/* the chain is never empty, cache_cache is never destroyed */

/*-- ?? 不知道干什么 */
if (clock_searchp == cachep)
clock_searchp = list_entry(cachep->next.next,
kmem_cache_t, next);
list_del(&cachep->next);
up(&cache_chain_sem);

/*-- 试图回收cache中的所有slab */

if (__kmem_cache_shrink(cachep)) {
printk(KERN_ERR "kmem_cache_destroy: Can't free all objects %p/n",
cachep);
down(&cache_chain_sem);

/*-- 没有回收全部的话,必须将这个cache加回队列中去 */

list_add(&cachep->next,&cache_chain);
up(&cache_chain_sem);

return 1;
}

/*-- 从cache_cache 中将cache描述符释放 */

kmem_cache_free(&cache_cache, cachep);

return 0;
}

/* Get the memory for a slab management obj. */
/*-- 设置slab 管理部分 */

static inline slab_t * kmem_cache_slabmgmt (kmem_cache_t *cachep,
void *objp, int colour_off, int local_flags)
{
slab_t *slabp;

/*-- 如果slab管理部分放在外面,则放在通用cache中, 具体哪个 cache已经在初始化的时候指定了 */

if (OFF_SLAB(cachep)) {
/* Slab management obj is off-slab. */
slabp = kmem_cache_alloc(cachep->slabp_cache, local_flags);
if (!slabp)
return NULL;
} else {
/* FIXME: change to
slabp = objp
* if you enable OPTIMIZE
*/

/*-- slab次序为 colour slab管理部分 对象序列 */

slabp = objp+colour_off;
colour_off += L1_CACHE_ALIGN(cachep->num *
sizeof(kmem_bufctl_t) + sizeof(slab_t));
}
slabp->inuse = 0;
slabp->colouroff = colour_off;
slabp->s_mem = objp+colour_off;

return slabp;
}

/*-- 当cache分配一个新的slab后,就对其中的每个对象初始化 */


static inline void kmem_cache_init_objs (kmem_cache_t * cachep,
slab_t * slabp, unsigned long ctor_flags)
{
int i;

for (i = 0; i < cachep->num; i++) {
void* objp = slabp->s_mem+cachep->objsize*i;

/*
* Constructors are not allowed to allocate memory from
* the same cache which they are a constructor for.
* Otherwise, deadlock. They must also be threaded.
*/
if (cachep->ctor)
cachep->ctor(objp, cachep, ctor_flags);

/*-- 因为该slab中每个对象都是空的,所以将所有指针项串起来 */

slab_bufctl(slabp) = i+1;
}

slab_bufctl(slabp)[i-1] = BUFCTL_END;
slabp->free = 0;
}

/*
* Grow (by 1) the number of slabs within a cache. This is called by
* kmem_cache_alloc() when there are no active objs left in a cache.
*/

/*-- 为一个cache再分配一个slab
当用户要求cache分配一个对象(kmem_cache_alloc),而cache在它原有的slab中
找不到空的对象时,就调用这个函数 */

static int kmem_cache_grow (kmem_cache_t * cachep, int flags)
{
slab_t *slabp;
struct page *page;
void *objp;
size_t offset;
unsigned int i, local_flags;
unsigned long ctor_flags;
unsigned long save_flags;

/* Be lazy and only check for valid flags here,
* keeping it out of the critical path in kmem_cache_alloc().
*/
if (flags & ~(SLAB_DMA|SLAB_LEVEL_MASK|SLAB_NO_GROW))
BUG();
if (flags & SLAB_NO_GROW)
return 0;

/*
* The test for missing atomic flag is performed here, rather than
* the more obvious place, simply to reduce the critical path length
* in kmem_cache_alloc(). If a caller is seriously mis-behaving they
* will eventually be caught here (where it matters).
*/
if (in_interrupt() && (flags & SLAB_LEVEL_MASK) != SLAB_ATOMIC)
BUG();

ctor_flags = SLAB_CTOR_CONSTRUCTOR;
local_flags = (flags & SLAB_LEVEL_MASK);
if (local_flags == SLAB_ATOMIC)
/*
* Not allowed to sleep. Need to tell a constructor about
* this - it might need to know...
*/
ctor_flags |= SLAB_CTOR_ATOMIC;

/* About to mess with non-constant members - lock. */
spin_lock_irqsave(&cachep->spinlock, save_flags);

/* Get colour for the slab, and cal the next value. */
/*-- 这个colour的设置是循环的 */

offset = cachep->colour_next;
cachep->colour_next++;
if (cachep->colour_next >= cachep->colour)
cachep->colour_next = 0;



offset *= cachep->colour_off;
cachep->dflags |= DFLGS_GROWN;

/*-- 在这段代码中是不允许其它线程用shrink或reap释放这个cache中的slab的,否则会出现不一致的错误 */

cachep->growing++;
spin_unlock_irqrestore(&cachep->spinlock, save_flags);

/* A series of memory allocations for a new slab.
* Neither the cache-chain semaphore, or cache-lock, are
* held, but the incrementing c_growing prevents this
* cache from being reaped or shrunk.
* Note: The cache could be selected in for reaping in
* kmem_cache_reap(), but when the final test is made the
* growing value will be seen.
*/

/* Get mem for the objs. */
/*-- cache分配一个slab的页面 */

if (!(objp = kmem_getpages(cachep, flags)))
goto failed;

/* Get slab management. */
/*-- 分配和初始化slab管理部分 */

if (!(slabp = kmem_cache_slabmgmt(cachep, objp, offset, local_flags)))
goto opps1;

/* Nasty!!!!!! I hope this is OK. */
i = 1 << cachep->gfporder;
page = virt_to_page(objp);

/*-- 借用obj所在的page中暂时空闲的两个指针来记录obj所在的cache和slab, 可以在释放该obj的时候找到它所在的cache和slab */

do {
SET_PAGE_CACHE(page, cachep);
SET_PAGE_SLAB(page, slabp);
PageSetSlab(page);
page++;
} while (--i);

/*-- 初始化这个slab中所有的对象 */

kmem_cache_init_objs(cachep, slabp, ctor_flags);

spin_lock_irqsave(&cachep->spinlock, save_flags);
cachep->growing--;

/* Make slab active. */

/*-- 将这个slab加到队列尾部 */

list_add_tail(&slabp->list,&cachep->slabs);

/*-- 如果这个cache还没有不满的slab,现在有了 */

if (cachep->firstnotfull == &cachep->slabs)
cachep->firstnotfull = &slabp->list;
STATS_INC_GROWN(cachep);
cachep->failures = 0;

spin_unlock_irqrestore(&cachep->spinlock, save_flags);
return 1;
opps1:
kmem_freepages(cachep, objp);
failed:
spin_lock_irqsave(&cachep->spinlock, save_flags);
cachep->growing--;
spin_unlock_irqrestore(&cachep->spinlock, save_flags);
return 0;
}

/*
* Perform extra freeing checks:
* - detect double free
* - detect bad pointers.
* Called with the cache-lock held.
*/


static inline void kmem_cache_alloc_head(kmem_cache_t *cachep, int flags)
{
}

static inline void * kmem_cache_alloc_one_tail (kmem_cache_t *cachep,
slab_t *slabp)
{
void *objp;

STATS_INC_ALLOCED(cachep);
STATS_INC_ACTIVE(cachep);
STATS_SET_HIGH(cachep);

/* get obj pointer */
slabp->inuse++;

/*-- slab 指向空对象的队列中找一个对象(实际上就是第一个) */

objp = slabp->s_mem + slabp->free*cachep->objsize;

/*-- 将队列头指向下一个空对象的指针项 */

slabp->free=slab_bufctl(slabp)[slabp->free];


/*-- 如果该slab中没有空对象了,则需要改变 cache的指向不满slab的指针 */

if (slabp->free == BUFCTL_END)
/* slab now full: move to next slab for next alloc */
cachep->firstnotfull = slabp->list.next;
return objp;
}

/*
* Returns a ptr to an obj in the given cache.
* caller must guarantee synchronization
* #define for the goto optimization 8-)
*/

/*-- 在该cache中分配一个对象 */

#define kmem_cache_alloc_one(cachep) /
({ /
slab_t *slabp; /
/
/* Get slab alloc is to come from. */ /
{
/*-- 如果cache中有不满的slab, 则在该slab中分配对象, 否则需要分配和初始化一个新的slab *//
/
struct list_head* p = cachep->firstnotfull; /
if (p == &cachep->slabs) /
goto alloc_new_slab; /
slabp = list_entry(p,slab_t, list); /
} /

/*-- 在slab中分配一个对象 */

kmem_cache_alloc_one_tail(cachep, slabp); /
})




static inline void * __kmem_cache_alloc (kmem_cache_t *cachep, int flags)
{
unsigned long save_flags;
void* objp;

kmem_cache_alloc_head(cachep, flags);
try_again:
local_irq_save(save_flags);
#ifdef CONFIG_SMP
{
cpucache_t *cc = cc_data(cachep);

if (cc) {
if (cc->avail) {
STATS_INC_ALLOCHIT(cachep);
objp = cc_entry(cc)[--cc->avail];
} else {
STATS_INC_ALLOCMISS(cachep);
objp = kmem_cache_alloc_batch(cachep,flags);
if (!objp)
goto alloc_new_slab_nolock;
}
} else {
spin_lock(&cachep->spinlock);
objp = kmem_cache_alloc_one(cachep);
spin_unlock(&cachep->spinlock);
}
}
#else
objp = kmem_cache_alloc_one(cachep);
#endif
local_irq_restore(save_flags);
return objp;
alloc_new_slab:
#ifdef CONFIG_SMP
spin_unlock(&cachep->spinlock);
alloc_new_slab_nolock:
#endif
local_irq_restore(save_flags);
if (kmem_cache_grow(cachep, flags))
/* Someone may have stolen our objs. Doesn't matter, we'll
* just come back here again.
*/
goto try_again;
return NULL;
}

/*
* Release an obj back to its cache. If the obj has a constructed
* state, it should be in this state _before_ it is released.
* - caller is responsible for the synchronization
*/

#define CHECK_PAGE(pg) do { } while (0)

/*-- 在指定cache中释放对象 */

static inline void kmem_cache_free_one(kmem_cache_t *cachep, void *objp)
{
slab_t* slabp;

CHECK_PAGE(virt_to_page(objp));
/* reduces memory footprint
*
if (OPTIMIZE(cachep))
slabp = (void*)((unsigned long)objp&(~(PAGE_SIZE-1)));
else
*/

/*-- 获得对象所在的slab */

slabp = GET_PAGE_SLAB(virt_to_page(objp));


/*-- 修改该slab中的空对象指针队列,将该对象剔除 */
{
unsigned int objnr = (objp-slabp->s_mem)/cachep->objsize;

slab_bufctl(slabp)[objnr] = slabp->free;
slabp->free = objnr;
}
STATS_DEC_ACTIVE(cachep);

/* fixup slab chain */

/*-- 如果该slab原来是满的,那么将它设为不满 */

if (slabp->inuse-- == cachep->num)
goto moveslab_partial;

/*-- 如果slab现在是空的, 那么将它移到slab队列尾部 */

if (!slabp->inuse)
goto moveslab_free;
return;

moveslab_partial:
/* was full.
* Even if the page is now empty, we can set c_firstnotfull to
* slabp: there are no partial slabs in this case
*/
{
struct list_head *t = cachep->firstnotfull;

/*-- 将cache的不满slab指针指向它 */

cachep->firstnotfull = &slabp->list;

/*-- 如果很巧的是,这个slab原来在第一个不满的slab前面,那么现在
正好它正好成为不满slab的第一个, 可以返回了 */
if (slabp->list.next == t)
return;

/*-- 否则的话, 将它移动到原来在第一个不满的slab前面*/

list_del(&slabp->list);
list_add_tail(&slabp->list, t);
return;
}
moveslab_free:
/*
* was partial, now empty.
* c_firstnotfull might point to slabp
* FIXME: optimize
*/
{
struct list_head *t = cachep->firstnotfull->prev;

/*-- 将它从队列中移动到尾部 */

list_del(&slabp->list);
list_add_tail(&slabp->list, &cachep->slabs);

/*-- 如果原来cache不满slab指针指向它,那么现在指向下一个 */

if (cachep->firstnotfull == &slabp->list)
cachep->firstnotfull = t->next;
return;
}
}

/*
* __kmem_cache_free
* called with disabled ints
*/
static inline void __kmem_cache_free (kmem_cache_t *cachep, void* objp)
{
kmem_cache_free_one(cachep, objp);
#endif
}

/**
* kmem_cache_alloc - Allocate an object
* @cachep: The cache to allocate from.
* @flags: See kmalloc().
*
* Allocate an object from this cache. The flags are only relevant
* if the cache has no available objects.
*/
void * kmem_cache_alloc (kmem_cache_t *cachep, int flags)
{
return __kmem_cache_alloc(cachep, flags);
}

/**
* kmalloc - allocate memory
* @size: how many bytes of memory are required.
* @flags: the type of memory to allocate.
*
* kmalloc is the normal method of allocating memory
* in the kernel. The @flags argument may be one of:
*
* %GFP_BUFFER - XXX
*
* %GFP_ATOMIC - allocation will not sleep. Use inside interrupt handlers.
*
* %GFP_USER - allocate memory on behalf of user. May sleep.
*
* %GFP_KERNEL - allocate normal kernel ram. May sleep.
*
* %GFP_NFS - has a slightly lower probability of sleeping than %GFP_KERNEL.
* Don't use unless you're in the NFS code.
*
* %GFP_KSWAPD - Don't use unless you're modifying kswapd.
*/

/*-- 在通用 cache中分配一个尺寸为size的对象, 这个用户可以直接调用 */

void * kmalloc (size_t size, int flags)
{
cache_sizes_t *csizep = cache_sizes;

for (; csizep->cs_size; csizep++) {
if (size > csizep->cs_size)
continue;
return __kmem_cache_alloc(flags & GFP_DMA ?
csizep->cs_dmacachep : csizep->cs_cachep, flags);
}
BUG(); // too big size
return NULL;
}

/**
* kmem_cache_free - Deallocate an object
* @cachep: The cache the allocation was from.
* @objp: The previously allocated object.
*
* Free an object which was previously allocated from this
* cache.
*/

/*-- 在指定cache中释放对象 */

void kmem_cache_free (kmem_cache_t *cachep, void *objp)
{
unsigned long flags;

local_irq_save(flags);
__kmem_cache_free(cachep, objp);
local_irq_restore(flags);
}

/**
* kfree - free previously allocated memory
* @objp: pointer returned by kmalloc.
*
* Don't free memory not originally allocated by kmalloc()
* or you will run into trouble.
*/
void kfree (const void *objp)
{
kmem_cache_t *c;
unsigned long flags;

if (!objp)
return;
local_irq_save(flags);
CHECK_PAGE(virt_to_page(objp));
c = GET_PAGE_CACHE(virt_to_page(objp));
__kmem_cache_free(c, (void*)objp);
local_irq_restore(flags);
}

/*-- 在通用cache中找一个适合size的cache, 它的DMA属性和gfpflags 是一致的 */

kmem_cache_t * kmem_find_general_cachep (size_t size, int gfpflags)
{
cache_sizes_t *csizep = cache_sizes;

/* This function could be moved to the header file, and
* made inline so consumers can quickly determine what
* cache pointer they require.
*/
for ( ; csizep->cs_size; csizep++) {
if (size > csizep->cs_size)
continue;
break;
}
return (gfpflags & GFP_DMA) ? csizep->cs_dmacachep : csizep->cs_cachep;
}


/**
* kmem_cache_reap - Reclaim memory from caches.
* @gfp_mask: the type of memory required.
*
* Called from try_to_free_page().
*/




void kmem_cache_reap (int gfp_mask)
{
slab_t *slabp;
kmem_cache_t *searchp;
kmem_cache_t *best_cachep;
unsigned int best_pages;
unsigned int best_len;
unsigned int scan;

if (gfp_mask & __GFP_WAIT)
down(&cache_chain_sem);
else
if (down_trylock(&cache_chain_sem))
return;

scan = REAP_SCANLEN;
best_len = 0;
best_pages = 0;
best_cachep = NULL;
searchp = clock_searchp;

/*-- 遍历几乎所有的cache, 然后(一般情况下)挑出一个含空的页最多的cache
为了公平起见, 如果这次从某个cache收割( reap ), 下次就从它的下一个开始比较,
这个被收割的cache的下一个cache由clock_searchp来记录 */

do {
unsigned int pages;
struct list_head* p;
unsigned int full_free;

/* It's safe to test this without holding the cache-lock. */
if (searchp->flags & SLAB_NO_REAP)
goto next;
spin_lock_irq(&searchp->spinlock);
if (searchp->growing)
goto next_unlock;
if (searchp->dflags & DFLGS_GROWN) {
searchp->dflags &= ~DFLGS_GROWN;
goto next_unlock;
}

full_free = 0;

/*-- 从slab队列最后一个slab开始向前找全空的slab, 统计全空的slab的个数 */

p = searchp->slabs.prev;
while (p != &searchp->slabs) {
slabp = list_entry(p, slab_t, list);
if (slabp->inuse)
break;
full_free++;
p = p->prev;
}

/*
* Try to avoid slabs with constructors and/or
* more than one page per slab (as it can be difficult
* to get high orders from gfp()).
*/

/*-- 计算出可以空出来的页数 */

pages = full_free * (1<<searchp->gfporder);

/*-- ?? 可能是降低pages 的权值, 原因见上面的注释,但是仍然不太清楚 */

if (searchp->ctor)
pages = (pages*4+1)/5;
if (searchp->gfporder)
pages = (pages*4+1)/5;


/*-- 比较, 获得当前的最佳候选 , 如果能释放的slab已经足够多了, 那么就
释放它,而不再继续比较下去了 why ??? 难道是避免再次遇到上次的cache?*/

if (pages > best_pages) {
best_cachep = searchp;
best_len = full_free;
best_pages = pages;
if (full_free >= REAP_PERFECT) {
clock_searchp = list_entry(searchp->next.next,
kmem_cache_t,next);
goto perfect;
}
}
next_unlock:
spin_unlock_irq(&searchp->spinlock);
next:
searchp = list_entry(searchp->next.next,kmem_cache_t,next);
} while (--scan && searchp != clock_searchp);

clock_searchp = searchp;

/*-- 没有一个cache可以被收割, 只能返回 */

if (!best_cachep)
/* couldn't find anything to reap */
goto out;

spin_lock_irq(&best_cachep->spinlock);
perfect:
/* free only 80% of the free slabs */
/*-- 不打算将这个cache的空的slab全部收割 */

best_len = (best_len*4 + 1)/5;


for (scan = 0; scan < best_len; scan++) {
struct list_head *p;

/*-- 如果这时有用户在这个cache中分配对象,而slab已经不够了,那么停止收割 */

if (best_cachep->growing)
break;

/*--从slab队列最后一个开始*/

p = best_cachep->slabs.prev;

/*-- 已经收割完了,那么退出
   我理解:如果在单线程的情况下,这个判断不需要,因为best_len已经做了保证
       如果多线程,可能其它用户使用shrink等(reap??)*/  
if (p == &best_cachep->slabs)
break;

slabp = list_entry(p,slab_t,list);

/*-- 如果已经遇到了不全空的slab, 退出 
   注释同上 */

if (slabp->inuse)
break;

/*-- 从队列中剔除 */

list_del(&slabp->list);
if (best_cachep->firstnotfull == &slabp->list)
best_cachep->firstnotfull = &best_cachep->slabs;
STATS_INC_REAPED(best_cachep);

/* Safe to drop the lock. The slab is no longer linked to the
* cache.
*/
spin_unlock_irq(&best_cachep->spinlock);

/*-- 把该slab释放掉 */

kmem_slab_destroy(best_cachep, slabp);
spin_lock_irq(&best_cachep->spinlock);
}
spin_unlock_irq(&best_cachep->spinlock);
out:
up(&cache_chain_sem);
return;
}

原创粉丝点击