跟我走吧:Linux kernel内存管理之--初识庐山,不见真面目(一)

来源:互联网 发布:淘宝店铺成功案例 编辑:程序博客网 时间:2024/04/28 22:34

众所周知,内存管理这个玩意在kernel里面的地位就相当于灭绝师太在峨眉山的地位,灭绝不倒,峨眉威武,后来据说灭绝挂了,所以现在的峨眉全是俺们男人的天下,连掌门都是男淫,罪过罪过。

内存管理复杂与重要性自然不言而喻,而且这部分的工作需要处理器和内核之间完美的协作。

内存首先是一个基于硬件的概念,常用的内存包括:RAM,ROM,CACHE以及一些他们的变种。以RAM为例,linux中,RAM的一部分被固定的永久的分配给了kernel,用于存放kernel code以及一些静态内核数据结构。那RAM剩下的一部分,就可以作为一般的内存使用,通常称为动态内存,这部分是所有内核进程、用户进程的宝贵资源,同时也是kernel本身必不可少的资源。内存管理说到底就是“需要时分配,不需要时释放”。简简单单十来个字,灵活运用的难度不亚于学习葵花宝典,但是学会了,可比葵花宝典的效果惊人,不信你学会了看看。

随着四个现代化的慢慢实现,linux的内存也越来越大,自然而然,内存就需要分成很小的单元,然后再一并管理,没错,就是这么干的,linux中,这个很小的单元被称作“page”,一个page的大小固定为4KB,当然也有一些其他的大小,暂不予理会,任督二脉一通,铁道部长都拦不住你。许多page通过链表的形式组织起来,最终将整个内存连成一条无比巨大而又复杂的链。现在,内存的管理就转移到page的管理。

还记得我们上学的时候,每个班主任会有一本花名册,这个花名册上会记录我们每个人的一些信息,如姓名,性别,出生年月日,家庭住址,婚姻状况(汗,这个可以没有)等 等等 等等。。。在内核里面,page就像我们一个一个的学生,每个page会有一个描述符,叫页描述符,然后所有的页描述符存放在一个叫men_map的数组中,这个men_map就是那本花名册,学生越多,花名册越厚,page越多,men_map越大,但一般略小于整个RAM的%1,你会觉得这个挺大的,但是相对于内存管理的复杂性,这个结构大的一点都不冤枉。

那么页描述符里一般都包含哪些信息呢?当然都是一些大家都关心的内容:如,该页是属于进程、内核代码或者内核静态数据,对于动态内存来说,这个页是不是空闲,是否可以用于动态分配等等。类似于花名册,老师可以通过学号找到我们的名字,也可以通过家庭地址找到我们的名字一样,kernel提供了两个宏,virt_to_page(addr),通过线性地址addr产生对应的页描述符地址,pfn_to_page(pfn),产生于页对应的页描述符之地。得到页描述符之后,我们接下来就应该将这个著名的数据结构了:

该数据结构位于:include\linux

/*
 * Each physical page in the system has a struct page associated with
 * it to keep track of whatever it is we are using the page for at the
 * moment. Note that we have no way to track which tasks are using
 * a page, though if it is a pagecache page, rmap structures can tell us
 * who is mapping it.
 */
struct page {
unsigned long flags;/* Atomic flags, some possibly
* updated asynchronously */
atomic_t _count;/* Usage count, see below. */
union {
atomic_t _mapcount;/* Count of ptes mapped in mms,
* to show when page is mapped
* & limit reverse map searches.
*/
struct { /* SLUB uses */
short unsigned int inuse;
short unsigned int offset;
};
};
union {
   struct {
unsigned long private;/* Mapping-private opaque data:
* usually used for buffer_heads
* if PagePrivate set; used for
* swp_entry_t if PageSwapCache;
* indicates order in the buddy
* system if PG_buddy is set.
*/
struct address_space *mapping;/* If low bit clear, points to
* inode address_space, or NULL.
* If page mapped as anonymous
* memory, low bit is set, and
* it points to anon_vma object:
* see PAGE_MAPPING_ANON below.
*/
   };
#if NR_CPUS >= CONFIG_SPLIT_PTLOCK_CPUS
   spinlock_t ptl;
#endif
   struct { /* SLUB uses */
    void **lockless_freelist;
struct kmem_cache *slab;/* Pointer to slab */
   };
   struct {
struct page *first_page;/* Compound pages */
   };
};
union {
pgoff_t index;/* Our offset within mapping. */
void *freelist;/* SLUB: freelist req. slab lock */
};
struct list_head lru;/* Pageout list, eg. active_list
* protected by zone->lru_lock !
*/
/*
* On machines where all RAM is mapped into kernel address space,
* we can simply calculate the virtual address. On machines with
* highmem some memory is mapped into kernel virtual memory
* dynamically, so we need a place to store that address.
* Note that this field could be 16 bits on x86 ... ;)
*
* Architectures with slow multiplication can define
* WANT_PAGE_VIRTUAL in asm/page.h
*/
#if defined(WANT_PAGE_VIRTUAL)
void *virtual;/* Kernel virtual address (NULL if
  not kmapped, ie. highmem) */
#endif /* WANT_PAGE_VIRTUAL */
};

我滴妈呀,这么长,这么复杂的结构,要我命啊,但是不要紧,我们现在暂时需要关注的只有几个字段,以后我们碰到之后会慢慢聊,细细聊。

flags:

这是一个包含32个用来描述各种状态的标志,这些标志与体系结构无关,用于表述页的属性。改flags为unsigned long类型,32位,刚好一个bit位对应一个flag,当然,现在不一定就有32个flag可以用。kernel定义了一些相应的位操作宏:PageXXX返回标志,SetPageXXX设置相应位,ClearPageXXX清除相应位。例如:PageDirty会检查PG_dirty位,而PageActive会检查PG_active位等。更多的关于该flag的一些操作宏多定义在include\linux\Page-flags.h中,感兴趣的可以自己查看。

这些flag都是以宏的形式存在,include\linuxPage-flags.h

/*
 * Various page->flags bits:
 *
 * PG_reserved is set for special pages, which can never be swapped out. Some
 * of them might not even exist (eg empty_bad_page)...
 *
 * The PG_private bitflag is set on pagecache pages if they contain filesystem
 * specific data (which is normally at page->private). It can be used by
 * private allocations for its own usage.
 *
 * During initiation of disk I/O, PG_locked is set. This bit is set before I/O
 * and cleared when writeback _starts_ or when read _completes_. PG_writeback
 * is set before writeback starts and cleared when it finishes.
 *
 * PG_locked also pins a page in pagecache, and blocks truncation of the file
 * while it is held.
 *
 * page_waitqueue(page) is a wait queue of all tasks waiting for the page
 * to become unlocked.
 *
 * PG_uptodate tells whether the page's contents is valid.  When a read
 * completes, the page becomes uptodate, unless a disk I/O error happened.
 *
 * PG_referenced, PG_reclaim are used for page reclaim for anonymous and
 * file-backed pagecache (see mm/vmscan.c).
 *
 * PG_error is set to indicate that an I/O error occurred on this page.
 *
 * PG_arch_1 is an architecture specific page state bit.  The generic code
 * guarantees that this bit is cleared for a page when it first is entered into
 * the page cache.
 *
 * PG_highmem pages are not permanently mapped into the kernel virtual address
 * space, they need to be kmapped separately for doing IO on the pages.  The
 * struct page (these bits with information) are always mapped into kernel
 * address space...
 *
 * PG_buddy is set to indicate that the page is free and in the buddy system
 * (see mm/page_alloc.c).
 *
 */


/*
 * Don't use the *_dontuse flags.  Use the macros.  Otherwise you'll break
 * locked- and dirty-page accounting.
 *
 * The page flags field is split into two parts, the main flags area
 * which extends from the low bits upwards, and the fields area which
 * extends from the high bits downwards.
 *
 *  | FIELD | ... | FLAGS |
 *  N-1     ^             0
 *          (N-FLAGS_RESERVED)
 *
 * The fields area is reserved for fields mapping zone, node and SPARSEMEM
 * section.  The boundry between these two areas is defined by
 * FLAGS_RESERVED which defines the width of the fields section
 * (see linux/mmzone.h).  New flags must _not_ overlap with this area.
 */
#define PG_locked 0/* Page is locked. Don't touch. */
#define PG_error 1
#define PG_referenced 2
#define PG_uptodate 3


#define PG_dirty 4
#define PG_lru 5
#define PG_active 6
#define PG_slab 7/* slab debug (Suparna wants this) */


#define PG_owner_priv_1 8/* Owner use. If pagecache, fs may use*/
#define PG_arch_1 9
#define PG_reserved 10
#define PG_private 11/* If pagecache, has fs-private data */


#define PG_writeback 12/* Page is under writeback */
#define PG_compound 14/* Part of a compound page */
#define PG_swapcache 15/* Swap page: swp_entry_t in private */


#define PG_mappedtodisk 16/* Has blocks allocated on-disk */
#define PG_reclaim 17/* To be reclaimed asap */
#define PG_buddy 19/* Page is free, on buddy lists */


/* PG_owner_priv_1 users should have descriptive aliases */
#define PG_checked PG_owner_priv_1 /* Used by some filesystems */


#if (BITS_PER_LONG > 32)
/*
 * 64-bit-only flags build down from bit 31
 *
 * 32 bit  -------------------------------| FIELDS |       FLAGS         |
 * 64 bit  |           FIELDS             | ??????         FLAGS         |
 *         63                            32                              0
 */
#define PG_uncached 31/* Page has been mapped as uncached */
#endif

这里我贴出来一段来自《Understanding the Linux Kernel,3 rd Edition》的英文描述,我想读者应该也看得懂,我就不翻译了,到具体用的时候在细看。

Flag名字                                                      含义

PG_locked        The page is locked; for instance, it is involved in a disk I/O operation.

PG_error         An I/O error occurred while transferring the page.

PG_referenced    The page has been recently accessed.

PG_uptodate      This flag is set after completing a read operation, unless a disk I/O error happened.

PG_dirty         The page has been modified

PG_lru           The page is in the active or inactive page list 

PG_active        The page is in the active page list 

PG_slab          The page frame is included in a slab

PG_highmem       The page frame belongs to the ZONE_HIGHMEM zone 

PG_checked       Used by some filesystems such as Ext2 and Ext3

PG_arch_1        Not used on the 80 x 86 architecture.

PG_reserved      The page frame is reserved for kernel code or is unusable.

PG_private       The private field of the page descriptor stores meaningful data.

PG_writeback     The page is being written to disk by means of thewritepage method 

PG_nosave        Used for system suspend/resume.

PG_compound      The page frame is handled through the extended paging mechanism 

PG_swapcache     The page belongs to the swap cache 

PG_mappedtodisk  All data in the page frame corresponds to blocks allocated on disk.

PG_reclaim       The page has been marked to be written to disk in order to reclaim memory.

PG_nosave_free   Used for system suspend/resume.

不要沉迷在一个如此恶心的flag里面,别忘了,我们是在看struct page这个结构

_count

    这个叫页的引用计数器,如果该字段为-1,表示该页空闲,可用于内核本身或者其他进程。如果该值>=0,则表示被分配给了一个或者多个进程,或者用于存放内核的一些静态数据结构。这时该页是不可以删除的。

mapping

    指定了页所在的地址空间。

现在我们暂时只需要直到这么多了,接触了一个非常牛逼的数据结构,内存管理至始至终都会使用这个结构,像幽灵一样,挥之不去,充斥着整个mm模块。

原创粉丝点击