Linux 的 Virtual Memory Areas(VMA):基本概念介紹

来源:互联网 发布:一年制硕士知乎 编辑:程序博客网 时间:2024/04/30 10:55

user process 角度來說明的話,VMA user process 裡一段 virtual address space 區塊;virtual address space 是連續的記憶體空間,當然 VMA 也會是連續的空間。VMA Linux 的主要好處是,可以記憶體的使用更有效率,並且更容易管理 user process address space

從另一個觀念來看,VMA 可以讓 Linux kernel process 的角度來管理 virtual address spaceProcess VMA 對映,可以由 /proc/<pid>/maps 檔案查詢;例如 pid 1init)的 VMA mapping 為:

$ cat /proc/1/maps
08048000-0804e000 r-xp 00000000 08:01 12118      /sbin/init
0804e000-08050000 rw-p 00005000 08:01 12118      /sbin/init
08050000-08054000 rwxp 00000000 00:00 0
40000000-40016000 r-xp 00000000 08:01 52297      /lib/ld-2.2.4.so
40016000-40017000 rw-p 00015000 08:01 52297      /lib/ld-2.2.4.so
40024000-40025000 rw-p 00000000 00:00 0
40025000-40157000 r-xp 00000000 08:01 58241      /lib/i686/libc-2.2.4.so
40157000-4015c000 rw-p 00131000 08:01 58241      /lib/i686/libc-2.2.4.so
4015c000-40160000 rw-p 00000000 00:00 0
bfffe000-c0000000 rwxp fffff000 00:00 0
 

列表中的欄位格式如下:

start-end perm offset major:minor inode image

Linux struct vm_area_struct 資料結構來紀錄每一「區塊」的 VMA 資訊(include/linux/mm.h):

struct vm_area_struct {
        struct mm_struct * vm_mm;
        unsigned long vm_start;
        unsigned long vm_end;

 

        struct vm_area_struct *vm_next;

 

        pgprot_t vm_page_prot;
        unsigned long vm_flags;

 

        rb_node_t vm_rb;

 

        struct vm_area_struct *vm_next_share;
        struct vm_area_struct **vm_pprev_share;

 

        struct vm_operations_struct * vm_ops;

 

        unsigned long vm_pgoff;

 

        struct file * vm_file;
        unsigned long vm_raend;
        void * vm_private_data;
};

struct vm_area_struct 裡有 3 個欄位,用來來維護 VMA 資料結構:

˙ unsigned long vm_start:記錄此 VMA 區塊的開始位址(start address)。
˙
unsigned long vm_end:記錄此 VMA 區塊的結束位址(end address)。
˙
struct vm_area_struct *vm_next:指向下一個 VMA 區塊結構的指標(Linux linked list 資料結構維護每一個 VMA 區塊)。

VMA 的實作主要是為了能更有效率地管理記憶體,並且是基於 paging 系統之上所發展出的;VMA 是比原始 paging 理論更高階的記憶體管理方法。

 

 

Process VMA 整體觀念

Memory Descriptor

Linux 的「Process Descriptor」資料結構為 struct task_structinclude/linux/sched.h)。Process descriptor 裡的 mm field 紀錄了 process VMA 資訊:

struct task_struct {
          ...
          struct mm_struct *mm;
          ...
}

struct mm_struct 即是 Linux 提供的「Memory Descriptor」資料結構,以下是 struct mm_struct 的原型宣告:

struct mm_struct {
          struct vm_area_struct * mmap;       /* list of VMAs */
          struct rb_root mm_rb;
          struct vm_area_struct * mmap_cache;      /* last find_vma result */
          unsigned long (*get_unmapped_area) (struct file *filp,
                                         unsigned long addr, unsigned long len,
                                         unsigned long pgoff, unsigned long flags);
          void (*unmap_area) (struct mm_struct *mm, unsigned long addr);
          unsigned long mmap_base;                 /* base of mmap area */
          unsigned long task_size;                 /* size of task vm space */
          unsigned long cached_hole_size;      /* if non-zero, the largest hole below free_area_cache */
          unsigned long free_area_cache; /* first hole of size cached_hole_size or larger */
          pgd_t * pgd;
          atomic_t mm_users;                       /* How many users with user space? */
          atomic_t mm_count;                       /* How many references to "struct mm_struct" (users count as 1) */
          int map_count;                                     /* number of VMAs */
          struct rw_semaphore mmap_sem;
          spinlock_t page_table_lock;              /* Protects page tables and some counters */

 

          struct list_head mmlist;                 /* List of maybe swapped mm's.  These are globally strung
                                                    * together off init_mm.mmlist, and are protected
                                                    * by mmlist_lock
                                                    */

 

          /* Special counters, in some configurations protected by the
           * page_table_lock, in other configurations by being atomic.
           */
          mm_counter_t _file_rss;
          mm_counter_t _anon_rss;

 

          unsigned long hiwater_rss;     /* High-watermark of RSS usage */
          unsigned long hiwater_vm;      /* High-water virtual memory usage */

 

          unsigned long total_vm, locked_vm, shared_vm, exec_vm;
          unsigned long stack_vm, reserved_vm, def_flags, nr_ptes;
          unsigned long start_code, end_code, start_data, end_data;
          unsigned long start_brk, brk, start_stack;
          unsigned long arg_start, arg_end, env_start, env_end;

 

          unsigned long saved_auxv[AT_VECTOR_SIZE]; /* for /proc/PID/auxv */

 

          unsigned dumpable:2;
          cpumask_t cpu_vm_mask;

 

          /* Architecture-specific MM context */
          mm_context_t context;

 

          /* Token based thrashing protection. */
          unsigned long swap_token_time;
          char recent_pagein;

 

          /* coredumping support */
          int core_waiters;
          struct completion *core_startup_done, core_done;

 

          /* aio bits */
          rwlock_t            ioctx_list_lock;
          struct kioctx                  *ioctx_list;
};

Memory descriptor 故名思義,是用來描述 process 記憶體資訊的資料結構。由 struct mm_struct 裡可以看到一個稱為 mmap fieldmmap data type struct vm_area_struct,這個資料結構即是我們在「Linux Virtual Memory AreasVMA):基本概念介紹」所介紹的 VMA 資料結構。

VMA ELF Image 的對映關係

在「Linux Virtual Memory AreasVMA):基本概念介」曾經介紹過,Process VMA 對映,可以由 /proc/<pid>/maps 檔案查詢;例如 pid 1init)的 VMA mapping 為:

$ cat /proc/1/maps
08048000-0804e000 r-xp 00000000 08:01 12118      /sbin/init
0804e000-08050000 rw-p 00005000 08:01 12118      /sbin/init
08050000-08054000 rwxp 00000000 00:00 0
40000000-40016000 r-xp 00000000 08:01 52297      /lib/ld-2.2.4.so
40016000-40017000 rw-p 00015000 08:01 52297      /lib/ld-2.2.4.so
40024000-40025000 rw-p 00000000 00:00 0
40025000-40157000 r-xp 00000000 08:01 58241      /lib/i686/libc-2.2.4.so
40157000-4015c000 rw-p 00131000 08:01 58241      /lib/i686/libc-2.2.4.so
4015c000-40160000 rw-p 00000000 00:00 0
bfffe000-c0000000 rwxp fffff000 00:00 0
 

列表結果便能用來說明 VMA ELF image 之間的關係。搭配上圖來說明列表結果的 VMA 對映關係,如下:

1.      1 列(row)是 ELF 執行檔(/sbin/init)的 code section VMA mapping

2.      2 列是 ELF 執行檔的 data section VMA mapping

3.      3 列是 ELF 執行檔的 .bss section VMA mapping

4.      4 列是 dynamic loader/lib/ld-2.2.4.so)的 code section VMA mapping

5.      5 列是 dynamic loader data section VMA mapping

6.      6 列是 dynamic loader .bss section VMA mapping

7.      7 列是 libc code section VMA mapping

8.      8 列是 libc data section VMA mapping

9.      9 列是 libc .bss section VMA mapping

另外,要留意的是,在文中所指的 code section data section 不見得就是 ELF .text section .data section;我們以 code section 來表示所有可執行的節區,以 data section 來表示包含資料的節區。

在整個 VMA 的討論過程中,我們只針對 code section data section 做討論(如圖),至於 .bss section 的話,原則上另案來討論其核心實作會比較實際一些。