Linux内核源代码情景分析-系统初始化

来源:互联网 发布:onekeytools mac 编辑:程序博客网 时间:2024/05/23 11:49

    我们跳过boot,setup,直接来到head代码,内核映像的起点是stext,也是_stext,引导和解压缩以后的整个映像放在内存从0x100000即1MB开始的区间。CPU执行内核映像的入口startup_32就在内核映像开头的地方,因此其物理地址也是0x100000。

    然而,在正常运行时整个内核映像都应该在系统空间中,系统空间的虚拟地址与物理地址间有个固定的位移,这就是0xC0000000,即3GB。所以,在连接内核映像时已经在所有的符号地址加了一个偏移量0xC0000000,这样startup_32的虚拟地址就成了0xC0100000。

    进入startup_32时都运行于保护模式下的段式寻址方式。段描述表中与_KERNEL_CS和_KERNEL_DS想对应的描述项所提供的基地址都是0。其中代码段寄存器CS已在进入startup_32之前设置成_KERNEL_CS,数据段寄存器则尚未设置成_KERNEL_DS。

    虽然代码段寄存器已经设置成_KERNEL_CS,从而startup_32的地址为0xC0100000。但是在转入这个入口时使用的指令是"ljmp 0x100000"而不是”ljmp startup_32“,所以装入CPU中的寄存器IP的地址是物理地址0x100000而不是虚拟地址0xC0000000。这样,CPU在进入startup_32以后就会继续以物理地址取指令。只要不在代码段中引用某个地址,例如向某个地址绝对转移,或者调用某个子程序,就可以一直这样运行下去,而与CS的内容无关。此外,CPU的中断已在进入startup_32之前关闭了。

    从startup_32开始的汇编代码在arch/i386/kernel/head.S中,代码如下:

/* *  linux/arch/i386/head.S -- the 32-bit startup code. * *  Copyright (C) 1991, 1992  Linus Torvalds * *  Enhanced CPU detection and feature setting code by Mike Jagdis *  and Martin Mares, November 1997. */.text#include <linux/config.h>#include <linux/threads.h>#include <linux/linkage.h>#include <asm/segment.h>#include <asm/page.h>#include <asm/pgtable.h>#include <asm/desc.h>#define OLD_CL_MAGIC_ADDR0x90020#define OLD_CL_MAGIC0xA33F#define OLD_CL_BASE_ADDR0x90000#define OLD_CL_OFFSET0x90022#define NEW_CL_POINTER0x228/* Relative to real mode data *//* * References to members of the boot_cpu_data structure. */#define CPU_PARAMSSYMBOL_NAME(boot_cpu_data)#define X86CPU_PARAMS+0#define X86_VENDORCPU_PARAMS+1#define X86_MODELCPU_PARAMS+2#define X86_MASKCPU_PARAMS+3#define X86_HARD_MATHCPU_PARAMS+6#define X86_CPUIDCPU_PARAMS+8#define X86_CAPABILITYCPU_PARAMS+12#define X86_VENDOR_IDCPU_PARAMS+16/* * swapper_pg_dir is the main page directory, address 0x00101000 * * On entry, %esi points to the real-mode code as a 32-bit pointer. */ENTRY(stext)ENTRY(_stext)startup_32:/* * Set segments to known values */cldmovl $(__KERNEL_DS),%eaxmovl %eax,%dsmovl %eax,%esmovl %eax,%fsmovl %eax,%gs  //将ds,es,fs,gs都设置成_KERNEL_DS        ....../* * Initialize page tables */movl $pg0-__PAGE_OFFSET,%edi //pg0是虚拟地址,所以要减去3GB的地址,才变成物理地址movl $007,%eax//"007"代表PRESENT+RW+USER   2:stosl //向目标地址复制数据add $0x1000,%eax //每次递增0x1000cmp $empty_zero_page-__PAGE_OFFSET,%edi //直到empty_zero_pag就不在复制了jne 2b ////从pg0开始直到empty_zero_page之间的8K字节设置成一个临时的页面映射表,依次是0x0,0x1000,0x2000,也就是物理内存中的页面0、1、2。映射表的大小是两个页面,即2K个表项,所以代表着一块8MB的存储空间,这就是Linux内核对内存大小的最低限度要求/* * Enable paging */3:movl $swapper_pg_dir-__PAGE_OFFSET,%eax //页目录表的位置movl %eax,%cr3//设置页目录表的地址movl %cr0,%eaxorl $0x80000000,%eaxmovl %eax,%cr0//开启分页机制jmp 1f//此时使用的是物理地址,这就是页目录表中低768个表项的前两项设置成0x00102007,0x00103007,起过度作用1:movl $1f,%eaxjmp *%eax//此时再跳转,使用的就是虚拟地址了,也就是1标识符的实际物理地址+3GB,形成虚拟地址,虚拟地址再通过分页机制,也就是页目录表中低256个表项中前两项设置成0x00102007,0x00103007,得到1标识符的实际物理地址,实际上就是1标识符的实际物理地址+3GB再减去3GB1:/* Set up the stack pointer */lss stack_start,%esp//设置了堆栈的位置        ....../* * Clear BSS first so that there are no surprises... * No need to cld as DF is already clear from cld above... */xorl %eax,%eax//暂时忽略movl $ SYMBOL_NAME(__bss_start),%edimovl $ SYMBOL_NAME(_end),%ecxsubl %edi,%ecxrepstosb/* * start system 32-bit setup. We need to re-do some of the things done * in 16-bit mode for the "real" operations. */call setup_idt//初始化中断向量表/* * Initialize eflags.  Some BIOS's leave bits like NT set.  This would * confuse the debugger if this code is traced. * XXX - best to initialize before switching to protected mode. */pushl $0popfl/* * Copy bootup parameters out of the way. First 2kB of * _empty_zero_page is for boot parameters, second 2kB * is for the command line. * * Note: %esi still has the pointer to the real-mode data. */movl $ SYMBOL_NAME(empty_zero_page),%edi//将setup传递过来的引导参数和命令行复制到empty_zero_page中movl $512,%ecxcldrepmovslxorl %eax,%eaxmovl $512,%ecxrepstoslmovl SYMBOL_NAME(empty_zero_page)+NEW_CL_POINTER,%esiandl %esi,%esijnz 2f# New command line protocolcmpw $(OLD_CL_MAGIC),OLD_CL_MAGIC_ADDRjne 1fmovzwl OLD_CL_OFFSET,%esiaddl $(OLD_CL_BASE_ADDR),%esi2:movl $ SYMBOL_NAME(empty_zero_page)+2048,%edimovl $512,%ecxrepmovsl        ......movl $-1,X86_CPUID#  -1 for no CPUID initially/* check if it is 486 or 386. *//* * XXX - this does a lot of unnecessary setup.  Alignment checks don't * apply at our cpl of 0 and the stack ought to be aligned already, and * we don't need to preserve eflags. */movl $3,X86# at least 386 //暂不关心pushfl# push EFLAGSpopl %eax# get EFLAGSmovl %eax,%ecx# save original EFLAGSxorl $0x40000,%eax# flip AC bit in EFLAGSpushl %eax# copy to EFLAGSpopfl# set EFLAGSpushfl# get new EFLAGSpopl %eax# put it in eaxxorl %ecx,%eax# change in flagsandl $0x40000,%eax# check if AC bit changedje is386movl $4,X86# at least 486movl %ecx,%eaxxorl $0x200000,%eax# check ID flagpushl %eaxpopfl# if we are on a straight 486DX, SX, orpushfl# 487SX we can't change itpopl %eaxxorl %ecx,%eaxpushl %ecx# restore original EFLAGSpopflandl $0x200000,%eaxje is486/* get vendor info */xorl %eax,%eax# call CPUID with 0 -> return vendor IDcpuidmovl %eax,X86_CPUID# save CPUID levelmovl %ebx,X86_VENDOR_ID# lo 4 charsmovl %edx,X86_VENDOR_ID+4# next 4 charsmovl %ecx,X86_VENDOR_ID+8# last 4 charsorl %eax,%eax# do we have processor info as well?je is486movl $1,%eax# Use the CPUID instruction to get CPU typecpuidmovb %al,%cl# save reg for future useandb $0x0f,%ah# mask processor familymovb %ah,X86andb $0xf0,%al# mask modelshrb $4,%almovb %al,X86_MODELandb $0x0f,%cl# mask mask revisionmovb %cl,X86_MASKmovl %edx,X86_CAPABILITYis486:movl %cr0,%eax# 486 or betterandl $0x80000011,%eax# Save PG,PE,ETorl $0x50022,%eax# set AM, WP, NE and MPjmp 2fis386:pushl %ecx# restore original EFLAGSpopflmovl %cr0,%eax# 386andl $0x80000011,%eax# Save PG,PE,ETorl $2,%eax# set MP2:movl %eax,%cr0call check_x87        ......lgdt gdt_descr //设置CPU的"全局段描述表寄存器"GDTRlidt idt_descr//设置CPU的"中断描述表寄存器"IDTRljmp $(__KERNEL_CS),$1f  //重新装载cs,ds,es,fs,gs1:movl $(__KERNEL_DS),%eax# reload all the segment registersmovl %eax,%ds# after changing gdt.movl %eax,%esmovl %eax,%fsmovl %eax,%gs        ......lss stack_start,%esp# Load processor stack        ......xorl %eax,%eaxlldt %ax          //LDTR选择子清零cld# gcc2 wants the direction flag cleared at all times        ......call SYMBOL_NAME(start_kernel) //开始执行start_kernelL6:jmp L6# main should never return here, but# just in case, we know what happens.#ifdef CONFIG_SMPready:.byte 0#endif/* * We depend on ET to be correct. This checks for 287/387. */check_x87:movb $0,X86_HARD_MATHcltsfninitfstsw %axcmpb $0,%alje 1fmovl %cr0,%eax/* no coprocessor: have to set bits */xorl $4,%eax/* set EM */movl %eax,%cr0retALIGN1:movb $1,X86_HARD_MATH.byte 0xDB,0xE4/* fsetpm for 287, ignored by 387 */ret/* *  setup_idt * *  sets up a idt with 256 entries pointing to *  ignore_int, interrupt gates. It doesn't actually load *  idt - that can be done only after paging has been enabled *  and the kernel moved to PAGE_OFFSET. Interrupts *  are enabled elsewhere, when we can be relatively *  sure everything is ok. */setup_idt://每个表项的大小是8个字节,共有256个表项,都指向了同一个中断响应程序ignore_intlea ignore_int,%edxmovl $(__KERNEL_CS << 16),%eaxmovw %dx,%ax/* selector = 0x0010 = cs */movw $0x8E00,%dx/* interrupt gate - dpl=0, present */lea SYMBOL_NAME(idt_table),%edimov $256,%ecxrp_sidt:movl %eax,(%edi)movl %edx,4(%edi)addl $8,%edidec %ecxjne rp_sidtretENTRY(stack_start) //task_struct和堆栈共同占用两个页面,堆栈在高地址端.long SYMBOL_NAME(init_task_union)+8192.long __KERNEL_DS/* This is the default interrupt "handler" :-) */int_msg:.asciz "Unknown interrupt\n"ALIGNignore_int://中断处理程序cldpushl %eaxpushl %ecxpushl %edxpushl %espushl %dsmovl $(__KERNEL_DS),%eaxmovl %eax,%dsmovl %eax,%espushl $int_msgcall SYMBOL_NAME(printk)popl %eaxpopl %dspopl %espopl %edxpopl %ecxpopl %eaxiret/* * The interrupt descriptor table has room for 256 idt's, * the global descriptor table is dependent on the number * of tasks we can have.. */#define IDT_ENTRIES256#define GDT_ENTRIES(__TSS(NR_CPUS)).globl SYMBOL_NAME(idt).globl SYMBOL_NAME(gdt)ALIGN.word 0idt_descr:.word IDT_ENTRIES*8-1//中断描述符表的长度SYMBOL_NAME(idt):.long SYMBOL_NAME(idt_table) //中断描述符表的基地址.idt_table是个全局变量.word 0gdt_descr:.word GDT_ENTRIES*8-1//全局段描述表的长度SYMBOL_NAME(gdt):.long SYMBOL_NAME(gdt_table) //全局段描述表的基地址,gdt_table如下/* * This is initialized to create an identity-mapping at 0-8M (for bootup * purposes) and another mapping of the 0-8M area at virtual address * PAGE_OFFSET. */.org 0x1000ENTRY(swapper_pg_dir)//参考下面的解释.long 0x00102007 //指向了pg0.long 0x00103007 //指向了pg1.fill BOOT_USER_PGD_PTRS-2,4,0 //768/* default: 766 entries */.long 0x00102007 //指向了pg0.long 0x00103007 //指向了pg1/* default: 254 entries */.fill BOOT_KERNEL_PGD_PTRS-2,4,0 //256/* * The page tables are initialized to only 8MB here - the final page * tables are set up later depending on memory size. */.org 0x2000 //实际的物理地址是0x00102007ENTRY(pg0).org 0x3000 //实际的物理地址是0x00103007ENTRY(pg1)/* * empty_zero_page must immediately follow the page tables ! (The * initialization loop counts until empty_zero_page) */.org 0x4000ENTRY(empty_zero_page).org 0x5000ENTRY(empty_bad_page).org 0x6000ENTRY(empty_bad_pte_table)#if CONFIG_X86_PAE .org 0x7000 ENTRY(empty_bad_pmd_table) .org 0x8000#else .org 0x7000#endif/* * This starts the data section. Note that the above is all * in the text section because it has alignment requirements * that we cannot fulfill any other way. */.dataALIGN/* * This contains typically 140 quadwords, depending on NR_CPUS. * * NOTE! Make sure the gdt descriptor in head.S matches this if you * change anything. */ENTRY(gdt_table).quad 0x0000000000000000/* NULL descriptor */.quad 0x0000000000000000/* not used */.quad 0x00cf9a000000ffff/* 0x10 kernel 4GB code at 0x00000000 */.quad 0x00cf92000000ffff/* 0x18 kernel 4GB data at 0x00000000 */.quad 0x00cffa000000ffff/* 0x23 user   4GB code at 0x00000000 */.quad 0x00cff2000000ffff/* 0x2b user   4GB data at 0x00000000 */.quad 0x0000000000000000/* not used */.quad 0x0000000000000000/* not used *//* * The APM segments have byte granularity and their bases * and limits are set at run time. */.quad 0x0040920000000000/* 0x40 APM set up for bad BIOS's */.quad 0x00409a0000000000/* 0x48 APM CS    code */.quad 0x00009a0000000000/* 0x50 APM CS 16 code (16 bit) */.quad 0x0040920000000000/* 0x58 APM DS    data */.fill NR_CPUS*4,8,0/* space for TSS's and LDT's *//* * This is to aid debugging, the various locking macros will be putting * code fragments here.  When an oops occurs we'd rather know that it's * inside the .text.lock section rather than as some offset from whatever * function happens to be last in the .text segment. */.section .text.lockENTRY(stext_lock)

.org 0x1000ENTRY(swapper_pg_dir).long 0x00102007.long 0x00103007.fill BOOT_USER_PGD_PTRS-2,4,0 //768/* default: 766 entries */.long 0x00102007.long 0x00103007/* default: 254 entries */.fill BOOT_KERNEL_PGD_PTRS-2,4,0 //256
    我们单独解释下这段代码,一个页目录表有1024个表项,共代表着4GB的虚拟空间。Linux内核以3GB为界把整个虚拟空间分成用户空间和系统空间。所以,页目录表中低768个表项用于用户空间的映射,而高256个表项用于系统空间的映射。


    在Linux0.11中,内核空间和用户空间是这样切换的。

    首先页目录项是这样的:

    

  

    页目录表的前4项用于内核空间,分别指向页表0,页表1,页表2,页表3,共映射16MB的空间,内核态使用GDT,基地址为0,可以访问到所有的内存地址。

    当处于进程2的用户态时,对应的页目录表是32~48项,对应的16个页表是自己创建的。由于用户态使用LDT,基地址为128MB。比如cs:eip,其中eip为0,那么经过分段机制,虚拟地址为128MB,经过分页机制,首先根据虚拟地址的前10位选择的便是页目录项中的第32项,然后根据虚拟地址的中间10位是选择的是第32项所指向页表中的第一个页表项,最后根据后12位都为0,这个页表项指向的内存地址便是要访问的物理地址。


    在Linux2.4中,内核空间和用户空间是这样切换的。

     每个进程有不同的页目录表,页目录价表有1024个表项,共代表着4GB的虚拟空间。Linux内核以3GB为界把整个虚拟空间分成用户空间和系统空间。所以,页目录表中低768个表项用于用户空间的映射,而高256个表项用于系统空间的映射。

    用户空间的虚拟地址是0~3G,也就是对应得了页目录表中的低768个表项。还记得我们分配用户空间的虚拟地址就是从0分配到3G么,Linux内核源代码情景分析-execve()。

    内核空间的虚拟地址是3G~4G,对应的是页目录表中的高256个表项,由于内核空间的标识符经过链接后都在实际的物理地址上加上了3G,所以访问内核空间时,虚拟地址在3G~4G,经过分页机制(如上)就变成了实际的物理地址(其实就是虚拟地址减去3G)。

    Linux2.4的不适用LDT,只使用GDT,无论在内核空间还是用户空间,逻辑地址经过分段机制,得到的虚拟地址与逻辑地址相同。

    GDT如下:

ENTRY(gdt_table).quad 0x0000000000000000/* NULL descriptor */.quad 0x0000000000000000/* not used */.quad 0x00cf9a000000ffff/* 0x10 kernel 4GB code at 0x00000000 */.quad 0x00cf92000000ffff/* 0x18 kernel 4GB data at 0x00000000 */.quad 0x00cffa000000ffff/* 0x23 user   4GB code at 0x00000000 */.quad 0x00cff2000000ffff/* 0x2b user   4GB data at 0x00000000 */.quad 0x0000000000000000/* not used */.quad 0x0000000000000000/* not used *//*

0 0