基于I386的Linux2.4.18内核的进程切换分析

来源：互联网发布：美工区区角进区卡图片编辑：程序博客网时间：2024/05/07 10:49

进程切换：为了能保证不同的进程在CPU上运行，内核必须做到挂起正在CPU上运行的进程，并恢复执行以前挂起的某个进程（也可以理解为唤醒进程），使其在CPU上正常运行。这个过程叫进程却换，也叫上下文切换，或者任务切换。
尽管每个进程都有自己的地址空间，但是所有的进程都共享CPU寄存器。所以当唤醒进程执行时，内核必须确保每个寄存器装入了挂起进程时的值。
进程恢复执行时必须装入寄存器的一组数据称为硬件上下文，它是进程执行上下文的一个子集。在Linux中，一个进程的硬件上下文主要保存在thread_struct中，剩余部分保存在内核态堆栈中。
进程切换非常频繁，所以一定要尽量减少保存和加载寄存器的时间。Linux2.4使用软件完成进程却换。进程切换只发生在内核态，进程切换的时候，所有用户态使用的寄存器内容被保存在内核栈，包括ss和esp，他们指明了用户态栈指针。

TSS(任务状态段)是80x86使用的一个特殊段类型。尽管LINUX不使用硬件上下文来完成切换，但它强制为系统中每个不同的CPU创建一个TSS，因为：
(1)80x86的CPU从用户态切换到内核态时，它就从TSS中获内核态堆栈的地址；
(2)当用户态试图用过in或out指令访问一个I/O端口时，CPU需要访问存放在TSS中的I/O许可权位图以检查该进程是否具有访问端口的权力。
TSS结构定义在Include/Asm-i386/Processor.h文件中。可见其主要用于保存一些寄存器和堆栈信息，以及I/O权位图。

每次进行进程切换时，被替换的进程的硬件上下文必须保存在别处，而不是按Intel最初设想地把它们保存在TSS中――因为我们无法猜测被替换的进程什么时候恢复执行，哪一个CPU将执行它。
因此，在每一个进程描述符task_struct结构中包含一个类型为thread_struct的thread字段，只要进程被切换出去，内核就把它的硬件上下文保存在这个结构中。thread_struct结构的定义也包含在Include/Asm-i386/Processor.h中。
实验内容
阅读Include/Asm-i386/Processor.h
阅读Include/Asm-i386/System.h
阅读arch/i386/kernel/process.c
阅读kernel/sched.c
实验程序
       无
实验结果（蓝色字体为代码阅读注释和一些个人理解）
       接下来进入实质性的进程切换步骤――schedule()函数。
每个进程切换分两步完成：
(1)    切换页全局目录以安装一个新的地址空间；
(2)    切换内核堆栈和硬件上下文，因为它提供了内核执行新的进程所需要的所以信息，包含CPU寄存器。这一步由switch_to宏（Include/Asm-i386/System.h中定义）完成。
#define switch_to(prev,next,last) do {                               /
       asm volatile("pushl %%esi/n/t"                                 / //将prev进程的相关寄存器值
                   "pushl %%edi/n/t"                                   /      //包括esi edi ebp等压栈
                   "pushl %%ebp/n/t"                                  /
                   "movl %%esp,%0/n/t" /* save ESP */        /
             //保存旧进程的esp栈顶指针到prev->thread.esp里
                   "movl %3,%%esp/n/t" /* restore ESP */    /
             //取出next->thread.esp新进程栈顶指针到esp里，从此内核对next的内核态
             //堆栈进行操作，因此这条指令执行从prev到next的真正的上下文切换
                   "movl $1f,%1/n/t"              /* save EIP */        /
             //在prev->thread.eip中保存标号为1的地址，该进程恢复执行时执行这指令
                   "pushl %4/n/t"            /* restore EIP */     / // 取得next->thread.eip压栈
                   "jmp __switch_to/n"                         / // __switch_to()函数执行进一步处理
                   "1:/t"                                        /
                   "popl %%ebp/n/t"                                   /
                   "popl %%edi/n/t"                             /
                   "popl %%esi/n/t"                              /
             //先恢复上次被切换走时保存的寄存器值，再从__switch_to()中返回
                   :"=m" (prev->thread.esp),"=m" (prev->thread.eip),   /
                    "=b" (last)                               /
/*
*因为进程切换后，恢复的栈上的prev信息不是刚被切换
*走的进程描述符，因此此处使用ebx寄存器传递该值给prev
*/
                   :"m" (next->thread.esp),"m" (next->thread.eip),       /
                    "a" (prev), "d" (next),                            /
                    "b" (prev));                             /
} while (0)

进程切换过程可以分成两个阶段，switch_to(prev,next,last)这段汇编代码可以看作第一阶段，它保存一些关键的寄存器，并在栈上设置好跳转到新进程的地址。第二阶段由__switch_to()函数启动（定义在arch/i386/kernel/process.c中），主要用于保存和更新不是非常关键的一些寄存器（以及IO操作许可权映射表ioperm）的值。
void __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
{
                               struct thread_struct *prev = &prev_p->thread,
                                                                             *next = &next_p->thread;
                               struct tss_struct *tss = init_tss + smp_processor_id();

                               unlazy_fpu(prev_p);
/*
*执行unlazy_fpu()宏产生的代码，如果prev使用了浮点计算，
*则将FPU内容保存在task_struct::thread中
*/
                               /*
                                * Reload esp0, LDT and the page table pointer:
                                */
                               tss->esp0 = next->esp0;
/*
*用新进程的esp0（task_struct::thread中）更新init_tss中相应位置的esp0
*/
                               /*
                                * Save away %fs and %gs. No need to save %es and %ds, as
                                * those are always kernel segments while inside the kernel.
                                */
                               asm volatile("movl %%fs,%0":"=m" (*(int *)&prev->fs));
                               asm volatile("movl %%gs,%0":"=m" (*(int *)&prev->gs));
/*
                 *在老进程的task_struct::thread中保存当前的fs和gs寄存器，
*然后从新进程的task_struct::thread中恢复fs和gs寄存器
*
                               /*
                                * Restore %fs and %gs.
                                */
                               loadsegment(fs, next->fs);//恢复next的fs，gs
                               loadsegment(gs, next->gs);

                               /*
                                * Now maybe reload the debug registers
                                */
                               if (next->debugreg[7]){
                                              loaddebug(next, 0);
                                              loaddebug(next, 1);
                                              loaddebug(next, 2);
                                              loaddebug(next, 3);
                                              /* no 4 and 5 */
                                              loaddebug(next, 6);
                                              loaddebug(next, 7);
                               }

//更新IO操作许可权映射表ioperm
                               if (prev->ioperm || next->ioperm) {
                                              if (next->ioperm) {
                                                             /*
                                                              * 4 cachelines copy ... not good, but not that
                                                              * bad either. Anyone got something better?
                                                              * This only affects processes which use ioperm().
                                                              * [Putting the TSSs into 4k-tlb mapped regions
                                                              * and playing VM tricks to switch the IO bitmap
                                                              * is not really acceptable.]
                                                              */
                                                             memcpy(tss->io_bitmap, next->io_bitmap,
                                                                             IO_BITMAP_SIZE*sizeof(unsigned long));
                                                             tss->bitmap = IO_BITMAP_OFFSET;
                                              } else
                                                             /*
                                                              * a bitmap offset pointing outside of the TSS limit
                                                              * causes a nicely controllable SIGSEGV if a process
                                                              * tries to use a port IO instruction. The first
                                                              * sys_ioperm() call sets up the bitmap properly.
                                                              */
                                                             tss->bitmap = INVALID_IO_BITMAP_OFFSET;
                               }
}

__switch_to()函数正常执行后，栈上的地址是新进程的task_struct::thread::eip，即新进程上一次被挂起时设置的继续运行的位置（也就是switch_to()宏中标号"1:"的位置）。至此转入新进程的上下文中运行，进程切换完成。

再看看在kernel/sched.c文件中，switch_to是如何被调用：
asmlinkage void schedule(void)
{ …
   …
   switch_to(prev, next, prev);
   …
}

当中传递给第三个参数last的是prev，这是进程切换里面一个奥妙之处。因为当调用switch_to()宏，进程A切换到进程B时，A的执行流就冻结了。随后当内核想重新激活A进程时，必须暂停正在运行的进程C（一般是不会是B，因为已经从B切换了若干个进程），又调用一次switch_to()宏，当A恢复执行流时prev还是指向A，next还是指向B，而A就丧失了跟C相关的一些上下文信息。为此，last参数接受了prev变量，以保证进程切换过后依然可以对引用切换进程的一些信息。