进程切换&&中断&&异常&系统调用execve（）函数

来源：互联网发布：卷子答案软件编辑：程序博客网时间：2024/05/12 13:03

进程切换

每个进程都可以拥有属于自己的进程空间；但是共享一个CPU寄存器；因此在切换进程之前，必须确保每个寄存器的值都是准入了挂起进程时的值。

进程恢复执行前必须装入寄存器的一组数据称为硬件上下文（hardware context）；在linux 中硬件上下文的一部分存放在TSS段，而剩余部分存放在内核态堆栈中。

* Stack layout in 'ret_from_system_call': * ptrace needs to have all regs on the stack. *if the order here is changed, it needs to be *updated in fork.c:copy_process, signal.c:do_signal, *ptrace.c and ptrace.h * * 0(%esp) - %ebx * 4(%esp) - %ecx * 8(%esp) - %edx *       C(%esp) - %esi *10(%esp) - %edi *14(%esp) - %ebp *18(%esp) - %eax *1C(%esp) - %ds *20(%esp) - %es *24(%esp) - orig_eax *28(%esp) - %eip *2C(%esp) - %cs *30(%esp) - %eflags *34(%esp) - %oldesp *38(%esp) - %oldss * * "current" is in register %ebx during any slow entries

thread字段

在每次进程切换时，被替换的进程硬件上下文必出存在别处；而linux中为每个处理器而不是每个进程使用TSS；所以进程切换时，内核就把硬件上下文保存在thread_struct的thread字段中。这个字段包含大部分CPU寄存器字段，但不包含通用寄存器（eax、ebx等）；它们的值保存在内核堆栈中。

switch_to宏：

#define switch_to(prev,next,last) do {\
unsigned long esi,edi;\
asm volatile("pushfl\n\t"\
"pushl %%ebp\n\t"\
"movl %%esp,%0\n\t"/* save ESP */\
"movl %5,%%esp\n\t"/* restore ESP */\
"movl $1f,%1\n\t"/* save EIP */\
"pushl %6\n\t"/* restore EIP */\
"jmp __switch_to\n"\
"1:\t" \
"popl %%ebp\n\t"\
"popfl" \
:"=m" (prev->thread.esp),"=m" (prev->thread.eip),\
"=a" (last),"=S" (esi),"=D" (edi)\
:"m" (next->thread.esp),"m" (next->thread.eip),\
"2" (prev), "d" (next));\
} while (0)

这里的输出部有三个参数，表示这段程序执行以后有三项数据会有改变。其中%0和%1都在内存中，分别为prev->thread.esp和prev->thread.eip，而%2则与寄存器EB X结合，对应于参数中的last。而输入部则有5个参数。其中%3和%4在内存中，分别为next->thread.esp和next->thread.eip} %5, %6和%7分别与寄存器EAX, EDX以及EBX结合，分别对应于prev, next和prev 。

先来看开头的三条push指令和结尾处的三条pop指令。看起来好像是很一般，其实却暗藏玄机。且看第19行和20行。第19行将当前的ESP，也就是当前进程prev的系统空问堆栈指针存入prev->thread.esp，第20行又将新受到调度要进入运行的进程next的系统空问堆栈指针next->thread.esp置入ESP。这样一来，CPU在第20行与21行这两条指令之问就已经切换了堆栈。假定我们有A, B两个进程，在本次切换中prev指向A，而next指向B。也就是说，在本次切换中A为要“调离”的进程，而B为要“切入”的进程。那么，在这里的第16至20行是在使用A的堆栈，而从第21行开始就是在用B的堆栈了。换言之，从第21行开始，“当前进程”，已经是B而不是A了。我们以前讲过，在内核代码中当需要访问当前进程的task struct结构时使用的指针current实际上是宏定义，它根据当前的堆栈指针ESP计算出所需的地址。如果第21行处引用current的话，那就已经指向B的task struct结构了。从这个意义上说，进程的切换在第20行的指令执行完就
已经完成了。但是，构成一个进程的另一个要素是程序的执行，这方面的切换显然尚未完成。那么，为什么在第16至18行push进A的堆栈，而在第25行至27行却从B的堆栈POP回来呢?这就是奥妙所在了。其实，第25行至27行是在恢复新切入的进程在上一次被调离时push进堆栈的内容。那么，程序执行的切换，具体又是怎样实现的呢?让我们来看第21行至24行。第21行将标号1"所在的地址，实际上就是第25行的pop指令所在的地址保存在prev->thread.eip中，作为进程A下一次被调度运行而切入时的“返回”地址。然后，又将next->thread.eip压入堆栈。所以，这里的

next->thread.eip正是进程B上一次被调离时在第21行中保存的。它也指向这里的标号“1"，即25行的pop指令。接着，在23行通过jmp指令，而不是call指令，转入了一个函数_switch to()。且不说在_switch toQ中干了些什么，当CPU执行到那里的ret指令时，由于是通过jmp指令转过去的，最后进入堆栈的next->thread.eip就变成了返回地址，而这就是标号“1”所在的地址，也就是25行的pop指令所在的地址。由于每个进程在被调离时都要执行这里的第21行，这就决定了每个进程在受到调度恢复运行时都是从这里的第25行开始。但是有一个例外，那就是新创建的进程。新创建的进程并没有在“上一次调离时”执行过这里的第16至21行，所以一来要将其task struct结构中的thread.eip事先设置好，二来所设置的“返回地址”也未必是这里的标号“1”所在，这取决于其系统空问堆栈的设置。事实上，读者在fork()一节中已经看到，这个地址在copy_ thread()中(见arch/i386/kernel/process.c)设置为ret from fork，其代码在entry.S中:

ENTRY(ret_from_fork)
pushl %eax
call schedule_tail
GET_THREAD_INFO(%ebp)
popl %eax
jmp syscall_exit

syscall_exit:
cli # make sure we don't miss an interrupt
# setting need_resched or sigpending
# between sampling and the iret
movl TI_flags(%ebp), %ecx
testw $_TIF_ALLWORK_MASK, %cx# current->work
jne syscall_exit_work

syscall_exit_work:
testb $(_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT|_TIF_SINGLESTEP), %cl
jz work_pending
sti # could let do_syscall_trace() call
# schedule() instead
movl %esp, %eax
movl $1, %edx
call do_syscall_trace
jmp resume_userspace

/*
* Return to user mode is not as complex as all this looks,
* but we want the default path for a system call return to
* go as quickly as possible which is why some of this is
* less clear than it otherwise should be.
*/

即从内核空间返回到用户空间

父进程在fork（）了进程以后，并不立即主动调用schedule（），而只是将其task struct结构中的need_ resched标志设成了1，然后就从do_ fork()和sys fork()中返回。经过entry. S中的ret_from_ sys_ call到达ret _with _reschedule时，如果其task_struct结构中的need resched为0，那就直接返回了，这时其堆栈指针已经指向了regs，所以RESTORE_ ALL就使进程回到用户空问(参看第3章)。可是，现在need_ resched已经是1，就要调用schedule()进行调度，所以其堆栈指针又回过头来向下仲展。如果调
度的结果是继续运行，那就马上会从schedule()返回，就像什么事也没发生过一样。而如果调度了另一个进程运行，那么其系统空问堆栈就变成了图4.5中的样了。处于堆栈“顶部”的是进程在下一次被调度运行时的切入点，那就是在前面switoh to()的代码中21行设置的。注意，switch to()是一个宏操作而并不是一个函数，所以堆栈中并没有从switch to()返回的地址。将来，当父进程被调度恢复运行时，在switoh to()的20行恢复了其堆栈指针，然后在_switch_ to()中执行ret指令时就“返回”到了25行，
所以其堆栈中的这一项也可以看成是“从_switch to()返回的地址”。父进程最后返回到了entry.S中的289行，紧接着就会跳转到ret from_ sys call。相比之下，了进程的这个“返回地址”被设置成ret from sys call ,所以存switch to(、一执行ret指令就返回到哪里}I}

汇编指令调用的jmp _switch_to

主要完成tss的处理

/*
* Note that the .io_bitmap member must be extra-big. This is because
* the CPU will access an additional byte beyond the end of the IO
* permission bitmap. The extra byte must be all 1 bits, and must
* be within the limit.
*/
#define INIT_TSS { \
.esp0 = sizeof(init_stack) + (long)&init_stack,\
.ss0 = __KERNEL_DS,\
.ss1 = __KERNEL_CS,\
.ldt = GDT_ENTRY_LDT,\
.io_bitmap_base= INVALID_IO_BITMAP_OFFSET,\
.io_bitmap = { [ 0 ... IO_BITMAP_LONGS] = ~0 }, \
}

DEFINE_PER_CPU(struct tss_struct, init_tss) ____cacheline_maxaligned_in_smp = INIT_TSS;

struct tss_struct {
unsigned shortback_link,__blh;
unsigned long esp0;
unsigned shortss0,__ss0h;
unsigned long esp1;
unsigned shortss1,__ss1h;/* ss1 is used to cache MSR_IA32_SYSENTER_CS */
unsigned long esp2;
unsigned shortss2,__ss2h;
unsigned long __cr3;
unsigned long eip;
unsigned long eflags;
unsigned long eax,ecx,edx,ebx;
unsigned long esp;
unsigned long ebp;
unsigned long esi;
unsigned long edi;
unsigned shortes, __esh;
unsigned shortcs, __csh;
unsigned shortss, __ssh;
unsigned shortds, __dsh;
unsigned shortfs, __fsh;
unsigned shortgs, __gsh;
unsigned shortldt, __ldth;
unsigned shorttrace, io_bitmap_base;
/*
* The extra 1 is there because the CPU will access an
* additional byte beyond the end of the IO permission
* bitmap. The extra byte must be all 1 bits, and must
* be within the limit.
*/
unsigned long io_bitmap[IO_BITMAP_LONGS + 1];
/*
* Cache the current maximum and the last task that used the bitmap:
*/
unsigned long io_bitmap_max;
struct thread_struct *io_bitmap_owner;
/*
* pads the TSS to be cacheline-aligned (size is 0x100)
*/
unsigned long __cacheline_filler[35];
/*
* .. and then another 0x100 bytes for emergency kernel stack
*/
unsigned long stack[64];
} __attribute__((packed)

此指令完成的内容:process.c文件中

/* *switch_to(x,yn) should switch tasks from x to y. * * We fsave/fwait so that an exception goes off at the right time * (as a call from the fsave or fwait in effect) rather than to * the wrong process. Lazy FP saving no longer makes any sense * with modern CPU's, and this simplifies a lot of things (SMP * and UP become the same). * * NOTE! We used to use the x86 hardware context switching. The * reason for not using it any more becomes apparent when you * try to recover gracefully from saved state that is no longer * valid (stale segment register values in particular). With the * hardware task-switch, there is no way to fix up bad state in * a reasonable manner. * * The fact that Intel documents the hardware task-switching to * be slow is a fairly red herring - this code is not noticeably * faster. However, there _is_ some room for improvement here, * so the performance issues may eventually be a valid point. * More important, however, is the fact that this allows us much * more flexibility. * * The return value (in %eax) will be the "prev" task after * the task-switch, and shows up in ret_from_fork in entry.S, * for example. */struct task_struct fastcall * __switch_to(struct task_struct *prev_p, struct task_struct *next_p){struct thread_struct *prev = &prev_p->thread, *next = &next_p->thread;int cpu = smp_processor_id();struct tss_struct *tss = &per_cpu(init_tss, cpu);/* never put a printk in __switch_to... printk() calls wake_up*() indirectly */__unlazy_fpu(prev_p);/* * Reload esp0, LDT and the page table pointer: */load_esp0(tss, next);/* * Load the per-thread Thread-Local Storage descriptor. */load_TLS(next, cpu);/* * Save away %fs and %gs. No need to save %es and %ds, as * those are always kernel segments while inside the kernel. */asm volatile("movl %%fs,%0":"=m" (*(int *)&prev->fs));asm volatile("movl %%gs,%0":"=m" (*(int *)&prev->gs));/* * Restore %fs and %gs if needed. */if (unlikely(prev->fs | prev->gs | next->fs | next->gs)) {loadsegment(fs, next->fs);loadsegment(gs, next->gs);}/* * Now maybe reload the debug registers */if (unlikely(next->debugreg[7])) {loaddebug(next, 0);loaddebug(next, 1);loaddebug(next, 2);loaddebug(next, 3);/* no 4 and 5 */loaddebug(next, 6);loaddebug(next, 7);}if (unlikely(prev->io_bitmap_ptr || next->io_bitmap_ptr))handle_io_bitmap(next, tss);return prev_p;}

0 0