Linux内核分析之五——分析系统调用(system_call)的执行机制

来源:互联网 发布:计算机专业与大数据 编辑:程序博客网 时间:2024/05/18 02:20

作者:姚开健

原创作品转载请注明出处

《Linux内核分析》MOOC课程http://mooc.study.163.com/course/USTC-1000029000

当系统进行系统调用时,系统会通过int 0x80进行跳转到system_call这个地方,它是在系统初始化时,调用trap_init()就开始初始化好的地址,如果在以后有发生系统调用,则会跳转至system_call这个地方去执行。system_call这个汇编执行块是在x86/kernel/entry_32.S这个文件里面(以x86为例),我们来看看它的部分代码:

ENTRY(system_call)491RING0_INT_FRAME# can't unwind into user space anyway492ASM_CLAC493pushl_cfi %eax# save orig_eax494SAVE_ALL495GET_THREAD_INFO(%ebp)496# system call tracing in operation / emulation497testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags(%ebp)498jnz syscall_trace_entry499cmpl $(NR_syscalls), %eax500jae syscall_badsys501syscall_call:502call *sys_call_table(,%eax,4)503syscall_after_call:504movl %eax,PT_EAX(%esp)# store the return value505syscall_exit:506LOCKDEP_SYS_EXIT507DISABLE_INTERRUPTS(CLBR_ANY)# make sure we don't miss an interrupt508# setting need_resched or sigpending509# between sampling and the iret510TRACE_IRQS_OFF511movl TI_flags(%ebp), %ecx512testl $_TIF_ALLWORK_MASK, %ecx# current->work513jne syscall_exit_work514515restore_all:516TRACE_IRQS_IRET517restore_all_notrace:518#ifdef CONFIG_X86_ESPFIX32519movl PT_EFLAGS(%esp), %eax# mix EFLAGS, SS and CS520# Warning: PT_OLDSS(%esp) contains the wrong/random values if we521# are returning to the kernel.522# See comments in process.c:copy_thread() for details.523movb PT_OLDSS(%esp), %ah524movb PT_CS(%esp), %al525andl $(X86_EFLAGS_VM | (SEGMENT_TI_MASK << 8) | SEGMENT_RPL_MASK), %eax526cmpl $((SEGMENT_LDT << 8) | USER_RPL), %eax527CFI_REMEMBER_STATE528je ldt_ss# returning to user-space with LDT SS529#endif530restore_nocheck:531RESTORE_REGS 4# skip orig_eax/error_code532irq_return:533INTERRUPT_RETURN534.section .fixup,"ax"535ENTRY(iret_exc)536pushl $0# no error code537pushl $do_iret_error538jmp error_code539.previous540_ASM_EXTABLE(irq_return,iret_exc)541542#ifdef CONFIG_X86_ESPFIX32543CFI_RESTORE_STATE544ldt_ss:545#ifdef CONFIG_PARAVIRT546/*547 * The kernel can't run on a non-flat stack if paravirt mode548 * is active.  Rather than try to fixup the high bits of549 * ESP, bypass this code entirely.  This may break DOSemu550 * and/or Wine support in a paravirt VM, although the option551 * is still available to implement the setting of the high552 * 16-bits in the INTERRUPT_RETURN paravirt-op.553 */554cmpl $0, pv_info+PARAVIRT_enabled555jne restore_nocheck556#endif557558/*559 * Setup and switch to ESPFIX stack560 *561 * We're returning to userspace with a 16 bit stack. The CPU will not562 * restore the high word of ESP for us on executing iret... This is an563 * "official" bug of all the x86-compatible CPUs, which we can work564 * around to make dosemu and wine happy. We do this by preloading the565 * high word of ESP with the high word of the userspace ESP while566 * compensating for the offset by changing to the ESPFIX segment with567 * a base address that matches for the difference.568 */569#define GDT_ESPFIX_SS PER_CPU_VAR(gdt_page) + (GDT_ENTRY_ESPFIX_SS * 8)570mov %esp, %edx/* load kernel esp */571mov PT_OLDESP(%esp), %eax/* load userspace esp */572mov %dx, %ax/* eax: new kernel esp */573sub %eax, %edx/* offset (low word is 0) */574shr $16, %edx575mov %dl, GDT_ESPFIX_SS + 4 /* bits 16..23 */576mov %dh, GDT_ESPFIX_SS + 7 /* bits 24..31 */577pushl_cfi $__ESPFIX_SS578pushl_cfi %eax/* new kernel esp */579/* Disable interrupts, but do not irqtrace this section: we580 * will soon execute iret and the tracer was already set to581 * the irqstate after the iret */582DISABLE_INTERRUPTS(CLBR_EAX)583lss (%esp), %esp/* switch to espfix segment */584CFI_ADJUST_CFA_OFFSET -8585jmp restore_nocheck586#endif587CFI_ENDPROC588ENDPROC(system_call)589590# perform work that needs to be done immediately before resumption591ALIGN592RING0_PTREGS_FRAME# can't unwind into user space anyway593work_pending:594testb $_TIF_NEED_RESCHED, %cl595jz work_notifysig596work_resched:597call schedule598LOCKDEP_SYS_EXIT599DISABLE_INTERRUPTS(CLBR_ANY)# make sure we don't miss an interrupt600# setting need_resched or sigpending601# between sampling and the iret602TRACE_IRQS_OFF603movl TI_flags(%ebp), %ecx604andl $_TIF_WORK_MASK, %ecx# is there any work to be done other605# than syscall tracing?606jz restore_all607testb $_TIF_NEED_RESCHED, %cl608jnz work_resched609610work_notifysig:# deal with pending signals and611# notify-resume requests612#ifdef CONFIG_VM86613testl $X86_EFLAGS_VM, PT_EFLAGS(%esp)614movl %esp, %eax615jne work_notifysig_v86# returning to kernel-space or616# vm86-space6171:618#else619movl %esp, %eax620#endif621TRACE_IRQS_ON622ENABLE_INTERRUPTS(CLBR_NONE)623movb PT_CS(%esp), %bl624andb $SEGMENT_RPL_MASK, %bl625cmpb $USER_RPL, %bl626jb resume_kernel627xorl %edx, %edx628call do_notify_resume629jmp resume_userspace630631#ifdef CONFIG_VM86632ALIGN633work_notifysig_v86:634pushl_cfi %ecx# save ti_flags for do_notify_resume635call save_v86_state# %eax contains pt_regs pointer636popl_cfi %ecx637movl %eax, %esp638jmp 1b639#endif640END(work_pending)641642# perform syscall exit tracing643ALIGN644syscall_trace_entry:645movl $-ENOSYS,PT_EAX(%esp)646movl %esp, %eax647call syscall_trace_enter648/* What it returned is what we'll actually use.  */649cmpl $(NR_syscalls), %eax650jnae syscall_call651jmp syscall_exit652END(syscall_trace_entry)653654# perform syscall exit tracing655ALIGN656syscall_exit_work:657testl $_TIF_WORK_SYSCALL_EXIT, %ecx658jz work_pending659TRACE_IRQS_ON660ENABLE_INTERRUPTS(CLBR_ANY)# could let syscall_trace_leave() call661# schedule() instead662movl %esp, %eax663call syscall_trace_leave664jmp resume_userspace665END(syscall_exit_work)
代码比较多,在此精简了其主要执行过程,总结成一个流程图:

可以看到,系统进入系统调用时(system_call),会首先保存现场,执行save_all宏,然后进行调用中断服务程序syscall_call,接着执行syscall_exit,当执行到这里准备退出时,会进行判断需不需要响应其他中断或者信号,如果不需要则直接进行restore_all恢复现场并且irq_return,正式返回到系统调用的地方;

如果需要响应其他中断,则需要执行syscall_exit_work,看看有没有work_resched或者work_notifysig,一个是看看当前进程需不需要调度,如果需要就执行call_schedule,如果需要响应某个信号,则进行work_notifysig,接着再跳转至restore_all接着执行并退出。这个就是系统调用时汇编代码级别的大致执行过程。

当我们知道上述系统调用处理过程之后,我们则可以根据之前的进程调度切换上下文,中断处理切换上下文得出一个一般性的过程,就是当系统需要跳转去调度进程,或中断处理或系统调用时,通常我们需要的是保护现场,接着再跳转至要执行的进程或中断服务程序或系统调用服务程序,接着执行完该程序后再恢复现场,并且返回至之前发生调用的地方。如果在服务程序执行过程中还需要进程其他中断或调度或系统调用,则重复刚才的过程直至返回至最初发生调用的地方。这就是系统调用处理程序的原理。

0 0
原创粉丝点击