Kernel command using Linux system calls

来源:互联网 发布:muse是什么软件 编辑:程序博客网 时间:2024/06/10 00:21
http://linux.chinaunix.net/techdoc/system/2009/07/20/1125219.shtml

A system call is an interface between a user-space applicationand a service that the kernel provides. Because the service is providedin the kernel, a direct call cannot be performed; instead, you must usea process of crossing the user-space/kernel boundary. The way you dothis differs based on the particular architecture. For this reason,I'll stick to the most common architecture, i386.
In this article, I explore the Linux SCI, demonstrate adding a systemcall to the 2.6.20 kernel, and then use this function from user-space.I also investigate some of the functions that you'll find useful forsystem call development and alternatives to system calls. Finally, Ilook at some of the ancillary mechanisms related to system calls, suchas tracing their usage from a given process.
The SCI
The implementation of system calls in Linux is varied based on thearchitecture, but it can also differ within a given architecture. Forexample, older x86 processors used an interrupt mechanism to migratefrom user-space to kernel-space, but new IA-32 processors provideinstructions that optimize this transition (using sysenter and sy***itinstructions). Because so many options exist and the end-result is socomplicated, I'll stick to a surface-level discussion of the interfacedetails. See the
Resources
at the end of this article for the gory details.
You needn't fully understand the internals of the SCI to amend it, so Iexplore a simple version of the system call process (see Figure 1).Each system call is multiplexed into the kernel through a single entrypoint. The eax register is used to identify the particular system callthat should be invoked, which is specified in the C library (per thecall from the user-space application). When the C library has loadedthe system call index and any arguments, a software interrupt isinvoked (interrupt 0x80), which results in execution (through theinterrupt handler) of the system_call function. This function handlesall system calls, as identified by the contents of eax. After a fewsimple tests, the actual system call is invoked using thesystem_call_table and index contained in eax. Upon return from thesystem call, syscall_exit is eventually reached, and a call toresume_userspace transitions back to user-space. Execution resumes inthe C library, which then returns to the user application.
Figure 1. The simplified flow of a system call using the interrupt method


At the core of the SCI is the system call demultiplexing table. Thistable, shown in Figure 2, uses the index provided in eax to identifywhich system call to invoke from the table (sys_call_table). A sampleof the contents of this table and the locations of these entities isalso shown. (For more about demultiplexing, see the sidebar, "
System call demultiplexing
.")
Figure 2. The system call table and various linkages


Back to top
Adding a Linux system call
System call demultiplexing
Some system calls are further demultiplexed by the kernel. For example,the Berkeley Software Distribution (BSD) socket calls (socket, bind,connect, and so on) are associated with a single system call index(__NR_socketcall) but are demultiplexed in the kernel to theappropriate call through another argument. See ./linux/net/socket.cfunction sys_socketcall.
Adding a new system call is mostly procedural, although you should lookout for a few things. This section walks through the construction of afew system calls to demonstrate their implementation and use by auser-space application.
You perform three basic steps to add a new system call to the kernel:
  • Add the new function.
  • Update the header files.
  • Update the system call table for the new function.
    Note: This process ignores user-space needs, which I address later.
    Most often, you create a new file for your functions. However, for thesake of simplicity, I add my new functions to an existing source file.The first two functions, shown in Listing 1, are simple examples of asystem call. Listing 2 provides a slightly more complicated functionthat uses pointer arguments.
    Listing 1. Simple kernel functions for the system call example
                   
    asmlinkage long sys_getjiffies( void )
    {
      return (long)get_jiffies_64();
    }
    asmlinkage long sys_diffjiffies( long ujiffies )
    {
      return (long)get_jiffies_64() - ujiffies;
    }
    In Listing 1, two functions are provided for jiffies monitoring. (For more information about jiffies, see the sidebar, "
    Kernel jiffies
    .") The first function returns the current jiffies, while the secondreturns the difference of the current and the value that the callerpasses in. Note the use of the asmlinkage modifier. This macro (definedin linux/include/asm-i386/linkage.h) tells the compiler to pass allfunction arguments on the stack.
    Listing 2. Final kernel function for the system call example
                   
    asmlinkage long sys_pdiffjiffies( long ujiffies,
                                      long __user *presult )
    {
      long cur_jiffies = (long)get_jiffies_64();
      long result;
      int  err = 0;
      if (presult) {
        result = cur_jiffies - ujiffies;
        err = put_user( result, presult );
      }
      return err ? -EFAULT : 0;
    }
    Kernel jiffies
    The Linux kernel maintains a global variable called jiffies, whichrepresents the number of timer ticks since the machine started. Thisvariable is initialized to zero and increments each timer interrupt.You can read jiffies with the get_jiffies_64 function, and then convertthis value to milliseconds (msec) with jiffies_to_msecs or tomicroseconds (usec) with jiffies_to_usecs. The jiffies' global andassociated functions are provided in ./linux/include/linux/jiffies.h.
    Listing 2 provides the third function. This function takes twoarguments: a long and a pointer to a long that's defined as __user. The__user macro simply tells the compiler (through noderef) that thepointer should not be dereferenced (as it's not meaningful in thecurrent address space). This function calculates the difference betweentwo jiffies values, and then provides the result to the user through auser-space pointer. The put_user function places the result value intouser-space at the location that presult specifies. If an error occursduring this operation, it will be returned, and you'll likewise notifythe user-space caller.
    For step 2, I update the header files to make room for the newfunctions in the system call table. For this, I update the header filelinux/include/asm/unistd.h with the new system call numbers. Theupdates are shown in bold in Listing 3.
    Listing 3. Updates to unistd.h to make room for the new system calls
                   
    #define __NR_getcpu                318
    #define __NR_epoll_pwait        319
    #define __NR_getjiffies                320
                    #define __NR_diffjiffies        321
                    #define __NR_pdiffjiffies        322
                    #define NR_syscalls        323
                
    Now I have my kernel system calls and numbers to represent them. All Ineed to do now is draw an equivalence among these numbers (tableindexes) and the functions themselves. This is step 3, updating thesystem call table. As shown in Listing 4, I update the filelinux/arch/i386/kernel/syscall_table.S for the new functions that willpopulate the particular indexes shown in Listing 3.
    Listing 4. Update the system call table with the new Functions
                   
    .long sys_getcpu
    .long sys_epoll_pwait
    .long sys_getjiffies                /* 320 */
    .long sys_diffjiffies
                    .long sys_pdiffjiffies
                
    Note: The size of this table is defined by the symbolic constant NR_syscalls.
    At this point, the kernel is updated. I must recompile the kernel andmake the new image available for booting before testing the user-spaceapplication.
    Reading and writing user memory
    The Linux kernel provides several functions that you can use to movesystem call arguments to and from user-space. Options include simplefunctions for basic types (such as get_user or put_user). For movingblocks of data such as structures or arrays, you can use another set offunctions: copy_from_user and copy_to_user. Moving null-terminatedstrings have their own calls: strncpy_from_user and strlen_from_user.You can also test whether a user-space pointer is valid through a callto access_ok. These functions are defined inlinux/include/asm/uaccess.h.
    You use the access_ok macro to validate a user-space pointer for agiven operation. This function takes the type of access (VERIFY_READ orVERIFY_WRITE), the pointer to the user-space memory block, and the sizeof the block (in bytes). The function returns zero on success:
    int access_ok( type, address, size );
    Moving simple types between the kernel and user-space (such as ints orlongs) is accomplished easily with get_user and put_user. These macroseach take a value and a pointer to a variable. The get_user functionmoves the value that the user-space address specifies (ptr) into thekernel variable specified (var). The put_user function moves the valuethat the kernel variable (var) specifies into the user-space address(ptr). The functions return zero on success:
    int get_user( var, ptr );
    int put_user( var, ptr );
    To move larger objects, such as structures or arrays, you can use thecopy_from_user and copy_to_user functions. These functions move anentire block of data between user-space and the kernel. Thecopy_from_user function moves a block of data from user-space intokernel-space, and copy_to_user moves a block of data from the kernelinto user-space:
    unsigned long copy_from_user( void *to, const void __user *from, unsigned long n );
    unsigned long copy_to_user( void *to, const void __user *from, unsigned long n );
    Finally, you can copy a NULL-terminated string from user-space to thekernel by using the strncpy_from_user function. Before calling thisfunction, you can get the size of the user-space string with a call tothe strlen_user macro:
    long strncpy_from_user( char *dst, const char __user *src, long count );
    strlen_user( str );
    These functions provide the basics for memory movement between thekernel and user-space. Some additional functions exist (such as thosethat reduce the amount of checking performed). You can find thesefunctions in uaccess.h.
    Back to top
    Using the system call
    Now that kernel is updated with a few new system calls, let's look atwhat's necessary to use them from a user-space application. There aretwo ways that you can use new kernel system calls. The first is aconvenience method (not something that you'd probably want to do inproduction code), and the second is the traditional method thatrequires a bit more work.
    With the first method, you call your new functions as identified bytheir index through the syscall function. With the syscall function,you can call a system call by specifying its call index and a set ofarguments. For example, the short application shown in Listing 5 callsyour sys_getjiffies using its index.
    Listing 5. Using syscall to invoke a system call
                   
    #include
    #include
    #define __NR_getjiffies                320
    int main()
    {
      long jiffies;
      jiffies = syscall( __NR_getjiffies );
      printf( "Current jiffies is %lx\n", jiffies );
      return 0;
    }
    As you can see, the syscall function includes as its first argument theindex of the system call table to use. Had there been any arguments topass, these would be provided after the call index. Most system callsinclude a SYS_ symbolic constant to specify their mapping to the __NR_indexes. For example, you invoke the index __NR_getpid with syscall as:
                    syscall( SYS_getpid )
    The syscall function is architecture specific but uses a mechanism totransfer control to the kernel. The argument is based on a mapping of__NR indexes to SYS_ symbols provided by /usr/include/bits/syscall.h(defined when the libc is built). Never reference this file directly;instead use /usr/include/sys/syscall.h.
    The traditional method requires that you create function calls thatmatch those in the kernel in terms of system call index (so that you'recalling the right kernel service) and that the arguments match. Linuxprovides a set of macros to provide this capability. The _syscallNmacros are defined in /usr/include/linux/unistd.h and have thefollowing format:
                    _syscall0( ret-type, func-name )
    _syscall1( ret-type, func-name, arg1-type, arg1-name )
    _syscall2( ret-type, func-name, arg1-type, arg1-name, arg2-type, arg2-name )
    User-space and __NR constants
    Note that in Listing 6 I've provided the __NR symbolic constants. Youcan find these in /usr/include/asm/unistd.h (for standard system calls).
    The _syscall macros are defined up to six arguments deep (although only three are shown here).
    Now, here's how you use the _syscall macros to make your new systemcalls visible to the user-space. Listing 6 shows an application thatuses each of your system calls as defined by the _syscall macros.
    Listing 6. Using the _syscall macro for user-space application development
                   
    #include
    #include
    #include
    #define __NR_getjiffies                320
    #define __NR_diffjiffies        321
    #define __NR_pdiffjiffies        322
    _syscall0( long, getjiffies );
    _syscall1( long, diffjiffies, long, ujiffies );
    _syscall2( long, pdiffjiffies, long, ujiffies, long*, presult );
    int main()
    {
      long jifs, result;
      int err;
      jifs = getjiffies();
      printf( "difference is %lx\n", diffjiffies(jifs) );
      err = pdiffjiffies( jifs, &result );
      if (!err) {
        printf( "difference is %lx\n", result );
      } else {
        printf( "error\n" );
      }
      return 0;
    }
    Note that the __NR indexes are necessary in this application becausethe _syscall macro uses the func-name to construct the __NR index(getjiffies -> __NR_getjiffies). But the result is that you can callyour kernel functions using their names, just like any other systemcall.
    Back to top
    Alternatives for user/kernel interactions
    System calls are an efficient way of requesting services in the kernel.The biggest problem with them is that it's a standardized interface. Itwould be difficult to have your new system call added to the kernel, soany additions are likely served through other means. If you have nointent of mainlining your system calls into the public Linux kernel,then system calls are a convenient and efficient way to make kernelservices available to user-space.
    Another way to make your services visible to user-space is through the/proc file system. The /proc file system is a virtual file system forwhich you can surface a directory and files to the user, and thenprovide an interface in the kernel to your new services through a filesystem interface (read, write, and so on).
    Back to top
    Tracing system calls with strace
    The Linux kernel provides a useful way to trace the system calls that aprocess invokes (as well as those signals that the process receives).The utility is called strace and is executed from the command line,using the application you want to trace as its argument. For example,if you wanted to know which system calls were invoked during thecontext of the date command, type the following command:
    strace date
    The result is a rather large dump showing the various system calls thatare performed in the context of a date command call. You'll see theloading of shared libraries, mapping of memory, and -- at the end ofthe trace -- the emitting of the date information to standard-out:
    ...
    write(1, "Fri Feb  9 23:06:41 MST 2007\n", 29Fri Feb  9 23:06:41 MST 2007) = 29
    munmap(0xb747a000, 4096)        = 0
    exit_group(0)                        = ?
    $
    This tracing is accomplished in the kernel when the current system callrequest has a special field set called syscall_trace, which causes thefunction do_syscall_trace to be invoked. You can also find the tracingcalls as part of the system call request in./linux/arch/i386/kernel/entry.S (see syscall_trace_entry).
    Back to top
    Going further
    System calls are an efficient way of traversing between user-space andthe kernel to request services in the kernel-space. But they are alsotightly controlled, and it's much easier simply to add a new /proc filesystem entry to provide the user/kernel interactions. When speed isimportant, however, system calls are an ideal way to squeeze thegreatest performance out of your application. See
    Resources
    to dig even further into the SCI.
    Resources
    Learn

    • In "
      Access the Linux kernel using the /proc filesystem
      " (developerWorks, March 2006), learn how to develop kernel code thatuses the /proc file system for user-space/kernel communication.
    • Read "
      Sysenter Based System Call Mechanism in Linux 2.6
      " from Manugarg to get a detailed look at the system call gate betweenthe user-space application and the kernel. This paper focuses on thetransition mechanisms provided in the 2.6 kernel.
    • This paper details the
      assembly language linkages
      between the user-space and the kernel.
    • The
      GNU C Library
      (glibc) is the standard library for GNU C. You'll find the glibc forLinux and also for numerous other operating systems. The GNU C Libraryfollows numerous standards, including the ISO C 99, POSIX, and UNIX98.You can find more information about it at the
      GNU Project
      .
    • The
      Linux syscalls man page
      gives a complete list of system calls available in Linux.
    • Wikipedia provides an
      interesting perspective on system calls
      , including history and typical implementations.
    • While a bit dated, a
      Kernel application program interface (API)
      is provided that documents many of the kernel functions available forgeneral (in-kernel) use. This includes the user-space memory-managementfunctions as well as many others.
    • In the
      developerWorks Linux zone
      , find more resources for Linux developers.
    • Stay current with
      developerWorks technical events and Webcasts
      .

    Get products and technologies


    • Order the SEK for Linux
      , a two-DVD set containing the latest IBM trial software for Linux fromDB2®, Lotus®, Rational®, Tivoli®, andWebSphere®.
    • With
      IBM trial software
      , available for download directly from developerWorks, build your next development project on Linux.

    Discuss

    • Check out
      developerWorks blogs
      and get involved in the
      developerWorks community
      .

    About the author


    M. Tim Jones is an embedded software architect and the author of GNU/Linux Application Programming, AI Application Programming, and BSD Sockets Programming from a Multilanguage Perspective.His engineering background ranges from the development of kernels forgeosynchronous spacecraft to embedded systems architecture andnetworking protocols development. Tim is a Consultant Engineer forEmulex Corp. in Longmont, Colorado.

  • <script>window._bd_share_config={"common":{"bdSnsKey":{},"bdText":"","bdMini":"2","bdMiniList":false,"bdPic":"","bdStyle":"0","bdSize":"16"},"share":{}};with(document)0[(getElementsByTagName('head')[0]||body).appendChild(createElement('script')).src='http://bdimg.share.baidu.com/static/api/js/share.js?v=89860593.js?cdnversion='+~(-new Date()/36e5)];</script>
    阅读(332) | 评论(0) | 转发(0) |
    0

    上一篇:C/C++ 副作用和序列点

    下一篇:GNU-ld链接脚本浅析

    相关热门文章
    • SHTML是什么_SSI有什么用...
    • shell中字符串操作
    • 卡尔曼滤波的原理说明...
    • 关于java中的“错误:找不到或...
    • shell中的特殊字符
    • linux dhcp peizhi roc
    • 关于Unix文件的软链接
    • 求教这个命令什么意思,我是新...
    • sed -e "/grep/d" 是什么意思...
    • 谁能够帮我解决LINUX 2.6 10...
    给主人留下些什么吧!~~