内核线程

来源：互联网发布：微软的飞行仿真软件编辑：程序博客网时间：2024/05/02 12:43

Foreword

What is an embedded device? Is it simply a low-resource "PC", so all you need is a scaled down Linux and off you go? Not really. In this article, a few thoughts on principal differences will be brought up that may need direct addressing by the embedded Linux community to foster the use of Linux in the embedded market.

Some key issues to be considered include:

Trusted kernel vs. untrusted user-space

Life time configurable vs. closed can-systems

Dedicated resources vs. dynamic user-demands

User-space applications vs. kernel-space applications

Real-time demands in embedded devices

Looking at these issues and considering possibilities to improve how Linux could satisfy the demands of embedded systems in a better way is the objective of this article. The possible solutions are based on RTLinux systems, as real-time (RT) demands are fairly common in embedded devices; but the mechanisms presented apply to any RT or non-RT embedded Linux.

Comments/suggestions/flames would be appreciated (see talkback), as I believe a discussion on facilities specific to embedded systems in the Linux kernel needs to take place.

Introduction

Linux is a desktop and server system and with some provisions an excellent embedded OS/RTOS. The basic concepts of embedded systems as quite a few development kits show is to down scale a desktop Linux system to something that will kind of fit on Flash [Note 1], DiskOnChip [Note 2], CompactFlash [Note 3] or a floppy or . . . But the concept stays a desktop concept!

In the past there have been fairly few dedicated embedded tools under development -- notably busybox was a clear approach to the size of user-land for embedded systems. The problem clearly is that there were neither clear standards available for developing such a user-land nor were the guiding concepts adopted, even more so for device drivers and interfacing to specialized hardware.

Trusted vs. untrusted

Operating system designers for desktop systems have a simple and obviously sane concept -- the kernel must be trusted code and we don't trust anything that user-space apps want to do, so make sure user-land can do anything and the kernel will manage it somehow without bringing down the entire system (well some don't manage that concept to well . . . but Linux does). Basically this means the user-space application must ask the kernel for any resource it wants to use and the kernel has the final decision to grant or reject a request [Note 4]. The cost of this is that a kernel-user-space and virtual filesystem boundary must be crossed to access the platforms resources. The expense of this crossing the kernel-user-space boundary can become fairly considerable especially when a dedicated system primarily need to manage one specific piece of hardware [] -- as is fairly common with dedicated embedded systems.

Embedded systems need strategies to dilute the kernel-user-space boundary for dedicated systems to allow optimized use of low-resource systems.

This will not be true for all devices but for many, and the reasoning is simply that the guiding assumption for trusted and untrusted code is false in embedded systems.

Embedded systems must validate a, in most cases, constant set of user-space applications -- this set is not variant as in desktop systems

Embedded systems are often forced to introduce application specific kernel modules or kernel extensions thus the assumption of the "trusted-kernel-code" does not hold

Embedded systems must be tested as a entity, that is hardware kernel and user-space thus the trusted code concept can be safely extended into user-space in many cases

How about the user installed application XYZ?

A normal Linux desktop system is life time configurable, there is no way for the distribution designer to know what will eventually run on the desktop system and for most (all) distributions it is not even feasible that all possible combinations of installed software and configuration could ever be tested and validated in a finite time (assuming time being u_32) so for Linux desktop distribution the assumption of untrusted user-space libs and applications is valid and justifies the trusted kernel vs. untrusted user-land paradigm. Embedded systems are different -- they are closed cans in most cases. They may or may not have a user-interface but that is limited and an integral part of the device design (it better not simply offer the user a root-shell . . . .). For dedicated embedded systems one can (or must) assume

we know all possible actions a user will be taking on the device

we know all installed libs and applications

the user can't modify the core system

updated are done on a system scope not an application scope

As said above, the claims here are claimed valid for most -- not all -- embedded systems. Knowing all actions possible generally means that we have a dedicated set of user-interfaces, something like a console device running some sort of operator interface a web-interface or SNMP management but not a simple login shell or unrestricted network access as must be assumed on a Linux desktop. The design of a dedicated system requires this limitation as generally it is not possible to monitor an embedded system the way a desktop system in an office is monitored. An embedded systems security policy must begin with limiting the possible actions on the system to prevent misses in the first place.

On embedded devices end-users will normally not install additional libs or applications -- there might be packages provided from the system vendor but those must be tested and validated before offering them to the end-user which is assumed to know nothing about the underlying OS -- you probably would not be too happy if you would need a lesson in using vi before you can adjust the settings in your DSL modem. Further most embedded systems will use hardware setups that would not easily allow adding such applications or libs, like booting from a flash image and not a simply reachable filesystem. Commonly this is a requirement that the user can't modify the core system and updates are done on a system level, i.e. by flashing a new image or downloading an entire filesystem. No vendor of embedded systems could give much of a guarantee for the device if users where allowed to install applications or libraries.

So the second claim is:

user-land in embedded systems can be considered trusted code

Resource management in dedicated devices

So this is the issue of dedicated resources vs. dynamic user-demands, are embedded systems not using dynamic memory resources? allocating devices to applications and freeing them again? Well yes -- but the absolute amount of resources necessary is generally fully determined at system design, in fact you need to be able to guarantee that the system will not fail due to low-memory or too many user-space tasks running at any time -- so in this sense the dynamic resources are bounded.

This does not mean that the kernel does not need to check any more if memory is available before returning a pointer to memory but it does mean that deadlock due to resource over-commitment can be prevented, in fact in most cases it must be prevented to be able to give any guarantees for the embedded system.

So the next claim is:

Embedded systems know their worst-case resource demands.

So stuff it all into kernel space?

There have been heated debates on kernel space applications -- NFS moved into the kernel and HTTP accelerations. The general consensus for desktop systems is leave it in user-space and only move as little as absolutely necessary into kernel-space. Even applications where the kernel approach could seem easy to justify like pppd moved almost everything to user-land and minimized the services in kernel space. So am I proposing kvi -- the kernel editor to directly edit the kernels task list? Not quite.

Dedicated embedded devices often include special drivers for the dedicated hardware, in many cases it is not sensible to introduce too many layers of abstraction which are justified if one assumes that a class of devices is to be introduced and the usefulness is not limited to a specific system setup. For embedded devices it is quite common that a device driver is totally useless even on a mildly modified system and/or multiple hardware entities are "hardwired" in software, something like a ISDN application/driver that logs to NVRAM. In such cases a lot of code moves into kernel space with the limitations of the kernel space API this can sometimes result in very strange code.

The problem with the split in kernel-space and user-space, under the assumption that user-space is trusted code, is that the kernel-space user-space (+ VFS) boundary is fairly expensive and for dedicated devices resource demands could be reduced if this boundary is diluted.

Last claim (for now):

embedded systems not only extend the trusted code into user-land they also need to dilute the kernel-space user-space boundary moving applications into kernel space.

Resources available to dilute the rt, kernel and user-space boundary

The good news: Linux has all the concepts available that one would need to at least massively reduce the expense of kernel-space user-space crossing. The core issues here are

zero copy strategies by mapping memory between kernel and user-space

kernel_threads, running applications in the kernels memory space

source code availability allowing to add your own system calls easily or modify the kernel to fit your device better.

open technology -- the underlying concepts in the Linux kernel are well known and documented thus the repertoire that is available to the programmer is enhanced

Its all there -- it just needs to be used.

Resource Linux provides

RTLinux has been focused on developing a POSIX compliant RTOS layer that operates below Linux -- within this development communication between RT-threads and kernel as well as user-space have been quite limited, in part due to the inherent restrictions of a RTOS and in part due to the restrictions imposed by POSIX, or rather the combination of these two sets of restrictions.

As RT-threads are operating in the same address-space as the Linux kernel itself it seems natural to investigate what capabilities within the Linux kernel could be made available to RT-threads as to enhance communication paths too and from user-space and non-rt kernel-space. here few of these, very non-portable, absolutely non-POSIX, paths are described. The main resources of interest being:

Tasklets

Kernel Threads

Software interrupts

Sharing Memory

Accessing Non-RT facilities from rt-space

'misusing' System calls

Presenting a few simple examples the mechanisms and problems with utilizing these capabilities of the Linux kernel from within RT-context and accessing rt and kernel resource from user-space are discussed.

In many embedded system the main challenge for the programmer is to find the correct split between what is to be executed in rt-context and what can be executed in non-rt context as well as what to put in kernel and what in user-space. RTLinux has been splitting tasks into hard-realtime rt-context and non-rt user context, in many cases a more fine gain split is desired, allowing hard-rt and different levels of non-rt execution. The task of designing this split requires a basic understanding of the facilities available on the non-rt side of the system and how to communicate with these. In this paper the focus is on accessing Linux kernel facilities from rt-threads and user-space tasks communication with kernel and rt-space.

Tasklets

Tasklets [Note 5] are the replacement of the bottom half concept that was in use up to kernel 2.2.X (in 2.4.X BH are still supported -- but are implemented via tasklets).

The main properties of tasklets:

tasklets can be scheduled with different priorities in Linux

tasklets don't need to be reentrant

the same tasklet will never run in parallel on SMP

scheduling a tasklet multiple times before it actually runs does not cause it to run multiple times.

different tasklets may run on different CPU's at the same time.

tasklets run in interrupt context -- thus with all limitations of an interrupt handler.

These properties make it fairly simple to write tasklets. The concept behind them is the same as with the former BH handlers, keep the interrupt or rt-thread small and put all processing steps that may be delayed into a tasklet.

Important for RTLinux is that tasklets are run at every context switch to Linux, they are not delayed until the next hardware interrupt. Tasklets will run before any user-space application will get a chance to run, thus they are a high priority non-rt task that can be easily scheduled from within a rt-thread by calling schedule_tasklet() or schedule_hi_tasklet, whereby the later has higher priority than the first.

Simple tasklet example

This first example is basically only a slightly modified version of RTLinux's examples/hello/hello.c the main change is the introduction of the tasklet code itself and the scheduling of the tasklet. Note that tasklets can be scheduled from rt-context and from Linux kernel context without any conflict as the scheduling is performed by bit-operations which are atomic.

A tasklet is declared with the DECLARE_TASKLET() macro and scheduled with schedule_tasklet or schedule_hi_tasklet. The tasklet related macros are found in linux/interrupts.h.

. . .

#include

int myint_for_something=1;

. . .

void tasklet_function(unsigned long);

char tasklet_data[64];

DECLARE_TASKLET(test_tasklet,

tasklet_function,

(unsigned long) &tasklet_data);

. . .

void *

start_routine(void *arg)

{

struct sched_param p;

p . sched_priority = 1;

pthread_setschedparam (pthread_self(),

SCHED_FIFO,

&p);

pthread_make_periodic_np (pthread_self(),

gethrtime(),

500000000);

while (1) {

pthread_wait_np ();

rtl_printf("RT-Thread; my arg is %x/n",

(unsigned) arg);

sprintf(tasklet_data,"%s /"%x/"",

"Linux tasklet received RT-Thread arg",

(unsigned) arg);

tasklet_hi_schedule(&test_tasklet);

}

return 0;

}

void

tasklet_function(unsigned long data)

{

struct timeval now;

do_gettimeofday(&now);

printk("%s at %ld,%ld/n",

(char *) data,

now.tv_sec,

now.tv_usec);

}

int init_module(void) {

sprintf(tasklet_data,"%s/n",

"Linux tasklet called in init_module");

tasklet_schedule(&test_tasklet);

. . .

}

This simple example aside from showing the basics of implementing a tasklet also allows to see the delay times between rt-threads and tasklets, if run with only one short rt-thread as in this example coupling is naturally very good -- to see the really coupling one could run this module together with the actual target application to get a fairly close picture of the delays introduced.

Scheduling tasklets from rt-context

From linux/interrupt.h:

/* PLEASE, avoid to allocate new softirqs,

if you need not _really_ high frequency

threaded job scheduling. For almost all

the purposes tasklets are more than

enough. F.e. all serial device BHs et

al. should be converted to tasklets, not

to softirqs.

The tasklet priority of a tasklet scheduled with schedule_hi_tasklet is above the network subsystem, so if you over due it you actually can cripple your network performance . . . , schedule_tasklet has a priority just below the network subsystem so a network overload can delay your tasklet substantially.

With the kernel functions tasklet_disable and tasklet_enable the execution of a tasklet can be suspended. If a tasklet was scheduled and is disable before it was executed it will be executed when tasklet_enabled is called. For the full set of kernel functions available for tasklets check linux/interrupt.h, note though that you must check if these are safe to be called from rt-context, for this paper checks were done against linux 2.4.4.

To ensure synchronization of tasklet scheduling when disabling tasklets within rt-context with tasklet_disable one must install a cleanup handler to re-enable the tasklet on termination of the thread so that a scheduled tasklets can be executed and the tasklet structure can be removed on module exit.

. . .

void

tasklet_cleanup(void *arg)

{

tasklet_enable(&test_tasklet);

rtl_printf("cleanup handler/n");

}

void *

start_routine(void *arg)

{

. . .

pthread_cleanup_push(

tasklet_cleanup,

0);

while (1) {

pthread_wait_np ();

. . .

if(i==20){

tasklet_disable(&test_tasklet);

rtl_printf("killed tasklet/n");

}

tasklet_hi_schedule(

&test_tasklet);

i++;

}

pthread_cleanup_pop(0);

return 0;

}

This somewhat artificial code shows the basic setup -- a cleanup handler to re-enable the tasklet is installed and within the main loop of the rt-thread tasklet_disable is called to disable the test_tasklet, the cleanup handler is executed on termination of the while(1) loop and re-enables tasklets.

Naive rt-allocator

As a second somewhat more interesting example of using a tasklet from rt-context a naive rt_allocator framework is presented. The tasklet is called from a rt-function that suspends the running thread rtl_malloc(size), this allocator will call a tasklet to do the actually memory allocation and then signal RTL_SIGNAL_WAKEUP back to the rt-thread when the allocator thread is done. The allocation thus is non-realtime and the realtime thread needs to check if memory actually was allocated successfully or not. Note that the call to kmalloc in the tasklet uses the flags GFP_ATOMIC which is necessary, if GFP_KERNEL were used the tasklet could sleep and thus the system would hang.

This allocator has a automatic initialized array of pointers to char set and will allocate a requested size of memory assigned to these pointers. These are globally available so the tasklet can signal a wakeup to the rt-thread by setting the appropriate bit in the threads pending signal mask. instead of setting the bit directly one could also call pthread_kill(rt_thread,RTL_SIGNAL_WAKEUP), if modules are split between kernel and rtl context it sometimes is a problem to include rtl API-calls that require rtl-header files so in those cases directly accessing the signal pending mask solves the problem.

. . .

#include /* kmalloc */

void allocator_function(unsigned long arg);

#define BUFFERS 128

This allocator has a static array of pointers for the buffers so the absolute memory that could be allocated is bounded by 128 pointers to kmalloc'ed areas of each maximum 128kByte.

static char *iptr[BUFFERS];

static int iptr_idx;

DECLARE_TASKLET(allocator_tasklet,i

allocator_function,0);

void

allocator_function(unsigned long arg)

{

struct timeval now;

do_gettimeofday(&now);

printk("alloc %ld at %ld,%ld/n",

(unsigned long)arg,

now.tv_sec,

now.tv_usec);

iptr[iptr_idx]=kmalloc(

(unsigned long)arg,

GFP_ATOMIC);

if(iptr[iptr_idx] == NULL){

printk("Allocation failed/n");

}

else{

memset(iptr[iptr_idx],

(unsigned long)arg);

printk("Allocated 0'ed buffer %d/n",

iptr_idx);

iptr_idx++;

}

If we get the memory that we wanted then wake up the rt-thread that requested memory, note that this is done here by directly flipping the bit in the signal vector of the thread -- one could have used a pthread_kill() as well, but as we are in kernel space we have direct access as well.

set_bit(RTL_SIGNAL_WAKEUP,

&rt_thread->pending);

}

This is a wrapper function to call the tasklet that will then call kmalloc -- basically it will schedule the tasklet and suspend itself -- this is made to be a cancellation point as well by calling pthread_testcancel() at the end -- which makes sense for functions that could theoretically be delayed an inappropriate time, causing a different thread to cancel this threads execution. Before actually calling the tasklet though, it first is checked if any buffer pointers are left. This hardly is really a usable allocator -- but it should outline the basic resources that would be needed to build an application specific allocator.

unsigned long

rtl_kmalloc(unsigned long size)

{

int idx;

pthread_t self = pthread_self();

RTL_MARK_SUSPENDED (self);

rtl_printf("requesting %ld bytes/n",

(unsigned long)size);

idx = iptr_idx;

if(idx < BUFFERS){

allocator_tasklet.data=size;

tasklet_hi_schedule(

&allocator_tasklet);

rtl_schedule();

pthread_testcancel();

if(iptr[idx] == NULL){

return -1;

}

else{

return idx;

}

else{

return -1;

}

return 0;

}

In the actual application thread, a period rt-thread with a period of half a second (500000000 nanoseconds), shown below, nothing useful is being done -- only allocate all buffers until we have none left. Once we are out of buffers this thread simply prints an error to the kernel message ring-buffer and goes on. To few the messages again use dmesg.

void *

start_routine(void *arg)

{

struct sched_param p;

int ret;

unsigned long i,size,block;

p . sched_priority = 1;

pthread_setschedparam (

pthread_self(),

SCHED_FIFO,

&p);

pthread_make_periodic_np(

pthread_self(),

gethrtime(),

500000000);

size=0;

block=128;

i=1;

while (1) {

pthread_wait_np ();

size=block*i++;

rtl_printf("request %ld bytes/n",

size);

ret=rtl_kmalloc(size);

This here is the actual problem for rt-threads with dynamic resources -- you never can have the guarantee that you get what your requested -- apps must check that they actually got something, and the problem is designing an exit strategy in case you get no resources that will not break your rt-application. So for any mission-critical task dynamic resources are a fundamental problem.

if(ret == -1){

rtl_printf("No more buffers available/n");

}

else{

rtl_printf("allocated buffer %d/n",ret);

}

return 0;

}

The mandatory init_module and cleanup_module, just get set all pointers to NULL and create the thread. In cleanup_module free all non-NULL buffers and delete the rt-thread.

int

init_module(void)

{

int i;

for(i=0;i iptr[i] = NULL;

}

return pthread_create (

&rt_thread,

NULL,

start_routine,

0);

}

void

cleanup_module(void)

{

int i;

for(i=0;i if(iptr[i]!= NULL){

kfree(iptr[i]);

printk("Freeing buffer %d/n",i);

}

pthread_delete_np (rt_thread);

}

Kernel Threads

Kernel threads are a mechanism in the Linux kernel that allow threads of execution to run in the kernels memory space (kernel context) but be visible as regular tasks that can receive signals and execute user-space calls with certain limitations/provisions. Here we are not so much interested with the details of kernel threads within the Linux kernel itself but rather with how to interface rt-threads via kthreads to non-rt kernel-space and user-space.

Simple example

This first example is not rt-specific, it only should give a framework of a kthread, this module declares a kernel function exec_cmd that is local to this module, a kernel thread is initiated passing this function as the routine to execute and a string via the arg pointer. The call to kernel_thread() initializes a task structure that is visible from user space (the pid of the process is printk'ed) and the thread routine (exec_cmd) is executed once. As we did not set up a specific context for this thread it runs in the inherited context of insmod and thus prints to the current console via the echo command. The thread routine is comparable to a regular user-space function that would call execve except for the privileges and the enabling of the kernels data section to store command arguments in set_fs(KERNEL_DS). This also shows one clear danger of kernel threads -- if they are not set up carefully with respect to privileges they can result in a serious security problem -- for details on this give the kmod kernel_thread implementation in kernel/kmod.c a look.

What we need specifically for kernel_threads:

#deine __KERNEL__

#deine __KERNEL__SYSCALLS__

#include

Now on to the actual code for a kernel_thread -- as usual . . . we start with a 'Hello World' . But doing this from kernel space is not quite as simple. The setup we are going to use to write to your current console is to invoke /bin/echo and let it print the infamous string, to be able to use /bin/echo we must set up a minimum environment first.

int errno;

char cmd_path[256] = "/bin/echo";

static int

exec_cmd(void * kthread_arg)

{

struct task_struct *curtask = current;

To set up a minimum environment we need to fill out at least TERM and PATH but note that we still inherit the environment of who ever launched insmod of this module! sounds dangerous? -- it is!

static char * envp[] = {

"HOME=/root ",

"TERM=linux ",

"PATH=/bin",

NULL };

char *argv[] = {

cmd_path,

kthread_arg,

NULL };

int ret;

Give the kthread all effective privileges and allow it to use the kernels data segment KERNEL_DS to store the arguments to execv.

curtask->euid = curtask->fsuid = 0;

curtask->egid = curtask->fsgid = 0;

cap_set_full(curtask->cap_effective);

set_fs(KERNEL_DS);

Now we only need to call execve, which will not return unless it fails, so on success the kernel_thread terminated, on failure we printk and terminate it our selves with return.

printk("calling execve for %s /n",

cmd_path);

ret = execve(cmd_path, argv, envp);

/* if we get here - execve failed */

printk(KERN_ERR "%s failed (%d)/n",

cmd_path,

ret);

return -1;

}

The creation of the kernel_thread is done in init_module, and we printk the PID of the created kernel_thread so we could actually check it with the ps tools to see that it is running (due to the short life-time of our 'hello world' thread though, you will hardly be able to see it . . . ). Note that we use pid as return value on failure -- init_module will terminate and dealloc'ed any resources allocated automatically by the kernel if init_module returns with anything else but 0 (but the kernel will not free any resources you explicitly requested before init_module failed -- those are your job to free . . . ).

int

init_module(void)

{

pid_t pid;

char kthread_arg[]="Hello World!";

pid = kernel_thread(

exec_cmd,

(void*) kthread_arg,

0);

if (pid < 0) {

printk(KERN_ERR /

"fork failed, errno %d/n",

-pid);

return pid;

}

printk("fork ok, pid %d/n",

pid);

return 0;

}

There is nothing to be done in cleanup_module, we did not allocate any resources that will not be freed automatically -- note that we don't explicitly reclaim the kernel_thread in any way, the call to execve or, on failure, the return, terminated it and freed any resources associated with this process.

Communicating with rt-threads

As kernel_threads are processes that are listed in the task-list with a unique PID, one can send UNIX signals as the next example will show. One also can execute any user-space application with a call to execv from a kernel_thread, thus potentially you can use any interprocess communication mechanism to communicate from a user-space application to a kernel_thread. To directly access user-space facilities in kernel_threads one can't use the normal API, like open/read/write/close on a file, an example of copying a file from within a kernel_thread is given to show this possibility.

Buddy thread concept

One of the many traditional communication mechanisms are signals. As rt-threads are operating in kernel memory space and are not available via the linux kernel task-structure direct unix-signals from user-space applications to rt-threads are not possible. Possibilities shown in RTLinux examples where to install rt-handlers for FIFOs and trigger signals via these rt-FIFOs. In the following code an alternative concept that is intended to be expanded in the future is shown. This concept introduces a buddy-thread to each rt-thread that runs in kernel space as a kernel thread and thus is reachable directly from user-space via regular unix-signals. The signal is still a two hop job, a signal is set to the kthread identified by the pid of the kernel process and passed on to the rt-thread via directly modifying the pending signals mask of the rt-thread structure or by using the RTLinux-API pthread_kill and pthread_delete_np. This approach is also necessitated by the fact that the UNIX signal set and the RTLinux signal set are not the same -- so the kernel_thread also remaps signals to something known in rt-context, in our case everything is a wakeup except a terminating signal which is a terminating rt-signal as well.

first we need some header files, the first for the RTLinux specific signal numbers (RTL_SIGNAL_WAKEUP in our case) and linux sched.h for the linux signal handling routines like flush_signals().

#include

This time the kernel_thread will not terminate on its own on time -- in our first example we simply assumed that the kernel_thread will have terminated before we can type rmmod khello.o so we did not bother weather the kernel_thread was still running or not. In this example we must take care of this and to do this we use a global variable state that allows cleanup_module to check and wait for the kernel_thread to terminate.

#define ACTIVE 1

#define TERMINATED 0

static int state=ACTIVE;

This rt-thread is no more than the /examples/hello/hello.c from RTLinux again, the only noteworthy difference its not a periodic thread but the thread suspends it self and waits until it is woken up by a signal again.

static void *

rtthread_code(void *arg)

{

struct sched_param p;

p . sched_priority = 1;

pthread_setschedparam (

pthread_self(),

SCHED_FIFO,

&p);

while (1) {

rtl_printf("RT-Thread woke up/n");

pthread_suspend_np(

pthread_self());

}

return 0;

}

The kernel thread was not created by a executable with a name that the kernel can use to name the process so we must explicitly fill out the process name in the process structure -- so first we grab the process structure with a call to current (which is defined to get_current()) and clear the filed for the name.

static int

kthread_code( void *data )

{

struct task_struct *kthread=current;

char thread_name[NAME_LEN];

memset(thread_name,0,NAME_LEN);

We noted in the first example that the environment of the kernel_thread needs to be initialized explicitly, in this example we use a call to demonize() to make this thread inherit the environment from init (see kernel/exit.c for details). Following is a brute force synchronization using the global variable rt_thread_state to ensure that we don't enter the while(1) main loop of the kernel_thread before the rt-thread that we intend to send signals to is set up.

daemonize();

while (!rt_thread_state) {

current->state=TASK_INTERRUPTIBLE;

schedule_timeout(1);

}

This might seem a bit strange but here we take the address of the rt-thread as the unique name, its the simples solution as we are guaranteed that the rt-threads address is unique. And to make sure this kernel_thread does not simply eat up our CPU we set its nice value to the lowest possible value, by setting it directly in the task-structure, basically this is what renice does via a system call. After those setup steps we flush the signal mask -- just to make sure there are no pending signals.

sprintf(thread_name,"rtl_%lx",

(unsigned long)&rt_thread);

strcpy(kthread->comm,

thread_name);

kthread->nice=20;

spin_lock_irq(&kthread->sigmask_lock);

sigemptyset(&kthread->blocked);

flush_signals(kthread);

recalc_sigpending(kthread);

spin_unlock_irq(&kthread->sigmask_lock);

This part is a more interesting now -- this kernel_thread will receive signals and by periodically (but non-realtime -- so very 'soft' periodicity) checking if it has any pending signals -- if there are and they are not a terminating signal, then wake up the rt-thread else terminate the rt-thread and then terminate it self. Note that pthread_delete_np basically simply ensures that the cancellation signal can be delivered by directly resetting threads blocked-signal mask and the calls pthread_cancel on the rt-thread.

while(1){

interruptible_sleep_on(&wait);

if(sigtestsetmask(

&kthread->pending.signal,

sigmask(SIGKILL))){

pthread_delete_np(rt_thread);

break;

}

else{

pthread_kill(rt_thread,

RTL_SIGNAL_WAKEUP);

spin_lock_irq(

&kthread->sigmask_lock);

sigemptyset(&kthread->blocked);

flush_signals(kthread);

recalc_sigpending(kthread);

spin_unlock_irq(

&kthread->sigmask_lock);

}

When the kernel_thread exits set state to TERMINATED so cleanup_module can proceed.

state=TERMINATED;

return(0);

}

Init module initializes the wait queue for the kernel_thread to wait on sets up the kernel_thread itself and then invokes pthread_create on the rt-thread function and uses the return value of pthread_create to 'signal' that the rt-thread is active via the global variable rt_thread_state, so that the kernel_thread can continue initializing.

int

init_module(void)

{

init_waitqueue_head(&wait);

kthread_id=kernel_thread(

kthread_code,

NULL,

CLONE_FS|CLONE_FILES|CLONE_SIGHAND);

printk("rt_sig_thread (pid %d)/n",

kthread_id);

rt_thread_state = pthread_create(

&rt_thread,

NULL,

rtthread_code,

0);

return 0;

}

Cleanup module more or less reverses the process from init_module -- we first delete the rt-thread, and then we kill the kernel_thread by sending it a SIGKILL, but before we can proceed on we need to wait for the kernel_thread to actually terminate -- again 'signaled' by a global variable state, to ensure that we would not wait for ever and hang up the system, we test 10 times and call the scheduler in between -- if the kernel_thread did not exit on time and rmmod continues then we get a kernel oops.

void

cleanup_module(void)

{

int ret;

pthread_delete_np (rt_thread);

ret = kill_proc(kthread_id,

SIGKILL,

1);

if (!ret) {

int count = 10 * HZ;

while (state && --count) {

current->state=TASK_INTERRUPTIBLE;

schedule_timeout(1);

}

printk("rt_sig_thread exit/n");

}

Accessing files from kernel_threads

What this does: launch a kthread that copies the src file to kernel space directly (not using copy_from_user) the file name is passed as kernel parameter. it the simply printk's it and thats it. Note that it will only read the first PAGE_SIZE (4K) from the file -- but those checks are not relevant for the principal here. Note also that doing things like this really can introduce serious security problems so be very carful with this -- some interesting notes on security can be found int kernel/kmod.c

For any really application having the src file hard-coded is probably not too useful -- for testing we allow src to be a module parameter which probably makes little sense for any real code. If you don't pass it a src=FILE_NAME at insmod it will fail and not do anything.

char *src = NULL;

MODULE_PARM(src,"s");

File access is as root:root! need to open up kernel space set_fs(KERNEL_DS) for parameters -- which is potentially dangerous!

int

kspace_init(int *uid,int *gid,mm_segment_t *fs)

{

*uid=current->fsuid;

*gid=current->fsgid;

current->fsuid=current->fsgid=0;

*fs=get_fs();

set_fs(KERNEL_DS);

return 0;

}

reset fs-context again. up to the application programmer to preserve it and reset it properly . . . return 0 for now -- what to do on error?

TODO: error handling -- what? panic on failure ;)

int

kspace_releas(int uid, int gid, mm_segment_t fs)

{

set_fs(fs);

current->fsuid=uid;

current->fsgid=gid;

return 0;

}

KernelMore -- read file (one page at most) on the filesystem from within kernel space.

TODO: check for directories -- currently if src if of type dir we still read some garbage. Maybe we should use init_fs context here -- not sure what is a really safe solution -- if you do let me know -- if you don't be very carful what you are doing!

int

kmore(char *data_file)

{

struct file *fd0;

int retval;

char *buffer;

unsigned long page;

printk("kmore: reading %s/n",

data_file);

page = __get_free_page(

GFP_KERNEL);

if(page){

buffer=(char*)page;

fd0 = filp_open(

data_file,

O_RDONLY,

0);

if(IS_ERR(fd0)){

printk("Error %ld opening %s/n",

-PTR_ERR(fd0),

data_file);

}else{

if(fd0->f_op&&fd0->f_op->read){

retval=fd0->f_op->read(

fd0,

buffer,

PAGE_SIZE,

&fd0->f_pos);

if(retval<0){

printk("Read error %d/n",

-retval);

}

if(retval>0){

buffer[retval]='/0';

printk("read:/n %s/n",

buffer);

}

retval=filp_close(fd0,NULL);

if(retval){

printk("Error %d closing %s/n",

-retval,

data_file);

}

free_page(page);

}

}else{

printk("kmore: Out of memory/n");

}

return 0;

}

kernel space file read needs to save/restore context only if called from kthreads or init_module -- from within kernel context kmore should only need to setup KERNEL_DS, so this can be considered a kernel_thread simply calling a kernel function -- actually there is nothing special about this kernel_thread any more. If one has the kspace_init() and kspace_release() functions available then there is not much to do in a kernel_thread.

static int

kthread_code(void *kthread_arg)

{

int old_uid,old_gid;

mm_segment_t old_fs;

old_uid=old_gid=0;

old_fs=get_fs();

kspace_init(&old_uid,

&old_gid,

&old_fs);

kmore(src);

kspace_releas(old_uid,

old_gid,

old_fs);

return 0;

}

Init_module only set up a kernel_thread and checked that we got a positive pid returned thats it. No cleanups for this module for now we assume that kthread_code was done long before we can type rmmod kmore.

Interrupts

In normal Linux we have no way for user code to be directly triggert by an interrupt and we have no way to generate soft-interrupts to invoke our specific interrupt handlers (not talking about page-faults).

Soft-interrupts

This first example set up a periodic rt-thread and lets that trigger a soft interrupt that Linux then handler. The rt-thread is a periodic thread so we can measure the time it takes from pending the interrupt until the Linux kernel invokes the interrupt handler we supplied -- not that this interrupt handler is executing in non-rt Linux context so it is more or less pure curiosity on my side to measure this value.

declare a rt-thread and the soft_interrupt number to use.

static pthread_t thread;

static int my_softirq;

Next we need some time variables to record delays, and following this is the number of tests to run in each loop ntests, the size of rt-FIFO used and its file descriptor. Last is the PERIOD of the rt-thread, don't make this too small or the box will be busy only with the rt-thread and freeze . . .

hrtime_t call_time;

hrtime_t last_time=0;

hrtime_t max_diff=-200000;

hrtime_t min_diff=200000;

int ntests=500;

int count=0;

int fifo_size=4096;

int rtf_fd;

#define PERIOD 1000000

Next the code of the periodic rt-thread -- basically it is split in the initialization part -- setting up the threads parameters and opening the rt-fifo to write the data to user-space. In the inner while(1) of the rt-thread we don't use the non-POSIX pthread_wait_np() call here but use a POSIX compliant way to suspend the thread and wake it up periodically by sleeping for one PERIOD

void *

start_routine(void *arg)

{

hrtime_t abstime=clock_gethrtime(/

CLOCK_REALTIME) + 1000000000;

struct sched_param p;

p . sched_priority = 1;

pthread_setschedparam (

pthread_self(),

SCHED_FIFO,

&p);

rtf_fd = open("/dev/rtf0",

O_NONBLOCK);

if (rtf_fd < 0) {

printk("rtf0 open failed (%d)/n",

rtf_fd);

return (void *)-1;

}

while (1) {

clock_nanosleep (

CLOCK_REALTIME,

TIMER_ABSTIME,

hrt2ts(abstime),

NULL);

Here is where we are when the PERIOD expired and the thread wakes up, we record the current time, and then pend the soft-interrupt. Note that when using the POSIX compliant 'periodic' thread you must set the time for the next period up your self.

call_time = clock_gethrtime (

CLOCK_REALTIME);

rtl_global_pend_irq(my_softirq);

abstime += PERIOD;

}

return 0;

}

Soft-interrupt handlers are basically just interrupt handlers -- anything you can do in a regular interrupt-handler also can be done in a soft-interrupt handler and vice versa -- so don't try to sleep in an interrupt handler. Note that this interrupt handler is being executed in non-rt Linux kernel context -- still we are using RTLinux API calls clock_gethrtime and writing to rt-FIFOs, RTLinux and the Linux kernel are executing in the same address space which simplifies things -- but that does not mean you can call every RTLinux function from Linux kernel context . . .

So first thing to do is to record the time stamp when the interrupt handler was invoked, the call_time was recorded in the rt-thread before pending the soft-interrupt so we can calculate the handler invocation time. then we simply run it through min/max for ntests and report the maximum minimum of netsts via printk or write it to an rt-fifo and read it with a user-space application.

static void

my_handler(int irq,

void *ignore,

struct pt_regs *ignoreregs)

{

hrtime_t diff;

struct sample samp;

hrtime_t now = clock_gethrtime(

CLOCK_REALTIME);

diff = now - call_time;

if( count < ntests){

if( diff > max_diff ){

max_diff = diff;

}

if( diff < min_diff ){

min_diff = diff;

}

else

{

samp.min = min_diff;

samp.max = max_diff;

// printk("min: %8d, max: %8d/n",

(int)min_diff,

(int)max_diff);

write(rtf_fd,&samp,sizeof(samp));

count=0;

max_diff=-200000;

min_diff=200000;

}

count++;

}

int

init_module(void)

{

int ret;

rtf_destroy(0);

ret=rtf_create(0,fifo_size);

if(ret){

printk("rtf_create failed/n");

return -1;

}

ret=pthread_create(

&thread,

NULL,

start_routine,

0);

if(ret){

printk("pthread_create failed/n");

return -1;

}

We request a free soft-irq from RTLinux and check that the return is not negative indicating an error, any positive return value is the interrupt number although actually irq 0 could never be legal either. The string 'softirq jitter test' is what appears in /proc/interrupts in the interrupts usage field. Again init module will terminate and cleanup all resource the kernel had allocated if it returns a negative return value -- but it will not take care of any resources we explicitly requested so here we actually should cleanup the rt-FIFOs we allocated above on a failure, and delete the rt-thread created . . . well its only an example . . .

my_softirq=rtl_get_soft_irq(

my_handler,

"softirq jitter test");

if(ret < 0){

printk("get softirq failed(%d)/n",

-ret);

return -ret;

}

return 0;

}

{/small

/begin{verbatim}

void

cleanup_module(void)

{

rtl_free_soft_irq(my_softirq);

pthread_cancel(thread);

pthread_join(thread,NULL);

close(rtf_fd);

rtf_destroy(0);

}

The user space application that reads the minimum maximum values passed to the rt-fifo simply does an open of the rt-fifo and then read using regular user-space printf to print it to the screen, see /examples/measurements/monitor.c in the RTLinux examples directory for details.

RTLinux sigaction

RTLinux basically runs as kernel space modules that spawn rt-threads, the RTLinux sigaction allows to couple user-space tasks to interrupt events, the handler is executed in realtime the main routine is non-rt though -- this does allow a fairly simple coupling of events especially as you easily can share data between the sigaction handler and user-space (simple global variables will do in many cases).

Now on to an example of this. Your mouse interrupt may be different than the one shown here, to find the interrupt your mouse is using execute cat /proc/interrupts and replace the #define MOUSE_IRQ 12 by the number used by your mouse. The only header file you need to use RTLinux sigaction in your user-space application is the RTLinux_sigaction.h.

#include

#define MOUSE_IRQ 12

void my_handler(int);

struct rtlinux_sigaction sig, oldsig;

scount is a global variable in this example -- showing that you can share variables and thus data between the realtime handler and the non-realtime user-space application fairly easy. In the main routine itself we do no more than fill out the RTLinux sigaction structure pointing to our handler (see below).

int scount=0;

int main(void)

{

sig.sa_handler = my_handler;

sig.sa_flags = RTLINUX_SA_PERIODIC;

rtlinux_sigaction(MOUSE_IRQ,

&sig,

&oldsig);

After requesting the user to wiggle the mouse we sleep for a few seconds -- in this time the interrupt service routine will be called on every mouse interrupt and will increment scount, note that your mouse will not work properly if you are running this from a graphics environment during this test, which is why we only do this for a few seconds here . . . Basically this is coupling a user-space application to a hardware interrupt but still we need to split between what is done in the signal handler which can be seen as a interrupt service routine, as the rtlinux sigaction directly coupled this handler to the hardware interrupt.

printf("IRQ's pleas . . . /n");

sleep(3);

sig.sa_handler = RTLINUX_SIG_IGN;

rtlinux_sigaction(MOUSE_IRQ,

&sig,

&oldsig);

To make your mouse functional again -- the RTLinux sigaction handler is turned off again thus now the regular Linux handler will be called again, and the main routing of the user-space application continues -- printing the number of received interrupts and exiting.

printf("got %i mouse IRQ's/n",

scount);

return 0;

}

void my_handler(int argument)

{

scount++;

}

Sharing Memory betweeen kerne, rt and user-space

Many rt-processes need to share data with non-rt processes or the non-rt Linux kernel. For this purpose the rt-extensions to Linux made use of a shared memory module contributed by Tomas Motylevsky. In this section we are not concerned with this module which is part of RTAI and RTLinux, but rather with sharing memory via mechanisms available from the Linux kernel.

The one way to share memory with rt-space is to add a character device that need not provide more than the open/release and mmap function in the fops an use a kmalloc'ed area that then can be shared, alternatively one can make use of the memory devices in Linux, mmaping /dev/mem.

Simple mmap driver

The simplest method of having shared memory for your RTLinux system is to set up a dummy character device (or drop it into any real device that you need for your system) and provide a mmap call allowing to access a kmalloc'ed area via the mmap system call [Note 6]. The only special header file you will find in this example is #include rtl_signal.h needed for RTL_SIGNAL_WAKEUP. We use MAJOR 17 here, which could be in use as it is note one of the official experimental MAJOR numbers -- so check before using this.

static pthread_t rt_thread;

#define DRIVER_MAJOR 17

#define LEN 4096

static char *kmalloc_area;

The rt-thread is a simply modification of the examples/hello/hello.c code from RTLinux, instead of printing a constant thread the content of the buffer is printed periodically.

static void *

rtthread_code(void *arg)

{

struct sched_param p;

p . sched_priority = 1;

pthread_setschedparam (

pthread_self(),

SCHED_FIFO, &p);

pthread_make_periodic_np (

pthread_self(),

gethrtime(),

500000000);

while (1) {

pthread_wait_np();

rtl_printf("RT-Thread buffer=%s/n",

kmalloc_area);

}

return 0;

}

Nothing wild to be done in driver open/close -- only protect against removing the module by increasing/decreasing the modules usage counter.

static int

driver_open(struct inode *inode,

struct file *file )

{

MOD_INC_USE_COUNT;

return 0;

}

static int

driver_close(struct inode *inode,

struct file *file)

{

MOD_DEC_USE_COUNT;

return 0;

}

In this mmap function we only need to remap the pages that we kmalloc'ed in init_module. Below the mmap function you find the file-operation structure for this simple device -- basically we only want to provide mmap, but we need an open and close as well -- the these three are the minimum for sharing memory this way.

static int

driver_mmap(struct file *file,

struct vm_area_struct *vma)

{

vma->vm_flags |= VM_SHARED|/

VM_RESERVED;

if(remap_page_range(vma->vm_start,

virt_to_phys(kmalloc_area),

LEN,

PAGE_SHARED))

{

printk("mmap failed/n");

return -ENXIO;

}

return 0;

}

static struct

file_operations simple_fops={

mmap: driver_mmap,

open: driver_open,

release: driver_close,

};

static int

__init simple_init(void)

{

struct page *page;

int ret;

kmalloc_area=kmalloc(LEN,GFP_USER);

if(!kmalloc_area){

printk("kmalloc failed - exit/n");

return -1;

}

page = virt_to_page(kmalloc_area);

mem_map_reserve(page);

memset(kmalloc_area,0,LEN);

if(register_chrdev(DRIVER_MAJOR,

"simple-driver",

&simple_fops) == 0)

{

printk("reg. driver major %d",

DRIVER_MAJOR);

ret = pthread_create (&rt_thread,

NULL,

rtthread_code,

0);

return 0;

}

printk("can't get major %d/n",

DRIVER_MAJOR);

return -EIO;

}

static void

__exit simple_exit(void)

{

pthread_delete_np (rt_thread);

unregister_chrdev(

DRIVER_MAJOR,

"simple-driver");

kfree(kmalloc_area);

}

module_init(simple_init);

module_exit(simple_exit);

The user space side simply opens /dev/simple-device

int

main(void)

{

int fd;

char msg[LEN];

unsigned int *addr;

if((fd=open(SIMPLE_DEV,

O_RDWR|O_SYNC))<0)

{

perror("open");

exit(-1);

}

Now we grab the address via the mmap call and then we can request input from the user to assign to the shared object. The offset passed to mmap is more or less irrelevant as we don't use it in our mmap file operations for simple device.

addr = mmap(0, LEN,

PROT_READ|PROT_WRITE,

MAP_SHARED,

fd,

0);

printf("enter a short test:");

scanf("%s",&msg);

if(!addr)

{

perror("mmap");

exit(-1);

}

else

{

memset(addr,0,LEN);

strncpy(addr,msg,sizeof(msg));

printf("Put: %s/n",addr);

}

munmap(addr,LEN);

close(fd);

return 0;

}

On exit don't forget to munmap the range. Generally even if user-space is not too strict about freeing resources as the kernel will take care of closing file descriptors etc. on exit of the user-space application, its a good habit to explicitly release everything.

Using /dev/mem

The POSIX way of sharing memory is via /dev/mem -- you can pass it an offset of 0 and let the kernel select where to place the shared buffer, or you can allocate a buffer and pass the address and size to the user-space side and then use /dev/mem to mmap it to the user-space application. In the given example we simply pass 0 and let the kernel take care of it.

The declarations needed, our rt-thread a shared memory object. Offset here is 0 passed as MEMORY_OFFSET.

pthread_t thread;

struct shared_mem_struct

{

int some_int;

char ready;

};

int memfd;

#define MEMORY_OFFSET 0

struct shared_mem_struct* shared_mem;

This is a sample cleanup handler -- kind of useless in this form, but intended as an example -- on rmmod of this module you get the printk message in the kernels ring buffer.

void

cleanup(void *arg)

{

printk("Cleanup handler called/n");

}

Nothing special about the rt-thread in this case -- its simply a periodic thread that will printk the shared object. Note the cleanup handler called on exit of the while loop, this section of code is never reached -- but cleanup handler push/pops must be balanced otherwise gcc is not happy. The cleanup handler is not called by the pthread_cleanup_pop though but by pthread_delete_np which checks for available cleanup handlers. The cleanup handler does nothing here -- but it is an important means to make sure that a rt-thread can exit safely without any synchronization object held in an inappropriate state -- so if you shared object was in use for synchronization make sure the cleanup handler puts it in a state where any other threads/processes can continue safely.

void * start_routine(void *arg)

{

struct sched_param p;

p . sched_priority = 1;

pthread_setschedparam (

pthread_self(),

SCHED_FIFO,

&p);

pthread_make_periodic_np (

pthread_self(),

gethrtime(),

500000000);

pthread_cleanup_push(cleanup,0);

while (1) {

pthread_wait_np ();

rtl_printf("shared mem=%d/n",

shared_mem->some_int);

}

pthread_cleanup_pop(0);

return 0;

}

Init module opens /dev/mem and then mmaps the shared memory object, if the mmap fails we must close the file descriptor explicitly (all resources you requested in init_module need to be released by you).

int

init_module(void)

{

int ret;

memfd = open("/dev/mem", O_RDWR);

if (memfd){

shared_mem = (struct /

shared_mem_struct*) mmap(0,

sizeof(struct shared_mem_struct),

PROT_READ | PROT_WRITE,

MAP_FILE | MAP_SHARED,

memfd,

MEMORY_OFFSET);

if(shared_mem!= NULL){

printk("Dev mem available/n");

}

else{

printk("Failed to map memory/n");

close (memfd);

return -1;

}

else{

printk("rt-shm - open failed/n");

return -1;

}

ret=pthread_create (&thread,

NULL,

start_routine,

0);

return ret;

}

Cleanup is only deleting the rt-thread and closing the device.

void cleanup_module(void) {

pthread_delete_np (thread);

close(memfd);

}

The user space application is not fundamentally different as with any other device mmapped. Open /dev/mem, mmap it, casting it to the shared object, write to it and it shows up in the printk of the periodic thread. Note that this method of using offset 0 is quite restricted as all mmaps of offset 0 see the same region -- but for many problems this is sufficient.

int main()

{

char input[40];

char stop=0;

memfd = open("/dev/mem", O_RDWR);

if (memfd){

shared_mem = (struct

shared_mem_struct*) mmap(0,

sizeof(struct shared_mem_struct),

PROT_READ | PROT_WRITE,

MAP_FILE | MAP_SHARED,

memfd,

MEMORY_OFFSET);

if(shared_mem!= NULL){

printf("enter # or /n");

do{

scanf("%s", input);

shared_mem->some_int = atoi(

input);

shared_mem->ready = 1;

}while (strcmp("quit",input));

}

else{

printf("Failed to map memory/n");

close (memfd);

exit(-1);

}

else{

printf("Failed to open /dev/mem/n");

exit(-1);

}

munmap(shared_mem);

close(memfd);

return 0;

}

Using reserved 'raw'-memory

You can map reserved physical memory by passing the kernel a mem=126m line at the boot: prompt and then mmap'ing it via /dev/mem (this assumes you have 128m of physical memory installed and want to dedicate 2MB to RTLinux). Not a very elegant way to do it -- but a very simple way if you need large blocks of contingous memory. Linux's kmalloc, that provides contingous memory, is limited to 128kB as maintaining a buddy-system up to 2MB would be a tremendous waste of resources, so contingous memory is limited to de-facto 128kB if you use the Linux kernel memory functions to allocate memory (vmalloc is non-contingous -- and not limited to 128kB). There is no need to do any magic for the kernel side to access this area, simply user the physical address of 126*0x100000 as the base address of the 126th MB and manage it on your own.

Kernel facilities for rt-threads

The Linux kernel provides many facilities that are useful for realtime threads, but not all functionality provided is rt-safe. The overview here is not exhaustive simply because the Linux kernel has so much to offer and rarely does anybody take the time to analyze what is rt-safe and what not. Here a few general guidelines are attempted.

Memory allocation

One of the common requests for real time threads is memory allocation. Basically dynamic resource allocation is a fundamental problem for rt-applications and it is simply not safe as you can't give a worst-case delay for the case the system simply has no memory available. So the simple rule is -- allocate all resource you need at thread creation time and don't rely on dynamic resources -- that said, here is how to violate this rule if you must.

char *my_memory;

unsigned int size=4096;

my_memory=kmalloc(size,GFP_ATOMIC);

if(my_memory == NULL){

rtl_printf("Got no memory - what now?/n");

}

GFP_ATOMIC is guaranteed not to sleep -- sleeping in a realtime thread is not a good thing to do . . . Also note that kmalloc always returns a power of two area so if you allocate 65bytes you get 128. Furthermore kmalloc is restricted to 128kByte if you request more you get 128kByte and NO ERROR -- until you try to access beyond the 128kByte boundary . . .

libc functions available in the kernel

Of course libc is not avaliable in kernel space but programmers are used to using things like memcpy and sprintf so quite a few of standard libc functions have been implemented in the kernel. A few of these are architecture specific so you need to check if a particular call actually is available or not.

Table 1: Functions available in the Linux kernel

Misusing System Calls

This is a sample implementation of a system call -- system calls are fairly fast compared to device open/read/close operations that need to traverse the VFS and execute a few system calls sequentially [Note 7] , but it is the most non-portable and the most dangerous solution to a problem possible, changing a system call or introducing a new one makes your system as a whole incompatible to all other Linux system. Adding a system call can introduce a serious security problem in your system. Adding a system call will require you to patch every kernel release when updating. So the best solution is not to write your own system calls . . . . but they solve problems some times ;)

The actual syscall code is quite simple, and placed in /usr/src/linux/arch/i386/kernel/sys_i386.c for our purposes, naturally if your system call code is more elaborate then you should put it into an independent file.

asmlinkage int sys_test_call(void)

{

/* do something useful in kernel

space - like a printk

printk("Test System Call/n");

return 0;

}

This system call will only produce a printk output and thats it -- system calls

have a fixed number of parameters and types that must be declared, in the above

case the system call takes no arguments at all. The number of arguments not

only needs to be given with the declaration of the system call but also with

the prototype declaration which is a little bit different than regular prototype

declarations (see below).

The kernel has a "jump-matrix" for the system calls -- the position of a system

call in the syscall table is absolute so you can't add in your system call at

the beginning or in the middle or your will break the entire system, if at all

add it at the end of the syscall table. The position in the syscall table is

the syscall number. So put it into the syscall table like:

/usr/src/linux/arch/i386/kernel/entry.S

. . .

.long SYMBOL_NAME(sys_getdents64)

.long SYMBOL_NAME(sys_fcntl64)

.long SYMBOL_NAME(sys_test_call)

Note that this system call table may change over time -- so you will have to

patch newer kernels with your system call and modify the code that is calling

the syscall since the number may have changed -- it is up to you to maintain

your system call.

If you want to put your syscall at a position beyond the last current system

you must fill up the system call table with empty system calls:

.long SYMBOL_NAME(sys_ni_syscall)

after recompiling your kernel you could now call it with the absolute system

call number, to be a bit more user friendly you need to add some entries to

make it available to user space apps via asm/unistd.h:

/usr/include/asm/unistd.h

/* this number better be the same

as the position in entry.S!!

#define __NR_test_call 222

Now a regular system call like open is simply called by

fd=open(" . . . ..

our system call could also be called in this way but that would require

recompiling glibc as well, as during the build process of glibc the kernels

syscall table is read -- if you do recompile glibc then you have reached

the maximum possible incompatibility to any other linux system. If you don't

want to recompile glibc, which is probably a good idea, then you need to

put the prototype declaration for your system call into the source file.

so assuming we did not recompile glibc, call it in in a c-source file like:

syscall.c

#include

_syscall0(int,test_call);

main(){

syscall(222);

test_call();

return 0;

}

You can call it by passing it the syscall number or by name -- the second is preferable as with about any kernel release this number may change.

Compile with simple gcc syscall.c -o syscall and run this program as ./syscall. To check kernel output (the printk that our syscall is to do) use the dmesg command -- it should have produced:

Test System Call

the two calls are via syscall(222) and test_call() - note that you don't need

the header files errno.h and asm/unistd.h to use syscall(222) but you do need

these includes for the named call test_call(); Using syscall(222) can be very

confusing as it says nothing about what you are trying to do, so give your

system call a meaningful name.

Conclusions

Linux is a stable and potent system that can satisfy the needs of the 32bit embedded device market to a large extent. The Linux kernel development is providing a vast amount of resources extending the possible fields of application of embedded Linux and real time Linux almost daily. An issue that needs to be addressed though is the design concepts applied to embedded Linux -- embedded Linux is not simply a scaled down desktop Linux system. Utilizing the vast resources available in kernel space to optimize embedded Linux systems needs to continue and the Linux kernel could improve performance a lot if it would permit more fine grain feature configuration at compile time and if it would ease the ability to extend trusted code into user-space. But the largest part of the work is on the side of the embedded system designer and programmers that need to switch from thinking in desktop terms to thinking and designing in 'closed-can' embedded terms.

Bibliography

Note 1: Quingguo Zhou, 2001, Embedded Data Aquisition System for Moesbauer Spectrum, Proceedings of the 3th Real Time Linux Workshop, Milan, Nov. 2001.

Note 2: N. Akteniz, 2002, RealBOX, A Novel Approach to MiniRTL Implementation for Instrumentation And Controll Applications, Proceedings of the 4th Real Time Linux Workshop, Boston, Nov. 2002.

Note 3: Ivon Wagner, 2001, CNC Control based on Real Time Linux, Proceedings of the 3th Real Time Linux Workshop, Milan, Nov. 2001.

Note 4: Gary Nutt, 2000, Operating Systems, Addison-Wesley, ISBN 0-201-47708-4.

Note 5: Wehrle, 2002, Linux Netzwerkarchitektur, Addison-Wesley ISBN 3-8273-1509-3.

Note 6: Alessandro Rubini, 2001, Linux Device Drivers 2nd Edition, O'Reilly, ISBN 0-59600-008-1.

Note 7: Michael Beck, 2001, Linux Kernelprogrammierung, Addison-Wesley, ISBN 3-8273-1659-6.

Copyright information: This paper and included code is released under Free Documentation License (FDL) Version 1.2 and GNU General Public License (GPL) Version 2.0 respectively. The full sources with the appropriate build environment can be found in the current RTLinux/GPL rtlinux-3.2-pre1 release.

About the author: Nicholas McGuire's first contact with Linux dates back to Linux kernel version 0.99.112, at a time when many rumors and myths were circulating about the fledgling open source operating system. McGuire first came into contact with RTLinux at RTL version 0.5, in the course of developing a DSP system replacement for magnetic bearing control, at the Institute for Material Science of the University of Vienna, Austria. McGuire began developing MiniRTL while RTLinux was at version 1.1, and has been engaged in RTLinux and MiniRTL based development work ever since.