Linux 内核的WorkQueues API做了修改 work_struct变动原因

来源:互联网 发布:mysql sql语句大全 编辑:程序博客网 时间:2024/06/10 00:48

原文源自:http://blog.chinaunix.net/space.php?uid=14163325&do=blog&cuid=1388772

WorkQueue机制允许内核代码在晚一点的时间执行。Workqueue通过存在的一个或者多个专门的进程实现,去执行队列工作。因为在进程的上下文汇总执行,因此如果需要,其可以sleep。WorkQueue也可以延迟特定时间执行工作。所以它们在内核中许多地方使用。
DavidHowells最近检查workqueue时发现work_struct(用来描述一个程序执行)是相当大的,在64-bit机器上有96bytes,这是相当大的数据结构,因为很多地方都使用这个结构。因此他想出办法把它们变小,他成功了但是需要改动workqueue的API.

导致struct work_struct臃肿的原因是:
1.其中所包含timer structure。许多workqueue的用户从来不使用这个delay特性,但是在结构体内都包含timer_list结构。
2.私有数据指针,这是传递给work函数的参数。许多函数使用这个指针,但是它通常可以从work_struct指针中用contain_of()计算出来。
3.一个word只用一个bit来表示pending,用来说明这个work_struct目前在队列上等待执行。

David处理了以上的情况,使用了一种新的结构体
structdelayed_work,专门用于延时调用使用。而把struct work_structure中的timer结构体删除了。私有数据指针消失了,work函数使用一个指向work_structure的指针,typedef void (*work_func_t)(struct work_struct *work)。使用一些技巧删除了pending word。这些变动的结果使得workqueue的API发生了变化。有两种方法声明一个workqueue的entry。
    DECLARE_WORK(name, func);
    DECLARE_DELAYED_WORK(name, func);
  对于在运行时生成的work structure,初始化宏现在如下:
  INIT_WORK(struct work_struct work, work_func_t func);
  PREPARE_WORK(struct work_struct work, work_func_t func);
  INIT_DELAYED_WORK(struct delayed_work work, work_func_t func);
  PREPARE_DELAYED_WORK(struct delayed_work work, work_func_t func);
  INIT_*版本的宏初始化整个结构,它们必须在这个结构第一次初始化的时候使用,PREPARE_*版本的宏运行速度稍微快些。
The functions for adding entries to workqueues (and canceling them) now look like this:
    int queue_work(struct workqueue_struct *queue,
                   struct work_struct *work);
    int queue_delayed_work(struct workqueue_struct *queue,
                           struct delayed_work *work);
    int queue_delayed_work_on(int cpu,
                              struct workqueue_struct *queue,
                             struct delayed_work *work);
    int cancel_delayed_work(struct delayed_work *work);
    int cancel_rearming_delayed_work(struct delayed_work *work);

Interestingly, David has added a variant on the workqueue declaration and initialization macros:
    DECLARE_WORK_NAR(name, func);
    DECLARE_DELAYED_WORK_NAR(name, func);
    INIT_WORK_NAR(name, func);
    INIT_DELAYED_WORK_NAR(name, func);
    PREPARE_WORK_NAR(name, func);
    PREPARE_DELAYED_WORK_NAR(name, func);
The "NAR" stands for "non-auto-release." Normally, the workqueuesubsystem resets a work entry's pending flag prior to calling the workfunction; that action, among other things, allows the function toresubmit itself if need be. If the entry is initialized with one of theabove macros, however, this reset will not happen, and the workfunction is expected to reset the flag itself (with a call towork_release()). The stated purpose is to prevent the workqueue entryfrom being released before the work function is done with it - butthere is nothing in the clearing of the pending bit which would causethat release to happen. Perhaps that is why there are no users of the_NAR variants in David's patch. It may be that somebody is thinkingabout implementing reference-counted workqueue structures in the future.

Meanwhile, these changes require a lot of fixes throughout the kerneltree; that drew a complaint from Andrew Morton, who was unable to makethose changes mesh with all of the other patches queued up for theopening of the 2.6.20 merge window. Andrew suggested that the workqueuepatches could be merged after 2.6.20-rc1 comes out, as was done withthe interrupt handler function prototype in 2.6.19. But Linus, wholikes the workqueue patches, would rather get them in sooner:

I'd actually prefer to take it before -rc1, because I think theprevious time we did something after -rc1 was a failure (the whole irqargument handling thing). It just exposed too many problems too late inthe dev cycle. I'd rather have the problems be exposed by the time -rc1rolls out, and keep the whole "we've done all major nasty ops by -rc1"thing.

So it seems that, somehow, all of the pieces will be made to fit and the workqueue API will change in 2.6.20.

因此可以用以下方法升级你程序的workqueues:
1.任何work_struct有调用一下这些函数的:
    queue_delayed_work()
    queue_delayed_work_on()
    schedule_delayed_work()
    schedule_delayed_work_on()
    cancel_rearming_delayed_work()
    cancel_rearming_delayed_workqueue()
    cancel_delayed_work()
需要改成delayed_work。注意,cancel_delayed_work()经常在它不起作用的地方调用(我认为是人们误解了它的作用)。
2.一个delayed_work struct必须用如下初始化:
    __DELAYED_WORK_INITIALIZER
    DECLARE_DELAYED_WORK
    INIT_DELAYED_WORK
    而不是:
    _WORK_INITIALIZER
    DECLARE_WORK
    INIT_WORK
    (这些只用来处理work_struct(non-delayable work).
3.初始化函数不再接受一个data指针参数,因此需要删除这个。
4.下列任何一个关于delayed_work调用的函数:
    queue_work()
    queue_work_on()
    schedule_work()
    schedule_work_on()
    必须改正成对应的如下函数:
    queue_delayed_work()
    queue_delayed_work_on()
    schedule_delayed_work()
    schedule_delayed_work_on()
    给一个值为0的timeout参数作为一个附加参数。这样只queue对应的work item,不设定timer.
5.任何直接检查work item的pending flag,如下所示:
    test_bit(0, &work->pending)
    应该被下面合适的函数代替:
    work_pending(work)
    delayed_work_pending(work)
6. work function 必须改成如下:
    void foo_work_func(struct work_struct *work)
    {
        ...
    }
    这个需要对work_struct和delayed_work handler同时运用:
    a)如果传入的为NULL的datum,这个work参数会被忽略。
    b)如果这个数据是一个指向结构的指针,这个结构包含这work_struct,例如:
        struct foo {
        struct work_struct worker;
        ...
    };
    void foo_work_func(struct work_struct *work)
    {
        struct foo *foo =    
        ...
    }
    如果work_struct被放置在被包含的struct的开始位置,可以省略掉container_of()的指令,否则container_of()就是必须的。
 c)如果这个数据是一个包含delayed_work的结构地址的值,那么如下类似的代码需要使用:
     struct foo {
        struct delayed_work worker;
        ...
    };

    void foo_work_func(struct work_struct *work)
    {
        struct foo *foo = container_of(work, struct foo, worker.work);
        ...
    }
    注意这里有一个例外,work在container_of()中,因为这个work_struct被包含在delayed_work中。
    d)如果这个数据不是一个指向container的指针,但是这个container在work handler运行时是存在的,那么数据可以用一个额外的变量存储在container中。
   
    handler应该安装(b)和(c)中编写,对于这个额外的变量可以在contain_of()之后再访问。
   
    很多情况是一个双向链表结构: work_struct <==> otherStruct。例如net_device
    e)如果数据是完全不相关的,不能存储到container中,因为这个container可能在handler中不能访问,那么work_struct或者delayed_work应该被下列宏初始化:
    DECLARE_WORK_NAR
    DECLARE_DELAYED_WORK_NAR
    INIT_WORK_NAR
    INIT_DELAYED_WORK_NAR
    __WORK_INITIALIZER_NAR
    __DELAYED_WORK_INITIALIZER_NAR
    这些宏和普通的初始化参数有着一样的参数,但是设置work_struct的flag意味着在work函数被调用之前不会被清除。

参考资料:
    1.http://lwn.net/Articles/211279/
    2.http://bugboy.ycool.com/post.2926602.html
    3.http://bugboy.ycool.com/post.2927176.html
    4.David Howells <dhowells-AT-redhat.com>


依据以上方法修改后的LDD3书附带的源代码jiq.c如下(该例子中没有使用
contain_of()从work_struct获取私有数据,而是直接使用全局变量):



/*
 * jiq.c -- the just-in-queue module
 *
 * Copyright (C) 2001 Alessandro Rubini and Jonathan Corbet
 * Copyright (C) 2001 O'Reilly & Associates
 *
 * The source code in this file can be freely used, adapted,
 * and redistributed in source or binary form, so long as an
 * acknowledgment appears in derived source files. The citation
 * should list that the code comes from the book "Linux Device
 * Drivers" by Alessandro Rubini and Jonathan Corbet, published
 * by O'Reilly & Associates. No warranty is attached;
 * we cannot take responsibility for errors or fitness for use.
 *
 * $Id: jiq.c,v 1.7 2004/09/26 07:02:43 gregkh Exp $
 */

 
#include <linux/config.h>
#include <linux/module.h>
#include <linux/moduleparam.h>
#include <linux/init.h>

#include <linux/sched.h>
#include <linux/kernel.h>
#include <linux/fs.h>/* everything... */
#include <linux/proc_fs.h>
#include <linux/errno.h>/* error codes */
#include <linux/workqueue.h>
#include <linux/preempt.h>
#include <linux/interrupt.h>/* tasklets */

MODULE_LICENSE("Dual BSD/GPL");

/*
 * The delay for the delayed workqueue timer file.
 */

static long delay= 1;
module_param(delay,long, 0);


/*
 * This module is a silly one: it only embeds short code fragments
 * that show how enqueued tasks `feel' the environment
 */


#define LIMIT    (PAGE_SIZE-128)    /* don't print any more after this size */

/*
 * Print information about the current environment. This is called from
 * within the task queues. If the limit is reched, awake the reading
 * process.
 */


//static struct work_struct jiq_work;
struct delayed_work jiq_work;  // 将
struct work_struct修改为struct delayed_work
static DECLARE_WAIT_QUEUE_HEAD (jiq_wait);

/*
 * Keep track of info we need between task queue runs.
 */

static struct clientdata{
    int len;
    char *buf;
    unsigned long jiffies;
    long delay;
}jiq_data;


#define SCHEDULER_QUEUE((task_queue *) 1)


static void jiq_print_tasklet(unsignedlong);
static DECLARE_TASKLET(jiq_tasklet, jiq_print_tasklet,(unsigned long)&jiq_data);


/*
 * Do the printing; return non-zero if the task should be rescheduled.
 */

static int jiq_print(void*ptr)
{
    struct clientdata *data = ptr;
    int len = data->len;
    char *buf = data->buf;
    unsigned long j= jiffies;

    if (len > LIMIT) {
        wake_up_interruptible(&jiq_wait);
        return 0;
    }

    if (len == 0)
        len = sprintf(buf," time delta preempt pid cpu command\n");
    else
        len =0;

      /* intr_count is only exported since 1.3.5, but 1.99.4 is needed anyways */
    len += sprintf(buf+len,"%9li %4li %3i %5i %3i %s\n",
            j, j - data->jiffies,
            preempt_count(), current->pid, smp_processor_id(),
            current->comm);

    data->len += len;
    data->buf += len;
    data->jiffies= j;
    return 1;
}


/*
 * Call jiq_print from a work queue
 */

static void jiq_print_wq(struct work_struct*ptr)
{
//    struct clientdata *data = (struct clientdata *) ptr;

    
    if (!jiq_print(&jiq_data))
        return;
    
    if (jiq_data.delay)
        schedule_delayed_work(&jiq_work, jiq_data.delay);
    else
        schedule_work(&jiq_work.work);//使用
jiq_work.work
}



static int jiq_read_wq(char*buf, char **start, off_t offset,
                   int len,int *eof,void *data)
{
    DEFINE_WAIT(wait);
    
    jiq_data.len = 0;/* nothing printed, yet */
    jiq_data.buf = buf;/* print in this place */
    jiq_data.jiffies = jiffies;/* initial time */
    jiq_data.delay = 0;
    
    prepare_to_wait(&jiq_wait,&wait, TASK_INTERRUPTIBLE);
    schedule_work(&jiq_work.work);
//使用jiq_work.work
    schedule();
    finish_wait(&jiq_wait,&wait);

    *eof = 1;
    return jiq_data.len;
}


static int jiq_read_wq_delayed(char*buf, char **start, off_t offset,
                   int len,int *eof,void *data)
{
    DEFINE_WAIT(wait);
    
    jiq_data.len = 0;/* nothing printed, yet */
    jiq_data.buf = buf;/* print in this place */
    jiq_data.jiffies =(unsigned long )jiffies;/* initial time */
    jiq_data.delay = delay;
    
    prepare_to_wait(&jiq_wait,&wait, TASK_INTERRUPTIBLE);
    schedule_delayed_work(&jiq_work, delay);
    schedule();
    finish_wait(&jiq_wait,&wait);

    *eof = 1;
    return jiq_data.len;
}




/*
 * Call jiq_print from a tasklet
 */

static void jiq_print_tasklet(unsignedlong ptr)
{
    if (jiq_print((void*) ptr))
        tasklet_schedule (&jiq_tasklet);
}



static int jiq_read_tasklet(char*buf, char **start, off_t offset,int len,
                int *eof,void *data)
{
    jiq_data.len = 0;/* nothing printed, yet */
    jiq_data.buf = buf;/* print in this place */
    jiq_data.jiffies = jiffies;/* initial time */

    tasklet_schedule(&jiq_tasklet);
    interruptible_sleep_on(&jiq_wait);/* sleep till completion */

    *eof = 1;
    return jiq_data.len;
}




/*
 * This one, instead, tests out the timers.
 */


static struct timer_list jiq_timer;

static void jiq_timedout(unsignedlong ptr)
{
    jiq_print((void*)ptr);/* print a line */
    wake_up_interruptible(&jiq_wait);/* awake the process */
}


static int jiq_read_run_timer(char*buf, char **start, off_t offset,
                   int len,int *eof,void *data)
{

    jiq_data.len = 0;/* prepare the argument for jiq_print() */
    jiq_data.buf = buf;
    jiq_data.jiffies = jiffies;

    init_timer(&jiq_timer);/* init the timer structure */
    jiq_timer.function = jiq_timedout;
    jiq_timer.data =(unsigned long)&jiq_data;
    jiq_timer.expires = jiffies + HZ;/* one second */

    jiq_print(&jiq_data);/* print and go to sleep */
    add_timer(&jiq_timer);
    interruptible_sleep_on(&jiq_wait);/* RACE */
    del_timer_sync(&jiq_timer);/* in case a signal woke us up */
    
    *eof = 1;
    return jiq_data.len;
}



/*
 * the init/clean material
 */


static int jiq_init(void)
{

    /* this line is in jiq_init() */
    INIT_DELAYED_WORK(&jiq_work, jiq_print_wq); // 使用
INIT_DELAYED_WORK代替INIT_WORK
    create_proc_read_entry("jiqwq", 0,NULL, jiq_read_wq,NULL);
    create_proc_read_entry("jiqwqdelay", 0,NULL, jiq_read_wq_delayed,NULL);
    create_proc_read_entry("jitimer", 0,NULL, jiq_read_run_timer,NULL);
    create_proc_read_entry("jiqtasklet", 0,NULL, jiq_read_tasklet,NULL);

    return 0; /* succeed */
}

static void jiq_cleanup(void)
{
    remove_proc_entry("jiqwq",NULL);
    remove_proc_entry("jiqwqdelay",NULL);
    remove_proc_entry("jitimer",NULL);
    remove_proc_entry("jiqtasklet",NULL);
}


module_init(jiq_init);
module_exit(jiq_cleanup);

原创粉丝点击