Linux驱动中的 wait_event_interruptible 与 wake_up_interruptible 深度理解

来源：互联网发布：seo全套视频教程编辑：程序博客网时间：2024/05/22 09:01

在 PCIE 驱动程序中常常会用到等待队列相关函数，主要是两个

（1）wait_event_interruptible ：使他们的调用进程在等待队列上睡眠，直到满足给定条件为止；

（2）wake_up_interruptible ：唤醒等待队列中的进程，并把它们的状态设置为TASK_RUNNING。

这两个函数函数对出现，主要用于中断处理、进程同步和定时等。

但在实际项目中，驱动程序中有可能会出现 wait 一次但 wake up 两次甚至多次的情况，这个时候会出现什么现象？这两个函数的实现细节是什么样的？所以有必要对内核源代码进行分析。

1.等待队列相关定义

一般等待队列都有一个等待队列头（wait queue head），等待队列头是一个类型为 wait_queue_head_t 的数据结构，在内核中定义如下：

struct __wait_queue_head {spinlock_tlock;struct list_headtask_list;};typedef struct __wait_queue_head wait_queue_head_t;

struct list_head {struct list_head *next, *prev;};

在驱动程序使用等待队列，一般先创建并初始化一个等待队列头：

#define DECLARE_WAIT_QUEUE_HEAD(name) \wait_queue_head_t name = __WAIT_QUEUE_HEAD_INITIALIZER(name)

这是一个宏定义，它通过调用宏 __WAIT_QUEUE_HEAD_INITIALIZER 来初始化这个头，接着看：

#define __WAIT_QUEUE_HEAD_INITIALIZER(name) {\.lock= __SPIN_LOCK_UNLOCKED(name.lock),\.task_list= { &(name).task_list, &(name).task_list } }

（1）分两步，自旋锁的初始化：

#define __SPIN_LOCK_UNLOCKED(lockname) \(spinlock_t ) __SPIN_LOCK_INITIALIZER(lockname)#define __SPIN_LOCK_INITIALIZER(lockname) \{ { .rlock = __RAW_SPIN_LOCK_INITIALIZER(lockname) } }#define __RAW_SPIN_LOCK_INITIALIZER(lockname)\{\.raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,\SPIN_DEBUG_INIT(lockname)\SPIN_DEP_MAP_INIT(lockname) }# define SPIN_DEBUG_INIT(lockname)\.magic = SPINLOCK_MAGIC,\.owner_cpu = -1,\.owner = SPINLOCK_OWNER_INIT,#define SPINLOCK_MAGIC0xdead4ead#define SPINLOCK_OWNER_INIT((void *)-1L)# define SPIN_DEP_MAP_INIT(lockname).dep_map = { .name = #lockname }

以上是一个自旋锁完整的初始化流程，不是本文讨论的重点，不做进一步解读分析。

（2）链表头的初始化：很简单，直接让它的前驱结点和后继结点指向它自身。

等待队列头的初始化也可以通过手动先定义，再手动初始化的方法：

wait_queue_head_t q;#define init_waitqueue_head(q)\do {\static struct lock_class_key __key;\\__init_waitqueue_head((q), #q, &__key);\} while (0)

再进一步分析__init_waitqueue_head函数：

void __init_waitqueue_head(wait_queue_head_t *q, const char *name, struct lock_class_key *key){spin_lock_init(&q->lock);lockdep_set_class_and_name(&q->lock, key, name);INIT_LIST_HEAD(&q->task_list);}

spin_lock_init 和 lockdep_set_class_and_name 用来初始化自旋锁，重点看宏 INIT_LIST_HEAD：

static inline void INIT_LIST_HEAD(struct list_head *list){list->next = list;list->prev = list;}

可以看到 INIT_LIST_HEAD 与宏 __WAIT_QUEUE_HEAD_INITIALIZER 里面的 .task_list = { &(name).task_list, &(name).task_list } 有异曲同工之妙。

由于等待队列是由中断处理程序和内核函数修改的，因此必须对其进行保护和同步，避免发生同时访问的情况，同步是通过等待队列头中 lock 自旋锁来实现的，task_list 字段是等待队列进程链表的头。

等待队列由双向链表实现，链表中的元素类型为 wait_queue_t，定义如下：

struct __wait_queue {unsigned intflags;void*private;wait_queue_func_tfunc;struct list_headtask_list;};typedef struct __wait_queue wait_queue_t;

等待队列中的每个元素都代表一个睡眠进程，该进程等待某一事件的发生；他的描述符地址存放在 task 字段中，task_list 字段中包含的是指针，由这个指针把一个元素链接到等待相同事件的进程链表中。

从上面的等待队列头初始化流程中可以看出，等待队列头初始化后，只有一个空的链表头，等待队列双向链表中并不包含实际的 wait_queue_t 元素。

2. wait_event_interruptible 分析

#define wait_event_interruptible(wq, condition)\({\int __ret = 0;\might_sleep();\if (!(condition))\__ret = __wait_event_interruptible(wq, condition);\__ret;\})#define __wait_event_interruptible(wq, condition)\___wait_event(wq, condition, TASK_INTERRUPTIBLE, 0, 0,\      schedule())#define ___wait_event(wq, condition, state, exclusive, ret, cmd)\({\__label__ __out;\wait_queue_t __wait;\long __ret = ret;/* explicit shadow */\\INIT_LIST_HEAD(&__wait.task_list);\if (exclusive)\__wait.flags = WQ_FLAG_EXCLUSIVE;\else\__wait.flags = 0;\\for (;;) {\long __int = prepare_to_wait_event(&wq, &__wait, state);\\if (condition)\break;\\if (___wait_is_interruptible(state) && __int) {\__ret = __int;\if (exclusive) {\abort_exclusive_wait(&wq, &__wait,\     state, NULL);\goto __out;\}\break;\}\\cmd;\}\finish_wait(&wq, &__wait);\__out:__ret;\})

可以看到， wait_event_interruptible 其实是一个宏定义，前面的函数一说并不准确。

might_sleep()指示当前进程可以睡眠，当条件 condition 不满足时，进入 __wait_event_interruptible，然后进入宏 __wait_event，其参数中的任务状态掩码在内核中的定义如下：

/* * Task state bitmask. NOTE! These bits are also * encoded in fs/proc/array.c: get_task_state(). * * We have two separate sets of flags: task->state * is about runnability, while task->exit_state are * about the task exiting. Confusing, but this way * modifying one set can't modify the other one by * mistake. */#define TASK_RUNNING0#define TASK_INTERRUPTIBLE1#define TASK_UNINTERRUPTIBLE2#define __TASK_STOPPED4#define __TASK_TRACED8

可以看到，在 __wait_event 宏中，开始定义一个等待队列链表中的元素 wait_queue_t __wait，并用相似的方法对它进行初始化：

INIT_LIST_HEAD(&__wait.task_list);static inline void INIT_LIST_HEAD(struct list_head *list){list->next = list;list->prev = list;}

由于传入的参数 exclusive = 0，所以 __wait.flags = 0，然后进入函数 prepare_to_wait_event：

long prepare_to_wait_event(wait_queue_head_t *q, wait_queue_t *wait, int state){unsigned long flags;if (signal_pending_state(state, current))return -ERESTARTSYS;wait->private = current;wait->func = autoremove_wake_function;spin_lock_irqsave(&q->lock, flags);if (list_empty(&wait->task_list)) {if (wait->flags & WQ_FLAG_EXCLUSIVE)__add_wait_queue_tail(q, wait);else__add_wait_queue(q, wait);}set_current_state(state);spin_unlock_irqrestore(&q->lock, flags);return 0;}

继续对链表元素 wait 进行赋值： wait->private = current； wait->func = autoremove_wake_function；

wait->func表示等待条件满足时的操作，重点看这个函数的细节：

int autoremove_wake_function(wait_queue_t *wait, unsigned mode, int sync, void *key){int ret = default_wake_function(wait, mode, sync, key);if (ret)list_del_init(&wait->task_list);return ret;}int default_wake_function(wait_queue_t *curr, unsigned mode, int wake_flags,  void *key){return try_to_wake_up(curr->private, mode, wake_flags);}/** * try_to_wake_up - wake up a thread * @p: the thread to be awakened * @state: the mask of task states that can be woken * @wake_flags: wake modifier flags (WF_*) * * Put it on the run-queue if it's not already there. The "current" * thread is always on the run-queue (except when the actual * re-schedule is in progress), and as such you're allowed to do * the simpler "current->state = TASK_RUNNING" to mark yourself * runnable without the overhead of this. * * Return: %true if @p was woken up, %false if it was already running. * or @state didn't match @p's state. */static inttry_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags){unsigned long flags;int cpu, success = 0;/* * If we are going to wake up a thread waiting for CONDITION we * need to ensure that CONDITION=1 done by the caller can not be * reordered with p->state check below. This pairs with mb() in * set_current_state() the waiting thread does. */smp_mb__before_spinlock();raw_spin_lock_irqsave(&p->pi_lock, flags);if (!(p->state & state))goto out;trace_sched_waking(p);success = 1; /* we're going to change ->state */cpu = task_cpu(p);/* * Ensure we load p->on_rq _after_ p->state, otherwise it would * be possible to, falsely, observe p->on_rq == 0 and get stuck * in smp_cond_load_acquire() below. * * sched_ttwu_pending()                 try_to_wake_up() *   [S] p->on_rq = 1;                  [L] P->state *       UNLOCK rq->lock  -----. *                              \ * +---   RMB * schedule()                   / *       LOCK rq->lock    -----' *       UNLOCK rq->lock * * [task p] *   [S] p->state = UNINTERRUPTIBLE     [L] p->on_rq * * Pairs with the UNLOCK+LOCK on rq->lock from the * last wakeup of our task and the schedule that got our task * current. */smp_rmb();if (p->on_rq && ttwu_remote(p, wake_flags))goto stat;#ifdef CONFIG_SMP/* * Ensure we load p->on_cpu _after_ p->on_rq, otherwise it would be * possible to, falsely, observe p->on_cpu == 0. * * One must be running (->on_cpu == 1) in order to remove oneself * from the runqueue. * *  [S] ->on_cpu = 1;[L] ->on_rq *      UNLOCK rq->lock *RMB *      LOCK   rq->lock *  [S] ->on_rq = 0;    [L] ->on_cpu * * Pairs with the full barrier implied in the UNLOCK+LOCK on rq->lock * from the consecutive calls to schedule(); the first switching to our * task, the second putting it to sleep. */smp_rmb();/* * If the owning (remote) cpu is still in the middle of schedule() with * this task as prev, wait until its done referencing the task. */while (p->on_cpu)cpu_relax();/* * Combined with the control dependency above, we have an effective * smp_load_acquire() without the need for full barriers. * * Pairs with the smp_store_release() in finish_lock_switch(). * * This ensures that tasks getting woken will be fully ordered against * their previous state and preserve Program Order. */smp_rmb();p->sched_contributes_to_load = !!task_contributes_to_load(p);p->state = TASK_WAKING;if (p->sched_class->task_waking)p->sched_class->task_waking(p);cpu = select_task_rq(p, p->wake_cpu, SD_BALANCE_WAKE, wake_flags);if (task_cpu(p) != cpu) {wake_flags |= WF_MIGRATED;set_task_cpu(p, cpu);}#endif /* CONFIG_SMP */ttwu_queue(p, cpu);stat:ttwu_stat(p, cpu, wake_flags);out:raw_spin_unlock_irqrestore(&p->pi_lock, flags);return success;}

这段 wake_up 确实难以理解，不过只需要知道它是用来唤醒等待队列链表元素 wait 字段 task_list 上链接的线程就行了，唤醒后，通过 list_del_init(&wait->task_list) 删除链接的线程。

static inline void list_del_init(struct list_head *entry){__list_del_entry(entry);INIT_LIST_HEAD(entry);}static inline void __list_del_entry(struct list_head *entry){__list_del(entry->prev, entry->next);}static inline void __list_del(struct list_head * prev, struct list_head * next){next->prev = prev;WRITE_ONCE(prev->next, next);}#define WRITE_ONCE(x, val) \({\union { typeof(x) __val; char __c[1]; } __u =\{ .__val = (__force typeof(x)) (val) }; \__write_once_size(&(x), __u.__c, sizeof(x));\__u.__val;\})

再回到 prepare_to_wait_event 函数，进入判断语句 if（list_empty(&wait->task_list)），显然此时的 task_list 不为空，下一步由于 wait->flags = 0 进入 else 分支执行__add_wait_queue 函数：

static inline void __add_wait_queue(wait_queue_head_t *head, wait_queue_t *new){list_add(&new->task_list, &head->task_list);}static inline void list_add(struct list_head *new, struct list_head *head){__list_add(new, head, head->next);}static inline void __list_add(struct list_head *new,      struct list_head *prev,      struct list_head *next){next->prev = new;new->next = next;new->prev = prev;prev->next = new;}

相当于把 wait 元素添加到等待队列链表 q 中去，至此，等待队列中有一个等待的元素，wait_event_interruptible 分析完毕。

3. wake_up_interruptible 分析

同样的，它也是一系列的宏定义：

#define wake_up_interruptible(x)__wake_up(x, TASK_INTERRUPTIBLE, 1, NULL)void __wake_up(wait_queue_head_t *q, unsigned int mode,int nr_exclusive, void *key){unsigned long flags;spin_lock_irqsave(&q->lock, flags);__wake_up_common(q, mode, nr_exclusive, 0, key);spin_unlock_irqrestore(&q->lock, flags);}

重点看 __wake_up_common：

/* * The core wakeup function. Non-exclusive wakeups (nr_exclusive == 0) just * wake everything up. If it's an exclusive wakeup (nr_exclusive == small +ve * number) then we wake all the non-exclusive tasks and one exclusive task. * * There are circumstances in which we can try to wake a task which has already * started to run but is not in state TASK_RUNNING. try_to_wake_up() returns * zero in this (rare) case, and we handle it by continuing to scan the queue. */static void __wake_up_common(wait_queue_head_t *q, unsigned int mode,int nr_exclusive, int wake_flags, void *key){wait_queue_t *curr, *next;list_for_each_entry_safe(curr, next, &q->task_list, task_list) {unsigned flags = curr->flags;if (curr->func(curr, mode, wake_flags, key) &&(flags & WQ_FLAG_EXCLUSIVE) && !--nr_exclusive)break;}}

其实，list_for_each_entry_safe 就是一个 for 循环，从等待队列头 head 开始遍历链表中的元素， &pos->member ！= （head）用于判断链表是否为空，这是 for 循环的退出条件。

#define list_first_entry(ptr, type, member) \list_entry((ptr)->next, type, member)#define list_entry(ptr, type, member) \container_of(ptr, type, member)#define container_of(ptr, type, member) ({\const typeof( ((type *)0)->member ) *__mptr = (ptr);\(type *)( (char *)__mptr - offsetof(type,member) );})

由于在 wait_event_interruptible 中创建了一个等待元素，此 for 循环至少会成功执行一次，所以进入 if 语句执行函数 curr->func(curr，mode，wake_flags，key)，此处的 func 即为之前在 prepare_to_wait_event 中定义的 wait->func = autoremove_wake_function，它会唤醒 task_list 所指向的进程。

至此，这一对等待队列操作的宏定义全部分析完毕，再回到开始那个问题，答案就很明显：wait 一次只创建并添加了一个 wait_queue_t 元素到等待队列头链表中，wake_up两次或多次时，在等效 “for 循环” list_for_each_entry_safe 中将会直接退出，从而对驱动程序和线程不产生影响，相当于无效操作。

阅读全文

1 0