Linux内核：poll机制

来源：互联网发布：淘宝快递助手编辑：程序博客网时间：2024/05/22 05:16

在编写驱动程序的过程当中我们可以使用poll机制来非阻塞的打开我们的设备文件，我们知道，在之前我们编写CC1100的驱动程序以及倒车雷达的驱动程序的时候，在read函数中都有用到过wait_event_interruptible_timeout这个函数，这个函数的主要作用就是采用非阻塞的read，因为每一次我们read函数的时候，都会先判断是否有新的数据可以读，如果没有新的数据就会休眠等待有新的数据。同时我们这里也给休眠等待规定了之间限制，即如果在规定的时间里面如果没有新的数据的话，便会自己唤醒自己。当然如果说是我们在上层应用程序只需要打开一个驱动程序的时候，其实这个方式也还比较适用。但是Linux内核针对这种情况呢，自己采用了一种全新的方式，那就是poll机制。其实poll机制的实现原理与我们上面用到的方法也是一样的。

poll机制的作用

poll机制的作用主要是通过在用户空间调用select()和poll()系统调用查询是否可对设备进行无阻塞访问。顾名思义是用来查询该驱动设备是否是无阻塞的。既然要查询，首先poll机制其本身是非阻塞的，那如何实现其本身是非阻塞呢？我们肯定是要用到等待队列机制。即进入驱动程序的自定义poll函数之后，我们首先将该进程加入等待队列（这个等待队列头一定要是该驱动程序的read或者write函数使用的等待队列头），然后就进入休眠等待，这个休眠等待当然也是有timeout限制的（如果没有限制，就成了阻塞调用了），如果在timeout阶段，该等待因为有新的数据而被驱动程序唤醒（能够被唤醒的主要原因就是这个等待队列的队列头是与read和wirte队列头一致的），那么我们就认为该设备是可以非阻塞调用的，反之如果该等待是被其自己通过其timeout机制而唤醒，那么就认为该设备是阻塞访问的。

当应用程序调用poll、select函数的时候，会调用到系统调用do_sys_poll函数，该函数最终调用do_poll函数，do_poll函数中有一个死循环，在里面又会利用do_pollfd函数去调用驱动中的poll函数（fds中每个成员的字符驱动程序都会被扫描到），驱动程序中的Poll函数的工作有两个，一个就是调用poll_wait函数，把进程挂到等待队列中去（这个是必须的，你要睡眠，必须要在一个等待队列上面，否则到哪里去唤醒你呢？？），另一个是确定相关的fd是否有内容可读，如果可读，就返回1，否则返回0，如果返回1 ，do_poll函数中的count++，然后do_poll函数然后判断三个条件（if (count ||!timeout || signal_pending(current))）如果成立就直接跳出，如果不成立，就睡眠timeout个jiffes这么长的时间（调用schedule_timeout实现睡眠），如果在这段时间内没有其他进程去唤醒它，那么第二次执行判断的时候就会跳出死循环。如果在这段时间内有其他进程唤醒它，那么也可以跳出死循环返回（例如我们可以利用中断处理函数去唤醒它，这样的话一有数据可读，就可以让它立即返回)。

首先系统会调用do_sys_poll函数，该函数的主要作用是去调用do_poll函数。函数源码如下：

int do_sys_poll(struct pollfd __user *ufds, unsigned int nfds,struct timespec *end_time){struct poll_wqueues table; int err = -EFAULT, fdcount, len, size;/* Allocate small arguments on the stack to save memory and be   faster - use long to make sure the buffer is aligned properly   on 64 bit archs to avoid unaligned access */long stack_pps[POLL_STACK_ALLOC/sizeof(long)];struct poll_list *const head = (struct poll_list *)stack_pps; struct poll_list *walk = head; unsigned long todo = nfds;if (nfds > rlimit(RLIMIT_NOFILE))return -EINVAL;len = min_t(unsigned int, nfds, N_STACK_PPS);//这个for循环会进行一些简单的判断。通常一般都会跳出该for循环。for (;;) {walk->next = NULL;walk->len = len;if (!len)break;//这段代码很关键，将应用程序中通过open函数得到的fd信息，在这里copy给linux内核变量walk->entries。这个时候walk变量就携带有设备文件的fd信息了。if (copy_from_user(walk->entries, ufds + nfds-todo,sizeof(struct pollfd) * walk->len))goto out_fds;todo -= walk->len;if (!todo)break;len = min(todo, POLLFD_PER_PAGE);size = sizeof(struct poll_list) + sizeof(struct pollfd) * len;walk = walk->next = kmalloc(size, GFP_KERNEL);if (!walk) {err = -ENOMEM;goto out_fds;}}//进行一些初始化的动作poll_initwait(&table);//关键调用函数，会去调用do_poll函数，通过其返回值来判断其可读的个数，同时table指针是poll_wqueues类型的指针//该poll_wqueues结构体包含有poll_table、poll_table_page、task_struct等非常重要的结构体和指针，例如后面要用到的等待队列项wait_queue_t就存放在//poll_table_page->poll_table_entry->wait_queue_t中，当然poll_wait函数是可以通过poll_table_struct的地址找到poll_table_entry的。//看我们这里的第二个参数head，其实就是walk的地址（poll_list指针类型）。fdcount = do_poll(nfds, head, &table, end_time);poll_freewait(&table);for (walk = head; walk; walk = walk->next) {struct pollfd *fds = walk->entries;int j;for (j = 0; j < walk->len; j++, ufds++)if (__put_user(fds[j].revents, &ufds->revents))goto out_fds;  }err = fdcount;out_fds:walk = head->next;while (walk) {struct poll_list *pos = walk;walk = walk->next;kfree(pos);}return err;}

下面继续看我们的do_poll函数的源代码

static int do_poll(unsigned int nfds,  struct poll_list *list,   struct poll_wqueues *wait, struct timespec *end_time){poll_table* pt = &wait->pt;ktime_t expire, *to = NULL;int timed_out = 0, count = 0;unsigned long slack = 0;/* Optimise the no-wait case */if (end_time && !end_time->tv_sec && !end_time->tv_nsec) {pt = NULL;timed_out = 1;}if (end_time && !timed_out)slack = select_estimate_accuracy(end_time);for (;;) {struct poll_list *walk;//看这里面的walk首先会指向list。每一个walk都应该代表一个设备文件，因为walk里面的entries数组只有一个元素用来存放fd信息的for (walk = list; walk != NULL; walk = walk->next) {struct pollfd * pfd, * pfd_end;pfd = walk->entries;pfd_end = pfd + walk->len;for (; pfd != pfd_end; pfd++) {/* * Fish for events. If we found one, record it * and kill the poll_table, so we don't * needlessly register any other waiters after * this. They'll get immediately deregistered * when we break out and return. *///调用do_pollfd函数，该函数会调用我们自己编写驱动程序的file_operation->poll函数指针指向的函数if (do_pollfd(pfd, pt)) {count++;pt = NULL;}}}/* * All waiters have already been registered, so don't provide * a poll_table to them on the next loop iteration. */pt = NULL;if (!count) {count = wait->error;if (signal_pending(current))count = -EINTR;}if (count || timed_out)break;/* * If this is the first loop and we have a timeout * given, then we convert to ktime_t and set the to * pointer to the expiry value. */if (end_time && !to) {expire = timespec_to_ktime(*end_time);to = &expire;}if (!poll_schedule_timeout(wait, TASK_INTERRUPTIBLE, to, slack))  //关键函数，该函数会使进程将当前进程休眠，因为在此前该进程已经被标识在等待队列上了timed_out = 1;}return count;}

这里要注意的是poll_list是一个单向链表，这个链表应该表示不同的fd设备文件信息，所以这也是为什么poll有轮询的机制，它会轮询
所有fd设备驱动程序的poll函数。
下面我们再继续看我们有一个关键函数do_pollfd，这个函数的作用在于可以去调用我们自己在驱动程序里面即file_operation->poll函数指针指向的函数

static inline unsigned int do_pollfd(struct pollfd *pollfd, poll_table *pwait){unsigned int mask;int fd;mask = 0;fd = pollfd->fd;if (fd >= 0) {int fput_needed;struct file * file;file = fget_light(fd, &fput_needed);mask = POLLNVAL;if (file != NULL) {mask = DEFAULT_POLLMASK;if (file->f_op && file->f_op->poll) {if (pwait)pwait->key = pollfd->events |POLLERR | POLLHUP;mask = file->f_op->poll(file, pwait);  //这句代码很关键，在这里就直接调用驱动程序的poll函数了}/* Mask out unneeded events. */mask &= pollfd->events | POLLERR | POLLHUP;fput_light(file, fput_needed);}}pollfd->revents = mask;return mask;}

通常我们自己在驱动程序中编写poll函数时都会是这样的模式

static unsigned int button_poll(struct file *file, struct poll_table_struct *wait){unsigned int mask = 0; //定义一个返回变量poll_wait(file,&button_waitq,wait)  //将当前进程加入以button_waitq（通常是自己定义）为队列头的等待队列if(flag)  //看是否有新的数据可读，如果有则将标志位maks |=POLLIN|POLLRDNORM，并返回。maks |=POLLIN|POLLRDNORM;return mask; }

这里我们使用了poll机制一个非常重要的函数poll_wait函数，这个函数的主要作用就是将当前进程加入等待队列，并且将进程的状态标志位置为interruptable
下面我们来看看poll_wait函数

static void __pollwait(struct file *filp, wait_queue_head_t *wait_address,poll_table *p){struct poll_wqueues *pwq = container_of(p, struct poll_wqueues, pt);struct poll_table_entry *entry = poll_get_entry(pwq);if (!entry)return;get_file(filp);entry->filp = filp;entry->wait_address = wait_address;entry->key = p->key;init_waitqueue_func_entry(&entry->wait, pollwake);entry->wait.private = pwq;add_wait_queue(wait_address, &entry->wait);}

注意这里面有一个 poll_get_entry函数，该函数是取得一个poll_table_entry的地址，我们只要关注poll_get_entry的最后一个代码就可

static struct poll_table_entry *poll_get_entry(struct poll_wqueues *p){struct poll_table_page *table = p->table;if (p->inline_index < N_INLINE_POLL_ENTRIES)return p->inline_entries + p->inline_index++;if (!table || POLL_TABLE_FULL(table)) {struct poll_table_page *new_table;new_table = (struct poll_table_page *) __get_free_page(GFP_KERNEL);if (!new_table) {p->error = -ENOMEM;return NULL;}new_table->entry = new_table->entries;new_table->next = table;p->table = new_table;table = new_table;}return table->entry++;  //这个代码很关键，entry++，这样就保证了每一个我们调用poll_wait函数的时候，都会有一个新的wait_queue_t项}

1 0