epoll的LT模式和ET模式

来源：互联网发布：帝国cms仿站工具编辑：程序博客网时间：2024/05/01 23:39

man page：http://linux.die.net/man/4/epoll

epoll事件分发系统可以运转在两种模式下：ET（边缘触发模式）和LT（水平触发模式）。

接下来说明这两种模式的不同之处，假设一个环境如下：

1、我们已经把一个用来从管道中读取数据的文件句柄（RFD）添加到epoll描述符

2、这个时候从管道的另一端被写入2K的数据

3、调用epoll_wait()，并且它会返回RFD，说明它已经准备好读取数据

4、然后我们读取了1K的数据

5、调用epoll_wait()........

Edge Triggered工作模式：

如果我们在第1步将RFD添加进epoll描述符的时候使用了EPOLLET标志，那么在第5步调用epoll_wait()之后将可能会挂起，因为剩余的数据还存在于文件的缓冲区中并且数据发送端还在等待一个针对它所发数据的反馈信息。这是因为只有在监听的文件句柄上发生了某个事件的时候ET模式才会汇报事件。因此在第5步的时候，调用者可能放弃等待仍然存在于文件缓冲区内的剩余数据。在上面的例子中，由于第2步的一个写操作引发了一个RFD上的事件，然后该事件在第3步中被消耗（使用）。因为第4步中的读操作没有读完整个缓冲区的数据，第5步调用epoll_wait()可能导致无限期的阻塞。epoll在ET模式下必须使用非阻塞套接口，以避免一个文件句柄的阻塞读或阻塞写把处理多个文件描述符的任务饿死。最好以下面的方式调用ET模式下的epoll接口，在后面会介绍避免可能的缺陷。

i 基于非阻塞文件描述符

ii 只有当read()或write()返回EAGAIN时才需要挂起

相反的，以LT方式调用epoll接口的时候，它就相当于一个速度比较快的poll(2)，并且无论后面的数据是否被使用，因此他们具有同样的职能。因为即使使用ET模式的epoll，在收到多个chunk的数据的时候仍然会产生多个事件。调用者可以设定EPOLLONESHOT标志，在 epoll_wait(2)收到事件后epoll会将与事件关联的文件句柄从epoll描述符中禁止掉。因此当EPOLLONESHOT设定后，使用带有 EPOLL_CTL_MOD标志的epoll_ctl(2)处理文件句柄就成为调用者必须作的事情。

建议看下面的英文原文：

If the RFD file descriptor has been added to the epoll interface using theEPOLLET flag, the call toepoll_wait(2) done in step5 will probably hang because of the available data still present in the file input buffers and the remote peer might be expecting a response based on the data it already sent. The reason for this is that Edge Triggered event distribution delivers events only when events happens on the monitored file. So, in step5 the caller might end up waiting for some data that is already present inside the input buffer. In the above example, an event onRFD will be generated because of the write done in2 , and the event is consumed in3. Since the read operation done in4 does not consume the whole buffer data, the call toepoll_wait(2) done in step5 might lock indefinitely. The epoll interface, when used with theEPOLLET flag ( Edge Triggered ) should use non-blocking file descriptors to avoid having a blocking read or write starve the task that is handling multiple file descriptors. The suggested way to useepoll as an Edge Triggered (EPOLLET ) interface is below, and possible pitfalls to avoid follow.

i
with non-blocking file descriptors
ii
by going to wait for an event only after read(2) orwrite(2)
return EAGAIN

On the contrary, when used as a Level Triggered interface, epoll is by all means a fasterpoll(2), and can be used wherever the latter is used since it shares the same semantics. Since even with the Edge Triggeredepoll multiple events can be generated up on receival of multiple chunks of data, the caller has the option to specify theEPOLLONESHOT flag, to tell epoll to disable the associated file descriptor after the receival of an event withepoll_wait(2). When theEPOLLONESHOT flag is specified, it is caller responsibility to rearm the file descriptor usingepoll_ctl(2) withEPOLL_CTL_MOD.

下面两段是从网上找的epoll的较详细的解释：

LT(level triggered)是缺省的工作方式，并且同时支持block和no-blocksocket.在这种做法中，内核告诉你一个文件描述符是否就绪了，然后你可以对这个就绪的fd进行IO操作。如果你不作任何操作，内核还是会继续通知你的，所以，这种模式编程出错误可能性要小一点。传统的select/poll都是这种模型的代表．

ET(edge-triggered)是高速工作方式，只支持no-block socket。在这种模式下，当描述符从未就绪变为就绪时，内核通过epoll告诉你。然后它会假设你知道文件描述符已经就绪，并且不会再为那个文件描述符发送更多的就绪通知，直到你做了某些操作导致那个文件描述符不再为就绪状态了(比如，你在发送，接收或者接收请求，或者发送接收的数据少于一定量时导致了一个EWOULDBLOCK错误）。但是请注意，如果一直不对这个fd作IO操作(从而导致它再次变成未就绪)，内核不会发送更多的通知(only once),不过在TCP协议中，ET模式的加速效用仍需要更多的benchmark确认。

在许多测试中我们会看到如果没有大量的idle-connection或者dead-connection，epoll的效率并不会比select/poll高很多，但是当我们遇到大量的idle- connection(例如WAN环境中存在大量的慢速连接)，就会发现epoll的效率大大高于select/poll。

另外，man epoll中说明了recv和send的时候缓冲区空和满的情况，这个错误是EAGAIN，则需要等待一段时间再recv或send.并给了一个代码的例子,man中给出了epoll的用法，example程序如下：

for(;;) {nfds = epoll_wait(kdpfd, events, maxevents, -1);for(n = 0; n < nfds; ++n) {   if(events[n].data.fd == listener) {       client = accept(listener, (struct sockaddr *) &local,                       &addrlen);       if(client < 0){           perror("accept");           continue;       }       setnonblocking(client);       ev.events = EPOLLIN | EPOLLET;       ev.data.fd = client;       if (epoll_ctl(kdpfd, EPOLL_CTL_ADD, client, &ev) < 0) {           fprintf(stderr, "epoll set insertion error: fd=%d\n",                   client);           return -1;       }   }   else       do_use_fd(events[n].data.fd);}

为了考虑上面提到的数据没有读完，或发送缓冲满的情况，对EPOLLIN事件和EPOLLOUT事件的处理如下：

读数据的时候需要考虑的是当recv()返回的大小如果等于请求的大小，那么很有可能是缓冲区还有数据未读完，也意味着该次事件还没有处理完，所以还需要再次读取：

while(rs){buflen = recv(activeevents[i].data.fd, buf, sizeof(buf), 0);if(buflen < 0){// 由于是非阻塞的模式,所以当errno为EAGAIN时,表示当前缓冲区已无数据可读// 在这里就当作是该次事件已处理处.if(errno == EAGAIN)break;elsereturn;}else if(buflen == 0){// 这里表示对端的socket已正常关闭.}if(buflen == sizeof(buf)rs = 1;   // 需要再次读取elsers = 0;}

还有，假如发送端流量大于接收端的流量(意思是epoll所在的程序读比转发的socket要快),由于是非阻塞的socket,那么 send()函数虽然返回,但实际缓冲区的数据并未真正发给接收端,这样不断的读和发，当缓冲区满后会产生EAGAIN错误(参考man send),同时,不理会这次请求发送的数据.所以,需要封装socket_send()的函数用来处理这种情况,该函数会尽量将数据写完再返回，返回 -1表示出错。在socket_send()内部,当写缓冲已满(send()返回-1,且errno为EAGAIN),那么会等待后再重试.这种方式并不很完美,在理论上可能会长时间的阻塞在socket_send()内部,但暂没有更好的办法.

ssize_t socket_send(int sockfd, const char* buffer, size_t buflen){ssize_t tmp;size_t total = buflen;const char *p = buffer;while(1){tmp = send(sockfd, p, total, 0);if(tmp < 0){// 当send收到信号时,可以继续写,但这里返回-1.if(errno == EINTR)return -1;// 当socket是非阻塞时,如返回此错误,表示写缓冲队列已满,// 在这里做延时后再重试.if(errno == EAGAIN){usleep(1000);continue;}return -1;}if((size_t)tmp == total)return buflen;total -= tmp;p += tmp;}return tmp;}

另外，epoll_create的参数是该epoll_create描述符所监听的最大描述符数，epoll_wait的maxevents是 epoll_wait一次返回所带回的最大的事件数，epoll_wait的timeout是-1时表示等待时间不确定。如果出现连接断了，或未预见的事件，会导致这个客户端描述符一直占用，不能被释放。在http://www.chinaunix.net/jh/23/813588.html一贴中就集中讨论这个问题。

0 0