epoll 编程注意事项以及参数查看

来源：互联网发布：竞价数据分析思路老卢编辑：程序博客网时间：2024/05/05 13:35

每次接受新连接的时候，我监视了这几个事件。

EPOLLIN | EPOLLET |  EPOLLERR | EPOLLHUP | EPOLLPRI;

每次有一批事件返回，经过统计
返回的一批fd数量=出错关闭的fd数量+由EPOLLIN转为EPOLLOUT的fd数量+EPOLLOUT正常处理关闭的fd的数量。也就是说，每批事件都完全处理，没有遗漏。

观察发现EPOLLET |  EPOLLERR | EPOLLHUP 这3发事件的发生率为0。

但fd却成增大趋势。以前那写较小的fd在经历一段时间后渐渐丢失，不再可用。

请问fd都丢失到哪里去了？

-----------------------------------------------------------------------

后来经常有人写信问我这个问题，我在帖子里回复过，好象帖子太多了，不好找，还是写在这里吧。
单纯靠epoll来管理描述符不泄露几乎是不可能的。
完全解决方案很简单，就是对每个fd设置超时时间，如果超过timeout的时间，这个fd没有活跃过，就close掉。

[ 本帖最后由 wyezl 于 2008-12-13 21:42 编辑 ]

思一克回复于：2006-08-18 10:46:50

有程序？

wyezl 回复于：2006-08-18 11:10:12

/*-------------------------------------------------------------------------------------------------
gcc -o httpd httpd.c -lpthread
author: wyezl
2006.4.28
---------------------------------------------------------------------------------------------------*/

#include <sys/socket.h>
#include <sys/epoll.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
#include <pthread.h>
#include <errno.h>

#define PORT 8888
#define MAXFDS 5000
#define EVENTSIZE 100

#define BUFFER "HTTP/1.1 200 OK\r\nContent-Length: 5\r\nConnection: close\r\nContent-Type: text/html\r\n\r\nHello"

int epfd;
void *serv_epoll(void *p);
void setnonblocking(int fd)
{
    int opts;
    opts=fcntl(fd, F_GETFL);
    if (opts < 0)
    {
          fprintf(stderr, "fcntl failed\n");
          return;
    }
    opts = opts | O_NONBLOCK;
    if(fcntl(fd, F_SETFL, opts) < 0)
    {
          fprintf(stderr, "fcntl failed\n");
          return;
    }
    return;
}

int main(int argc, char *argv[])
{
    int fd, cfd,opt=1;
    struct epoll_event ev;
    struct sockaddr_in sin, cin;
    socklen_t sin_len = sizeof(struct sockaddr_in);
    pthread_t tid;
    pthread_attr_t attr;

    epfd = epoll_create(MAXFDS);
    if ((fd = socket(AF_INET, SOCK_STREAM, 0)) <= 0)
    {
          fprintf(stderr, "socket failed\n");
          return -1;
    }
    setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, (const void*)&opt, sizeof(opt));

    memset(&sin, 0, sizeof(struct sockaddr_in));
    sin.sin_family = AF_INET;
    sin.sin_port = htons((short)(PORT));
    sin.sin_addr.s_addr = INADDR_ANY;
    if (bind(fd, (struct sockaddr *)&sin, sizeof(sin)) != 0)
    {
          fprintf(stderr, "bind failed\n");
          return -1;
    }
    if (listen(fd, 32) != 0)
    {
          fprintf(stderr, "listen failed\n");
          return -1;
    }

    pthread_attr_init(&attr);
    pthread_attr_setdetachstate(&attr,PTHREAD_CREATE_DETACHED);
    if (pthread_create(&tid, &attr, serv_epoll, NULL) != 0)
    {
          fprintf(stderr, "pthread_create failed\n");
          return -1;
    }

    while ((cfd = accept(fd, (struct sockaddr *)&cin, &sin_len)) > 0)
    {
          setnonblocking(cfd);
          ev.data.fd = cfd;
          ev.events = EPOLLIN | EPOLLET |  EPOLLERR | EPOLLHUP | EPOLLPRI;
          epoll_ctl(epfd, EPOLL_CTL_ADD, cfd, &ev);
          //printf("connect from %s\n",inet_ntoa(cin.sin_addr));
          //printf("cfd=%d\n",cfd);
    }

    if (fd > 0)
          close(fd);
    return 0;
}

void *serv_epoll(void *p)
{
    int i, ret, cfd, nfds;;
    struct epoll_event ev,events[EVENTSIZE];
    char buffer[512];

    while (1)
    {
          nfds = epoll_wait(epfd, events, EVENTSIZE , -1);
          //printf("nfds ........... %d\n",nfds);
          for (i=0; i<nfds; i++)
          {
                if(events.events & EPOLLIN)
                {
                    cfd = events.data.fd;
                    ret = recv(cfd, buffer, sizeof(buffer),0);
                    //printf("read ret..........= %d\n",ret);

                    ev.data.fd = cfd;
                    ev.events = EPOLLOUT | EPOLLET;
                    epoll_ctl(epfd, EPOLL_CTL_MOD, cfd, &ev);
                }
                else if(events.events & EPOLLOUT)
                {
                    cfd = events.data.fd;
                    ret = send(cfd, BUFFER, strlen(BUFFER), 0);
                    //printf("send ret...........= %d\n", ret);

                    ev.data.fd = cfd;
                    epoll_ctl(epfd, EPOLL_CTL_DEL, cfd, &ev);

                    close(cfd);

                }

                   else
      {

cfd = events.data.fd;
ev.data.fd = cfd;
epoll_ctl(epfd, EPOLL_CTL_DEL, cfd, &ev);
close(cfd);
       }

          }
    }
    return NULL;
}

wyezl 回复于：2006-08-18 11:23:46

只要能帮我找出描述符从哪耗尽的就行。
：）

思一克回复于：2006-08-18 11:50:07

是不是main()中的cfd没有关闭

wyezl 回复于：2006-08-18 12:25:12

main中只接受连接，加入监视，不做处理。
估计epoll_ctl 没判断出错，我下午再测试一下。

wyezl 回复于：2006-08-18 13:27:07

if(epoll_ctl(epfd, EPOLL_CTL_ADD, cfd, &ev)<0)
printf("............................................EPOLL_CTL_ADD error!\n");

if(epoll_ctl(epfd, EPOLL_CTL_MOD, cfd, &ev)<0)
printf(".......................................EPOLL_CTL_MOD error!\n");

修改了两句，但未发现这里的错误输出，估计也不是这的问题。

不知道到底从哪耗尽的，奇怪。

wyezl 回复于：2006-08-18 15:50:15

继续顶。。。。。。。。。。。。。。。。

思一克回复于：2006-08-18 15:58:14

想帮你实验，但程序无法编译

playmud 回复于：2006-08-19 20:48:39

挑挑错吧，说的不对的话，请见谅！
1， while ((cfd = accept(fd, (struct sockaddr *)&cin, &sin_len)) > 0)
可够改成
while(1)
cfd = accept(fd, (struct sockaddr *)&cin, &sin_len;
if(cfd>0)
..
2，你是要让客户端发送一次就不再发送了吗？

                if(events.events & EPOLLIN)
                {
                    cfd = events.data.fd;
                    ret = recv(cfd, buffer, sizeof(buffer),0);
                    //printf("read ret..........= %d\n",ret);

                    ev.data.fd = cfd;
                    ev.events = EPOLLOUT | EPOLLET;
                    epoll_ctl(epfd, EPOLL_CTL_MOD, cfd, &ev);
                }

3，将accept事件列入epoll监控的对象
4，对close最好做一个判断

tysn 回复于：2006-08-20 11:25:26

accept()之后的epoll_ctl(epfd, EPOLL_CTL_ADD, cfd, &ev);
要改成：if (epoll_ctl(epfd, EPOLL_CTL_ADD, cfd, &ev)<0) close(cfd);
估计就是这里错吧，这几率比较小，
所以出现“fd在经历一段时间后渐渐丢失”

另外还要考虑playmud挑的4点问题，
比如第一点，lz那样写的while好像跟if差不多，
accept也是在很小几率下出错，出错之后程序就退出？

说的不对的话，请指正！

wyezl 回复于：2006-08-20 12:58:20

引用：原帖由思一克于 2006-8-18 15:58 发表
想帮你实验，但程序无法编译

编译没问题，需要linux 2。6以上内核的支持。

wyezl 回复于：2006-08-20 13:10:17

引用：

挑挑错吧，说的不对的话，请见谅！
1，    while ((cfd = accept(fd, (struct sockaddr *)&cin, &sin_len)) > 0)
    可够改成
while(1)
cfd = accept(fd, (struct sockaddr *)&cin, &sin_len;
if(cfd>0)
..
2，你是要让客户端发送一次就不再发送了吗？

CODE:[Copy to clipboard]                if(events.events & EPOLLIN)
                {
                    cfd = events[ i ].data.fd;
                    ret = recv(cfd, buffer, sizeof(buffer),0);
                    //printf("read ret..........= %d\n",ret);

                    ev.data.fd = cfd;
                    ev.events = EPOLLOUT | EPOLLET;
                    epoll_ctl(epfd, EPOLL_CTL_MOD, cfd, &ev);
                }
3，将accept事件列入epoll监控的对象
4，对close最好做一个判断

...

1，如果描述符已经耗尽，这样判断cfd永远不会成立。所以就让它退出算了。

2，我只取http请求的头信息的第一行。 get  /xxx  http/1.0 做简单分析，其他的都不要了。按照tcp协议的特点，本应该多次读取的，但测试发现，只读一次就能得到我想要的，基本上没出过错。
里面的数据没读完就留那了，会不会对以后的请求造成什么不好的影响？

3，accept是在住线程里跑，加入了epoll有什么好处吗？

4，对close没做判断，这点确实是遗漏了。我得补上。
谢谢你的建议。

[ 本帖最后由 wyezl 于 2006-8-20 13:16 编辑 ]

wyezl 回复于：2006-08-20 13:14:43

引用：原帖由 tysn 于 2006-8-20 11:25 发表
accept()之后的epoll_ctl(epfd, EPOLL_CTL_ADD, cfd, &ev);
要改成：if (epoll_ctl(epfd, EPOLL_CTL_ADD, cfd, &ev)<0) close(cfd);
估计就是这里错吧，这几率比较小，
所以出现“fd在经历一段时间 ...

这一条判断我后来添加了。但这儿出错几率几乎为0 。

现在仍未发现耗尽的原因。怀疑是不是epoll机制本身有问题呢？

wyezl 回复于：2006-08-21 09:41:20

继续顶。。。。。。。。。

思一克回复于：2006-08-21 09:49:38

EPOLLIN事件后是不是应该关闭CFD?

我看的，不一定准确，因为无法实验你的程序

wyezl 回复于：2006-08-21 12:14:14

引用：原帖由思一克于 2006-8-21 09:49 发表
EPOLLIN事件后是不是应该关闭CFD?

我看的，不一定准确，因为无法实验你的程序

如果关闭了，以后就不会有EPOLLOUT了。因为我是接受请求，然后返回所请求的信息。

思一克回复于：2006-08-21 12:18:40

那你把能编译的贴出，我帮你实验。如果你愿意的话

billzhou 回复于：2006-08-21 12:34:25

不懂什么是epoll，哪位兄弟给解释一下

wyezl 回复于：2006-08-21 16:47:23

引用：原帖由思一克于 2006-8-21 12:18 发表
那你把能编译的贴出，我帮你实验。如果你愿意的话

我贴出的代码就是能编译的。
请问你用的是什么操作系统？版本？
这个程序只能在linux上运行，而且内核版本必须在2.6以上。

[ 本帖最后由 wyezl 于 2006-8-21 16:52 编辑 ]

wyezl 回复于：2006-08-21 16:49:00

思一克是个热心的斑竹，先赞一个。：）

精简指令回复于：2006-08-21 20:12:49

“但fd却成增大趋势。以前那写较小的fd在经历一段时间后渐渐丢失，不再可用。”

是否已经将FD用光了？如果没有用光，试着用到最大值，看看是否会重新分配较小的FD

safedead 回复于：2006-08-21 21:52:01

我没有这么用过epoll

我的程序是仅用epoll弹出发生EPOLIN事件的LISTEN套接字
然后就ACCEPT出客户端连接就交给线程处理了
没有发生过fd耗尽的情况

思一克回复于：2006-08-22 08:19:12

to LZ,

你的模型好象不太对，至少不太好。epoll_wait和ACCEPT的次序？

你在网络上找，有现成的好的。

wyezl 回复于：2006-08-22 09:54:56

引用：原帖由 safedead 于 2006-8-21 21:52 发表
我没有这么用过epoll

我的程序是仅用epoll弹出发生EPOLIN事件的LISTEN套接字
然后就ACCEPT出客户端连接就交给线程处理了
没有发生过fd耗尽的情况

方便把你的模型大概贴出来看一下吗？

wyezl 回复于：2006-08-22 09:55:54

引用：原帖由精简指令于 2006-8-21 20:12 发表
“但fd却成增大趋势。以前那写较小的fd在经历一段时间后渐渐丢失，不再可用。”

是否已经将FD用光了？如果没有用光，试着用到最大值，看看是否会重新分配较小的FD

确实用光了。不能分配小的了。

星之孩子回复于：2006-08-22 10:06:27

为何你用epoll还要用多线程
感觉你这个模型怪怪的

wyezl 回复于：2006-08-22 10:32:07

只开了两个线程。 epoll独占一个。

星之孩子回复于：2006-08-22 10:34:49

多个线程这样用epoll不知道会不会有问题
其实你的应用根本不用多线程

wyezl 回复于：2006-08-22 11:05:13

引用：原帖由思一克于 2006-8-22 08:19 发表
to LZ,

你的模型好象不太对，至少不太好。epoll_wait和ACCEPT的次序？

你在网络上找，有现成的好的。

能把你觉得好的模型共享一下吗？

nuclearweapon 回复于：2006-08-22 11:43:27

刚才测试了lz的程序。
50个并发，cfd最大达到37然后回落到5。

如上测试多次没有发现cfd有增大的趋势。
所以感觉和内核版本有关系。

测试环境：

kernel 2.6.15
gcc 4.0.3

distribution:ubuntu

wyezl 回复于：2006-08-22 11:47:58

下面这个模型我也用过。不过效率很底。

#include <iostream>

#include <sys/socket.h>

#include <sys/epoll.h>

#include <netinet/in.h>

#include <arpa/inet.h>

#include <fcntl.h>

#include <unistd.h>

#include <stdio.h>

#define MAXLINE 10

#define OPEN_MAX 100

#define LISTENQ 20

#define SERV_PORT 5555

#define INFTIM 1000

void setnonblocking(int sock)

{

  int opts;

  opts=fcntl(sock,F_GETFL);

  if(opts<0)

  {

      perror("fcntl(sock,GETFL)");

      exit(1);

  }

  opts = opts|O_NONBLOCK;

  if(fcntl(sock,F_SETFL,opts)<0)

  {

      perror("fcntl(sock,SETFL,opts)");

      exit(1);

  }

}

int main()

{

  int i, maxi, listenfd, connfd, sockfd,epfd,nfds;

  ssize_t n;

  char line[MAXLINE];

  socklen_t clilen;

  //声明epoll_event结构体的变量,ev用于注册事件,数组用于回传要处理的事件

  struct epoll_event ev,events[20];

  //生成用于处理accept的epoll专用的文件描述符

  epfd=epoll_create(256);

  struct sockaddr_in clientaddr;

  struct sockaddr_in serveraddr;

  listenfd = socket(AF_INET, SOCK_STREAM, 0);

  //把socket设置为非阻塞方式

  setnonblocking(listenfd);

  //设置与要处理的事件相关的文件描述符

  ev.data.fd=listenfd;

  //设置要处理的事件类型

  ev.events=EPOLLIN|EPOLLET;

  //注册epoll事件

  epoll_ctl(epfd,EPOLL_CTL_ADD,listenfd,&ev);

  bzero(&serveraddr, sizeof(serveraddr));

  serveraddr.sin_family = AF_INET;

  char *local_addr="200.200.200.204";

  inet_aton(local_addr,&(serveraddr.sin_addr));//htons(SERV_PORT);

  serveraddr.sin_port=htons(SERV_PORT);

  bind(listenfd,(sockaddr *)&serveraddr, sizeof(serveraddr));

  listen(listenfd, LISTENQ);

  maxi = 0;

  for ( ; ; ) {

      //等待epoll事件的发生

      nfds=epoll_wait(epfd,events,20,500);

      //处理所发生的所有事件

      for(i=0;i<nfds;++i)

      {

          if(events.data.fd==listenfd)

          {

            connfd = accept(listenfd,(sockaddr *)&clientaddr, &clilen);

            if(connfd<0){

                perror("connfd<0");

                exit(1);

            }

            setnonblocking(connfd);

            char *str = inet_ntoa(clientaddr.sin_addr);

            std::cout<<"connect from "<_u115 ?tr<<std::endl;

            //设置用于读操作的文件描述符

            ev.data.fd=connfd;

            //设置用于注测的读操作事件

            ev.events=EPOLLIN|EPOLLET;

            //注册ev

            epoll_ctl(epfd,EPOLL_CTL_ADD,connfd,&ev);

          }

          else if(events.events&EPOLLIN)

          {

            if ( (sockfd = events.data.fd) < 0) continue;

            if ( (n = read(sockfd, line, MAXLINE)) < 0) {

                if (errno == ECONNRESET) {

                    close(sockfd);

                    events.data.fd = -1;

                } else

                    std::cout<<"readline error"<<std::endl;

            } else if (n == 0) {

                close(sockfd);

                events.data.fd = -1;

            }

            //设置用于写操作的文件描述符

            ev.data.fd=sockfd;

            //设置用于注测的写操作事件

            ev.events=EPOLLOUT|EPOLLET;

            //修改sockfd上要处理的事件为EPOLLOUT

            epoll_ctl(epfd,EPOLL_CTL_MOD,sockfd,&ev);

          }

          else if(events.events&EPOLLOUT)

          {

            sockfd = events.data.fd;

            write(sockfd, line, n);

            //设置用于读操作的文件描述符

            ev.data.fd=sockfd;

            //设置用于注测的读操作事件

            ev.events=EPOLLIN|EPOLLET;

            //修改sockfd上要处理的事件为EPOLIN

            epoll_ctl(epfd,EPOLL_CTL_MOD,sockfd,&ev);

          }

      }

  }

}

wyezl 回复于：2006-08-22 11:50:49

引用：原帖由 nuclearweapon 于 2006-8-22 11:43 发表
刚才测试了lz的程序。
50个并发，cfd最大达到37然后回落到5。

如上测试多次没有发现cfd有增大的趋势。
所以感觉和内核版本有关系。

测试环境：

kernel 2.6.15
gcc 4.0.3

distribution:ubuntu

50个并发，cfd最大达到37然后回落到5。是有这样的现象。所以我说是增大趋势。而没有说直线增长。

这也不是你几分钟就能测试出来的。1024个fd也需要一个多小时才能耗尽（大约处理100万请求）。而且是量比较大的线上测试。

nuclearweapon 回复于：2006-08-22 11:57:36

引用：原帖由 wyezl 于 2006-8-22 11:50 发表

50个并发，cfd最大达到37然后回落到5。是有这样的现象。所以我说是增大趋势。而没有说直线增长。

这也不是你几分钟就能测试出来的。1024个fd也需要一个多小时才能耗尽（大约处理100万请求）。而且是量比较 ...

你这里说的耗尽是什么意思？

表示Accept不能再接受client了吗？

nuclearweapon 回复于：2006-08-22 12:02:51

就调试来说,当服务程序挂起的时候
你可以看如下目录：
/proc/your_process_id/fd

看一看有那些fd在被你的进程使用！

星之孩子回复于：2006-08-22 12:38:41

你另外一个模型就是epoll例子的模型吧
凭什么说人家的就效率低了？

wyezl 回复于：2006-08-22 13:10:18

引用：原帖由星之孩子于 2006-8-22 12:38 发表
你另外一个模型就是epoll例子的模型吧
凭什么说人家的就效率低了？

我测试过。处理能力底了20%。

思一克回复于：2006-08-22 13:11:53

to LZ,

你做什么服务？有多少同时连接？

wyezl 回复于：2006-08-22 13:18:46

引用：原帖由 nuclearweapon 于 2006-8-22 12:02 发表
就调试来说,当服务程序挂起的时候
你可以看如下目录：
/proc/your_process_id/fd

看一看有那些fd在被你的进程使用！

我一共监视了5000个描述符，程序跑了一天，基本上快耗完了。还剩不到1000个了。

ls /proc/24152/fd/
Display all 4051 possibilities? (y or n)

可见这些耗尽的描述符都在使用中。但不知道什么地方没把它们释放。

wyezl 回复于：2006-08-22 13:22:36

引用：原帖由思一克于 2006-8-22 13:11 发表
to LZ,

你做什么服务？有多少同时连接？

http服务。简单的数据。基本上是读取内存的操作。类似股票行情数据。

当然每秒能处理越多请求越好。

playmud 回复于：2006-08-22 13:48:27

5000 个？我怎么看你的listen才32？
    if (listen(fd, 32) != 0)
    {
          fprintf(stderr, "listen failed\n");
          return -1;
    }

SYNOPSIS
       #include <sys/socket.h>

       int listen(int sockfd, int backlog);

DESCRIPTION
       To  accept connections, a socket is first created with socket(2), a willingness to accept incoming connec-
       tions and a queue limit for incoming connections are specified with listen(), and then the connections are
       accepted with accept(2).  The listen() call applies only to sockets of type SOCK_STREAM or SOCK_SEQPACKET.

       The backlog parameter defines the maximum length the queue of pending connections may grow to.  If a  con-
       nection request arrives with the queue full the client may receive an error with an indication of ECONNRE-
       FUSED or, if the underlying protocol supports retransmission, the request may be ignored so  that  retries
       succeed.

思一克回复于：2006-08-22 14:07:38

你的程序我看了。好象是有漏洞，而且是必须在大量连接，慢速的断线才可疑的。

一个fd有事件EPOLLIN后，如果断线，EPOLLOUT永不再来，你的fd不就永远不被关闭了吗？

请讨论。

slay78 回复于：2006-08-22 14:13:42

引用：原帖由 playmud 于 2006-8-22 13:48 发表
5000 个？我怎么看你的listen才32？
    if (listen(fd, 32) != 0)
    {
          fprintf(stderr, "listen failed\n");
          return -1;
    }

SYNOPSIS
#include <s ...

人家这个32不是表示可以连32个的意思，这个32表示最多有32个同时在连并且都没连上，第33个进不来而已

思一克回复于：2006-08-22 14:14:15

虽然有HUP等else 控制。但如果EPOLLIN之后由于网络的不好状况，其它时间不在来？如何

wyezl 回复于：2006-08-22 14:32:13

引用：原帖由 playmud 于 2006-8-22 13:48 发表
5000 个？我怎么看你的listen才32？
    if (listen(fd, 32) != 0)
    {
          fprintf(stderr, "listen failed\n");
          return -1;
    }

SYNOPSIS
#include <s ...

这个32与5000个同时在线并不矛盾。

又不是说1秒内把这5000个连线全接受进来。
5000是我的epoll所监视的最大描述符个数。

nuclearweapon 回复于：2006-08-22 14:33:08

引用：原帖由思一克于 2006-8-22 14:14 发表
一个fd有事件EPOLLIN后，如果断线，EPOLLOUT永不再来，你的fd不就永远不被关闭了吗？

由于有SO_KEEPALIVE可以避免这种情况吧（lz的程序也做了处理）。
否则就是client有问题，一直连着不放。

nuclearweapon 回复于：2006-08-22 14:34:54

引用：原帖由 wyezl 于 2006-8-22 13:18 发表

我一共监视了5000个描述符，程序跑了一天，基本上快耗完了。还剩不到1000个了。

ls /proc/24152/fd/
Display all 4051 possibilities? (y or n)

可见这些耗尽的描述符都在使用中。但不知道什么地方 ...

用ls -l 看下是那些socket
再用 netstat -a看下这些socket的状态

wyezl 回复于：2006-08-22 14:38:09

引用：原帖由思一克于 2006-8-22 14:07 发表
你的程序我看了。好象是有漏洞，而且是必须在大量连接，慢速的断线才可疑的。

就是漫漫耗尽的。

第二种模型也一样。出现这样耗尽的情况。以前都是用测试工具测试的，基本上看不出破绽。
在线上测试，就很明显了。。

思一克回复于：2006-08-22 14:41:07

to wyezl,

FD泄露，原因就是没有close是无疑问的。
WHY MEIYOU close?
就是事件有的时候（比如网络坏等原因）没有到来。

wyezl 回复于：2006-08-22 14:51:33

引用：原帖由 nuclearweapon 于 2006-8-22 14:34 发表

用ls -l 看下是那些socket
再用 netstat -a看下这些socket的状态

这是只监视1024个fds的时候的部分贴图。

# netstat -a
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address               Foreign Address             State
tcp        0      0 *:32768                     *:*                         LISTEN
tcp        0      0 *:sunrpc                    *:*                         LISTEN
tcp        0      0 *:http                      *:*                         LISTEN
tcp        0      0 xxx.108.37.77:http          122.48.0.37:56652           SYN_RECV
tcp        0      0 xxx.108.37.77:http          211.147.253.74:55689        SYN_RECV
tcp        0      0 xxx.108.37.77:http          219.135.251.110:2669        SYN_RECV
warning, got duplicate tcp line.
tcp        0      0 xxx.108.37.77:http          122.48.0.89:12881           SYN_RECV
tcp        0      0 xxx.108.37.77:http          58.246.194.192:4254         SYN_RECV
tcp        0      0 xxx.108.37.77:http          122.48.0.89:12923           SYN_RECV
tcp        0      0 xxx.108.37.77:http          218.22.98.170:56851         SYN_RECV
tcp        0      0 xxx.108.37.77:http          122.48.0.89:12888           SYN_RECV
tcp        0      0 xxx.108.37.77:http          60.190.192.46:4072          SYN_RECV
tcp        0      0 xxx.108.37.77:http          61.177.227.182:27180        SYN_RECV
tcp        0      0 xxx.108.37.77:http          60.0.218.7:41237            SYN_RECV
tcp        0      0 xxx.108.37.77:http          221.232.42.194:3713         SYN_RECV
tcp        0      0 xxx.108.37.77:http          218.81.111.25:4495          SYN_RECV
tcp        0      0 xxx.108.37.77:http          122.48.0.35:59812           SYN_RECV
tcp        0      0 xxx.108.37.77:http          58.60.5.33:7660             SYN_RECV
tcp        0      0 xxx.108.37.77:http          222.137.4.7:4296            SYN_RECV

ls -l /proc/27952/fd/
rwx------  1 root root 64 Aug 22 14:44 837 -> socket:[31398309]
lrwx------  1 root root 64 Aug 22 14:44 838 -> socket:[31397749]
lrwx------  1 root root 64 Aug 22 14:44 839 -> socket:[31394222]
lrwx------  1 root root 64 Aug 22 14:38 84 -> socket:[30926430]
lrwx------  1 root root 64 Aug 22 14:44 840 -> socket:[31398135]
lrwx------  1 root root 64 Aug 22 14:44 841 -> socket:[31398930]
lrwx------  1 root root 64 Aug 22 14:44 842 -> socket:[31397984]
lrwx------  1 root root 64 Aug 22 14:44 843 -> socket:[31390139]
lrwx------  1 root root 64 Aug 22 14:44 844 -> socket:[31398331]
lrwx------  1 root root 64 Aug 22 14:44 845 -> socket:[31397572]
lrwx------  1 root root 64 Aug 22 14:44 846 -> socket:[31397546]
lrwx------  1 root root 64 Aug 22 14:44 847 -> socket:[31396094]
lrwx------  1 root root 64 Aug 22 14:44 848 -> socket:[31393666]
lrwx------  1 root root 64 Aug 22 14:44 849 -> socket:[31398932]
lrwx------  1 root root 64 Aug 22 14:38 85 -> socket:[30884712]
lrwx------  1 root root 64 Aug 22 14:44 852 -> socket:[31397555]
lrwx------  1 root root 64 Aug 22 14:44 856 -> socket:[31390278]
lrwx------  1 root root 64 Aug 22 14:44 858 -> socket:[31392652]
lrwx------  1 root root 64 Aug 22 14:44 859 -> socket:[31392710]
lrwx------  1 root root 64 Aug 22 14:38 86 -> socket:[30883810]
lrwx------  1 root root 64 Aug 22 14:38 87 -> socket:[30913192]
lrwx------  1 root root 64 Aug 22 14:38 88 -> socket:[30943036]
lrwx------  1 root root 64 Aug 22 14:38 89 -> socket:[31133080]
lrwx------  1 root root 64 Aug 22 14:38 9 -> socket:[30951877]
lrwx------  1 root root 64 Aug 22 14:38 90 -> socket:[30999630]
lrwx------  1 root root 64 Aug 22 14:38 91 -> socket:[31134432]
lrwx------  1 root root 64 Aug 22 14:38 92 -> socket:[30928870]
lrwx------  1 root root 64 Aug 22 14:38 93 -> socket:[30975324]
lrwx------  1 root root 64 Aug 22 14:38 94 -> socket:[30936083]

wyezl 回复于：2006-08-22 14:52:50

引用：原帖由思一克于 2006-8-22 14:41 发表
to wyezl,

FD泄露，原因就是没有close是无疑问的。
WHY MEIYOU close?
就是事件有的时候（比如网络坏等原因）没有到来。

这种情况怎么处理才好呢？

思一克回复于：2006-08-22 14:55:05

你可以设置一个timeout, 超过的cfd(在ACCPTE处）一律关闭

wyezl 回复于：2006-08-22 14:56:56

引用：原帖由 nuclearweapon 于 2006-8-22 14:33 发表

由于有SO_KEEPALIVE可以避免这种情况吧（lz的程序也做了处理）。
否则就是client有问题，一直连着不放。

我没有支持KEEPALIVE。发送完处理，立刻就关闭描述符了。

wyezl 回复于：2006-08-22 15:01:52

引用：原帖由思一克于 2006-8-22 14:55 发表
你可以设置一个timeout, 超过的cfd(在ACCPTE处）一律关闭

怎么为每个描述符设置超时？

nuclearweapon 回复于：2006-08-22 15:01:58

引用：原帖由 wyezl 于 2006-8-22 14:51 发表

这是只监视1024个fds的时候的部分贴图。

# netstat -a
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State ...

难道有人攻击你:-(

SYN_RECV有多少个！

nuclearweapon 回复于：2006-08-22 15:04:54

你打开tcp_syncookies
试试。
再做测试

思一克回复于：2006-08-22 15:07:31

可能不是好方法：

while( ......... accept() ....) {

....这里检查全局的time_t fd_create[5000];

}

关闭时设置fd_create[fd] = 0;

wyezl 回复于：2006-08-22 15:11:15

引用：原帖由 nuclearweapon 于 2006-8-22 15:01 发表

难道有人攻击你:-(

SYN_RECV有多少个！

几百个吧。贴出来。省略一部分后面的。
# netstat -a
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address               Foreign Address             State
tcp        0      0 *:32768                     *:*                         LISTEN
tcp        0      0 *:sunrpc                    *:*                         LISTEN
tcp        0      0 *:http                      *:*                         LISTEN
tcp        0      0 xxx.108.37.77:http          58.217.195.47:4689          SYN_RECV
tcp        0      0 xxx.108.37.77:http          58.47.42.74:2899            SYN_RECV
tcp        0      0 xxx.108.37.77:http          218.68.242.127:1238         SYN_RECV
tcp        0      0 xxx.108.37.77:http          222.67.10.49:30384          SYN_RECV
tcp        0      0 xxx.108.37.77:http          pc68.broad.dynamic.xm.:2215 SYN_RECV
tcp        0      0 xxx.108.37.77:http          220.191.231.198:47700       SYN_RECV
tcp        0      0 xxx.108.37.77:http          218.77.186.194:63425        SYN_RECV
tcp        0      0 xxx.108.37.77:http          220.207.230.113:3652        SYN_RECV
tcp        0      0 xxx.108.37.77:http          60.7.59.197:3216            SYN_RECV
tcp        0      0 xxx.108.37.77:http          123.49.160.226:3127         SYN_RECV
warning, got duplicate tcp line.
tcp        0      0 xxx.108.37.77:http          122.48.0.89:13698           SYN_RECV
tcp        0      0 xxx.108.37.77:http          58.67.158.106:krb524        SYN_RECV
tcp        0      0 xxx.108.37.77:http          218.79.187.16:3129          SYN_RECV
tcp        0      0 xxx.108.37.77:http          122.48.0.89:13680           SYN_RECV
tcp        0      0 xxx.108.37.77:http          60.171.192.39:2531          SYN_RECV
tcp        0      0 xxx.108.37.77:http          60.216.170.48:xxx5          SYN_RECV
warning, got duplicate tcp line.
tcp        0      0 xxx.108.37.77:http          218.22.68.194:1101          SYN_RECV
tcp        0      0 xxx.108.37.77:http          219.137.172.101:15053       SYN_RECV
tcp        0      0 xxx.108.37.77:http          61.141.94.21:4748           SYN_RECV
tcp        0      0 xxx.108.37.77:http          58.24.101.69:4188           SYN_RECV
tcp        0      0 xxx.108.37.77:http          222.82.225.134:3118         SYN_RECV
tcp        0      0 xxx.108.37.77:http          59.52.119.145:55806         SYN_RECV
tcp        0      0 xxx.108.37.77:http          124.248.1.69:50344          SYN_RECV
tcp        0      0 xxx.108.37.77:http          220.171.79.196:3014         SYN_RECV
tcp        0      0 xxx.108.37.77:http          123.49.164.148:55258        SYN_RECV
tcp        0      0 xxx.108.37.77:http          212.193.163.60.broad.:63429 SYN_RECV
tcp        0      0 xxx.108.37.77:http          122.48.0.89:13754           SYN_RECV
tcp        0      0 xxx.108.37.77:http          61.188.210.2:3532           SYN_RECV
tcp        0      0 xxx.108.37.77:http          122.48.0.89:13752           SYN_RECV

[root@sina src]# netstat -an | more
warning, got duplicate tcp line.
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address               Foreign Address             State
tcp        0      0 0.0.0.0:32768               0.0.0.0:*                   LISTEN
tcp        0      0 0.0.0.0:111                 0.0.0.0:*                   LISTEN
tcp        0      0 0.0.0.0:80                  0.0.0.0:*                   LISTEN
tcp        0      0 xxx.108.37.77:80            xxx.103.215.242:62165       SYN_RECV
tcp        0      0 xxx.108.37.77:80            220.191.231.198:47700       SYN_RECV
tcp        0      0 xxx.108.37.77:80            218.246.73.21:60164         SYN_RECV
tcp        0      0 xxx.108.37.77:80            222.64.136.219:2881         SYN_RECV
tcp        0      0 xxx.108.37.77:80            123.49.160.226:3127         SYN_RECV
tcp        0      0 xxx.108.37.77:80            60.63.11.185:22834          SYN_RECV
tcp        0      0 xxx.108.37.77:80            58.51.39.146:1870           SYN_RECV
tcp        0      0 xxx.108.37.77:80            122.48.0.89:13698           SYN_RECV
tcp        0      0 xxx.108.37.77:80            211.147.240.74:3076         SYN_RECV
tcp        0      0 xxx.108.37.77:80            59.54.226.64:1135           SYN_RECV
tcp        0      0 xxx.108.37.77:80            218.28.166.198:61757        SYN_RECV
tcp        0      0 xxx.108.37.77:80            222.82.225.134:3118         SYN_RECV
tcp        0      0 xxx.108.37.77:80            58.35.242.69:23660          SYN_RECV
tcp        0      0 xxx.108.37.77:80            211.92.156.131:1210         SYN_RECV
tcp        0      0 xxx.108.37.77:80            221.5.82.81:2100            SYN_RECV
tcp        0      0 xxx.108.37.77:80            220.171.79.196:3014         SYN_RECV
tcp        0      0 xxx.108.37.77:80            60.163.193.212:63429        SYN_RECV
tcp        0      0 xxx.108.37.77:80            122.48.0.89:13754           SYN_RECV
tcp        0      0 xxx.108.37.77:80            61.188.210.2:3532           SYN_RECV
tcp        0      0 xxx.108.37.77:80            218.82.23.151:64811         SYN_RECV
tcp        0      0 xxx.108.37.77:80            122.48.0.89:13752           SYN_RECV
tcp        0      0 xxx.108.37.77:80            218.18.33.108:4641          SYN_RECV
tcp        0      0 xxx.108.37.77:80            211.95.165.175:61439        SYN_RECV
tcp        0      0 xxx.108.37.77:80            218.28.188.86:52324         SYN_RECV
tcp        0      0 xxx.108.37.77:80            211.92.156.131:1215         SYN_RECV
tcp        0      0 xxx.108.37.77:80            211.92.156.131:1206         SYN_RECV
tcp        0      0 xxx.108.37.77:80            218.71.200.192:1503         SYN_RECV
tcp        0      0 xxx.108.37.77:80            221.200.137.120:1619        SYN_RECV
tcp        0      0 xxx.108.37.77:80            218.59.106.182:53008        SYN_RECV
tcp        0      0 xxx.108.37.77:80            211.69.18.125:3158          SYN_RECV
tcp        0      0 xxx.108.37.77:80            218.81.193.198:1434         SYN_RECV
tcp        0      0 xxx.108.37.77:80            61.144.254.129:50050        SYN_RECV
tcp        0      0 xxx.108.37.77:80            211.92.156.131:1216         SYN_RECV
tcp        0      0 xxx.108.37.77:80            218.28.58.69:9898           SYN_RECV
tcp        0      0 xxx.108.37.77:80            220.234.241.178:1679        SYN_RECV
tcp        0      0 xxx.108.37.77:80            219.242.196.109:2261        SYN_RECV
tcp        0      0 xxx.108.37.77:80            124.21.243.228:1648         SYN_RECV
tcp        0      0 xxx.108.37.77:80            221.200.137.120:1620        SYN_RECV
tcp        0      0 xxx.108.37.77:80            222.35.126.211:1336         SYN_RECV
tcp        0      0 xxx.108.37.77:80            221.234.146.160:2188        SYN_RECV
tcp        0      0 xxx.108.37.77:80            222.169.187.100:3529        SYN_RECV
tcp        0      0 xxx.108.37.77:80            122.48.0.89:13748           SYN_RECV
--More--warning, got duplicate tcp line.
warning, got duplicate tcp line.
warning, got duplicate tcp line.
tcp        0      0 xxx.108.37.77:80            218.246.73.21:60165         SYN_RECV
tcp        0      0 xxx.108.37.77:80            221.196.131.130:4444        SYN_RECV
tcp        0      0 xxx.108.37.77:80            220.174.8.13:12043          SYN_RECV
tcp        0      0 xxx.108.37.77:80            211.95.165.175:61440        SYN_RECV
tcp        0      0 xxx.108.37.77:80            220.205.30.17:1519          SYN_RECV
tcp        0      0 xxx.108.37.77:80            61.188.210.2:3531           SYN_RECV
tcp        0      0 xxx.108.37.77:80            60.163.193.212:63427        SYN_RECV
tcp        0      0 xxx.108.37.77:80            218.94.127.135:30598        SYN_RECV
tcp        0      0 xxx.108.37.77:80            219.155.143.97:1120         SYN_RECV
tcp        0      0 xxx.108.37.77:80            221.7.131.131:4331          SYN_RECV
tcp        0      0 xxx.108.37.77:80            211.147.240.74:3077         SYN_RECV
tcp        0      0 xxx.108.37.77:80            210.76.66.35:1524           SYN_RECV
tcp        0      0 xxx.108.37.77:80            61.138.254.47:29233         SYN_RECV
tcp        0      0 xxx.108.37.77:80            59.35.87.21:11079           SYN_RECV
tcp        0      0 xxx.108.37.77:80            219.242.196.109:2260        SYN_RECV
tcp        0      0 xxx.108.37.77:80            220.205.30.17:1518          SYN_RECV
tcp        0      0 xxx.108.37.77:80            58.62.84.52:4315            SYN_RECV
tcp        0      0 xxx.108.37.77:80            221.216.74.228:4444         SYN_RECV
tcp        0      0 xxx.108.37.77:80            222.169.187.100:3528        SYN_RECV
tcp        0      0 xxx.108.37.77:80            211.92.156.131:1207         SYN_RECV
tcp        0      0 xxx.108.37.77:80            222.209.217.21:38807        SYN_RECV
tcp        0      0 xxx.108.37.77:80            219.238.191.17:17679        SYN_RECV
tcp        0      0 xxx.108.37.77:80            61.172.137.186:4075         SYN_RECV
tcp        0      0 xxx.108.37.77:80            211.92.156.131:1212         SYN_RECV
tcp        0      0 xxx.108.37.77:80            221.214.13.109:17600        SYN_RECV
tcp        0      0 xxx.108.37.77:80            218.23.149.98:46060         SYN_RECV
tcp        0      0 xxx.108.37.77:80            221.222.116.178:1658        SYN_RECV
tcp        0      0 xxx.108.37.77:80            xxx.103.215.242:62164       SYN_RECV
tcp        0      0 xxx.108.37.77:80            222.173.191.91:1143         SYN_RECV
tcp        0      0 xxx.108.37.77:80            61.242.112.118:59158        SYN_RECV
tcp        0      0 xxx.108.37.77:80            222.209.217.21:38551        SYN_RECV
tcp        0      0 xxx.108.37.77:80            219.129.164.182:62607       SYN_RECV
tcp        0      0 xxx.108.37.77:80            58.60.67.4:48095            SYN_RECV
tcp        0      0 xxx.108.37.77:80            211.92.156.131:1213         SYN_RECV
tcp        0      0 xxx.108.37.77:80            122.48.0.89:13694           SYN_RECV
tcp        0      0 xxx.108.37.77:80            218.87.71.62:2363           SYN_RECV
tcp        0      0 xxx.108.37.77:80            61.173.14.106:4444          SYN_RECV
tcp        0      0 xxx.108.37.77:80            220.234.241.178:1678        SYN_RECV
tcp        0      0 xxx.108.37.77:80            210.21.209.145:5554         SYN_RECV
tcp        0      0 xxx.108.37.77:80            122.48.0.89:13750           SYN_RECV
tcp        0      0 xxx.108.37.77:80            59.35.87.21:11080           SYN_RECV
tcp        0      0 xxx.108.37.77:80            61.172.137.186:4076         SYN_RECV
tcp        0      0 xxx.108.37.77:80            222.67.94.46:1434           SYN_RECV
tcp        0      0 xxx.108.37.77:80            122.48.0.89:13697           SYN_RECV
tcp        0      0 xxx.108.37.77:80            61.242.112.118:59160        SYN_RECV
tcp        0      0 xxx.108.37.77:80            58.35.242.69:23659          SYN_RECV
--More--warning, got duplicate tcp line.
warning, got duplicate tcp line.
tcp        0      0 xxx.108.37.77:80            61.181.71.50:3946           SYN_RECV
tcp        0      0 xxx.108.37.77:80            211.92.156.131:1209         SYN_RECV
tcp        0      0 xxx.108.37.77:80            60.63.11.185:22833          SYN_RECV
tcp        0      0 xxx.108.37.77:80            122.48.0.89:13744           SYN_RECV
tcp        0      0 xxx.108.37.77:80            222.137.152.197:4755        SYN_RECV
tcp        0      0 xxx.108.37.77:80            220.169.5.239:3676          SYN_RECV
tcp        0      0 127.0.0.1:631               0.0.0.0:*                   LISTEN
tcp        0      0 127.0.0.1:25                0.0.0.0:*                   LISTEN
tcp        0      0 xxx.108.37.77:80            221.6.163.131:35293         TIME_WAIT
tcp        0      0 xxx.108.37.77:80            221.198.127.25:1956         TIME_WAIT
tcp        0      0 xxx.108.37.77:80            60.2.106.52:58968           TIME_WAIT
tcp        0      0 xxx.108.37.77:80            221.222.144.24:1618         TIME_WAIT
tcp        0      0 xxx.108.37.77:80            211.158.132.14:1808         TIME_WAIT
tcp        0      0 xxx.108.37.77:80            218.89.188.200:60118        TIME_WAIT
tcp        0      0 xxx.108.37.77:80            221.0.180.142:58305         TIME_WAIT
tcp        0      0 xxx.108.37.77:80            61.51.223.223:1992          TIME_WAIT
tcp        0      0 xxx.108.37.77:80            61.170.213.11:14704         TIME_WAIT
tcp        0      0 xxx.108.37.77:80            219.136.26.220:3765         TIME_WAIT
tcp        0      0 xxx.108.37.77:80            211.158.81.195:4087         TIME_WAIT
tcp        0      0 xxx.108.37.77:80            218.71.200.192:1460         TIME_WAIT
tcp        0      0 xxx.108.37.77:80            221.5.152.50:37803          TIME_WAIT
tcp        0      0 xxx.108.37.77:80            210.21.232.236:33515        TIME_WAIT
tcp        0      0 xxx.108.37.77:80            221.198.127.25:1957         TIME_WAIT
tcp        0      0 xxx.108.37.77:80            221.15.17.77:1878           TIME_WAIT
tcp        0      0 xxx.108.37.77:80            221.1.244.62:1230           TIME_WAIT
tcp        0      0 xxx.108.37.77:80            61.155.209.139:2875         TIME_WAIT
tcp        1      1 xxx.108.37.77:80            60.186.14.190:13808         CLOSING
tcp        0      0 xxx.108.37.77:80            60.208.111.253:57016        TIME_WAIT
tcp        1      1 xxx.108.37.77:80            219.145.113.15:52970        CLOSING
tcp        0      0 xxx.108.37.77:80            61.170.213.11:14705         TIME_WAIT
tcp        1      1 xxx.108.37.77:80            xxx.106.180.254:2266        CLOSING
tcp        1      1 xxx.108.37.77:80            222.68.248.150:7632         CLOSING
tcp        0      0 xxx.108.37.77:80            60.17.17.149:62063          TIME_WAIT
tcp        0      0 xxx.108.37.77:80            60.0.16.66:3159             TIME_WAIT
tcp        0      0 xxx.108.37.77:80            58.101.33.97:3616           TIME_WAIT
tcp        0      0 xxx.108.37.77:80            125.32.0.114:1367           TIME_WAIT
tcp        0      0 xxx.108.37.77:80            125.33.217.83:3502          TIME_WAIT
tcp        0      0 xxx.108.37.77:80            71.135.63.37:61595          TIME_WAIT
tcp        0      0 xxx.108.37.77:80            218.26.227.9:31478          TIME_WAIT
tcp        0      0 xxx.108.37.77:80            220.161.163.113:1141        TIME_WAIT
tcp        0      0 xxx.108.37.77:80            221.1.244.62:1229           TIME_WAIT
tcp        0      0 xxx.108.37.77:80            60.2.106.52:58970           TIME_WAIT
tcp        0      0 xxx.108.37.77:80            221.15.17.77:1877           TIME_WAIT
tcp        0      0 xxx.108.37.77:80            222.64.14.150:64545         TIME_WAIT
tcp        0      0 xxx.108.37.77:80            222.67.166.24:2299          TIME_WAIT
tcp        0      0 xxx.108.37.77:80            61.51.223.223:1994          TIME_WAIT
--More--warning, got duplicate tcp line.
warning, got duplicate tcp line.
tcp        0      0 xxx.108.37.77:80            61.155.18.18:56733          TIME_WAIT
tcp        0      0 xxx.108.37.77:80            60.15.21.5:12313            TIME_WAIT
tcp        0      0 xxx.108.37.77:80            58.33.225.117:2739          TIME_WAIT
tcp        0      0 xxx.108.37.77:80            61.167.60.224:58498         TIME_WAIT
tcp        0      0 xxx.108.37.77:80            58.60.5.33:7454             TIME_WAIT
tcp        0    243 xxx.108.37.77:80            218.71.200.192:1718         FIN_WAIT1
tcp        0      0 xxx.108.37.77:80            123.49.164.148:55288        TIME_WAIT
tcp        0      0 xxx.108.37.77:80            125.33.217.83:3501          TIME_WAIT
tcp        0      0 xxx.108.37.77:80            221.214.13.109:47024        TIME_WAIT
tcp        0      0 xxx.108.37.77:80            221.218.117.87:17919        TIME_WAIT
tcp        0      0 xxx.108.37.77:80            218.108.44.10:21325         TIME_WAIT
tcp        0      0 xxx.108.37.77:80            124.200.18.190:3484         TIME_WAIT
tcp        0    300 xxx.108.37.77:80            221.218.117.87:26879        FIN_WAIT1
tcp        0    567 xxx.108.37.77:80            221.218.117.87:27135        FIN_WAIT1
tcp        0      0 xxx.108.37.77:80            218.26.227.9:31479          TIME_WAIT
tcp        0      0 xxx.108.37.77:80            220.161.163.113:1140        TIME_WAIT
tcp        0      0 xxx.108.37.77:80            221.222.144.24:1617         TIME_WAIT
tcp        0      0 xxx.108.37.77:80            60.2.106.52:58971           TIME_WAIT
tcp        0      0 xxx.108.37.77:80            221.5.181.31:3752           TIME_WAIT
tcp        0      0 xxx.108.37.77:80            61.49.166.124:62700         TIME_WAIT
tcp        0      0 xxx.108.37.77:80            221.0.180.142:58306         TIME_WAIT
tcp        0      0 xxx.108.37.77:80            61.51.223.223:1995          TIME_WAIT
tcp        0      0 xxx.108.37.77:80            60.208.111.253:57018        TIME_WAIT
tcp        0      0 xxx.108.37.77:80            xxx.111.152.6:56054         TIME_WAIT
tcp        0      0 xxx.108.37.77:80            58.33.225.117:2738          TIME_WAIT
tcp        0      0 xxx.108.37.77:80            61.181.245.85:25618         TIME_WAIT
tcp        0      0 xxx.108.37.77:80            123.49.164.148:55289        TIME_WAIT
tcp        1      1 xxx.108.37.77:80            125.92.214.174:6619         CLOSING
tcp        0      0 xxx.108.37.77:80            124.200.18.190:3483         TIME_WAIT
tcp        0      0 xxx.108.37.77:80            60.166.100.106:48296        TIME_WAIT
tcp        0      0 xxx.108.37.77:80            222.223.6.176:40086         ESTABLISHED
tcp        0      0 xxx.108.37.77:80            124.42.126.70:55826         TIME_WAIT
tcp        0      0 xxx.108.37.77:80            221.226.242.212:64827       TIME_WAIT
tcp        0      0 xxx.108.37.77:80            221.198.127.25:1952         TIME_WAIT
tcp        0      0 xxx.108.37.77:80            222.64.14.150:64295         TIME_WAIT
tcp        0      0 xxx.108.37.77:80            xxx.106.113.61:6182         TIME_WAIT
tcp        0      0 xxx.108.37.77:80            221.229.12.88:4273          TIME_WAIT
tcp        0      0 xxx.108.37.77:80            61.49.166.124:62699         TIME_WAIT
tcp        0      0 xxx.108.37.77:80            221.8.9.91:5210             TIME_WAIT
tcp        0      0 xxx.108.37.77:80            210.21.196.86:11655         TIME_WAIT
tcp        0      0 xxx.108.37.77:80            218.87.255.183:3552         TIME_WAIT
tcp        0      0 xxx.108.37.77:80            61.181.250.190:32782        TIME_WAIT
tcp        0      0 xxx.108.37.77:80            125.33.217.83:3499          TIME_WAIT

wyezl 回复于：2006-08-22 15:12:21

引用：原帖由 nuclearweapon 于 2006-8-22 15:04 发表
你打开tcp_syncookies
试试。
再做测试

怎么打开？

思一克回复于：2006-08-22 15:14:30

wyezl,

攻击不会影响FD。FD是已经建立的连接没close.

思一克回复于：2006-08-22 15:15:40

这个帖子可以做为精华了。因为问题微妙又棘手，有普遍性，尤其对与EPOLL

nuclearweapon 回复于：2006-08-22 15:17:25

如果一直有大量的SYN_RECV就有可能是攻击。

echo 1 > /proc/sys/net/ipv4/tcp_syncookies
可以打开。

下边还有几个都是对付 ddos的。太具体要自己查查。

wyezl 回复于：2006-08-22 15:23:30

引用：原帖由思一克于 2006-8-22 15:14 发表
wyezl,

攻击不会影响FD。FD是已经建立的连接没close.

我暂时的方法是描述符用完了，就退出。
由监视程序定时检查和重启。
这不知道是不是epoll自身的缺陷。

nuclearweapon 回复于：2006-08-22 15:25:25

引用：原帖由思一克于 2006-8-22 15:14 发表
wyezl,

攻击不会影响FD。FD是已经建立的连接没close.

我记得和斑竹有一些出入。
因为：
只要有socket就会有一个inode，也就会有一个fd。而不是连接以后才会有fd的。
对于linux2。6来说
每当在内核创建完struct socket后，就会调用sock_map_fd()在调用进程创建一个fd。

如果有SYN_RECV状态，也就有了struct socket，也就有了inode，进而有了fd。
希望指正！

[ 本帖最后由 nuclearweapon 于 2006-8-22 15:27 编辑 ]

思一克回复于：2006-08-22 15:26:40

应该不是epoll本身的问题。epoll采样事件，如果事件没有来它也无能为力。timeout是必须的

wyezl 回复于：2006-08-22 15:27:28

引用：原帖由思一克于 2006-8-22 15:15 发表
这个帖子可以做为精华了。因为问题微妙又棘手，有普遍性，尤其对与EPOLL

如果能讨论出一个成熟的，通用的epoll模型。加精华也不亏。呵呵。
我继续线上测试。

uname -a
Linux xxx.com.cn 2.6.9-34.EL #1 Wed Mar 8 00:07:35 CST 2006 i686 i686 i386 GNU/Linux
[finance@sina src]$ cat /etc/issue
CentOS release 4.1 (Final)
Kernel \r on an \m

思一克回复于：2006-08-22 15:30:35

TO nuclearweapon，

你说的对。但他的问题是accept之后的fd被耗尽了，那就是有连接的socket.

如果那些DDOS攻击，往往是半连接（部分IP包），accept不了。

引用：原帖由 nuclearweapon 于 2006-8-22 15:25 发表

我记得和斑竹有一些出入。
因为：
只要有socket就会有一个inode，也就会有一个fd。而不是连接以后才会有fd的。
对于linux2。6来说
每当在内核创建完struct socket后，就会调用sock_map_fd()在调用进程创建 ...

nuclearweapon 回复于：2006-08-22 15:35:28

引用：原帖由思一克于 2006-8-22 15:30 发表
TO nuclearweapon，

你说的对。但他的问题是accept之后的fd被耗尽了，那就是有连接的socket.

如果那些DDOS攻击，往往是半连接（部分IP包），accept不了。

就是这半连接的socket把fd用完了。
因为只有有了socket才可能有sycrcv状态。

思一克回复于：2006-08-22 15:42:47

To nuclearweapon,

也有可能是攻击引起的。让他继续实验。
半连接accept能成功返回fd吗？我不是十分肯定。

nuclearweapon 回复于：2006-08-22 15:45:33

对于半连接来说accpet是不能返回了，但是在内核中fd已经建立起来了，也就消耗了一个进程的可用fd数量。

wyezl 回复于：2006-08-22 15:50:57

引用：原帖由思一克于 2006-8-22 15:42 发表
To nuclearweapon,

也有可能是攻击引起的。让他继续实验。
半连接accept能成功返回fd吗？我不是十分肯定。

现在访问量下降了。
SYN_RECV 也下降到了只有20~30个。

恶意攻击的可能性比较小。
是不是由于我的次序造成的？

思一克回复于：2006-08-22 15:52:56

TO nuclearweapon,

你肯定没有建立的连接，没有accept的也消耗KERNEL中的fd吗？我不SURE

wyezl 回复于：2006-08-22 15:54:38

引用：原帖由 nuclearweapon 于 2006-8-22 15:45 发表
对于半连接来说accpet是不能返回了，但是在内核中fd已经建立起来了，也就消耗了一个进程的可用fd数量。

我可以统计一段时间内， accpet的总数，和close的总数。看他们的差是不是等于。
/etc/pid/fd 下面的fd数。

思一克回复于：2006-08-22 16:00:55

TO nuclearweapon，

我刚才看了KERNEL代码，看到fd消耗只有3个函数socket() , accept(), socketpair(), 而且都是成功后才消耗fd. 网络程序其它任何地方没有看到用fd的？

半连接能消耗fd吗，我不是很清楚。如果你清楚就写出来。

谢谢

引用：原帖由 nuclearweapon 于 2006-8-22 15:45 发表
对于半连接来说accpet是不能返回了，但是在内核中fd已经建立起来了，也就消耗了一个进程的可用fd数量。

wyezl 回复于：2006-08-22 16:05:40

现在访问量比较少的时候。fd随时间变化情况。当然是重新启动之后测试的。这样看起来基本没问题。
压力上来后就漫漫变了。

[root@xxx ~]# ls /proc/28305/fd/
0   10  12  14  16  18  2   21  23  25  28  4   6   8
1   11  13  15  17  19  20  22  24  27  3   5   7   9
[root@xxx ~]# ls /proc/28305/fd/
0   10  12  14  16  18  2   21  23  25  28  4   6   8
1   11  13  15  17  19  20  22  24  27  3   5   7   9
[root@xxx ~]# ls /proc/28305/fd/
0   1   10  11  12  13  14  15  16  17  18  19  2   20  21  22  24  3   4   5   6   7   8   9
[root@xxx ~]# ls /proc/28305/fd/
0   1   11  12  13  14  15  16  17  18  2   20  21  24  3   4   5   6   8   9
[root@xxx ~]# ls /proc/28305/fd/
0   1   10  11  12  13  14  15  16  17  18  2   20  21  24  3   4   5   6   8   9
[root@xxx ~]# ls /proc/28305/fd/
0   1   10  11  12  13  14  15  16  17  18  2   20  21  24  3   4   5   6
[root@xxx ~]# ls /proc/28305/fd/
0   1   10  11  12  13  14  15  16  17  18  2   24  3   4   5   6   7   8   9
[root@xxx ~]# ls /proc/28305/fd/
0   1   10  11  12  13  14  15  18  2   3   4   5   6   7   8   9
[root@xxx ~]# ls /proc/28305/fd/
0   1   10  11  12  13  14  15  16  18  2   3   4   5   6   7   8
[root@xxx ~]# ls /proc/28305/fd/
0   1   10  11  13  14  15  16  17  18  2   3   4   5   6   7   8   9
[root@xxx ~]# ls /proc/28305/fd/
0   1   10  11  12  13  14  15  16  17  18  19  2   3   4   5   6   7   8   9
[root@xxx ~]# ls /proc/28305/fd/
0   1   10  11  12  13  14  15  16  17  2   3   4   5   6   7   8   9
[root@xxx ~]# ls /proc/28305/fd/
0   1   11  12  13  14  15  17  19  2   3   4   5   6   7   8   9
[root@xxx ~]# ls /proc/28305/fd/
0   1   10  11  12  13  14  15  17  19  2   3   4   5   6   7   8   9
[root@xxx ~]# ls /proc/28305/fd/
0   1   10  11  13  14  15  17  19  2   3   4   5   6   7   9
[root@xxx ~]# ls /proc/28305/fd/
0   1   10  11  13  14  15  16  17  19  2   3   4   5   6   7   9
[root@xxx ~]# ls /proc/28305/fd/
0   1   11  12  13  14  15  16  17  19  2   3   4   5   6   9
[root@xxx ~]# ls /proc/28305/fd/
0   1   10  11  13  14  15  16  17  18  19  2   20  21  3   4   5   6   7   8   9
[root@xxx ~]# ls /proc/28305/fd/
0   1   10  11  13  14  15  16  17  18  19  2   20  21  3   4   5   6   7   8   9
[root@xxx ~]# ls /proc/28305/fd/
0   1   10  11  12  13  14  15  16  17  18  19  2   20  21  3   4   5   6   7   8   9
[root@xxx ~]# ls /proc/28305/fd/
0   1   10  12  13  14  15  17  18  19  2   20  21  3   4   5   6   7
[root@xxx ~]# ls /proc/28305/fd/
0   1   10  12  13  14  15  17  18  19  2   20  21  3   4   5   6   7   8
[root@xxx ~]# ls /proc/28305/fd/
0   1   10  12  13  14  15  18  2   20  21  3   4   5   6   7   8
[root@xxx ~]# ls /proc/28305/fd/
0   1   10  11  12  13  14  15  2   21  3   4   5   6   7   8
[root@xxx ~]# ls /proc/28305/fd/
0   1   10  12  13  14  15  2   3   4   5   6   7   8
[root@xxx ~]# ls /proc/28305/fd/
0   1   10  12  13  14  15  2   3   4   5   6   7   8
[root@xxx ~]# ls /proc/28305/fd/
0   1   10  11  12  13  14  15  2   3   4   5   6   7   8   9
[root@xxx ~]# ls /proc/28305/fd/
0   1   10  11  13  14  15  2   3   4   5   6   7   8   9
[root@xxx ~]# ls /proc/28305/fd/
0   1   10  11  14  15  2   3   4   5   6   7   8
[root@xxx ~]# ls /proc/28305/fd/
0   1   10  11  13  14  15  2   3   4   5   6   7   8
[root@xxx ~]# ls /proc/28305/fd/
0   1   10  11  14  15  2   3   4   5   6   8
[root@xxx ~]# ls /proc/28305/fd/
0   1   10  11  12  14  15  2   3   4   5   6   7   8   9
[root@xxx ~]# ls /proc/28305/fd/
0   1   11  14  15  2   3   4   5   6   7   8

[ 本帖最后由 wyezl 于 2006-8-22 16:09 编辑 ]

思一克回复于：2006-08-22 16:09:52

TO LZ,

我的服务器也有许多攻击 SYN_RECV. 但基本不影响，更没有fd没有了的情况。

wyezl 回复于：2006-08-22 16:22:51

是在线上测试吗？一小时能达到100万请求数的时候就能看出来了。

用测试工具好象不行。测试不错破绽。
我现在访问量也下降了，不好测了。

思一克回复于：2006-08-22 16:26:09

to LZ,

我试图在我一个SERVER上跑80（有IP），遗憾的是无法编译，因为EPOLL支持问题

wyezl 回复于：2006-08-22 16:32:42

引用：原帖由思一克于 2006-8-22 16:26 发表
to LZ,

我试图在我一个SERVER上跑80（有IP），遗憾的是无法编译，因为EPOLL支持问题

谢谢热心的斑竹，耽误了你不少时间。
这个没有环境不好测。
我再自己想想办法。
不知道有没有什么可以跟踪一个描述符使用的。

刚才模拟100万个请求，只丢了一个描述符。模拟很难发现问题。
肯定是跟网络状态有关。

[ 本帖最后由 wyezl 于 2006-8-22 16:40 编辑 ]

思一克回复于：2006-08-22 16:45:47

不用谢。仅仅是我觉得你这个程序的完善有意义。我也从中学习了。

nuclearweapon 回复于：2006-08-22 16:50:09

引用：原帖由思一克于 2006-8-22 16:00 发表
TO nuclearweapon，

我刚才看了KERNEL代码，看到fd消耗只有3个函数socket() , accept(), socketpair(), 而且都是成功后才消耗fd. 网络程序其它任何地方没有看到用fd的？

半连接能消耗fd吗，我不是很清楚。如 ...

按照我的理解，netstat都看到了socket的状态所以就应当有 struct socket了。

可能是我错了。
我也在看代码。查找原因

多谢指教！

tysn 回复于：2006-08-22 16:55:50

可不可以不用分成
if(events[ i].events & EPOLLIN)
和
else if(events[ i].events & EPOLLOUT)
两种情况？

只要有事件来就进行读写，
当然要对读写cfd的函数进行出错判断，一旦出错就close(cfd)
这样就能避免EPOLLN之后EPOLLOUT永不再来的情况？
引用：原帖由思一克于 2006-8-22 14:07 发表
你的程序我看了。好象是有漏洞，而且是必须在大量连接，慢速的断线才可疑的。

一个fd有事件EPOLLIN后，如果断线，EPOLLOUT永不再来，你的fd不就永远不被关闭了吗？

请讨论。

前提是假定产生EPOLLN事件(可读)的socket一定也可写

[ 本帖最后由 tysn 于 2006-8-22 16:59 编辑 ]

思一克回复于：2006-08-22 16:56:45

TO nuclearweapon，

我也不是多肯定，所以谈不到指教。你太客气。

fd好象是最后需要（仅仅是一个下标），而且都是本地（本问题中是SERVER）的APP的直接调用SOCK函数才可以产生。CLIENT要想在本机上产生fd, accept要成功

wyezl 回复于：2006-08-22 17:06:47

通过观察知道
lrwx------  1 root root 64 Aug 22 16:25 480 -> socket:[34370805]
480这个描述符是死掉。不能回收了。我怎么看它的状态？

# ls -l  /proc/28477/fd/
total 20
lrwx------  1 root root 64 Aug 22 16:14 0 -> /dev/pts/3
lrwx------  1 root root 64 Aug 22 16:14 1 -> /dev/pts/3
lrwx------  1 root root 64 Aug 22 16:14 10 -> socket:[34584054]
lrwx------  1 root root 64 Aug 22 16:14 11 -> socket:[34584140]
lrwx------  1 root root 64 Aug 22 16:14 12 -> socket:[32509524]
lrwx------  1 root root 64 Aug 22 16:14 13 -> /usr/home/fi/src/home/ww
lrwx------  1 root root 64 Aug 22 16:14 14 -> socket:[34584144]
lrwx------  1 root root 64 Aug 22 16:14 15 -> socket:[34584145]
lrwx------  1 root root 64 Aug 22 16:14 16 -> socket:[34584147]
lrwx------  1 root root 64 Aug 22 16:14 17 -> socket:[34584152]
lrwx------  1 root root 64 Aug 22 16:14 2 -> /dev/pts/3
lrwx------  1 root root 64 Aug 22 16:16 23 -> socket:[34543624]
lr-x------  1 root root 64 Aug 22 16:14 3 -> eventpoll:[32224366]
lrwx------  1 root root 64 Aug 22 16:14 4 -> socket:[32224367]
lrwx------  1 root root 64 Aug 22 16:25 480 -> socket:[34370805]
lr-x------  1 root root 64 Aug 22 16:14 5 -> /usr/home/fi/src/home/ww
lrwx------  1 root root 64 Aug 22 16:36 6 -> socket:[34581817]
lrwx------  1 root root 64 Aug 22 16:14 7 -> socket:[33195785]
lrwx------  1 root root 64 Aug 22 16:36 8 -> socket:[34584130]
lrwx------  1 root root 64 Aug 22 16:14 9 -> socket:[34583662]

wyezl 回复于：2006-08-22 17:10:11

引用：原帖由 tysn 于 2006-8-22 16:55 发表
可不可以不用分成
if(events[ i].events & EPOLLIN)
和
else if(events[ i].events & EPOLLOUT)
两种情况？

只要有事件来就进行读写，
当然要对读写cfd的函数进行出错判断，一旦出错就close(cfd) ...

这个效率有点低。
因为可能会产生等待可写。

playmud 回复于：2006-08-22 17:35:00

引用：原帖由 wyezl 于 2006-8-22 14:32 发表

这个32与5000个同时在线并不矛盾。

又不是说1秒内把这5000个连线全接受进来。
5000是我的epoll所监视的最大描述符个数。

怎么不矛盾了？
你是if ((fd = socket(AF_INET, SOCK_STREAM, 0)) <= 0)
你把listen设成5000看看。

wyezl 回复于：2006-08-22 17:43:33

引用：原帖由 playmud 于 2006-8-22 17:35 发表

怎么不矛盾了？
你是if ((fd = socket(AF_INET, SOCK_STREAM, 0)) <= 0)
你把listen设成5000看看。

5000以内的描述符我都能接受到。

设那个没什么影响。估计listen也不能支持那么大的。

playmud 回复于：2006-08-22 17:49:59

tcp正常的断开需要3次或者4次握手确认，如果没有这个确认他就会保持一定的时间。
/proc/sys/net/ipv4/tcp_keepalive_time

playmud 回复于：2006-08-22 17:50:54

引用：原帖由 wyezl 于 2006-8-22 17:43 发表

5000以内的描述符我都能接受到。

设那个没什么影响。估计listen也不能支持那么大的。

你能接收到和你能处理掉不是一个概念。

playmud 回复于：2006-08-22 17:52:18

实在找不到原因，你可以把那个默认超时时间设成10几秒或者几十秒。
系统帮你释放掉占用的资源。

wyezl 回复于：2006-08-22 17:53:34

引用：原帖由 playmud 于 2006-8-22 17:49 发表
tcp正常的断开需要3次或者4次握手确认，如果没有这个确认他就会保持一定的时间。
/proc/sys/net/ipv4/tcp_keepalive_time

我在服务器端处理完请求就close不行吗？

有的死掉的描述符再给它一个小时它也不会释放了，肯定是没close。

wyezl 回复于：2006-08-22 18:02:01

看了lighttpd代码。有了点思路。明天再继续修改。

#include <sys/types.h>

#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <errno.h>
#include <signal.h>
#include <fcntl.h>

#include "fdevent.h"
#include "settings.h"
#include "buffer.h"

#ifdef USE_LINUX_EPOLL
static void fdevent_linux_sysepoll_free(fdevents *ev) {
close(ev->epoll_fd);
free(ev->epoll_events);
}

static int fdevent_linux_sysepoll_event_del(fdevents *ev, int fde_ndx, int fd) {
struct epoll_event ep;

if (fde_ndx < 0) return -1;

memset(&ep, 0, sizeof(ep));

ep.data.fd = fd;
ep.data.ptr = NULL;

if (0 != epoll_ctl(ev->epoll_fd, EPOLL_CTL_DEL, fd, &ep)) {
fprintf(stderr, "%s.%d: epoll_ctl failed: %s, dying\n", __FILE__, __LINE__, strerror(errno));

SEGFAULT();

return 0;
}

return -1;
}

static int fdevent_linux_sysepoll_event_add(fdevents *ev, int fde_ndx, int fd, int events) {
struct epoll_event ep;
int add = 0;

if (fde_ndx == -1) add = 1;

memset(&ep, 0, sizeof(ep));

ep.events = 0;

if (events & FDEVENT_IN) ep.events |= EPOLLIN;
if (events & FDEVENT_OUT) ep.events |= EPOLLOUT;

/**
*
* with EPOLLET we don't get a FDEVENT_HUP
* if the close is delay after everything has
* sent.
*
*/

ep.events |= EPOLLERR | EPOLLHUP /* | EPOLLET */;

ep.data.ptr = NULL;
ep.data.fd = fd;

if (0 != epoll_ctl(ev->epoll_fd, add ? EPOLL_CTL_ADD : EPOLL_CTL_MOD, fd, &ep)) {
fprintf(stderr, "%s.%d: epoll_ctl failed: %s, dying\n", __FILE__, __LINE__, strerror(errno));

SEGFAULT();

return 0;
}

return fd;
}

static int fdevent_linux_sysepoll_poll(fdevents *ev, int timeout_ms) {
return epoll_wait(ev->epoll_fd, ev->epoll_events, ev->maxfds, timeout_ms);
}

static int fdevent_linux_sysepoll_event_get_revent(fdevents *ev, size_t ndx) {
int events = 0, e;

e = ev->epoll_events[ndx].events;
if (e & EPOLLIN) events |= FDEVENT_IN;
if (e & EPOLLOUT) events |= FDEVENT_OUT;
if (e & EPOLLERR) events |= FDEVENT_ERR;
if (e & EPOLLHUP) events |= FDEVENT_HUP;
if (e & EPOLLPRI) events |= FDEVENT_PRI;

return e;
}

static int fdevent_linux_sysepoll_event_get_fd(fdevents *ev, size_t ndx) {
# if 0
fprintf(stderr, "%s.%d: %d, %d\n", __FILE__, __LINE__, ndx, ev->epoll_events[ndx].data.fd);
# endif

return ev->epoll_events[ndx].data.fd;
}

static int fdevent_linux_sysepoll_event_next_fdndx(fdevents *ev, int ndx) {
size_t i;

UNUSED(ev);

i = (ndx < 0) ? 0 : ndx + 1;

return i;
}

int fdevent_linux_sysepoll_init(fdevents *ev) {
ev->type = FDEVENT_HANDLER_LINUX_SYSEPOLL;
#define SET(x) \
ev->x = fdevent_linux_sysepoll_##x;

SET(free);
SET(poll);

SET(event_del);
SET(event_add);

SET(event_next_fdndx);
SET(event_get_fd);
SET(event_get_revent);

if (-1 == (ev->epoll_fd = epoll_create(ev->maxfds))) {
fprintf(stderr, "%s.%d: epoll_create failed (%s), try to set server.event-handler = \"poll\" or \"select\"\n",
__FILE__, __LINE__, strerror(errno));

return -1;
}

if (-1 == fcntl(ev->epoll_fd, F_SETFD, FD_CLOEXEC)) {
fprintf(stderr, "%s.%d: epoll_create failed (%s), try to set server.event-handler = \"poll\" or \"select\"\n",
__FILE__, __LINE__, strerror(errno));

close(ev->epoll_fd);

return -1;
}

ev->epoll_events = malloc(ev->maxfds * sizeof(*ev->epoll_events));

return 0;
}

#else
int fdevent_linux_sysepoll_init(fdevents *ev) {
UNUSED(ev);

fprintf(stderr, "%s.%d: linux-sysepoll not supported, try to set server.event-handler = \"poll\" or \"select\"\n",
__FILE__, __LINE__);

return -1;
}
#endif

wyezl 回复于：2006-08-22 18:25:47

引用：原帖由 playmud 于 2006-8-22 17:52 发表
实在找不到原因，你可以把那个默认超时时间设成10几秒或者几十秒。
系统帮你释放掉占用的资源。

哪个默认的超时？

nuclearweapon 回复于：2006-08-22 22:36:10

引用：原帖由 wyezl 于 2006-8-22 17:06 发表
通过观察知道
lrwx------  1 root root 64 Aug 22 16:25 480 -> socket:[34370805]
480这个描述符是死掉。不能回收了。我怎么看它的状态？

# ls -l  /proc/28477/fd/
total 20
lrwx------  1 root  ...

34370805是inode号
cat /proc/net/tcp|grep 34370805

第2、3项为socket的adress：port

还可以:
netstat -a | grep _pid_
如果照你说只有一个的话就很好分析出是拿一个是那个死掉的。看看他的状态是什么

safedead 回复于：2006-08-22 23:15:43

我写的一个非常简单的epoll服务器,
监听从8000到12999共计5000个端口
预创建4096个线程
主线程用线程锁调度

具体代码见: http://www.cublog.cn/u/17999/showart.php?id=159057

wyezl 回复于：2006-08-23 09:48:09

引用：原帖由 nuclearweapon 于 2006-8-22 22:36 发表

34370805是inode号
cat /proc/net/tcp|grep 34370805

第2、3项为socket的adress：port

还可以:
netstat -a | grep _pid_
如果照你说只有一个的话就很好分析出是拿一个是那个死掉的。看看他的状 ...

死掉的fd。
lrwx------  1 root root 64 Aug 23 09:37 7 -> socket:[35074112]

#cat /proc/net/tcp|grep 35074112
17392: 4D256CCA:0050 4ACF18DA:0843 01 00000000:00000000 00:00000000 00000000   527        0 35074112 1 f4a71020 3000 0 0 2 -1

netstat -a | grep _pid

这个pid是谁的？
怎么看状态？

nuclearweapon 回复于：2006-08-23 12:00:27

引用：原帖由 wyezl 于 2006-8-23 09:48 发表

死掉的fd。
lrwx------ 1 root root 64 Aug 23 09:37 7 -> socket:[35074112]

#cat /proc/net/tcp|grep 35074112
17392: 4D256CCA:0050 4ACF18DA:0843 01 00000000:00000000 00:00000000 00000000 ...

实在对不起是我写错了
应该是
netstat -a| grep port

wyezl 回复于：2006-08-23 14:38:11

time_t cfds[MAXFDS];
time_t  now;
加了一个监视线程。每次accept的时候cfds[cfd]=time(NULL);
close(cfd)的时候 cfds[cfd]=0; 临时解决一下。

void *loop_check(void *p)
{
        int i;
        struct epoll_event ev;
        while(1)
        {
                time(&now);
                for(i=0;i<MAXFDS;i++)
                {
                        if(cfds!=0)
                                if(now-cfds>TIMEOUT)
                                {
                                        printf("cfd=%d timeout.\n",i);

                                        ev.data.fd = i;
                                        if(epoll_ctl(epfd, EPOLL_CTL_DEL, i, &ev)!=0)
                                                printf("can't del epoll.\n");

                                        if(close(i)==0)
                                                cfds=0;
                                        else
                                                printf("can't close cfds.\n");

                                }
                }
                sleep(SLEEPTIME);
        }
        return NULL;
}

60s检查一次。发现这样检查出来的还真不少。
./httpd
cfd=8 timeout.
cfd=11 timeout.
cfd=12 timeout.
cfd=17 timeout.
cfd=21 timeout.
cfd=30 timeout.
cfd=32 timeout.
cfd=51 timeout.
cfd=52 timeout.
cfd=59 timeout.
cfd=60 timeout.
cfd=71 timeout.
cfd=76 timeout.
cfd=7 timeout.
cfd=10 timeout.
cfd=14 timeout.
cfd=15 timeout.
cfd=16 timeout.
cfd=18 timeout.
cfd=20 timeout.
cfd=25 timeout.
cfd=33 timeout.
cfd=36 timeout.
cfd=53 timeout.
cfd=58 timeout.

思一克回复于：2006-08-23 14:42:07

原因看来就是我原来想的那样。

wyezl 回复于：2006-08-23 15:24:58

这样做效率低了一些。暂时也没好的办法。
如果用线程池效率也是会有影响。

猜测是，一些fd加入了epoll的监视中，但确没有任何事件触发。
所以就都死到里面了。

思一克回复于：2006-08-23 15:27:27

基本不太影响效率。

依赖事件驱动的程序要有TIMEOUT。否则万一有遗漏，坏情况就出现了

wyezl 回复于：2006-08-23 15:28:44

有个问题想问一下，client连接的时候发来的请求，我只读了一次，有可能没读完，因为我只需要一部分数据，剩余的数据憋在里面，我就把描述符关闭了，会不会有问题。

思一克回复于：2006-08-23 15:30:23

应该没有问题

wyezl 回复于：2006-08-23 15:31:32

引用：原帖由思一克于 2006-8-23 15:27 发表
基本不太影响效率。

依赖事件驱动的程序要有TIMEOUT。否则万一有遗漏，坏情况就出现了

以前一秒能处理1。1万请求，现在只剩1.0万了。赶上检查的时候还不到。

等待以后有人能完善。
不知道是不是我操作系统的原因。
centos

[ 本帖最后由 wyezl 于 2006-8-23 15:37 编辑 ]

思一克回复于：2006-08-23 15:39:58

10%了

思一克回复于：2006-08-23 16:52:33

你的监视县城有问题，消耗大。

从1找到当前有的最大FD，而你MAXFD。
还有未必用一个县城呀，在accept循环处每次监视一部分，是不是能节省？

县城====THREAD

safedead 回复于：2006-08-23 19:10:49

你有没有CLOSE_WAIT状态的套接字?
默认设置下系统要12小时才释放CLOSE_WAIT状态的描述字
需要设置SO_LINGER把CLOSE_WAIT超时改小, 我一般是设在60秒,
但不要设成0, 那样对客户端非常不友好(0表示不用FIN终止连接而是用RST强行释放)
另外要设置ulimits -n 把进程能够打开的描述字上限提高, 默认是1024, 我一般设到16384甚至65536
我那个小程序仅仅LISTEN就占用5000个描述字, 连接到达线程池上限则占用9000多

epoll我认为只用于LISTEN套接字肯定没问题

playmud 回复于：2006-08-24 08:42:12

建议搂住还是找出来世纪的原因，这样对大家避免类似的错误都有好处。
短时间如果连接太多的话还需要增加arp表的大小，默认是1024

wyezl 回复于：2006-08-24 11:06:48

引用：原帖由 safedead 于 2006-8-23 19:10 发表
你有没有CLOSE_WAIT状态的套接字?
默认设置下系统要12小时才释放CLOSE_WAIT状态的描述字
需要设置SO_LINGER把CLOSE_WAIT超时改小, 我一般是设在60秒,
但不要设成0, 那样对客户端非常不友好(0表示不用FIN终止连 ...

很早以前就看过你写的文章，不过是被转载到其他地方去的，想必一定有不少编写高性能server的经验了。

netstat -an | grep CLOSE_WAIT 未发现结果。
我是用toot启动，然后setuid 到别的用户的。所以可以直接在程序中直接设置MAXFDS，不需要使用ulimit。我的应用属于请求比较繁忙，处理时间非常短，单个请求数据量小，的情况。对我来说开多个线程效率可能会降低不少。

epoll的最大长处在于监视大量在线用户，也就是长连接，少量活跃用户的情况。

如果只用于LISTEN套接字，而不去监视其它fd的事件，好象发挥不出它的优势吧？

nuclearweapon 回复于：2006-08-24 20:30:11

to wyezl
请问，你做的测试有超过2个小时的吗？
或者说你对比过时间t1时候的fd个数和（t1＋2小时）后的发的个数吗？

wyezl 回复于：2006-08-25 09:39:29

引用：原帖由 nuclearweapon 于 2006-8-24 20:30 发表
to wyezl
请问，你做的测试有超过2个小时的吗？
或者说你对比过时间t1时候的fd个数和（t1＋2小时）后的发的个数吗？

有超过一天的。
访问量小的时候描述符基本不损耗。
访问两大的时候。程增长趋势。会耗尽。

思一克回复于：2006-08-25 09:56:24

LZ, 可以将标题改一下？

比如叫
epoll的使用及其程序XXX泄露问题探讨

等。这样便于查找问题

wyezl 回复于：2006-08-25 10:19:24

标题已修改。
epoll都出来很久了，难道这个问题以前都没人遇到过？

wyezl 回复于：2006-08-25 14:08:51

man epoll

其中关于EAGAIN 的部分我难以理解，谁能指点一下？

NAME
       epoll - I/O event notification facility

SYNOPSIS
       #include <sys/epoll.h>

DESCRIPTION
       epoll is a variant of poll(2) that can be used either as Edge or Level Triggered interface and scales
       well to large numbers of watched fds. Three system calls are provided to set up and control an  epoll
       set: epoll_create(2), epoll_ctl(2), epoll_wait(2).

       An epoll set is connected to a file descriptor created by epoll_create(2).  Interest for certain file
       descriptors  is  then  registered  via  epoll_ctl(2).   Finally,  the  actual  wait  is  started   by
       epoll_wait(2).

NOTES
       The  epoll  event  distribution  interface  is able to behave both as Edge Triggered ( ET ) and Level
       Triggered ( LT ). The difference between ET and LT event distribution mechanism can be  described  as
       follows. Suppose that this scenario happens :

       1      The  file descriptor that represents the read side of a pipe ( RFD ) is added inside the epoll
              device.

       2      Pipe writer writes 2Kb of data on the write side of the pipe.

       3      A call to epoll_wait(2) is done that will return RFD as ready file descriptor.

       4      The pipe reader reads 1Kb of data from RFD.

       5      A call to epoll_wait(2) is done.

       If the RFD file descriptor has been added to the epoll interface using the EPOLLET flag, the call  to
       epoll_wait(2)  done  in  step 5 will probably hang because of the available data still present in the
       file input buffers and the remote peer might be expecting a response based on  the  data  it  already
       sent.  The reason for this is that Edge Triggered event distribution delivers events only when events
       happens on the monitored file.  So, in step 5 the caller might end up waiting for some data  that  is
       already  present  inside  the  input  buffer. In the above example, an event on RFD will be generated
       because of the write done in 2 , and the event is consumed in 3.  Since the read operation done in  4
       does  not  consume the whole buffer data, the call to epoll_wait(2) done in step 5 might lock indefi-
       nitely. The epoll interface, when used with the EPOLLET flag ( Edge Triggered ) should use non-block-
       ing file descriptors to avoid having a blocking read or write starve the task that is handling multi-
       ple file descriptors.  The suggested way to use epoll as an Edge Triggered ( EPOLLET )  interface  is
       below, and possible pitfalls to avoid follow.

              i      with non-blocking file descriptors

              ii     by going to wait for an event only after read(2) or write(2) return EAGAIN

       On  the  contrary,  when used as a Level Triggered interface, epoll is by all means a faster poll(2),
       and can be used wherever the latter is used since it shares the same semantics. Since even  with  the
       Edge  Triggered epoll multiple events can be generated up on receival of multiple chunks of data, the
       caller has the option to specify the EPOLLONESHOT flag, to tell epoll to disable the associated  file
       descriptor  after  the receival of an event with epoll_wait(2).  When the EPOLLONESHOT flag is speci-
       fied, it is caller responsibility to rearm the file descriptor using epoll_ctl(2) with EPOLL_CTL_MOD.

EXAMPLE FOR SUGGESTED USAGE
       While  the usage of epoll when employed like a Level Triggered interface does have the same semantics
       of poll(2), an Edge Triggered usage requires more clarifiction to avoid  stalls  in  the  application
       event  loop.  In  this example, listener is a non-blocking socket on which listen(2) has been called.
       The function do_use_fd() uses the new ready file  descriptor  until  EAGAIN  is  returned  by  either
       read(2) or write(2).  An event driven state machine application should, after having received EAGAIN,
       record its current state so that at the next call to do_use_fd()  it  will  continue  to  read(2)  or
       write(2) from where it stopped before.

       struct epoll_event ev, *events;

       for(;;) {
           nfds = epoll_wait(kdpfd, events, maxevents, -1);

           for(n = 0; n < nfds; ++n) {
               if(events[n].data.fd == listener) {
                   client = accept(listener, (struct sockaddr *) &local,
                                   &addrlen);
                   if(client < 0){
                       perror("accept");
                       continue;
                   }
                   setnonblocking(client);
                   ev.events = EPOLLIN | EPOLLET;
                   ev.data.fd = client;
                   if (epoll_ctl(kdpfd, EPOLL_CTL_ADD, client, &ev) < 0) {
                       fprintf(stderr, "epoll set insertion error: fd=%d0,
                               client);
                       return -1;
                   }
               }
               else
                   do_use_fd(events[n].data.fd);
           }
       }

       When  used  as  an  Edge triggered interface, for performance reasons, it is possible to add the file
       descriptor inside the epoll interface ( EPOLL_CTL_ADD ) once by specifying ( EPOLLIN|EPOLLOUT ). This
       allows  you  to  avoid  continuously switching between EPOLLIN and EPOLLOUT calling epoll_ctl(2) with
       EPOLL_CTL_MOD.

QUESTIONS AND ANSWERS (from linux-kernel)
              Q1     What happens if you add the same fd to an epoll_set twice?

              A1     You will probably get EEXIST. However, it is possible that two threads may add the same
                     fd twice. This is a harmless condition.

              Q2     Can  two epoll sets wait for the same fd? If so, are events reported to both epoll sets
                     fds?

              A2     Yes. However, it is not recommended. Yes it would be reported to both.

              Q3     Is the epoll fd itself poll/epoll/selectable?

              A3     Yes.

              Q4     What happens if the epoll fd is put into its own fd set?

              A4     It will fail. However, you can add an epoll fd inside another epoll fd set.

              Q5     Can I send the epoll fd over a unix-socket to another process?

              A5     No.

              Q6     Will the close of an fd cause it to be removed from all epoll sets automatically?

              A6     Yes.

              Q7     If more than one event comes in between  epoll_wait(2)  calls,  are  they  combined  or
                     reported separately?

              A7     They will be combined.

              Q8     Does an operation on an fd affect the already collected but not yet reported events?

              A8     You can do two operations on an existing fd. Remove would be meaningless for this case.
                     Modify will re-read available I/O.

              Q9     Do I need to continuously read/write an fd until EAGAIN when using the EPOLLET  flag  (
                     Edge Triggered behaviour ) ?

              A9     No  you  donâ?™t.  Receiving  an event from epoll_wait(2) should suggest to you that such
                     file descriptor is ready for the requested I/O operation. You have simply  to  consider
                     it  ready  until  you will receive the next EAGAIN. When and how you will use such file
                     descriptor is entirely up to you. Also, the condition that the read/write I/O space  is
                     exhausted  can be detected by checking the amount of data read/write from/to the target
                     file descriptor. For example, if you call read(2) by asking to read a certain amount of
                     data and read(2) returns a lower number of bytes, you can be sure to have exhausted the
                     read I/O space for such file descriptor. Same is valid when writing using the  write(2)
                     function.

POSSIBLE PITFALLS AND WAYS TO AVOID THEM
              o Starvation ( Edge Triggered )

              If  there  is a large amount of I/O space, it is possible that by trying to drain it the other
              files will not get processed causing starvation. This is not specific to epoll.

              The solution is to maintain a ready list and mark the file descriptor as ready in its  associ-
              ated  data structure, thereby allowing the application to remember which files need to be pro-
              cessed but still round robin amongst all the ready files. This also supports  ignoring  subse-
              quent events you receive for fdâ?™s that are already ready.

              o If using an event cache...

              If you use an event cache or store all the fdâ?™s returned from epoll_wait(2), then make sure to
              provide a way to mark its closure dynamically (ie- caused by a previous  eventâ?™s  processing).
              Suppose  you receive 100 events from epoll_wait(2), and in eventi #47 a condition causes event
              #13 to be closed.  If you remove the structure and close() the fd for  event  #13,  then  your
              event cache might still say there are events waiting for that fd causing confusion.

              One  solution for this is to call, during the processing of event 47, epoll_ctl(EPOLL_CTL_DEL)
              to delete fd 13 and close(), then mark its associated data structure as removed and link it to
              a  cleanup  list.  If you find another event for fd 13 in your batch processing, you will dis-
              cover the fd had been previously removed and there will be no confusion.

CONFORMING TO
       epoll(4) is a new API introduced in Linux kernel 2.5.44.  Its interface should be finalized in  Linux
       kernel 2.5.66.

SEE ALSO
       epoll_ctl(2), epoll_create(2), epoll_wait(2)

Linux                                            2002-10-23                                         EPOLL(4)

思一克回复于：2006-08-25 15:36:49

LZ,

你注意看man epoll的EPLLET部分了吗？如果不用ET方式可能不会有你的问题。

看PITFALLS AND SOLUTIONS部分。

你的问题：
by going to wait for an event only after read(2) or write(2) return EAGAIN
意思是仅仅将这样的fd放入epll_wait, 如果它上次读写的返回错误是EAGAIN.

wyezl 回复于：2006-08-25 16:05:37

引用：原帖由思一克于 2006-8-25 15:36 发表
LZ,

你注意看man epoll的EPLLET部分了吗？如果不用ET方式可能不会有你的问题。

看PITFALLS AND SOLUTIONS部分。

你的问题：
by going to wait for an event only after read(2) or write(2) return EA ...

如果我使用的是EPLLET。
正在读的时候，返回了EAGAIN 。
那我该如何处理？

思一克回复于：2006-08-25 16:12:39

TO LZ，

man page中说的意思是
如果我使用的是EPLLET。
正在读的时候，返回了EAGAIN 。
那我该如何处理？
那就加入poll_wait(如果原来就在，？）。
如果返回的不是EAGAIN,就不要poll_wait了。

我理解就是如此。说是饥饿症，并非epoll独有的问题

wyezl 回复于：2006-08-25 16:33:33

引用：原帖由思一克于 2006-8-25 16:12 发表
TO LZ，

man page中说的意思是
如果我使用的是EPLLET。
正在读的时候，返回了EAGAIN 。
那我该如何处理？
那就加入poll_wait(如果原来就在，？）。
如果返回的不是EAGAIN,就不要poll_wait了。

我理解就 ...

事情是这样的：
以前accept的时候把一个cfd加入了epoll。
后来wait返回。说这个cfd可读，我就去读，读了一次或两次，就返回了EAGAIN。
我难道新把以前的剔除要重 ADD一下？
那已经读取的一部分数据怎么处理呢？
找个buffer存着？等待下一次wait返回再接着读？

另外，不管有没使用EPLLET，下面代码，我只读一次的情况下。

  if(events.events & EPOLLIN)
                        {

                                ret = recv(cfd, buffer, sizeof(buffer),0);

                                if(ret>0)
                                {
                                         //正常处理
                                }
                                else
                                {
                                        perror("recv:");
                                        printf("recv<=0\n");

                                        epoll_ctl(epfd, EPOLL_CTL_DEL, cfd, &ev);

                                        if(close(cfd)==0)
                                                cfds[cfd]=0;
                                }

打印出了一些这样的错误。为什么wait返回可读，我去读，还能出错呢？
ecv<=0
recv:: Illegal seek
recv<=0
recv:: Illegal seek
recv<=0
recv:: Connection reset by peer
recv<=0
recv:: Illegal seek
recv<=0
recv:: Illegal seek
recv<=0
recv:: Illegal seek
recv<=0
recv:: Illegal seek
recv<=0
recv:: Illegal seek
recv<=0
recv:: Illegal seek
recv<=0
recv:: Illegal seek
recv<=0
recv:: Illegal seek
recv<=0
recv:: Illegal seek
recv<=0
recv:: Illegal seek
recv<=0
recv:: Connection reset by peer
recv<=0
recv:: Connection reset by peer

nuclearweapon 回复于：2006-08-25 17:34:05

引用：原帖由 wyezl 于 2006-8-25 16:33 发表

事情是这样的：
以前accept的时候把一个cfd加入了epoll。
后来wait返回。说这个cfd可读，我就去读，读了一次或两次，就返回了EAGAIN。
我难道新把以前的剔除要重 ADD一下？
那已经读取的一部分数据怎么 ...

EAGAIN和EWOULDBLOCK在linux是相同。
socket可读，是和SO_RCVLOWAT设定的值有关系（但是linux中的select和poll并不遵守这个选项）。

默认这个值默认是1。

现在你希望read读的字节数为sizeof(buffer)，现在有可能没有数据，所以回返回EWOULDBLOCK，在linux中其实就是EAGAIN

[ 本帖最后由 nuclearweapon 于 2006-8-25 17:39 编辑 ]

wyezl 回复于：2006-08-25 18:01:25

引用：原帖由 nuclearweapon 于 2006-8-25 17:34 发表

EAGAIN和EWOULDBLOCK在linux是相同。
socket可读，是和SO_RCVLOWAT设定的值有关系（但是linux中的select和poll并不遵守这个选项）。

默认这个值默认是1。

现在你希望read读的字节数为sizeof(buffer)， ...

这种情况该做什么处理呢？
顺便问一下I， llegal seek 是什么原因造成的？

nuclearweapon 回复于：2006-08-25 22:57:51

EAGAIN
             发生这种erro，大多数应该是你程序逻辑的错误。
         这是因为，当select/poll/epoll返回的时候，这些api只是说可以读了，但是没有说可以读多少。
         val=read(fd,buf,n)。如果val已经小于n了，你还读此fd，在__大多数__情况下是会EAGAIN。
         不过，也可以不改，直接再把这些fd放到fdset中继续监测也是可以的。


     为什么读一次就出错，我推断是这样的。
         就select/poll来说erro也是readable。
         推测epoll也是采取这种策略。所以才有你这种错误。
         这也就说明了，为什么访问量这么大却没有错误发生！你把错误也作为了readable了。


        至于llegal seek 。
         不太好说，看不到你的程序。
         不过我觉得你肯定没有seek一个socket。

[ 本帖最后由 nuclearweapon 于 2006-8-27 20:48 编辑 ]

精简指令回复于：2006-08-27 00:13:44

EPOLLET 是边沿触发。如果epoll_wait返回一个可读的文件描述符，必须要把它缓冲区中的数据都读出来。如果没有读完，就继续下一次epoll_wait()，epoll就不会在这个描述符上被唤醒。

请参考MAN手册的建议

The suggested way to use epoll as an Edge Triggered (EPOLLET) interface is below, and possible pitfalls to avoid follow.

i with non-blocking file descriptors
ii by going to wait for an event only after read(2) or write(2) return EAGAIN

[ 本帖最后由精简指令于 2006-8-27 00:37 编辑 ]

精简指令回复于：2006-08-27 00:36:10

突然发现一个问题：

你使用了 EPOLLET（边沿触发）  和非阻塞IO。

但是在接收数据时，你只接收一次，没有等到recv返回EAGAIN，就设置EPOLLOUT继续等待了。

假如这个描述符的缓冲区内还有数据没有读完，它就可能“死”在epoll里，不再被返回了。

我看到过一个人非常形象的描述这个问题：
  水平触发 --> 有事了，你不处理？不断骚扰你直到你处理。
  边沿触发 --> 有事了，告诉你一次，你不处理？拉倒！

[ 本帖最后由精简指令于 2006-8-27 00:37 编辑 ]

safedead 回复于：2006-08-27 09:41:48

引用：原帖由精简指令于 2006-8-27 00:36 发表
突然发现一个问题：

你使用了 EPOLLET（边沿触发）和非阻塞IO。

但是在接收数据时，你只接收一次，没有等到recv返回EAGAIN，就设置EPOLLOUT继续等待了。

假如这个描述符的缓冲区内还有数据没有读完， ...

这么一说我终于理解level和edge的区别了
我只用level和阻塞socket,所以没碰到类似问题

精简指令回复于：2006-08-27 16:48:28

我又按照下面步骤测试了一下

第一次：
1、epoll 在一个文件描述符上，等待一个EPOLLIN事件。
2、epoll 在这个文件描述符上被唤醒，然后只接收部分数据。（没有等到EAGAIN）
3、继续等待这个EPOLLIN事件。
4、epoll 没有再被唤醒。

正常现象。

第二次：
1、epoll 在一个文件描述符上，等待一个EPOLLIN事件。
2、epoll 在这个文件描述符上被唤醒，然后只接收部分数据。（没有等到EAGAIN）
3、继续等待这个EPOLLIN事件。
4、epoll 没有再被唤醒。
5、再次收到新数据后，epoll又在这个描述符上被唤醒了。

正常现象。

第三次：
1、epoll 在一个文件描述符上，等待一个EPOLLIN事件。
2、epoll 在这个文件描述符上被唤醒，然后只接收部分数据。（没有等到EAGAIN）
3、不再等待EPOLLIN，继续等待另一个EPOLLOUT事件。
4、epoll 在这个文件描述符上被唤醒了。

我怀疑EPOLLIN和EPOLLOUT是分别处理的。但是，也不能保证这种现象是正常行为。

第四次：
1、epoll 在一个文件描述符上，等待一个EPOLLOUT事件。
2、epoll 在这个文件描述符上被唤醒。
3、继续等待EPOLLOUT事件。
4、epoll 没有被唤醒。
5、收到新的数据，epoll_wait 在这个描述符上返回 EPOLLOUT 事件。

是否说明 IN 和 OUT 事件还是会相互影响呢？

[ 本帖最后由精简指令于 2006-8-27 17:01 编辑 ]

ken1984 回复于：2006-08-27 19:01:06

顶，关注楼上的结果。

wyezl 回复于：2006-08-28 10:44:06

谢谢楼上那几个兄弟的热心解答。
我试试把数据都读出来，直到EAGAIN。

wyezl 回复于：2006-08-28 11:19:06

int my_read(int fd,void *buffer,int length)
{
        int bytes_left;
        int bytes_read;
        char *ptr;
        ptr=buffer;
        bytes_left=length;
        while(bytes_left>0)
        {
                bytes_read=read(fd,ptr,bytes_read);

                if(bytes_read<0)
                {
                         if(errno==EINTR)
                                bytes_read=0;
                        else if(errno==EAGAIN)
                                break;
                        else
                        {
                                perror("read");
                                return(-1);
                        }
                }
                else if(bytes_read==0)
                break;

                bytes_left-=bytes_read;
                ptr+=bytes_read;

        }

        return(length-bytes_left);
}

把recv改成这个了，还是不行。

GodArmy 回复于：2006-08-30 15:54:34

借光，问一个问题，这里对于同一个fd，只能是发送或者接收是吗？不能用同一个fd进行发送和接收吗？

wyezl 回复于：2006-08-30 16:26:24

本来就是用同一个fd，先接受，再发送的。

GodArmy 回复于：2006-08-31 09:34:24

引用：原帖由 wyezl 于 2006-8-30 16:26 发表
本来就是用同一个fd，先接受，再发送的。

你的意思是整个过程就只有一个fd?

GodArmy 回复于：2006-08-31 10:53:42

http://www.xmailserver.org/linux-patches/dphttpd_last.tar.gz

给楼主推荐一个，在网上查到了，你看你能用吗？

精简指令回复于：2006-08-31 22:16:07

我看到楼主前面记录了一些日志，并且有一些连接确实通过超时检测出是死在epoll里了。

能否统计一下，在这些FD上是 EPOLLIN 事件没有发生，还是后面的 EPOLLOUT 事件没有发生?

wyezl 回复于：2006-09-01 09:40:02

引用：原帖由 GodArmy 于 2006-8-31 10:53 发表
http://www.xmailserver.org/linux-patches/dphttpd_last.tar.gz

给楼主推荐一个，在网上查到了，你看你能用吗？

这个可能是2.6以前的epoll吧。
反正跟现在的不一样。

wyezl 回复于：2006-09-01 09:42:27

引用：原帖由精简指令于 2006-8-31 22:16 发表
我看到楼主前面记录了一些日志，并且有一些连接确实通过超时检测出是死在epoll里了。

能否统计一下，在这些FD上是 EPOLLIN 事件没有发生，还是后面的 EPOLLOUT 事件没有发生?

这个不太好统计。

solegoose 回复于：2006-09-01 10:13:57

我认为原因是没有对每个FD进行超时的管理.
假设下列情况,一用户发起连接,服务器accept成功,返回了FD,但是在用户发起请求前,如果网络有问题,此FD当然无法返回POLLIN,当然就不会有POLLOUT等等了,这样FD就无法关闭.
KEEPALIVE可以解决此问题,这个不是指HTTP中的KEEPALIVE,是指socket选项.
个人观点,欢迎指教.

另,我个人认为,半连接,是不会占用FD的.如果只有半连接,我相信内核中的sock->ops->accept()函数是无法正确返回的.只有在这个函数正确返回的情况下,才会调用sock_map_fd(),把socket和fd关联起来.

思一克回复于：2006-09-01 10:18:01

LZ, 楼上说的对，和我原来和你说的差不多。你采取关闭那些长时间FD不行吗

cds 回复于：2006-09-01 15:27:29

用了EPOLLET的缘故，所以会产生hang的句柄。

精简指令回复于：2006-09-02 01:40:55

我同意 solegoose 和思一克的说法。

其实，把这个问题的焦点集中在epoll上是不对的，我们有一点糊涂了。

当客户端的机器在发送“请求”前，就崩溃了（或者网络断掉了），则服务器一端是无从知晓的。

按照你现在的这个“请求响应方式”，无论是否使用epoll，都必须要做超时检查。

因此，这个问题与epoll无关。

[ 本帖最后由精简指令于 2006-9-2 13:26 编辑 ]

wyezl 回复于：2006-09-03 00:13:57

引用：原帖由思一克于 2006-9-1 10:18 发表
LZ, 楼上说的对，和我原来和你说的差不多。你采取关闭那些长时间FD不行吗

暂时用的是这种方法。

wyezl 回复于：2006-09-03 00:16:09

wyezl 回复于：2006-09-04 09:35:50

solegoose 回复于：2006-09-04 09:46:23

怎么设置KEEPALIVE，请参考《UNIX网络编程卷1》第七章。
能否返回POLLERR这个我不是很确定，我对epoll不是很熟悉。但是在对方主机突然崩溃或者网络突然断开的时候，因为来不及有数据包的交换，估计epoll不容易探测到此连接有问题，在这种情况下，就是调用write，都能马上成功返回。

wyezl 回复于：2006-09-04 18:33:23

引用：原帖由 solegoose 于 2006-9-4 09:46 发表
怎么设置KEEPALIVE，请参考《UNIX网络编程卷1》第七章。
能否返回POLLERR这个我不是很确定，我对epoll不是很熟悉。但是在对方主机突然崩溃或者网络突然断开的时候，因为来不及有数据包的交换，估计epoll不容易探 ...

关于KEEPALIVE我了解了一下，那个需要两个小时，还要发送9个分节。
我觉得这用于web server不太合适吧。

wyezl 回复于：2006-09-06 14:07:27

在computer_xu的BLOG  http://blog.sina.com.cn/u/544465b0010000bp
中看到了如下翻译。也贴在这。

在man epoll中的Notes说到：

EPOLL事件分发系统可以运转在两种模式下：
   Edge Triggered (ET)
   Level Triggered (LT)
接下来说明ET, LT这两种事件分发机制的不同。我们假定一个环境：
1. 我们已经把一个用来从管道中读取数据的文件句柄(RFD)添加到epoll描述符
2. 这个时候从管道的另一端被写入了2KB的数据
3. 调用epoll_wait(2)，并且它会返回RFD，说明它已经准备好读取操作
4. 然后我们读取了1KB的数据
5. 调用epoll_wait(2)......

Edge Triggered 工作模式：
如果我们在第1步将RFD添加到epoll描述符的时候使用了EPOLLET标志，那么在第5步调用epoll_wait(2)之后将有可能会挂起，因为剩余的数据还存在于文件的输入缓冲区内，而且数据发出端还在等待一个针对已经发出数据的反馈信息。只有在监视的文件句柄上发生了某个事件的时候 ET 工作模式才会汇报事件。因此在第5步的时候，调用者可能会放弃等待仍在存在于文件输入缓冲区内的剩余数据。在上面的例子中，会有一个事件产生在RFD句柄上，因为在第2步执行了一个写操作，然后，事件将会在第3步被销毁。因为第4步的读取操作没有读空文件输入缓冲区内的数据，因此我们在第5步调用epoll_wait(2)完成后，是否挂起是不确定的。epoll工作在ET模式的时候，必须使用非阻塞套接口，以避免由于一个文件句柄的阻塞读/阻塞写操作把处理多个文件描述符的任务饿死。最好以下面的方式调用ET模式的epoll接口，在后面会介绍避免可能的缺陷。
   i    基于非阻塞文件句柄
   ii   只有当read(2)或者write(2)返回EAGAIN时才需要挂起，等待

Level Triggered 工作模式
相反的，以LT方式调用epoll接口的时候，它就相当于一个速度比较快的poll(2)，并且无论后面的数据是否被使用，因此他们具有同样的职能。因为即使使用ET模式的epoll，在收到多个chunk的数据的时候仍然会产生多个事件。调用者可以设定EPOLLONESHOT标志，在epoll_wait(2)收到事件后epoll会与事件关联的文件句柄从epoll描述符中禁止掉。因此当EPOLLONESHOT设定后，使用带有EPOLL_CTL_MOD标志的epoll_ctl(2)处理文件句柄就成为调用者必须作的事情。

以上翻译自man epoll.

然后详细解释ET, LT:

LT(level triggered)是缺省的工作方式，并且同时支持block和no-block socket.在这种做法中，内核告诉你一个文件描述符是否就绪了，然后你可以对这个就绪的fd进行IO操作。如果你不作任何操作，内核还是会继续通知你的，所以，这种模式编程出错误可能性要小一点。传统的select/poll都是这种模型的代表．

ET(edge-triggered)是高速工作方式，只支持no-block socket。在这种模式下，当描述符从未就绪变为就绪时，内核通过epoll告诉你。然后它会假设你知道文件描述符已经就绪，并且不会再为那个文件描述符发送更多的就绪通知，直到你做了某些操作导致那个文件描述符不再为就绪状态了(比如，你在发送，接收或者接收请求，或者发送接收的数据少于一定量时导致了一个EWOULDBLOCK 错误）。但是请注意，如果一直不对这个fd作IO操作(从而导致它再次变成未就绪)，内核不会发送更多的通知(only once),不过在TCP协议中，ET模式的加速效用仍需要更多的benchmark确认。

在许多测试中我们会看到如果没有大量的idle-connection或者dead-connection，epoll的效率并不会比select/poll高很多，但是当我们遇到大量的idle-connection(例如WAN环境中存在大量的慢速连接)，就会发现epoll的效率大大高于select/poll。

realclimber 回复于：2006-09-10 19:37:30

ret = recv(cfd, buffer, sizeof(buffer),0);

这里应该判断一下状态吧,如果对方网络这个时候断开,你这边就不会关闭连接.

wyezl 回复于：2006-09-11 18:18:01

引用：原帖由 realclimber 于 2006-9-10 19:37 发表
ret = recv(cfd, buffer, sizeof(buffer),0);

这里应该判断一下状态吧,如果对方网络这个时候断开,你这边就不会关闭连接.

我在实际的程序中做了些简单的判断。
这是个最初的演示版本，所以有些没写上。

ken1984 回复于：2006-09-15 22:34:43

现在到底是什么问题？

seeLnd 回复于：2006-09-19 10:59:18

问题还是没有解决吗？
楼主有没有采用其他的变通做法？

wyezl 回复于：2006-09-19 11:11:30

没想到更好的办法，只有一个监视线程。定期关闭一些超时的fd。
估计这是不可避免的了。

wyezl 回复于：2006-09-25 10:24:13

继续这个问题。有过实际网络编程经验的兄弟看一下。
这个server在访问量不是非常大的情况下，有时候cpu利用率能达到99%，但过几分钟又自己下去了。
而且99%的时候服务不受任何影响，速度还是很快。
load average: 0.20 最高能达到1。

在99%的时候，重新启动http server，立刻能降下去。但一会还有可能上去。很奇怪。

请帮忙分析下是什么原因。

safedead 回复于：2006-09-25 22:13:18

这阵子一直在编写代理服务器
发现了一个有趣的现象：
当服务器很快，客户端很慢的时候
大量数据积压在代理服务器的tcp缓冲区里
代理服务器内存占用急剧上升，内存快用完的时候，发生SWAP然后CPU占用也上去了
最糟糕的时候是内存占用增长过快，代理服务程序被内核直接杀掉
在默认TCP内核参数下，代理1000个并发连接(2000个套接字)最高消耗掉800M内存
在top界面看不到服务程序内存内存消耗增加，但系统可用内存几分钟内就没有了
断开连接内存就恢复了

若是客户端或服务器拔网线或是拔电，就只好等KEEPALIVE起作用了，
此期间内存占用一直居高不下

wyezl 回复于：2006-09-26 10:02:20

引用：原帖由 safedead 于 2006-9-25 22:13 发表
这阵子一直在编写代理服务器
发现了一个有趣的现象：
当服务器很快，客户端很慢的时候
大量数据积压在代理服务器的tcp缓冲区里
代理服务器内存占用急剧上升，内存快用完的时候，发生SWAP然后CPU占用也上去了
...

网络编程在实际应用中经常出现一些奇怪的现象。
远远没有书上讲的几个调用那么干净简单。
我那个程序没有动态分配内存。所以内存基本上空闲。
就是cpu用了n多。

正常情况输出。
strace  -c   -p16940
Process 16940 attached - interrupt to quit

Process 16940 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
81.33    4.041201         313     12929           accept
13.48    0.669718          52     12929           epoll_ctl
  3.39    0.168595           7     25858           fcntl64
  1.79    0.089180           7     12929           time
------ ----------- ----------- --------- --------- ----------------
100.00    4.968694                 64645           total

vmstat
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
0  0   4988 873736  48832  53464    0    0     1     5    0     9  2  4 94  0

accept在严重的时候占用99。95%。估计是因为没有使用非阻塞导致的吧。

[ 本帖最后由 wyezl 于 2006-9-26 10:03 编辑 ]

hightman 回复于：2006-09-26 10:38:11

晕,顶这么多贴?? 小的描述符关闭后可以复用的, 只需要在程序设计时自己注意一下. 一般设计时会设定一个最大上限比如允许 1024 个并发连接. 那么你可以将套接字描述符控制在 1100 以下.(假设主程序还有若干文件描述符打开)

比如在 accept 之后调用系统调用 dup(), 它会复制一个描述符出来, 如果你确定有较小的可以尝试5次调用dup()直到返回较小的为止.

fd2 = dup(fd);
close(fd);

..................

[ 本帖最后由 hightman 于 2006-9-26 10:39 编辑 ]

wyezl 回复于：2006-09-26 10:46:31

引用：原帖由 hightman 于 2006-9-26 10:38 发表
晕,顶这么多贴?? 小的描述符关闭后可以复用的, 只需要在程序设计时自己注意一下. 一般设计时会设定一个最大上限比如允许 1024 个并发连接. 那么你可以将套接字描述符控制在 1100 以下.(假设主程序还有若干文件描 ...

我更晕。你那治标不治本。
现在问题已经转移到cpu利用率上来了。跟fd无关了。
偶尔cpu利用率过高的问题。

safedead 回复于：2006-09-26 20:20:27

accept占用高我还从没遇到过
我用的是阻塞socket
每秒新建连接数不超过2000
并发控制在8000之内

你的accept速度是不是太快了, 我记得你的每秒新建连接超过1万了
我发生CPU高涨全是软中断和TCP缓存不够引起的
跟网卡有关(PCI-X 64bit 66MHz效率太低了)

2.6内核的异步IO机制还有内核级线程使得高并发容易实现多了,
但是按下葫芦又起瓢, 其它以前看不到的问题现在都出现了

wyezl 回复于：2006-09-28 10:46:48

引用：原帖由 safedead 于 2006-9-26 20:20 发表
accept占用高我还从没遇到过
我用的是阻塞socket
每秒新建连接数不超过2000
并发控制在8000之内

你的accept速度是不是太快了, 我记得你的每秒新建连接超过1万了
我发生CPU高涨全是软中断和TCP缓存不够引起 ...

问题是，不上线测试问题就是出不来。
用工具测试情况都很理想。

估计跟client的网络状态有关。

站内短信给我留个msn吧，有空跟你请教网络编程。

[ 本帖最后由 wyezl 于 2006-9-28 10:49 编辑 ]

GodArmy 回复于：2006-09-29 17:39:10

引用：原帖由 solegoose 于 2006-9-1 10:13 发表
我认为原因是没有对每个FD进行超时的管理.
假设下列情况,一用户发起连接,服务器accept成功,返回了FD,但是在用户发起请求前,如果网络有问题,此FD当然无法返回POLLIN,当然就不会有POLLOUT等等了,这样FD就无法关闭. ...

为什么你们总是想着关闭FD呢？
我是这样做的

void *thread(void *data)
{
struct epoll_event events[20];

for ( ; ; ) {
printf("begin epoll wait, threadid = %d\n", pthread_self());
int nfds=epoll_wait(epfd,events,20, -1);

for(int i=0;i<nfds;++i)
{
//printf("begin epoll wait, fd  = %d, events = %lu\n", events.data.fd, events.events);
if(events.data.fd == g_listenfd )
{
num ++;
struct sockaddr_in clientaddr;
socklen_t clilen;

int connfd = accept(g_listenfd,(struct sockaddr *)&clientaddr, &clilen);
if(connfd<0){
printf("error in g_listenfd\n");
break;
}
//printf("ip = %s\n", inet_ntoa(clientaddr.sin_addr));
int intMsg;
intMsg = allowip.select_ip(clientaddr.sin_addr.s_addr);
//printf("intMsg = %d\n", intMsg);
if(intMsg == 1 || num == 1)
{
setnonblocking(connfd);

//printf("new connection fd = %d\n", connfd);

struct epoll_event ev;
ev.data.fd=connfd;
ev.events=EPOLLHUP|EPOLLERR|EPOLLIN|EPOLLET|EPOLLPRI;
epoll_ctl(epfd, EPOLL_CTL_ADD, connfd ,&ev);
}
}

else if(events.events&EPOLLIN)
{

char buffer[102400] = {0};
if ( (events.data.fd) < 0) continue;
int connfd = events.data.fd;

int a = recv(connfd, buffer, sizeof(buffer), 0);

if (a > 0)
{
               //业务处理
}
else
{
if (a == 0 || errno == ECONNRESET)
{

struct epoll_event ev;
ev.data.fd=connfd;
if (epoll_ctl(epfd, EPOLL_CTL_DEL, connfd ,&ev) == 0)
{
//ev.events=EPOLLHUP|EPOLLERR|EPOLLOUT|EPOLLET|EPOLLPRI;
//printf("epoll_ctl after del return = %d\n", epoll_ctl(epfd, EPOLL_CTL_MOD, connfd ,&ev));
}

if (myclose(connfd) == 0)
printf("close success\n");
else
printf("close error\n");
}

}
}

else if(events.events&EPOLLOUT)
{
if ( (events.data.fd) < 0) continue;
int connfd = events.data.fd;

//send(connfd, buffer, strlen(buffer), 0);

struct epoll_event ev;
ev.data.fd=connfd;
ev.events=EPOLLHUP|EPOLLERR|EPOLLIN|EPOLLET|EPOLLPRI;
epoll_ctl(epfd, EPOLL_CTL_MOD, connfd ,&ev);

}
else if(events.events&EPOLLERR)
{
printf("error on fd = %d\n", events.data.fd);
}
else if(events.events&EPOLLHUP)
{
printf("hup on fd = %d\n", events.data.fd);
}
else if (events.events & EPOLLPRI)
{
printf(" epollpri %d\n", events.data.fd);
}
else if (events.events & EPOLLET)
printf("epollet %d\n", events.data.fd);
}
}
}

[ 本帖最后由 GodArmy 于 2006-9-29 17:40 编辑 ]

wyezl 回复于：2006-09-29 18:41:21

引用：原帖由 GodArmy 于 2006-9-29 17:39 发表

为什么你们总是想着关闭FD呢？
我是这样做的

void *thread(void *data)
{
struct epoll_event events[20];

for ( ; ; ) {
printf("begin epoll wait, threadid = %d\n", pthread_sel ...

我觉得这个程序能用但效率比较低。
能把程序中的高明之处指出来吗？

现在描述符到是小事了。
accept成了大事，有时候莫名其妙把cpu吃到99%。
比较奇怪，我还单独一个线程接受的呢。

cwinl 回复于：2006-11-02 19:25:45

libevent封装了epoll的操作
默认是用的EPOLLLT
虽然没有用EPOLLET，但满足千个左右的client应该是没有问题的吧

我也在用libevent封装自己的程序
使用过程中发现了一些问题
一起探讨吧

oknet 回复于：2007-02-26 16:35:22

对于非阻塞socket，recv如果读取到0字节，那么就意味着remote side主动关闭了连接，如果这个时候在server side执行close(fd)，那么以后永远都不会触发epoll events，因此这个文件描述字将永远不会关闭了

ret = recv(cfd, buffer, sizeof(buffer),0);

因此在这句之后要判断一次，是否读到了0长度的内容

wyezl 回复于：2007-02-26 16:42:01

设定个超时，定期检查不活动的连接，close掉，就ok了。

oknet 回复于：2007-02-26 16:59:33

在楼主给出的第二种架构里面就有对 recv 返回 0 长度的判断
            } else if (n == 0) {

                close(sockfd);

                events.data.fd = -1;

            }

所以第二种架构不会出问题

wyezl 回复于：2007-02-27 09:45:22

我把返回值不大于0 的情况，都关闭了。
而且显式执行了 epoll_ctl(epfd, EPOLL_CTL_DEL, cfd, &ev);

尽管如此，问题还是有的。

所以必须设置一个超时监控线程来解决。

空灵静世回复于：2007-04-19 16:07:20

你的文件描述符号是怎么耗尽的,有什么结论吗?

空灵静世回复于：2007-04-19 16:21:42

引用：原帖由 oknet 于 2007-2-26 16:35 发表
对于非阻塞socket，recv如果读取到0字节，那么就意味着remote side主动关闭了连接，如果这个时候在server side执行close(fd)，那么以后永远都不会触发epoll events，因此这个文件描述字将永远不会关闭了

ret = ...

你的说法我怎么看不懂阿,server side 都已经关闭了,怎么又说这个文件描述自将永远不会关闭了,都已经关了还要关什么啊?

xhl 回复于：2007-04-19 16:33:44

我觉得LZ既然是做TCP的SERVER，而且有这样大的访问量，不应该不做TCPKEEPALIVE的操作，而且要把BACKLOG增加到500或者更大。。

感觉LZ的问题，就是因为很多用户在保持与LZ的SERVER连接的时候，直接关机器或者拔网络线，都会操作你这边出现一个死连接。
默认的情况下， LINUX或经过大概7200秒才会回收这个FD，这样肯定会影响你的服务的。

我贴段KEEPALIVE的代码，希望对LZ有帮助：

# include <netinet/tcp.h>

int keepalive; // 打开TCP KEEPALIVE开关
int keepidle; // 系统等待多长时间开始检测
int keepintvl; // 系统每次探测时间
int keepcnt; // 系统探测几次后执行关闭操作。

setsockopt(fd, SOL_SOCKET, SO_KEEPALIVE, (char*)&keepalive, sizeof(keepalive))；
isetsockopt(fd, SOL_TCP, TCP_KEEPIDLE, (char *)&keepidle, sizeof(keepidle))；
setsockopt(fd, SOL_TCP, TCP_KEEPINTVL,(char *)&keepintvl, sizeof(keepintvl))；
setsockopt(fd, SOL_TCP, TCP_KEEPCNT,(char *)&keepcnt, sizeof(keepcnt))；

这个功能是协议栈实现的，应用层不用关心，但当发生检测失败后，他会叫你的epoll函数。

wyezl 回复于：2007-04-19 18:56:07

谢谢你的建议。BACKLOG我一般都用8192。
最大fd一般用20w。
现在服务器很稳定，几个月都不会down掉，已经用做商业应用有半了。

yangsf5 回复于：2008-04-14 18:35:32

引用：原帖由 wyezl 于 2006-8-25 16:33 发表 [url=http://bbs.chinaunix.net/redirect.php?goto=findpost&pid=5684249&ptid=813588]

事情是这样的：
以前accept的时候把一个cfd加入了epoll。
后来wait返回。说这个cfd可读，我就去读，读了一次或两次，就返回了EAGAIN。
我难道新把以前的剔除要重 ADD一下？
那已经读取的一部分数据怎 ...

我难道新把以前的剔除要重 ADD一下？
那已经读取的一部分数据怎么处理呢？
找个buffer存着？等待下一次wait返回再接着读？

请问这个怎么处理的？

bical 回复于：2008-10-06 11:28:36

在linsten之后把listen的fd加到epoll监控对象中epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &ev)

wwwsq 回复于：2008-10-06 13:06:11

引用：原帖由 wyezl 于 2007-2-27 09:45 发表 [url=http://bbs.chinaunix.net/redirect.php?goto=findpost&pid=6478257&ptid=813588]
我把返回值不大于0 的情况，都关闭了。
而且显式执行了 epoll_ctl(epfd, EPOLL_CTL_DEL, cfd, &ev);

尽管如此，问题还是有的。

所以必须设置一个超时监控线程来解决。

tcp connection是可能断开而不触发任何事件的，这种情况在internet上要比在局域网里面常见得多。这可能也是为什么你用测试工具测试都ok，而上线就会有问题。

tcp connection你可以看作是两个村子之间有条山路，这条山路到底通还是不通，你必须派信使实际走一趟才知道。如果没有信使到来，你就不知道这条路通不通，也许发生山崩了呢。(信使就相当于网线上的高低电平信号)

chenzhanyiczy 回复于：2008-10-06 14:53:20

引用：原帖由精简指令于 2006-8-27 16:48 发表 [url=http://bbs.chinaunix.net/redirect.php?goto=findpost&pid=5689775&ptid=813588]
我又按照下面步骤测试了一下

第一次：
1、epoll 在一个文件描述符上，等待一个EPOLLIN事件。
2、epoll 在这个文件描述符上被唤醒，然后只接收部分数据。（没有等到EAGAIN）
3、继续等待这个EPOLLIN事件。 ...

“第二次：
1、epoll 在一个文件描述符上，等待一个EPOLLIN事件。
2、epoll 在这个文件描述符上被唤醒，然后只接收部分数据。（没有等到EAGAIN）
3、继续等待这个EPOLLIN事件。
4、epoll 没有再被唤醒。
5、再次收到新数据后，epoll又在这个描述符上被唤醒了。

正常现象。”

这个有问题吧。再次收到新数据，epoll 应该没有再被唤醒才对。因为文件描述符状态并没有改变，仍然是就绪状态

redor 回复于：2008-10-06 15:39:30

引用：原帖由 wyezl 于 2006-8-18 11:23 发表 [url=http://bbs.chinaunix.net/redirect.php?goto=findpost&pid=5647820&ptid=813588]
只要能帮我找出描述符从哪耗尽的就行。
：）

您老的程序有很多TIMEWAIT吧? shutdown(fd, SHUT_RDWR); 了么?

bobozhang 回复于：2008-10-06 19:52:00

这个帖子有些年代了，不知道你们怎么翻出来了
对于楼主这个问题，我觉得会不会可能是accept太快，而且每次epoll_wait满足io的fd要大于EVENTSIZE这样慢慢积累到最后就满足Io条件但又得不到处理的fd越来越多，最终导致fd用完。

yuanyuan025 回复于：2008-10-07 03:10:49

好啊帮丁了

alexhappy 回复于：2008-10-08 15:05:54

我曾碰到跟楼主类似的问题，正好学习一下。。。。

lauxp 回复于：2009-02-26 23:17:38

引用：原帖由 wyezl 于 2006-8-28 11:19 发表 [url=http://bbs2.chinaunix.net/redirect.php?goto=findpost&pid=5692376&ptid=813588]
int my_read(int fd,void *buffer,int length)
{
        int bytes_left;
        int bytes_read;
        char *ptr;
        ptr=buffer;
        bytes_left=length;
        while(bytes_left>0)
...

在这个函数里面读到0的地方，表示对方已经主动关闭

挖坟..