nginx针对fastcgi保持keepalive的实验

来源:互联网 发布:流光字软件 编辑:程序博客网 时间:2024/04/28 19:02

为了保持与后端的连接,请下载keepalive模块(http://mdounin.ru/hg/ngx_http_upstream_keepalive )

需要注意的是在多进程模式下,需要设置accept_mutex off;

 

假设你已经会用keepalive模块,我们继续分析在fastcgi如何保持连接?

 

nginx连接fastcgi默认情况下,是这样的(即使你设置了上面keepalive)

 

[root@localhost ~]# tcpdump -i lo -s 1500 port 9000
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo, link-type EN10MB (Ethernet), capture size 1500 bytes
15:23:16.901004 IP localhost.localdomain.50867 > localhost.localdomain.9000: S 3482201970:3482201970(0) win 32767 <mss 16396,sackOK,timestamp 2296841391 0,nop,wscale 7>
15:23:16.901025 IP localhost.localdomain.9000 > localhost.localdomain.50867: S 3473410857:3473410857(0) ack 3482201971 win 32767 <mss 16396,sackOK,timestamp 2296841391 2296841391,nop,wscale 7>
15:23:16.901039 IP localhost.localdomain.50867 > localhost.localdomain.9000: . ack 1 win 256 <nop,nop,timestamp 2296841391 2296841391>
15:23:16.901150 IP localhost.localdomain.50867 > localhost.localdomain.9000: P 1:1377(1376) ack 1 win 256 <nop,nop,timestamp 2296841391 2296841391>
15:23:16.901170 IP localhost.localdomain.9000 > localhost.localdomain.50867: . ack 1377 win 256 <nop,nop,timestamp 2296841391 2296841391>
15:23:16.901214 IP localhost.localdomain.9000 > localhost.localdomain.50867: P 1:97(96) ack 1377 win 256 <nop,nop,timestamp 2296841391 2296841391>
15:23:16.901222 IP localhost.localdomain.50867 > localhost.localdomain.9000: . ack 97 win 256 <nop,nop,timestamp 2296841391 2296841391>
15:23:16.901236 IP localhost.localdomain.9000 > localhost.localdomain.50867: F 97:97(0) ack 1377 win 256 <nop,nop,timestamp 2296841391 2296841391>
15:23:16.901822 IP localhost.localdomain.50867 > localhost.localdomain.9000: F 1377:1377(0) ack 98 win 256 <nop,nop,timestamp 2296841392 2296841391>
15:23:16.901836 IP localhost.localdomain.9000 > localhost.localdomain.50867: . ack 1378 win 256 <nop,nop,timestamp 2296841392 2296841392>

 

 

可以看出是后端主动关闭了连接,所以直接在nginx.conf配置文件中设置keepalive无效。

 

查看ngx_http_fastcgi_module.c文件,发现ngx_http_fastcgi_begin_request_t数据结构中有flags字段,

这个字段用来控制后端是否要主动关闭连接,

 

     102 typedef struct {                                      
     103     u_char  role_hi;
     104     u_char  role_lo;                                  
     105     u_char  flags;     //通过这个来控制连接的关闭是否                               
     106     u_char  reserved[5];                              
     107 } ngx_http_fastcgi_begin_request_t;

 

fastcgi协议说明如下:

 

Closing Transport Connections
    The Web server controls the lifetime of transport connections. The Web server can close a connection when no requests are active. Or the Web server can delegate close authority to the application (see FCGI_BEGIN_REQUEST). In this case the application closes the connection at the end of a specified request.

 

再查看ngx_http_fastcgi_module.c文件中的设置:

 

     475 static ngx_http_fastcgi_request_start_t  ngx_http_fastcgi_request_start = {
     476     { 1,                                               /* version */
     477       NGX_HTTP_FASTCGI_BEGIN_REQUEST,                  /* type */
     478       0,                                               /* request_id_hi */
     479       1,                                               /* request_id_lo */
     480       0,                                               /* content_length_hi */
     481       sizeof(ngx_http_fastcgi_begin_request_t),        /* content_length_lo */
     482       0,                                               /* padding_length */
     483       0 },                                             /* reserved */
     484
     485     { 0,                                               /* role_hi */
     486       NGX_HTTP_FASTCGI_RESPONDER,                      /* role_lo */
     487       0, /* NGX_HTTP_FASTCGI_KEEP_CONN */              /* flags */
     488       { 0, 0, 0, 0, 0 } },                             /* reserved[5] */
     489
     490     { 1,                                               /* version */
     491       NGX_HTTP_FASTCGI_PARAMS,                         /* type */
     492       0,                                               /* request_id_hi */
     493       1 },                                             /* request_id_lo */
     494
     495 };

 

 

我们把487行的0改为1,确保后端不主动关闭connection。

 

仅仅修改这个还是不能保持keepalive,还需要在ngx_http_fastcgi_module.c中的ngx_http_fastcgi_finalize_request函数中增加这么一段,确保nginx保持连接,也就是设置

u->length=0,以确保keepalive模块能够判断是否要保持连接。

 

            1917 ngx_http_fastcgi_finalize_request(ngx_http_request_t *r, ngx_int_t rc)
            1918 {
            1919     ngx_http_upstream_t *u = r->upstream;
            1920     if(u != NULL)
            1921     {
            1922         u->length = 0;
            1923     }
            1924     ngx_log_debug0(NGX_LOG_DEBUG_HTTP, r->connection->log, 0,                                                                 
            1925                    "finalize http fastcgi request");
            1926
            1927     return;
            1928 }

 

为了让nginx及时返回信息给客户端,

还需要修改src/event/ngx_event_pipe.c

 

原来代码如下:

while (cl && n > 0) {

    ngx_event_pipe_remove_shadow_links(cl->buf);

    size = cl->buf->end - cl->buf->last;

    if (n >= size) {

    cl->buf->last = cl->buf->end;

    /* STUB * / cl->buf->num = p->num++;

    if (p->input_filter(p, cl->buf) == NGX_ERROR){

        return NGX_ABORT;

    }

    n -= size;

    ln = cl;

    cl = cl->next;

    ngx_free_chain(p->pool, ln);

    } else {

    cl->buf->last += n;

    n = 0;

    }

    }

    修改如下:

while (cl && n > 0) {

    ngx_event_pipe_remove_shadow_links(cl->buf);

    size = cl->buf->end - cl->buf->last;

    if (n >= size) {

    cl->buf->last = cl->buf->end;

    n -= size;

    } else {

    cl->buf->last += n;

    n = 0;

    }

    /* STUB */cl->buf->num = p->num++;

    if (p->input_filter(p, cl->buf) == NGX_ERROR) {

    return NGX_ABORT;

    }

    ln = cl;

    cl = cl->next;

    ngx_free_chain(p->pool, ln);

    }

 

 

经过上面修改,执行程序,抓包分析如下:

 

[root@localhost ~]# tcpdump -i lo -s 1500 port 9000
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo, link-type EN10MB (Ethernet), capture size 1500 bytes
11:14:03.711955 IP localhost.localdomain.50708 > localhost.localdomain.9000: S 2207544028:2207544028(0) win 32767 <mss 16396,sackOK,timestamp 3491628977 0,nop,wscale 7>
11:14:03.712218 IP localhost.localdomain.9000 > localhost.localdomain.50708: S 2221134347:2221134347(0) ack 2207544029 win 32767 <mss 16396,sackOK,timestamp 3491628977 3491628977,nop,wscale 7>
11:14:03.712241 IP localhost.localdomain.50708 > localhost.localdomain.9000: . ack 1 win 256 <nop,nop,timestamp 3491628977 3491628977>
11:14:03.712257 IP localhost.localdomain.50708 > localhost.localdomain.9000: P 1:1257(1256) ack 1 win 256 <nop,nop,timestamp 3491628977 3491628977>
11:14:03.712273 IP localhost.localdomain.9000 > localhost.localdomain.50708: . ack 1257 win 256 <nop,nop,timestamp 3491628977 3491628977>
11:14:03.711969 IP localhost.localdomain.9000 > localhost.localdomain.50708: P 1:2985(2984) ack 1257 win 256 <nop,nop,timestamp 3491628978 3491628977>
11:14:03.711980 IP localhost.localdomain.50708 > localhost.localdomain.9000: . ack 2985 win 303 <nop,nop,timestamp 3491628978 3491628978>
11:14:05.738632 IP localhost.localdomain.50708 > localhost.localdomain.9000: P 1257:2513(1256) ack 2985 win 303 <nop,nop,timestamp 3491631005 3491628978>
11:14:05.738832 IP localhost.localdomain.9000 > localhost.localdomain.50708: P 2985:5969(2984) ack 2513 win 256 <nop,nop,timestamp 3491631005 3491631005>
11:14:05.738848 IP localhost.localdomain.50708 > localhost.localdomain.9000: . ack 5969 win 303 <nop,nop,timestamp 3491631005 3491631005>
11:14:06.901924 IP localhost.localdomain.50708 > localhost.localdomain.9000: P 2513:3769(1256) ack 5969 win 303 <nop,nop,timestamp 3491632168 3491631005>
11:14:06.902098 IP localhost.localdomain.9000 > localhost.localdomain.50708: P 5969:8953(2984) ack 3769 win 256 <nop,nop,timestamp 3491632168 3491632168>
11:14:06.902110 IP localhost.localdomain.50708 > localhost.localdomain.9000: . ack 8953 win 303 <nop,nop,timestamp 3491632168 3491632168>
11:14:07.570211 IP localhost.localdomain.50708 > localhost.localdomain.9000: P 3769:5025(1256) ack 8953 win 303 <nop,nop,timestamp 3491632836 3491632168>
11:14:07.570387 IP localhost.localdomain.9000 > localhost.localdomain.50708: P 8953:11937(2984) ack 5025 win 256 <nop,nop,timestamp 3491632837 3491632836>
11:14:07.570399 IP localhost.localdomain.50708 > localhost.localdomain.9000: . ack 11937 win 303 <nop,nop,timestamp 3491632837 3491632837>
11:14:08.202399 IP localhost.localdomain.50708 > localhost.localdomain.9000: P 5025:6281(1256) ack 11937 win 303 <nop,nop,timestamp 3491633469 3491632837>
11:14:08.202473 IP localhost.localdomain.9000 > localhost.localdomain.50708: P 11937:14921(2984) ack 6281 win 256 <nop,nop,timestamp 3491633469 3491633469>
11:14:08.202483 IP localhost.localdomain.50708 > localhost.localdomain.9000: . ack 14921 win 303 <nop,nop,timestamp 3491633469 3491633469>
11:14:09.475039 IP localhost.localdomain.50708 > localhost.localdomain.9000: P 6281:7537(1256) ack 14921 win 303 <nop,nop,timestamp 3491634742 3491633469>
11:14:09.475277 IP localhost.localdomain.9000 > localhost.localdomain.50708: P 14921:17905(2984) ack 7537 win 256 <nop,nop,timestamp 3491634742 3491634742>
11:14:09.475291 IP localhost.localdomain.50708 > localhost.localdomain.9000: . ack 17905 win 303 <nop,nop,timestamp 3491634742 3491634742>
11:14:10.082268 IP localhost.localdomain.50708 > localhost.localdomain.9000: P 7537:8793(1256) ack 17905 win 303 <nop,nop,timestamp 3491635349 3491634742>
11:14:10.082512 IP localhost.localdomain.9000 > localhost.localdomain.50708: P 17905:20889(2984) ack 8793 win 256 <nop,nop,timestamp 3491635349 3491635349>
11:14:10.082522 IP localhost.localdomain.50708 > localhost.localdomain.9000: . ack 20889 win 303 <nop,nop,timestamp 3491635349 3491635349>
11:14:10.818134 IP localhost.localdomain.50708 > localhost.localdomain.9000: P 8793:10049(1256) ack 20889 win 303 <nop,nop,timestamp 3491636085 3491635349>
11:14:10.818252 IP localhost.localdomain.9000 > localhost.localdomain.50708: P 20889:23873(2984) ack 10049 win 256 <nop,nop,timestamp 3491636085 3491636085>
11:14:10.818263 IP localhost.localdomain.50708 > localhost.localdomain.9000: . ack 23873 win 303 <nop,nop,timestamp 3491636085 3491636085>
11:14:11.506168 IP localhost.localdomain.50711 > localhost.localdomain.9000: S 2218187766:2218187766(0) win 32767 <mss 16396,sackOK,timestamp 3491636773 0,nop,wscale 7>
11:14:11.506191 IP localhost.localdomain.9000 > localhost.localdomain.50711: S 2224663648:2224663648(0) ack 2218187767 win 32767 <mss 16396,sackOK,timestamp 3491636773 3491636773,nop,wscale 7>
11:14:11.506205 IP localhost.localdomain.50711 > localhost.localdomain.9000: . ack 1 win 256 <nop,nop,timestamp 3491636773 3491636773>
11:14:11.506318 IP localhost.localdomain.50711 > localhost.localdomain.9000: P 1:1257(1256) ack 1 win 256 <nop,nop,timestamp 3491636773 3491636773>
11:14:11.506329 IP localhost.localdomain.9000 > localhost.localdomain.50711: . ack 1257 win 256 <nop,nop,timestamp 3491636773 3491636773>
11:15:11.506325 IP localhost.localdomain.50711 > localhost.localdomain.9000: F 1257:1257(0) ack 1 win 256 <nop,nop,timestamp 3491696782 3491636773>
11:15:11.546176 IP localhost.localdomain.9000 > localhost.localdomain.50711: . ack 1258 win 256 <nop,nop,timestamp 3491696822 3491696782>

 

上面我启动了两个nginx工作进程,

发现在第一个进程执行保持keepalive是没有问题的,一旦第二个进程取得了处理权后,就歇菜了,后端fastcgi就没有响应了,导致客户端迟迟没有响应

 

我们用strace进行查看fastcgi在干吗?

 

read(3, "/1/1/0/1/0/10/0/0/0/1/1/0/0/0/0/0/1/4/0/1/4/271/7/0/t/21"..., 8192) = 1256
time(NULL)                              = 1302665416
write(3, "/1/6/0/1/v/201/7/0Content-type: text/html/r"..., 2984) = 2984
read(3, "/1/1/0/1/0/10/0/0/0/1/1/0/0/0/0/0/1/4/0/1/4/271/7/0/t/21"..., 8192) = 1256
time(NULL)                              = 1302665418
write(3, "/1/6/0/1/v/201/7/0Content-type: text/html/r"..., 2984) = 2984


read(3,

 

发现出现问题的时候,read没有响应,好像只对一个连接起到作用,其它的连接fastcgi根本没法读取,导致没法返回信息给nginx

 

这里说明一下我用的是fcgi-2.4.0下自带的examples下的echo程序作为fastcgi后端程序,如果后端能够正常处理,比如也采用epoll,理论上能够处理。

 

后续探索还在进行中

 

原创粉丝点击