TinyHTTPd源码剖析

来源：互联网发布：如何开网络棋牌室赚钱编辑：程序博客网时间：2024/05/16 06:40

TinyHTTPd是一个超轻量型Http Server，使用C语言开发，全部代码只有502行(包括注释)，附带一个简单的Client，可以通过阅读这段代码理解一个Http Server 的本质。

文中只给出了几个重要函数的注释，完整的注释代码请看我的github，代码经过我的修改，能够无Warning在Linux下编译通过，同时更改了代码风格。

阅读源代码之前应该了解基本的http知识：

http基础

http报文格式

http报文分为请求报文和响应报文。

http请求报文的格式如下：

这里写图片描述

例如下面便是一个请求报文的示例：

这里写图片描述

响应报文的格式如下：

这里写图片描述

下面是一个响应报文的例子：

这里写图片描述

在介绍完http报文格式之后，我们提前解析一下源代码中的get_line函数：

get_line函数

每一行都以回车符（carriage return， ‘\r’）和换行符（line feed， ‘\n’）结尾，但应用程序也应该接受单个换行符作为行的终止。这样就可以理解get_line函数的实现了，get_line函数每次读取一行。

/**********************************************************************//* Get a line from a socket, whether the line ends in a newline, * carriage return, or a CRLF combination.  Terminates the string read * with a null character.  If no newline indicator is found before the * end of the buffer, the string is terminated with a null.  If any of * the above three line terminators is read, the last character of the * string will be a linefeed and the string will be terminated with a * null character. * Parameters: the socket descriptor *             the buffer to save the data in *             the size of the buffer * Returns: the number of bytes stored (excluding null) */ /* 从socket中读取一行内容，换行符可以为'\r', '\n'或'\r\n'. 如果读取到上述三 * 个之一，统一设定为'\n'，最后用null字符结尾. 如果没有检测到换行符， * 字符串以null结尾 */ /**********************************************************************/int get_line(int sock, char *buf, int size){    int i = 0;    char c = '\0';    int n;    while ((i < size - 1) && (c != '\n')) {        n = recv(sock, &c, 1, 0);        /* DEBUG printf("%02X\n", c); */        if (n > 0) {            // 如果末尾是\r或\r\n组合，设为\n            if (c == '\r') {                // MSG_PEEK选项使得下一次依然可以读取这个字符                n = recv(sock, &c, 1, MSG_PEEK);                /* DEBUG printf("%02X\n", c); */                if ((n > 0) && (c == '\n'))                    recv(sock, &c, 1, 0);                else                    c = '\n';            }            buf[i] = c;            i++;        } else { // 如果没有可读的，将c设置为\n以终止循环            c = '\n';        }    }    buf[i] = '\0';    return i;}

http方法（method）

方法是客户端希望服务器对资源执行的动作，是请求报文第一行的第一个单词。常见的方法有GET、HEAD或POST。tinyhttpd只实现了GET和POST方法。GET通常用于从指定的资源请求数据，POST用于向指定的资源提交要被处理的数据。

介绍完http之后，进入本文的正题，源码剖析：

tinyhttpd中的函数

tinyhttpd主要函数如下：

void accept_request(int);void bad_request(int);void cat(int, FILE *);void cannot_execute(int);void error_die(const char *);void execute_cgi(int, const char *, const char *, const char *);int  get_line(int, char *, int);void headers(int, const char *);void not_found(int);void serve_file(int, const char *);int  startup(u_short *);void unimplemented(int);

其中，bad_request, cannot_execute, error_die, headers, not_found, unimplemented几个函数直接给客户端发送HTML语句，本文就不给出注释了。下面剖析几个重要函数：main, startup, accept_request, execute_gui, get_line, server_file。

下面从main函数开始，一步步分析它的工作原理。

main函数

在main函数中，首先调用startup函数创建一个监听套接字，然后进入一个无限循环，等待客户端请求，一旦有请求，accept函数返回一个已连接套接字，然后调用aceept_request函数处理请求。完成之后，关闭监听套接字。

int main(void){    int server_sock = -1;    u_short port = 0;    int client_sock = -1;    struct sockaddr_in client_name;    socklen_t client_name_len = sizeof(client_name);    server_sock = startup(&port); // 建立一个监听套接字    printf("httpd running on port %d\n", port);    while (1) {        // 返回一个已连接套接字        client_sock = accept(server_sock,                (struct sockaddr *)&client_name,                &client_name_len);        if (client_sock == -1)            error_die("accept");        accept_request(client_sock); // 处理请求    }    close(server_sock);    return 0;}

startup函数

startup函数按照建立TCP连接的正常流程依次调用socket, bind, listen函数。监听套接字端口既可以指定也可以动态分配一个随机端口。

/**********************************************************************//* This function starts the process of listening for web connections * on a specified port.  If the port is 0, then dynamically allocate a * port and modify the original port variable to reflect the actual * port. * Parameters: pointer to variable containing the port to connect on * Returns: the socket */ // 创建一个监听套接字，等待客户的请求/**********************************************************************/int startup(u_short *port){    int httpd = 0;    struct sockaddr_in name;    httpd = socket(PF_INET, SOCK_STREAM, 0);    if (httpd == -1)        error_die("socket");    memset(&name, 0, sizeof(name));    name.sin_family = AF_INET;    name.sin_port = htons(*port);    name.sin_addr.s_addr = htonl(INADDR_ANY); // INADDR_ANY: 表示任意地址, see IP(7)    if (bind(httpd, (struct sockaddr *)&name, sizeof(name)) < 0)        error_die("bind");    /* if dynamically allocating a port */    if (*port == 0) {        socklen_t namelen = sizeof(name);        if (getsockname(httpd, (struct sockaddr *)&name, &namelen) == -1)            error_die("getsockname");        *port = ntohs(name.sin_port);    }    if (listen(httpd, 5) < 0) // queue size if 5        error_die("listen");    return httpd;}

accept_request函数

accept_request函数首先处理客户端的http请求报文，从中提取信息。accept_request函数首先从第一行——请求行中提取method。在上面的例子中，method数组的内容为字符串"GET"，url数组的初始内容为字符串"/test/demo_form.asp?name1=value1&name2=value2"，由于其中包含问号（?），是一个动态请求，url经过处理后成为"/test/demo_form.asp"，query_string指向问号后面的内容"name1=value1&name2=value2"。

字符串解析完毕之后，可以判断是否是动态请求，如果是则调用execute_cgi函数运行一个cgi脚本，如果是静态请求则调用serve_file函数直接将本地文件发送给客户端。

/**********************************************************************//* A request has caused a call to accept() on the server port to * return.  Process the request appropriately. * Parameters: the socket connected to the client */ // 处理请求/**********************************************************************/void accept_request(int client){    char buf[1024];    int numchars;    char method[255];    char url[255];    char path[512];    size_t i, j;    struct stat st;    int cgi = 0;      /* becomes true if server decides this is a CGI                       * program */    char *query_string = NULL;    // 获取请求报文的第一行，即为请求行    numchars = get_line(client, buf, sizeof(buf));    i = 0;    j = 0;    // 获取第一个单词 即为method，常见的method有GET，POST，HEAD等    while (!ISspace(buf[j]) && (i < sizeof(method) - 1)) {        method[i] = buf[j];        i++;        j++;    }    method[i] = '\0';    // 只能处理GET和POST method，否则通知客户端请求的操作非法    if (strcasecmp(method, "GET") && strcasecmp(method, "POST")) {        unimplemented(client);        return;    }    if (strcasecmp(method, "POST") == 0)        cgi = 1;    // 获取url    i = 0;    while (ISspace(buf[j]) && (j < sizeof(buf)))        j++;    while (!ISspace(buf[j]) && (i < sizeof(url) - 1) && (j < sizeof(buf))) {        url[i] = buf[j];        i++;        j++;    }    url[i] = '\0';    // GET请求，如果有问号，则将query_string指向'?'后的内容    // 没有问号，则query_string为NULL    if (strcasecmp(method, "GET") == 0) {        query_string = url;        while ((*query_string != '?') && (*query_string != '\0'))            query_string++;        // 有问号，则为动态请求        if (*query_string == '?') {            cgi = 1;            *query_string = '\0';            query_string++;        }    }    sprintf(path, "htdocs%s", url); // 内容存储在htdocs目录下    if (path[strlen(path) - 1] == '/') // '/'代表默认的home page        strcat(path, "index.html");    // 如果stat返回错误，回复客户端not_found    if (stat(path, &st) == -1) {        // 读取并丢弃 headers，第一行请求行之后便是请求头部        while ((numchars > 0) && strcmp("\n", buf))            numchars = get_line(client, buf, sizeof(buf));        not_found(client);    } else {        // 如果是目录，请求其默认的home page        if ((st.st_mode & S_IFMT) == S_IFDIR)            strcat(path, "/index.html");        // 如果owner group other任何一个有执行权限        if ((st.st_mode & S_IXUSR) ||            (st.st_mode & S_IXGRP) ||            (st.st_mode & S_IXOTH) )        {            cgi = 1;        }        if (!cgi)             serve_file(client, path); // 静态请求，直接从硬盘读取文件发送出去        else            execute_cgi(client, path, method, query_string); // 动态请求，执行cgi脚本    }    close(client);}

execute_cgi函数

这是源代码中非常重要的一个函数，处理动态请求。首先判断是GET还是POST，如果是GET则读取header并丢弃，如果是POST，则一直读取header直到遇到Content-Length，解析出长度并保存到content_length变量中。

随后向客户端发送响应报文的第一行——状态行。
接下来，父进程创建两个管道，之后fork一个子进程，子进程设置环境变量之后，调用execl函数运行cgi脚本处理数据，并通过管道和重定向将数据发送给父进程，具体的数据流向请看代码中的注释图。

/**********************************************************************//* Execute a CGI script.  Will need to set environment variables as * appropriate. * Parameters: client socket descriptor *             path to the CGI script */ // 运行一个 CGI 脚本.  需要设置环境变量/**********************************************************************/void execute_cgi(int client, const char *path,        const char *method, const char *query_string){    char buf[1024];    int cgi_output[2];    int cgi_input[2];    pid_t pid;    int status;    int i;    char c;    int numchars = 1;    int content_length = -1;    buf[0] = 'A';    buf[1] = '\0';    if (strcasecmp(method, "GET") == 0) {        // 读取并丢弃 headers        while ((numchars > 0) && strcmp("\n", buf))              numchars = get_line(client, buf, sizeof(buf));    } else { /* POST */        numchars = get_line(client, buf, sizeof(buf)); // 读取headers        // 一行行地读取，直到遇到content_length        while ((numchars > 0) && strcmp("\n", buf)) {            buf[15] = '\0';            if (strcasecmp(buf, "Content-Length:") == 0)                content_length = atoi(&(buf[16]));            numchars = get_line(client, buf, sizeof(buf));        }        if (content_length == -1) {            bad_request(client);            return;        }    }    sprintf(buf, "HTTP/1.0 200 OK\r\n");    send(client, buf, strlen(buf), 0);    // 关键的部分：父进程和子进程通过管道通信    // 子进程父进程管道流向图 P表示parent, C表示child    /*     send to brower                                            data        <<<----         ---------<--<--<-----------         -----<<<                                                                          \        |                         |        /                                                  \       |                         |       /                                                   (P)   fd 1 (C)                  fd 0 (C)  (P)                |       |                         |       |                                         ^       v                         ^       v                ^       v                         ^       v                                      |       |                         |       |                             cgi_output [0]     [1]             cgi_input [0]     [1]                |       |                         |       |                                                              |---<---|                         |---<---|                                                                pipe                              pipe    */    // 建立管道    if (pipe(cgi_output) < 0) {        cannot_execute(client);        return;    }    if (pipe(cgi_input) < 0) {        cannot_execute(client);        return;    }    if ( (pid = fork()) < 0 ) {        cannot_execute(client);        return;    }    // 子进程中执行cgi脚本，cgi脚本将相关内容打印到标准输出    if (pid == 0) { /* 子进程中: CGI script */        char meth_env[255];        char query_env[255];        char length_env[255];        dup2(cgi_output[1], 1); // 将cgi_output的写入端重定向到标准输出        dup2(cgi_input[0], 0);  // 将cgi_input的写入端重定向到标准输入        close(cgi_output[0]); // 在子进程中关闭cgi_output的读取端        close(cgi_input[1]);  // 在子进程中关闭cgi_input的写入端        sprintf(meth_env, "REQUEST_METHOD=%s", method);        putenv(meth_env);          if (strcasecmp(method, "GET") == 0) {            // 设定query_string环境变量            sprintf(query_env, "QUERY_STRING=%s", query_string);            putenv(query_env);        } else {   /* POST method */            // 设定query_string环境变量            sprintf(length_env, "CONTENT_LENGTH=%d", content_length);            putenv(length_env);        }        // 执行cgi脚本，cgi脚本输出到child的标准输出，通过管道，最终到达了父进程        // 父进程会将其发送给客户端，见下面116, 117行的send        execl(path, path, NULL);         exit(0);    } else {    /* in parent */ // 父进程将处理后的请求通过管道发送给子进程        close(cgi_output[1]); // 在父进程中关闭cgi_output的写入端        close(cgi_input[0]);  // 在父进程中关闭cgi_output的写入端        // 接收POST过来的数据        if (strcasecmp(method, "POST") == 0) {            for (i = 0; i < content_length; i++) {                // 从客户端读取一个个字符，最后通过管道重定向到child的标准输入                // 经过上述的管道，最终发送回给客户端                recv(client, &c, 1, 0);                write(cgi_input[1], &c, 1);            }        }        // 将从管道中读取的内容发送到client        while (read(cgi_output[0], &c, 1) > 0)            send(client, &c, 1, 0);        // 关闭两个管道的另外一端        close(cgi_output[0]);        close(cgi_input[1]);        waitpid(pid, &status, 0);    }}

serve_file函数

如果是静态请求，直接调用serve函数将本地文件发送过去：

/**********************************************************************//* Send a regular file to the client.  Use headers, and report * errors to client if they occur. * Parameters: a pointer to a file structure produced from the socket *              file descriptor *             the name of the file to serve */ // 将文件传送给client/**********************************************************************/void serve_file(int client, const char *filename){    FILE *resource = NULL;    int numchars = 1;    char buf[1024];    buf[0] = 'A';    buf[1] = '\0';    while ((numchars > 0) && strcmp("\n", buf))  /* read & discard headers */        numchars = get_line(client, buf, sizeof(buf));    resource = fopen(filename, "r");    if (resource == NULL)        not_found(client);    else {        // 将 header 和文件内容传送给 client        headers(client, filename);         cat(client, resource);    }    fclose(resource);}/**********************************************************************//* Put the entire contents of a file out on a socket.  This function * is named after the UNIX "cat" command, because it might have been * easier just to do something like pipe, fork, and exec("cat"). * Parameters: the client socket descriptor *             FILE pointer for the file to cat */ // 将文件的整个内容发送到一个socket上 （静态请求）/**********************************************************************/void cat(int client, FILE *resource){    char buf[1024];    fgets(buf, sizeof(buf), resource);    while (!feof(resource)) {        send(client, buf, strlen(buf), 0);        fgets(buf, sizeof(buf), resource);    }}

后续

接下来的博文里会介绍如何使用webbench测试TinyHTTPd的性能。

0 0