最小C基础库

来源：互联网发布：实对称矩阵对角化方法编辑：程序博客网时间：2024/05/20 20:19

还是从我们最熟悉的程序说起，我们学编程时接触到的第一个程序就是helloworld，代码如下：

#include <stdio.h>int main(){        printf("hello world\n");        return 0;}

我们使用gcc静态编译这个程序 gcc -static -o helloworld hellworld.c就可以生成可执行文件helloworld，执行这个程序就会在屏幕上打印出一行字符：hello world。确实很简单。但是你有没有想过编译过程中gcc做了什么呢？我不是指从c代码到汇编代码到机器码的编译过程，因为我在讨论C基础库，我的意思是gcc会向这个程序中添加大量其他函数。我们可以通过readelf -s helloworld |grep FUNC查看helloworld中包含的函数，我就不贴输出的信息了，因为的确很恐怖，最后输出了1353行信息，也就是说helloworld这个可执行程序需要调用1353个函数，是不是很恐怖？我只想说：这TM都是什么函数？你们都跟hello world有关系吗？为了弄清楚helloworld执行的过程，我决定自己写一个最小的C基础库，让helloworld脱离glibc独立运行。

一个最简单的C基础库只需要包含两个函数就可以了：_start()和_exit()。_start()是ld设置的可执行程序的入口函数，_exit()的作用是结束一个进程。我们编写这两个函数:

# _start.S        .text        .align 4        .type _start, @function        .globl _start_start:        call main        call _exit

_start.S中实现了一个函数_start()，_start()依次调用了两个函数main()和_exit()，就这么简单。这里为什么用汇编实现呢？因为后面我们会扩充这个函数，扩充的内容需要用汇编实现。

# _exit.S        .text        .align 4        .type _exit, @function        .globl _exit_exit:        pushl   %ebx        mov     %eax, %ebx        movl    $1, %eax        int     $0x80        popl    %ebx        ret

_exit.S中实现了一个函数_exit()，_exit()直接发起系统调用结束了一个进程。

有了这两个函数，一个C程序就可以正常运行了。但是，由于我们还没有实现printf()，因此需要先注释掉helloworld.c中的printf()语句。

// helloworld.cint main(){//      printf("hello world\n");        return 0;}

为了避免引入glibc中的函数，我们需要用下面的方法编译这个程序

[root@localhost libc]# gcc -c -fno-builtin -o _start.o _start.S[root@localhost libc]# gcc -c -fno-builtin -o _exit.o _exit.S [root@localhost libc]# gcc -c -fno-builtin -o helloworld.o helloworld.c [root@localhost libc]# ld -static -s -o helloworld _exit.o _start.o helloworld.o

现在就可以运行这个程序了，直接在终端中输入./helloworld，这个程序绝对可以正常执行，当然终端中不会有任何输出信息。那么怎么验证程序真的执行了呢？可以通过在终端中执行echo $?，$?表示上一条语句（也就是./helloworld）的返回值，结果是0。你还可以修改main()函数的return语句，让main()返回2，重新编译运行，再次执行echo $?，这时输出的值就是2了，说明main()函数的确执行了。

为了让终端中打印出hello world，我们来实现printf()函数，由于标准的printf()函数太复杂了（变参、各种不同的格式化方式），为了简单起见我们实现一个简化版本的printf()，代码如下：

# _start.S# void printf(char *str, int size);        .text        .align 4        .type printf, @function        .globl printfprintf:    pushl   %ebx    pushl   %ecx    pushl   %edx    mov     $1, %ebx            # 向标准输出中写数据    mov     16(%esp), %ecx      # 这是printf()中第一个参数，需要打印的字符串.    mov     20(%esp), %edx      # 这是printf()中第二个参数，字符串的长度.    movl    $4, %eax            # 这是write(2)系统调用的编号    int     $0x80               # 发起系统调用    popl    %edx    popl    %ecx    popl    %ebx    ret

这个函数也不难，printf()直接利用write(2)系统调用将信息打印在屏幕中。helloworld.c代码如下：

// helloworld.cint main(){        printf("hello world\n", 12);        return 0;}

再次编译helloworld，

[root@localhost libc]# gcc -c -fno-builtin -o _start.o _start.S[root@localhost libc]# gcc -c -fno-builtin -o _exit.o _exit.S[root@localhost libc]# gcc -c -fno-builtin -o helloworld.o helloworld.c[root@localhost libc]# gcc -c -fno-builtin -o printf.o printf.S[root@localhost libc]# ld -static -s -o helloworld helloworld.o _start.o _exit.o printf.o

现在执行./helloworld，就可以在屏幕中打印出hello world了。为了在屏幕上打印出hello world，只需要实现_start()、_exit()、printf()三个函数就可以了，够简单吧。那么为什么利用glibc打印hello world时会关联那么多的函数呢？因为glibc在执行main()前做了很多初始化工作，main()之后还做了一些清理工作，另外我们实现的是一个简化版本的printf()，glibc中的_start()、exit()、printf()比我们这里的复杂多了。但是不管怎么说，我们毕竟用几行代码就在屏幕上打印出了hello world，这就可以看作是一个最小的C基础库。

我们可以在这个库的基础上进行扩充实现更多的功能。每次调用printf()前我们都要自己计算出要打印的字符串的长度，是不是很烦？我们可以实现strlen()，自动计算字符串长度。

int strlen(const char *str){        const char *s;        for (s = str; *s; ++s)                ;        return (s - str);       }

现在修改printf()，去掉printf()中第二个参数

# _start.S# void printf(char *str);        .text        .align 4        .type printf, @function        .globl printfprintf:    pushl   %ebp    movl    %esp,  %ebp    pushl   %ebx    pushl   %ecx    pushl   %edx    pushl   8(%ebp)    call    strlen    addl    $4,  %esp    mov     %eax, %edx          # 这是printf()中第二个参数，字符串的长度.    mov     $1, %ebx            # 向标准输出中写数据    mov     8(%ebp), %ecx       # 这是printf()中第一个参数，需要打印的字符串.    movl    $4, %eax            # 这是write(2)系统调用的编号    int     $0x80               # 发起系统调用    popl    %edx    popl    %ecx    popl    %ebx    popl    %ebp    ret

修改后的代码中，printf()首先调用strlen()计算字符串的长度，然后再发起write()系统调用，我们修改helloworld.c

// helloworld.cint main(){        printf("hello world\n");        return 0;}

现在打印hello world时就不需要指定字符串的长度了。

我们继续扩充这个C基础库，现在扩充什么呢？我们向扩充main()函数的参数。前面的例子中main()函数一直没有参数，但是我们知道main()函数有两个参数argc和argv[]，我们可以将main()的参数打印出来。

int main(int argc, char *argv[]){        int i;        for (i =  0; i < argc; i++)                printf(argv[i]);        return 0;}

很可惜，如果不使用glibc而是使用上面我们自己写的C基础库的话，这段程序无法执行，终端会出现“段错误（吐核）”的提示信息。为什么会出现这种情况呢？因为在_start()函数中我们没有处理好main()函数的参数就直接调用main()函数了，为了让这段程序正常运行，我们需要扩充_start()函数。首先我们看看可执行程序加载完毕后main()函数的参数在栈中是如何存放的

上图是可执行程序加载完毕后进程栈的示意图，进程栈中保存了下列信息：

argc：这是传递给main()函数的参数个数，也就是main()函数的第一个参数。

argv[]：这是传递给main()函数的参数，也就是main()函数的第二个参数。argv只是一个指针，参数保存在这个指针指向的位置。

envp[]：这其实是传递给main()函数的第三个参数，保存的是环境变量的信息。我们不考虑这个参数了。

根据可执行文件链接方式的不同，进程栈中还有其他一些信息。如果可执行程序是动态链接的，进程栈中还会保存动态链接器的一些信息。进程栈中的”返回地址“就是动态链接器的地址。可执行程序加载完毕后首先执行动态链接器的程序，动态链接器负责将动态库加载到进程中，然后跳转到_start()函数执行。如果可执行程序是静态链接的，进程栈中就不保存动态链接器的信息。进程栈中的“返回地址”就是_start()函数的地址。可执行程序加载完毕后直接跳转到_start()开始执行。为了简单起见，我们就不考虑动态链接了。另外需要说明的一点是：无论是动态链接还是静态链接，可执行程序加载完毕后寄存器esp中保存的都是argc在进程栈中的地址，通过寄存器esp我们就可以找到main()函数的参数。我们对前面的_start.S修改如下：

# _start.S        .text        .align 4        .type _start, @function        .globl _start_start:        mov     %esp, %eax        mov     $0f, %edx        pushl   %edx        pushl   %eax        call    __libc_init0:  jmp   main

// init.cvoid __libc_init(int *elfdata, int (*main)(int, char**)){    int     argc = *elfdata;    char**  argv = (char**)(elfdata + 1);    _exit(main(argc, argv));}

最后我们向这个基础C库中增加atexit()函数，应用程序可以调用atexit()注册一些函数，这些函数在main()函数之后运行，一个应用程序可以通过atexit()注册任意多个函数。由于我们没有实现malloc()，无法动态分配内存，因此我们规定函数数量的最大值（规定为10个函数），静态分配内存。代码如下：

// exit.cint index = 0;void (*func[10])(void) = {};int atexit(void (*function)(void)){        if (index >= 10)                return 1;        func[index] = function;        index++;        return 0;}void exit(int status){        int i;        for (i = index - 1; i >= 0; i--)                func[i]();        _exit(status);}

我们修改helloworld.c如下：

// helloworld.cvoid atexit_func1(void){        printf("I am in atexit_func1()\n");}void atexit_func2(void){        printf("I am in atexit_func2()\n");}int main(int argc, char *argv[]){        int i;        for (i =  0; i < argc; i++) {                printf(argv[i]);                printf("\n");        }        atexit(atexit_func1);        atexit(atexit_func2);        printf("I am in main()\n");        return 0;}

我们重新编译后运行，结果如下

[root@mail libc]# ./helloworld argv1 argv2
./helloworld
argv1
argv2
I am in main()
I am in atexit_func2()
I am in atexit_func1()

可见通过atexit()注册的函数的确在main()函数之后运行。

完整代码可以从这里下载。（我本来想将代码打包上传到博客中，但是不知道怎么上传文件，因此就创建了一个项目，我不会继续维护这个项目。）