Hello from a libc-free world! (Part 1)

来源:互联网 发布:淘宝网上投诉电话 编辑:程序博客网 时间:2024/05/16 07:35

As an exercise, I want to write a Hello World program in C simple enough that I can disassemble it and be able to explain all of the assembly to myself.

This should be easy, right?

This adventure assumes compilation and execution on a Linux machine. Some familiarity with reading assembly is helpful.

Here's our basic Hello World program:

charles@taotao:~$ cat hello.c #include <stdio.h>int main(void){printf("Hello World\n");return 0;}

Let's compile it and get a bytecount:

charles@taotao:~$ arm-linux-gnueabi-gcc hello.c -g -o hellocharles@taotao:~$ wc -c hello9551 hello

Yikes! Where are 11 Kilobytes worth of executable coming from?

objdump -t hello gives us 84  symbol-table entries, most of which we can blame on our using the standard library.

So let's stop using it. We won't use printf so we can get rid of our include file:

charles@taotao:~/code$ cat hello.c int main(void){char  * str = "hello, world";return 0;}

Recompiling and checking the bytecount:

charles@taotao:~/code$ arm-linux-gnueabi-gcc -o hello hello.c charles@taotao:~/code$ wc -c hello8991 hello

What? That barely changed anything!

The problem is that gcc is still using standard library startup files when linking.

Want proof? We'll compile with -nostdlib, which according to thegcc man pagewon't "use the standard system

libraries and startup files when linking. Only the files you specify will be passed to the linker".

charles@taotao:~/code$ arm-linux-gnueabi-gcc -nostdlib -o hello hello.c /home/charles/toolchain/arm-cortex-a9-toolchain/bin/../lib/gcc/arm-linux-gnueabi/4.8.2/../../../../arm-linux-gnueabi/bin/ld:warning: cannot find entry symbol _start; defaulting to 00008074charles@taotao:~/code$ wc -c hello1016 hello
That looks pretty good! We got our bytecount down to a much more reasonable size (an order of magnitude smaller!)...

root@taotao:/mnt/code#./helloSegmentation fault (core dumped)

...at the expense of segfaulting when it runs. Hrmph.

For fun, let's get our program to be actually runnable before digging into the assembly.

So what is this _start entry symbol that appears to be required for our program to run? Where is it usually defined if you're using libc?

From the perspective of the linker, by default _start is the actual entry point to your program, notmain. It is normally defined in thecrt1.o ELF relocatable. We can verify this by linking against crt1.o and noting that_start is now found (although we develop other problems by not having defined other necessary libc startup symbols):

charles@taotao:~/code$ arm-linux-gnueabi-ld /home/charles/toolchain/arm-cortex-a9-toolchain/arm-linux-gnueabi/sysroot/usr/lib/crt1.o -o hello hello.o /home/charles/toolchain/arm-cortex-a9-toolchain/arm-linux-gnueabi/sysroot/usr/lib/crt1.o: In function `_start':/home/charles/code/cross-compile-arm/src/glibc-2.19/csu/../ports/sysdeps/arm/start.S:124: undefined reference to `__libc_start_main'/home/charles/code/cross-compile-arm/src/glibc-2.19/csu/../ports/sysdeps/arm/start.S:128: undefined reference to `abort'/home/charles/code/cross-compile-arm/src/glibc-2.19/csu/../ports/sysdeps/arm/start.S:113: undefined reference to `__libc_csu_fini'/home/charles/code/cross-compile-arm/src/glibc-2.19/csu/../ports/sysdeps/arm/start.S:120: undefined reference to `__libc_csu_init'

This check conveniently also tells us where _start lives in the libc source:sysdeps/x86_64/elf/start.S for this particular machine. This delightfully well-commented file exports the _start symbol, sets up the stack and some registers, and calls__libc_start_main. If we look at the very bottom ofcsu/libc-start.c we see the call to our program'smain:

result = main (argc, argv, __environ MAIN_AUXVEC_PARAM);

and down the rabbit hole we go.

So that's what _start is all about. Conveniently, we can summarize what happens between_start and the call tomain as "set up a bunch of stuff for libc and then callmain'', and since we don't care about libc, let's just export our own_start symbol that just callsmain and link against that:

charles@taotao:~/code$ cat stubstart.S .globl _start_start:    bl  main
Compiling and running with our stub _start assembly file:

charles@taotao:~/code$ arm-linux-gnueabi-gcc -nostdlib stubstart.S -o  hello hello.c
./hello

Hurrah, our compilation problems go away! However, the execution does not stop, seems to be running in a infinite loop...

let's check the assembly code;

charles@taotao:~/code$ arm-linux-gnueabi-objdump  -d hellohello:     file format elf32-littlearmDisassembly of section .text:00008074 <_start>:    8074:ebffffff bl8078 <main>00008078 <main>:    8078:e52db004 push{fp}; (str fp, [sp, #-4]!)    807c:e28db000 addfp, sp, #0    8080:e24dd00c subsp, sp, #12    8084:e30830a4 movwr3, #32932; 0x80a4    8088:e3403000 movtr3, #0    808c:e50b3008 strr3, [fp, #-8]    8090:e3a03000 movr3, #0    8094:e1a00003 movr0, r3    8098:e24bd000 subsp, fp, #0    809c:e49db004 pop{fp}; (ldr fp, [sp], #4)    80a0:e12fff1e bxlr

the function is executed from _start, where it calls main. after  that,  the register lr points to 8078, the first instruction of main.

the last instruction of main is bx lr. after this instruction, pc value is lr & 0xfffffffe. In this case, pc is 0x8078. which  points to the first instruction of main, so main starts to execute again...

We need an exit strategy.

Let's modify  stubstart.s  this way:

charles@taotao:~/code$ cat stubstart.S .globl _start_start:    bl  mainmov %r0, $0mov %r7, $1swi $0
charles@taotao:~/code$ arm-linux-gnueabi-gcc -nostdlib stubstart.S -g -o hello hello.c root@taotao:/mnt/code#./helloroot@taotao:/mnt/code#

Success! It compiles, it runs, and if we step through this new version under gdb it even exits normally.

Hello from a libc-free world!

Stay tuned for Part 2, where we'll walk through the parts of the executable in earnest and watch what happens to it as we add complexity, in the process understanding more about x86 linking and calling conventions and the structure of an ELF binary.

https://blogs.oracle.com/ksplice/entry/hello_from_a_libc_free


0 0