Essential GNU Linker Concepts for Embedded Systems Programmers

来源:互联网 发布:pureftpd mac 安装 编辑:程序博客网 时间:2024/04/30 23:19

非常好的文章,最大的收获是通过实例让我们明白了VMA(virtual memory address)和LMA(load memory address)的区别。

原文地址:http://jeffrey.co.in/wiki/essential_gnu_linker_concepts_for_embedded_systems_programmers

免费PDF下载地址:http://ishare.iask.sina.com.cn/f/35050371.html

                       或者:http://download.csdn.net/detail/astrotycoon/4886539

翻译的不好,还请见谅,欢迎批评指正。

 

 

"Section" Basics                                                                                        Application programmers usually dosn’t have to bother about low level stuff like say where in the virtual address space the data section of their program begins. In the embedded systems world - you have no such luxuries. Often, your code will be running on the bare metal and you will have to precisely lay out things at specific memory locations. The GNU linker provides you this flexibility through linker scripts.

应用开发人员一般不需要考虑软件的底层原理,诸如不需要关心数据段的虚拟地址首地址从哪开始等。但是在嵌入式开发领域,这些你都必须得了解,因为你的程序通常都是运行在裸机上,所以你必须精确地将你的程序放在指定的内存位置。GNU ld链接器的链接脚本可以帮助我们完成这一切。

Let‘s start with a simple assembly language program, a1.s:

先来看一个非常简单的汇编程序,a1.s:

.section abc, "a"i:        .byte 1        .byte 2        .byte 3 .section def, "a"k:        .byte 7        .byte 8        .byte 9

This program can be assembled by invoking as:

as来汇编改程序:

as a1.s -o a1.o

Our program is divided into two sections - a section is simply a logical chunk of data/code which the linker will combine with other sections in order to create a single output file. Let’s view the output produced by the assembler by invokingobjdump with the -D option: [Note: the “a” after the section name is aflag which says that the section isallocatable]

我们的程序只有两个段 —— 段就是目标文件中包含数据或者指令的区域。 链接器将多个段链接在一起生成最后的可执行文件。 让我们看看上面简单程序的生成的反汇编(objdump-D选项)[注意:在段名后的“a”说明这个段是可分配的,就是说程序运行时,该段会被加载进内存]

 

The very first field of each line is the memory address - this is followed by actual data and then an assembly language statement which is meaningful only if the data bytes were actually meant to be instructions - in this case, we can simply ignore these assembly language statements. Our section abc has three 1 byte constants in it, and it starts at address 0 - that is what the objdump output also tells us. What about sectiondef? It too starts at address 0, and has 3 bytes in it. Now, this is not really possible - when this program is actually loaded into memory, its impossible to have both these sections at the same address!

第一列是内存地址 —— 紧接着是实际的数据,最后是汇编语句,只有当数据是指令数据时汇编语句才有意义,所以这里我们完全可以忽略生成的汇编语句。在abc段中有三个1字节的常量,内存首地址是0。那def段如何呢?它里面也有三个1字节的常量,并且首地址也是0。那么,当这个程序加载进内存时,这两个段放在相同的位置? 答案肯定是否定的。

 

Let’s make the problem a bit more difficult by writing yet another assembly language program, a2.s:

我们再写另一个汇编程序a2.s,来增加上述问题的复杂度:

.section    abcj:        .byte 4        .byte 5        .byte 6 .section    defl:        .byte 10        .byte 11        .byte 12

And this is the listing produced by objdump:

反汇编如下:

 

a2.o:     file format elf32-i386 Disassembly of section abc: 00000000 <j>:   0:   04 05          add    $0x5,%al   2:   06               push   %esDisassembly of section def: 00000000 <l>:   0:   0a 0b         or     (%ebx),%cl   2:   0c              .byte 0xc

We see that again there are two sections abc anddef, both containing 3 bytes of data and both starting at address 0.

我们看到段abc和段def都包含31字节的数据,并且起始地址都为0

 

Combining these two object files into a single executable in such a way that the sections have non-overlapping addresses is the job of the linker - many aspects of this merging process can be precisely controlled by text files called “linker scripts”.

将这两个目标文件链接成一个可执行文件,当然,保证段之间不重叠是链接器的工作。链接过程中的很多细节都可以用链接脚本来控制。

 

Let’s say we want both sections abc to be merged into a single section in the output file and mapped to location 0×0; the two sectionsdef should also be merged and mapped to the location immediately below the merged abc’s. Here is a linker script which will do this job (let’s call ittest.lds):

比如说我们想把两个abc段合并成输出文件中的一个段,并且地址映射到0地址位置。同样的,两个def段合并一个段,并且地址紧跟着合并好的abc段之后。这里我们有个链接脚本(test.lds:

SECTIONS{        . = 0x0;        abc : {                *(abc)        }        def : {                *(def)        }}

Many linker scripts have just this command within them - SECTION, which tells the linker how to map input sections to output and where to place the output sections. The special symbol “.” (a single dot) stands for thelocation counter. The linker finds out sections labelled abc in ALL the input object files and merges them into a single section in the output file, also calledabc, which will be mapped to the current value of the location counter (which is 0×0). The location counter gets incremented by the size of the combined section - the linker then reads the next part in the SECTIONS command which instructs it to combine sections labelleddef in all the input object files into a single section also calleddef in the output file. Here is how you tellld (the linker) to use thislinker script during the linking process:

很多链接脚本语言都是以命令SECTION开始的,它告诉链接器如何将输入文件中的段放入输出文件内,并控制输出文件内各段在程序地址空间内的布局。“.”代表位置计算器。这个链接脚本在所有的输入文件中寻找标签为abc的段,将它们合并成一个段,并且这个段的名字依然是abcabc段最终会映射到内存地址0处。与此同时,位置计算器的增加量为abc段的大小。链接器接着读取下一个命令,这个命令指示链接器将所有输入文件中标签为def的段合并成输出文件中的一个段,段名依然叫def。 如下命令就是如何告诉链接器ld利用链接脚本控制链接过程:

ld a1.o a2.o -o a.out   -T   test.lds

And here is the output from objdump:

这是反汇编的结果:

Disassembly of section abc: 00000000 <i>:    0:       01 02          add    %eax,(%edx)    2:       03            .byte 0x3 00000003 <j>:    3:       04 05          add    $0x5,%al    5:       06               push   %esDisassembly of section def: 00000006 <k>:    6:       07               pop    %es    7:       08 09          or     %cl,(%ecx) 00000009 <l>:    9:       0a 0b           or     (%ebx),%cl    b:       0c            .byte 0xc

 

 

Generating raw binary files                                                                                                          

When writing code for embedded microcontrollers, we often convert thea.out produced by the linker into some simpler format (say Intel Hex, Motorola S-record or even plain binary). Theobjcopy command is used for performing this coversion. Let's try to applying it to oura.out:

当我们为嵌入式微控制器编写代码的时候,通常将链接器生成a.out文件转换成结构更简单的二进制文件。objcopy工具可以完成这个转换过程,让我把它用在刚生成的a.out文件上:

objcopy -j abc -j def -O binary a.out a.bin

The -j option instructs objcopy to copy that particular section into the output file (a.bin) - the-O binary option says that the output format is plainbinary.

-j选项告诉objcopy复制特定的段到输出文件中(a.bin),-O binary选项说明生成的文件为二进制文件。

Let's check out the contents of a.bin usingod:

让我们通过工具od查看a.bin的内容:

od -t x1 a.bin

Here is the output:

0000000 01 02 03 04 05 06 07 08 09 0a 0b 0c0000014

We have a simple memory dump of the two sections!

Just to check whether the order in which we specify the-j options has any impact, let's try:

-j选项指定段名的顺序会不会影响输出文件的内容呢?我们试一下:

objcopy -j def -j abc -O binary a.out a.bin

The content of a.bin, as displayed by od, looks like this:

0000000 01 02 03 04 05 06 07 08 09 0a 0b 0c0000014

This shows that objcopy goes by the ordering specified in the input file.

没有影响。

 

Understanding VMA and LMA                                                                                                                 

Let's modify the linker script a little bit:

SECTIONS{        . = 0x0;        abc : {                *(abc)        }        . = 100;        def : {                *(def)        }}

Now, section def will be at address 100 (hex 64). This is verified by looking at theobjdump output:

现在,段def现在的地址为1000x64),这可以通过objdump的输出核实:

Disassembly of section def: 00000064 <k>:  64:07                   pop    %es  65:08 09                or     %cl,(%ecx) 00000067 <l>:  67:0a 0b                or     (%ebx),%cl  69:0c                   .byte 0xc

Let's also check out the output from objcopy:

0000000 01 02 03 04 05 06 00 00 00 00 00 00 00 00 00 000000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00*0000140 00 00 00 00 07 08 09 0a 0b 0c0000152

After the first six bytes (section abc), the next section (def) starts at offset octal 144 (decimal 100).

abc段的6个字节之后,def段开始于144(十进制100)偏移处。

 

We now have a small problem. Let's say we are programming a flash memory based microcontroller. Say memory locations from 0 to 99 represent flash and locations from 100 and upwards represent locations in static RAM. We wish to store sectionabc in flash and section def in RAM in such a way that everytime the system powers up, it will find contents of sectiondef in RAM at location 100. The only way to do this is to store sectiondef in flash (maybe after section abc) and make sure that we have some code in sectionabc which will copy the data from flash to RAM (at location 100) every time the system restarts.

现在我们有个小问题 —— 比方说,我们为之编程的嵌入式产品有flash内存,再比方说0~99的内存位置代表flash的空间,大于地址100的都为RAM空间。我们希望把段abc存储在flash中,并且每次系统启动后,都能在地址100处的RAM空间中找到段def的内容。唯一的做法就是把段def存储在flash中(也许位于段abc之后),并且段abc中有代码可以确保每次系统重启后都能将段def的数据拷贝到RAM空间(地址100)。

Let's modify the linker script a little bit:

让我们再次稍微得修改链接脚本:

SECTIONS{        . = 0x0;        abc : {                *(abc)        }        . = 100;        def : AT (ADDR(abc) + SIZEOF(abc)) {                *(def)        }}

The output from objdump -D does not look different, but if we examine the binary filea.bin produced byobjcopy, we will find:

objdump -D输出的程序看不出有什么不同,但是如果我们查看objcopy生成的二进制文件内容的话,就会发现:

0000000 01 02 03 04 05 06 07 08 09 0a 0b 0c0000014

Section def has been placed immediately afterabc!

段def直接放在了段abc之后。

 

The idea is that every section has a VMA (virtual memory address) as well as an LMA (load memory address) - by default, both will be identical. By using the AT keyword, we can change a section's LMA - that is what we did in the above instance. We changed the LMA of section def to be the address of the byte immediately after the last byte of sectionabc.

对于每个段来说都有一个VMA(虚拟内存地址)和一个LMA(加载内存地址)。默认情况下,他们是相等的。不过,通过使用AT这个关键字,我们可以改变一个段的加载地址LMA —— 这真是上面的链接脚本所做的,我们将段def的加载地址LMA改成紧挨着段abc之后。

 

objdump has an option -h which shows information about sections like its VMA and LMA; here is what we get when we try it on oura.out:

objdump有个选项-h可以打印出段的VMALMA。下面是我们从a.out中得到的结果:

a.out:     file format elf32-i386Sections:Idx Name          Size      VMA       LMA       File off  Algn0 abc           00000006  00000000  00000000  00001000  2**0                CONTENTS, ALLOC, LOAD, READONLY, DATA1 def           00000006  00000064  00000006  00001064  2**0                CONTENTS, ALLOC, LOAD, READONLY, DATA

Section abc has identical LMA and VMA (both 0) while section 1 has VMA hex 64 and LMA 6.

可以看到,段abcVMALMA相等,而段defVMA0x64LMA0x6

 

Note that other than providing information as to where the section should be copied to initially, the LMA has no role to play in the working of the linker. All the addresses that we deal with in linker scripts are VMA's.

注意在其它场合下提供的关于段放在内存的哪里的信息,LMA我们不关心,我们在链接脚本中只关心VMA

Let's do one more experiment. Say we wish to store address of symbolk as value of the third byte starting fromi in sectionabc. Here is the modified assembly language filea1.s:

让我们接着做实验。比方说我们希望把符号k的地址存储在段abc中以符号i为起始地址的第3个字节处。下面是修改后的汇编代码a1.S:

.section abc, "a"i:        .byte 1        .byte 2        .byte k .section def, "a"k:        .byte 7        .byte 8        .byte 9

Here is what objdump -D shows, when applied to thea.out obtained by linking a1.o and a2.o:

链接a1.oa2.o,以下是objdump -D 输出:

Disassembly of section abc:00000000 <i>:   0:01 02                add    %eax,(%edx)   2:64                   fs00000003 <j>:   3:04 05                add    $0x5,%al   5:06                   push   %esDisassembly of section def:00000064 <k>:  64:07                   pop    %es  65:08 09                or     %cl,(%ecx)00000067 <l>:  67:0a 0b                or     (%ebx),%cl  69:0c                   .byte 0xc

Note that third byte in section abc has value 0x64 which is the VMA of symbolk in sectiondef.

注意到段abc3个字节处的内容是0x64,这个值是段def中符号k的虚拟地址。

 

Defining memory regions                                                                                                                 

In a typical flash memory based microcontroller, there are at least two distinctregions of memory - program text (machine code) is stored inflash and run-time variables are stored instatic RAM. Linker scripts have a simple notation to make it clear that different sections are to be mapped to distinct regions.

在一个典型的拥有flash内存的微控制器中,至少存在两种不同的内存区域 — 程序指令(机器码)存储在flash中,而运行时变量数据存储在静态RAM中。链接脚本有一个办法可以清楚的知道一个段映射到那个内存区域。

MEMORY{      flash (rx) : ORIGIN = 0, LENGTH = 100      ram (rwx) : ORIGIN = 100, LENGTH = 50}SECTIONS{      abc : {              *(abc)      } >flash      def : AT (ADDR(abc) + SIZEOF(abc)) {              *(def)      } >ram}

When the linker assigns addresses for section abc, it uses the regionflash (that is what the>flash at the end of the section definition does) and when it assigns addresses for sectiondef, it uses the regionram.

当链接器为段abc分配地址时,它会用flash内存区域的首地址(这就是为什么>flash放在段abc定义之后的原因),同样的,当链接器为段def分配地址时,它会使用ram内存区域的首地址。

 

Alignment restrictions                                                                                                                                    

Many processors have restrictions (enforced by the architecture) like: reading a 4 byte object from a memory location whose address is not a multiple of 4 will generate a fault. We can instruct the linker to place sections at addresses which are multiples of 2, 4, 8, 16 etc by using the ALIGN directive.

很多处理器都拥有限制条件(由处理器架构决定):比如如果在地址不是4字节对齐的地址上试图读取一个4字节的数据会产生一个错误。我们可以使用ALIGN指令来要求链接器为段分配地址时地址是地址对齐的,一般是2, 4, 8, 16 ......

SECTIONS{      abc : {              *(abc)      }      def : {              . = ALIGN(8);              *(def)      }}

The above linker script will result in the first symbol of section def starting at a memory location whose address is a multiple of 8. ALIGN returns the current location counter aligned upwards to meet the specific requirement. [Note: ALIGN only performs arithmetic on the current location counter - if the current location counter is to be changed, we have to do an explicit assignment as shown above].

上面的链接脚本将会使段def的第一个符号地址是8字节对齐的。ALIGN返回当前位置计算器适当的符号要求的地址值。[注意:ALIGN指令只算术运算作用于当前位置计数器 — 如果当前位置计数器被改变了,我们必须显示地赋值,就像上面所作的一样]

 

 

Defining symbols within linker scripts                                                                                                        

When writing low level startup code, it is sometime necessary for our C/assembly language code to know where a particular section starts/ends. We can define symbols within our linker script to hold these values; these symbols can later be accessed in C code by declaring them as extern variables and taking their addresses.

当我们写底层的启动代码时,有时候C或者汇编语言需要知道一个特殊段的起始/结束地址。我们可以在链接脚本中定义特殊的符号,这些符号可以在C语言中声明并且访问。

SECTIONS{      abc : {              _sabc = .;              *(abc)              _eabc = .;      }      def :  {              *(def)      }}

The symbols _sabc (start abc) and _eabc (end abc) hold the starting and ending addresses of sectionabc.

符号_sabc(段abc的开始的地方)和符号_eabc(段abc结束的地方)分别代表段abc的起始地址和结束地址。

 

Standard section names for C programs                                                                                                                          

The C compiler translates C code into assembly language.

C编译器将C代码翻译成汇编代码。

Different logical sections in our C code are mapped to specifically named sections in the resulting assembly language program. Consider the C program given below:

C语言中代表不同逻辑的段分配被映射到汇编语言中不同的段。考虑下面一段C代码:

int j, k;int m = 23;main(){        int i = 12;        char *s = "abcd";}

And the equivalent assembly language code (generated by running cc -S ):

下面是对应的汇编代码(通过cc -S 生成):

.file   "a.c".globl m        .data        .align 4        .type   m, @object        .size   m, 4m:        .long   23        .section        .rodata.LC0:        .string "abcd"        .text.globl main        .type   main, @functionmain:        leal    4(%esp), %ecx        andl    $-16, %esp        pushl   -4(%ecx)        pushl   %ebp        movl    %esp, %ebp        pushl   %ecx        subl    $20, %esp        movl    $12, -12(%ebp)        movl    $.LC0, -8(%ebp)        addl    $20, %esp        popl    %ecx        popl    %ebp        leal    -4(%ecx), %esp        ret        .size   main, .-main        .comm   j,4,4        .comm   k,4,4        .ident  "GCC: (GNU) 4.3.2 20081105 (Red Hat 4.3.2-7)"        .section        .note.GNU-stack,"",@progbits

The actual code of the program is stored in a section called .text, the initialized global variables are stored in the.data section, string constants are stored in the.rodata section.

程序的指令存储在名为.text的段中。初始化的全局变量存储在名为.data的段中,字符串常量存储在名为.rodata的段中。

The uninitialized globals are stored in what is called a “COMMON” area (check out the .comm directives in the assembly code shown above). In a linked executable file, the COMMON objects will be placed in thebss section.

未初始化的全局变量存储在名为COMMON的区域(见上面汇编代码中的.comm指令)。在一个已经链接过的可执行文件中,COMMON块中的变量被放在BSS段。

相关阅读推荐:

《elf转化成bin后,bin文件变大的问题》

《虚拟内存地址VMA、装载内存地址LMA和位置无关代码PIC》

《Linux 的 Virtual Memory Areas(VMA):Process 與 VMA 整體觀念》

《帮 C/C++ 程序员彻底了解链接器》

《Beginner's Guide to Linkers》

原创粉丝点击