u-boot 连接脚本文件u-boot.lds的分析

来源：互联网发布：美国矩阵投资管理公司编辑：程序博客网时间：2024/05/16 12:08

u-boot.lds决定了u-boot可执行映像的连接方式，以及各个段的装载地址（装载域）和执行地址（运行域）。

GNU官方网站上对.lds文件形式的完整描述：
SECTIONS {
...
secname start BLOCK(align) (NOLOAD) : AT ( ldadr )
{ contents } >region :phdr =fill
...
}
secname和contents是必须的，前者用来命名这个段，后者用来确定代码中的什么部分放在这个段，以下是对这个描述中的一些关键字的解释。
secname：段名
contents：决定哪些内容放在本段，可以是整个目标文件，也可以是目标文件中的某段（代码段、数据段等）
start：是段的重定位地址，本段连接（运行）的地址，如果代码中有位置无关指令，程序运行时这个段必须放在这个地址上。start可以用任意一种描述地址的符号来描述。
AT（ldadr）：定义本段存储（加载）的地址，如果不使用这个选项，则加载地址等于运行地址，通过这个选项可以控制各段分别保存于输出文件中不同的位置。
例：
/* nand.lds */
SECTIONS {
firtst 0x00000000 : { head.o init.o }
second 0x30000000 : AT(4096) { main.o }
}
以上，head.o放在0x00000000地址开始处，init.o放在head.o后面，他们的运行地址也是0x00000000，即连接和存储地址相同（没有AT指定）；
main.o放在4096（0x1000，是AT指定的，存储地址）开始处，但它的运行地址在0x30000000，运行之前需要从0x1000（加载地址处）复制到0x30000000（运行地址处），此过程也就需要读取 flash，把程序拷贝到相应位置才能运行。这就是存储地址和运行地址的不同，称为加载时域和运行时域，可以在.lds连接脚本文件中分别指定。
装载地址－－－》运行之前各段的地址
运行地址－－－》运行时各段的地址

编写好的.lds文件，在用arm-linux-ld连接命令时带-Tfilename来调用执行，如
arm-linux-ld -Tnand.lds x.o y.o -o xy.o。也用-Ttext参数直接指定连接地址，如
arm-linux-ld -Ttext 0x30000000 x.o y.o -o xy.o。
既然程序有了两种地址，就涉及到一些跳转指令的区别。
ARM汇编中，常有两种跳转方法：b跳转指令、ldr指令向PC赋值。
要特别注意这两条指令的意思：
（1） b step：b跳转指令是相对跳转，依赖当前PC的值，偏移量是通过该指令本身的 bit[23:0]算出来的，这使得使用b指令的程序不依赖于要跳到的代码的位置，只看指令本身。
（2） ldr pc, =step ：该指令是一个伪指令编译后会生成以下代码：
ldr pc, 0x30008000
<0x30008000>
step是从内存中的某个位置（step）读出数据并赋给PC，同样依赖当前PC的值，但是偏移量是step的连接地址（运行时的地址），所以可以用它实现从Flash到RAM的程序跳转。
（3）此外，有必要回味一下adr伪指令，U-boot中那段relocate代码就是通过adr实现当前程序是在RAM中还是flash中：
relocate: /* 把U-Boot重新定位到RAM */
adr r0, _start /* r0是代码的当前位置 */
/* adr伪指令，汇编器自动通过当前PC的值算出这条指令中"_start"的值，执行到_start时PC的值放到r0中：
当此段在flash中执行时r0 = _start = 0；当此段在RAM中执行时_start = _TEXT_BASE(在board/smdk2410/config.mk中指定的值为0x33F80000，即u-boot在把代码拷贝到RAM中去执行的代码段的开始) */
ldr r1, _TEXT_BASE /* 测试判断是从Flash启动，还是RAM */
/* 此句执行的结果r1始终是0x33FF80000，因为此值是链接指定的 */
cmp r0, r1 /* 比较r0和r1，调试的时候不要执行重定位 */

下面是u-boot-1.3.4的u-boot.lds连接脚本，简单分析如下：


O UTPUT_FORMAT("elf32-littlearm" , "elf32-littlearm", "elf32-littlearm")
/* 指定输出可执行文件是elf格式,32位ARM指令,小端   */
/*OUTPUT_FORMAT("elf32-arm", "elf32-arm", "elf32-arm")*/
OUTPUT_ARCH(arm)   /* 指定输出文件的平台体系是 ARM */
ENTRY(_start)   /* 指定可执行映像文件的起始段的段名是_start */
SECTIONS
{
    /* 指定可执行image文件的全局入口点，通常这个地址都放在ROM(flash)0x0位置。必须使编译器知道这个地址，通常都是修改此处来完成 */
    . = 0x00000000;   /* 起始地址为0x00000000 */
    . = ALIGN(4);     /* 字对齐，即就是4字节对齐 */
    .text      :      /* 代码段 */
    {
      cpu/arm920t/start.o    (.text)   /* 代码段第一部分代码*/
      board/fs2410/lowlevel_init.o (.text)   /* 代码段第二部分，这段由自己添加，由于在编译连接时发现，lowlevel_init.o代码段总是被连接在4kB之后，导致start.s执行到该段代码时，总是无法找到这段代码（注明：从nandflash启动才会存在这个问题）。*/
      *(.text)   /*其余代码段*/
    }
    . = ALIGN(4);
    .rodata : { *(.rodata) }   /* 只读数据段，所有的只读数据段都放在这个位置*/
    . = ALIGN(4);
    .data : { *(.data) }      /* 可读写数据段，所有的可读写数据段都放在这里 */
    . = ALIGN(4);
    .got : { *(.got) }       /*指定got段, got段式是uboot自定义的一个段, 非标准段*/
    . = .;
    __u_boot_cmd_start = .;    /*把__u_boot_cmd_start赋值为当前位置, 即起始位置*/
    .u_boot_cmd : { *(.u_boot_cmd) } /* u_boot_cmd段，所有的u-boot命令相关的定义都放在这个位置，因为每个命令定义等长，所以只要以__u_boot_cmd_start为起始地址进行查找就可以很快查找到某一个命令的定义，并依据定义的命令指针调用相应的函数进行处理用户的任务*/
    __u_boot_cmd_end = .;     /* u_boot_cmd段结束位置，由此可以看出，这段空间的长度并没有严格限制，用户可以添加一些u-boot的命令，最终都会在连接是存放在这个位置。*/
    . = ALIGN(4);
    __bss_start = .;               /*把__bss_start赋值为当前位置,即bss段的开始位置*/
    .bss (NOLOAD) : { *(.bss) }    /*指定bss段，这里NOLOAD的意思是这段不需装载，仅在执行域中才会有这段*/
    _end = .;                      /*把_end赋值为当前位置,即bss段的结束位置*/
} 上面start.o中的代码装载地址和运行地址都为0x00000000,但是在start.o中会有一个u-boot自拷贝及重定位过程，start.o执行到最后时，整个u-boot已经被复制到了内存的TEXT_BASE(0x33f80000)位置，开始执行下面的跳转语句：
    ldr    pc, _start_armboot   /* 将标号_start_armboot的值传给pc，实际上是将start_armboot函数的首地址传给pc 但是此时的start_armboot应该是在内存中，因为start_armboot一定是在4kB之后，而nand flash 4kB之后的代码是无法直接访问的，必须先读入内存。而这时候u-boot的代码已经被拷贝并重定位到内存中，所以此处加在到pc的地址应当是内存中的地址，即33f800之后的某一地址 */
_start_armboot:    .word start_armboot

上述均为个人理解，如有不正确之处，还请指正，谢谢！
问题：编译连接时是依据什么样的顺序来连接各个目标文件的？
start.S当然在上述文件中已经确定在第一个位置，那之后的文件会以什么样的顺序来连接呢？是调用关系吗？

原文地址 http://blog.csdn.net/sustzombie/archive/2009/12/19/5039061.aspx

Linker Script Format

Linker scripts are text files.

You write a linker script as a series of commands.Each command is either a keyword,

possibly followed by arguments,or an assignment to a symbol.You may separate commands using semicolons.Whitespace is generally ignored.

Strings such as file or format names can normally be entered directly.If the file name

contains a character such as a comma which would otherwise serve to separate file names, you may put the file name in double quotes.There is no way to use a double quote character in a file name.

You may include comments in linker scripts just as in C,delimited by‘/*’and‘*/’.As in

C,comments are syntactically equivalent to whitespace.

一个可执行img（镜像）文件必须有一个入口点，并且只能有一个全局入口点，通常这个入口点的地址放在ROM（Flash）的0x0位置，因此我们必须使编译器知道这个入口地址，而该过程是通过修改连接脚本文件来完成的。

这里，我们可以尝试着分析一下u-boot-1.1.6的链接脚本u-boot.lds。可以选择u-boot-1.1.6/board/smdk2410/目录下的链接脚本u-boot.lds进行剖析。

OUTPUT_FORMAT("elf32-littlearm", "elf32-littlearm", "elf32-littlearm")

先看看GNU官方对OUTPUT_FORMAT的解释：

Commands Dealing with Object File Formats

A couple of linker script commands deal with object file formats.

l OUTPUT_FORMAT(bfdname)

l OUTPUT_FORMAT(default,big,little)

The OUTPUT_FORMAT command names the BFD format to use for the output

file.Using OUTPUT_FORMAT(bfdname)is exactly like using‘--oformat bfdname’on the command line.If both are used,the command line option takes precedence.

You can use OUTPUT_FORMAT with three arguments to use different formats based

on the‘-EB’and‘-EL’command line options. This permits the linker script to set the output format based on the desired endianness. If neither‘-EB’nor‘-EL’are used, then the output format will be the first argument,default. If‘-EB’is used,the output format will be the second argument, big.If‘-EL’is used, the output format will be the third argument,little.

For example, the default linker script for the MIPS ELF target uses this command:

OUTPUT_FORMAT(elf32-bigmips,elf32-bigmips,elf32-littlemips)

This says that the default format for the output file is‘elf32-bigmips’, but if the user uses the‘-EL’command line option, the output file will be created in the‘elf32-littlemips’format.

注：BFD是一种特殊的库。The linker accesses object and archive files using the BFD libraries.These libraries allow the linker to use the same routines to operate on object files whatever the object file format.

l OUTPUT_FORMAT(DEFAULT,BIG,LITTLE) : 这一行的目的是指定输出目标文件的输出文件格式，一共三种，缺省是第一种DEFAULT。

l 若有命令行选项-EB, 则使用第2个BFD格式; 若有命令行选项-EL，则使用第3个BFD格式.否则默认选第一个BFD格式.

l 三个分别指定在缺省、大端、小端情况下的输出可执行文件格式，u-boot-1.1.6在这里（缺省为第一种，即elf32-littlearm）指定可执行文件输出格式是elf32，小端和arm体系结构。

OUTPUT_ARCH(arm)

先看看GNU官方对OUTPUT_ARCH的解释：

Other Linker Script Commands

OUTPUT_ARCH(bfdarch)

Specify a particular output machine architecture.The argument is one of the names used by the BFD library.You can see the architecture of an object file by using the objdump program with the‘-f’ option.

注：可通过 man -S 1 ld查看ld的联机帮助, 里面也包括了对这些命令的介绍.

l 指定输出可执行文件的平台为ARM

l OUTPUT_ARCH(BFDARCH)：设置输出文件的machine architecture(体系结构)，BFDARCH为被BFD库使用的名字之一。可以用命令objdump -f查看。

ENTRY(_start)

先看看GNU官方对ENTRY的解释：

Setting the Entry Point

The first instruction to execute in a program is called the entry point.You can use the

ENTRY linker script command to set the entry point.The argument is a symbol name:

ENTRY(symbol)

There are several ways to set the entry point.The linker will set the entry point by trying

each of the following methods in order,and stopping when one of them succeeds:

l the‘-e’entry command-line option;

l the ENTRY(symbol)command in a linker script;

l the value of the symbol start,if defined;

l the address of the first byte of the‘.text’section,if present;

l The address 0.

中文解释：

ENTRY(SYMBOL) : 将符号SYMBOL的值设置成入口地址。

入口地址(entry point): 进程执行的第一条用户空间的指令在进程地址空间的地址。

ld有多种方法设置进程入口地址, 按一下顺序: (编号越前, 优先级越高)

1、ld命令行的-e选项

2、连接脚本的ENTRY(SYMBOL)命令

3、如果定义了start符号, 使用start符号值

4、如果存在.text section, 使用.text section的第一字节的位置值

5、使用值0

注：ENTRY(_start) 在这里的意思是——指定启动时的函数入口地址，_start在每个CPU目录下的start.S中定义，真正的启动运行地址段在编译时在/u-boot-1.1.6/board/smdk2410/config.mk中由TEXT_BASE宏定义，即TEXT_BASE = 0x33F80000

在开始看SECTIONS之前，我们先看看官方给SECTIONS的解释和一个例子：

Simple Linker Script Example

Many linker scripts are fairly simple.

The simplest possible linker script has just one command:‘SECTIONS’.You use the

‘SECTIONS’command to describe the memory layout of the output file.

The‘SECTIONS’command is a powerful command.Here we will describe a simple use of it.

Let’s assume your program consists only of code,initialized data,and uninitialized data.

These will be in the‘.text’（代码段）,‘.data’（数据段）,and‘.bss’（未初始化数据段）sections,respectively.Let’s assume further that these are the only sections which appear in your input files.

For this example,let’s say that the code should be loaded at address 0x10000,and that the

data should start at address 0x8000000.Here is a linker script which will do that:

SECTIONS

{

.=0x10000;

.text:{*(.text)}

.=0x8000000;

.data:{*(.data)}

.bss:{*(.bss)}

}

You write the‘SECTIONS’command as the keyword‘SECTIONS’,followed by a series of

symbol assignments and output section descriptions enclosed in curly braces.

The first line inside the‘SECTIONS’command of the above example sets the value of the

special symbol‘.’,which is the location counter.If you do not specify the address of an

output section in some other way(other ways are described later),the address is set from

the current value of the location counter.The location counter is then incremented by the

size of the output section.At the start of the‘SECTIONS’command,the location counter

has the value‘0’.

The second line defines an output section,‘.text’.The colon is required syntax which may

be ignored for now.Within the curly braces after the output section name,you list the

names of the input sections which should be placed into this output section.The‘*’is a

wildcard which matches any file name.The expression‘*(.text)’means all‘.text’input

sections in all input files.

Since the location counter is‘0x10000’when the output section‘.text’is defined,the linker

will set the address of the‘.text’section in the output file to be‘0x10000’.

The remaining lines define the‘.data’and‘.bss’sections in the output file.The linker

will place the‘.data’output section at address‘0x8000000’.After the linker places the

‘.data’output section,the value of the location counter will be‘0x8000000’plus the size of

the‘.data’output section.The e?ect is that the linker will place the‘.bss’output section

immediately after the‘.data’output section in memory

The linker will ensure that each output section has the required alignment,by increasing

the location counter if necessary.In this example,the specified addresses for the‘.text’

and‘.data’sections will probably satisfy any alignment constraints,but the linker may

have to create a small gap between the‘.data’and‘.bss’sections.

注：下面是对上面那个例子的中文解释。

这段脚本将输出文件的text section定位在0x10000, data section定位在0x8000000:

SECTIONS

{

. = 0x10000;

.text : { *(.text) }

. = 0x8000000;

.data : { *(.data) }

.bss : { *(.bss) }

}

解释一下上述的例子:

l . = 0x10000 : 把定位器符号置为0x10000 (若不指定, 则该符号的初始值为0).

l .text : { *(.text) } : 将所有(*符号代表任意输入文件)输入文件的.text section合并成一个.text section, 该section的地址由定位器符号的值指定, 即0x10000.

l . = 0x8000000 ：把定位器符号置为0x8000000

l .data : { *(.data) } : 将所有输入文件的.text section合并成一个.data section, 该section的地址被置为0x8000000.

l .bss : { *(.bss) } : 将所有输入文件的.bss section合并成一个.bss section，该section的地址被置为0x8000000+.data section的大小.

连接器每读完一个section描述后, 将定位器符号的值*增加*该section的大小（此处暂且不考虑对齐约束）。

下面开始分析SECTIONS：

SECTIONS

{

. = 0x00000000;

l 这里的点”.”，是定位器符号（GNU风格的一个典型）。

l 把定位器符号置为0x00000000 (若不指定, 则该符号的初始值为0)。

l 定系统启动从偏移地址零处开始。注意这只是个代码地址偏移值，真正的起始地址是由编译时指定的CFLAGS指定的。

. = ALIGN(4);

l 4字节对齐调整, 那么ALIGN(0x10) 即16字节对齐后

再看看官方给的解释：

ALIGN(exp)

Return the location counter(.)aligned to the next exp boundary.ALIGN

doesn’t change the value of the location counter—it just does arithmetic on it.

Here is an example which aligns the output.data section to the next 0x2000

byte boundary after the preceding section and sets a variable within the section

to the next 0x8000 boundary after the input sections:

SECTIONS{...

.data ALIGN(0x2000):{

*(.data)

variable=ALIGN(0x8000);

}

...}

The first use of ALIGN in this example specifies the location of a section be-

cause it is used as the optional address attribute of a section definition(see

Section 3.6.3[Output Section Address],page 37).The second use of ALIGN is

used to defines the value of a symbol.

The builtin function（内嵌函数） NEXT is closely related to ALIGN.

NEXT(exp)

Return the next unallocated address that is a multiple of exp.This function is

closely related to ALIGN(exp);unless you use the MEMORY command to define

discontinuous memory for the output file,the two functions are equivalent.

对字节对齐的进一步讲解，可以看看这篇博客：

http://www.yuanma.org/data/2006/0723/article_1213.htm

.text :

{

cpu/arm920t/start.o (.text) /*.text段空间 */

*(.text) /*后续.text段内容的分配*/

}

这段脚本的意思是将所有输入文件的.text section，以及cpu/arm920t/start.o合并成一个.text section，该section的地址由定位器符号的值指定（字节对齐后定位器符号的值）。

. = ALIGN(4);

.rodata : { *(.rodata) } /*.rodata只读数据段*/

这段脚本的意思是先进行4字节对齐，然后将所有输入文件的.rodata section，合并成一个.rodata section，该section的地址由定位器符号的值指定（字节对齐后定位器符号的值）。

. = ALIGN(4);

.data : { *(.data) } /* .data可读可写数据段 */

按照上面的解释，这段应该自己去理解！

. = ALIGN(4);

.got : { *(.got) } /*.got段是uboot自定义的一个段，不是GNU官方定义的标准段 */

. = .; //这里没有搞清楚为什么要这样做！

__u_boot_cmd_start = .;

/*把当前位置赋值给__u_boot_cmd_start，即定义了.u_boot_cmd段空间的开始位置 */

.u_boot_cmd : { *(.u_boot_cmd) }

__u_boot_cmd_end = .;

/*把当前位置赋值给__u_boot_cmd_end，即定义了.u_boot_cmd段空间的结束位置

armboot_end_data = .; ;armboot_end_data符号指向之前所有分配完段的结束

. = ALIGN(4);

__bss_start = .;

/* .bss段开始位置 */

.bss : { *(.bss) }

_end = .;

/* .bss段结束位置 */

}

最后附上官方对location counter的解释：

The Location Counter

The special linker variable dot‘.’always contains the current output location counter.Since

the.always refers to a location in an output section,it may only appear in an expression

within a SECTIONS command.The.symbol may appear anywhere that an ordinary symbol

is allowed in an expression.

Assigning a value to.will cause the location counter to be moved.This may be used to

create holes in the output section.The location counter may never be moved backwards.

SECTIONS

{

output:

{

file1(.text)

.=.+1000;

file2(.text)

.+=1000;

file3(.text)

}=0x12345678;

}

In the previous example,the‘.text’section from‘file1’is located at the beginning of the

output section‘output’.It is followed by a 1000 byte gap.Then the‘.text’section from

‘file2’appears,also with a 1000 byte gap following before the‘.text’section from‘file3’.

The notation‘=0x12345678’specifies what data to write in the gaps(see Section 3.6.8.5

[Output Section Fill],page 45).

Note:.actually refers to the byte o?set from the start of the current containing object.

Normally this is the SECTIONS statement,whose start address is 0,hence.can be used as

an absolute address.If.is used inside a section description however,it refers to the byte

o?set from the start of that section,not an absolute address.Thus in a script like this: Using LD,the GNU linker

SECTIONS

{

.=0x100

.text:{

*(.text)

.=0x200

}

.=0x500

.data:{

*(.data)

.+=0x600

}

The‘.text’section will be assigned a starting address of 0x100 and a size of exactly 0x200

bytes,even if there is not enough data in the‘.text’input sections to fill this area.(If

there is too much data,an error will be produced because this would be an attempt to move

.backwards).The‘.data’section will start at 0x500 and it will have an extra 0x600 bytes

worth of space after the end of the values from the‘.data’input sections and before the

end of the‘.data’output section itself.