Chapter 7-02

来源：互联网发布：拉沙德刘易斯生涯数据编辑：程序博客网时间：2024/06/05 13:33

Please indicate the source: http://blog.csdn.net/gaoxiangnumber1
7.6.2 Linking with Static Libraries
All compilation systems provide a mechanism for packaging related object modules into a single file called a static library, which can then be supplied as input to the linker. When it builds the output executable, the linker copies only the object modules in the library that are referenced by the application program.
Related functions can be compiled into separate object modules and then packaged in a single static library file. Application programs can then use any of the functions defined in the library by specifying a single file name on the command line.
E.g.: A program that uses functions from the standard C library and the math library could be compiled and linked with a command of the form
unix> gcc main.c /usr/lib/libm.a /usr/lib/libc.a
At link time, the linker will only copy the object modules that are referenced by the program, which reduces the size of the executable on disk and in memory. On the other hand, the application programmer only needs to include the names of a few library files.
On Unix systems, static libraries are stored on disk in a file format known as an archive. An archive is a collection of concatenated relocatable object files, with a header that describes the size and location of each member object file. Archive filenames are denoted with the .a suffix.
Suppose that we want to provide the vector routines in Figure 7.5 in a static library called libvector.a.

To create the library, we would use the ar tool as follows:
unix> gcc -c addvec.c multvec.c
unix> ar rcs libvector.a addvec.o multvec.o
To use the library, we write main2.c in Figure 7.6, which invokes the addvec library routine. (The file vector.h defines the function prototypes for the routines in libvector.a.)

To build the executable, we would compile and link the input files main.o and libvector.a:
unix> gcc -c main2.c
unix> gcc -static -o p2 main2.o ./libvector.a

The -static argument tells the compiler driver that the linker should build a fully linked executable object file that can be loaded into memory and run without any further linking at load time.
When the linker runs, it determines that the addvec symbol defined by addvec.o is referenced by main.o, so it copies addvec.o into the executable. Since the program doesn’t reference any symbols defined by multvec.o, the linker does not copy this module into the executable. The linker also copies the printf.o module from libc.a, along with a number of other modules from the C run-time system.
7.6.3 How Linkers Use Static Libraries to Resolve References
During the symbol resolution phase, the linker scans the relocatable object files and archives left to right in the same sequential order that they appear on the compiler driver’s command line. The driver automatically translates any .c files on the command line into .o files.
During this scan, the linker maintains:
set E of relocatable object files that will be merged to form the executable;
set U of unresolved symbols (i.e., symbols referred to, but not yet defined);
set D of symbols that have been defined in previous input files.
Initially, E, U , and D are empty.
1.For each input file f on the command line, the linker determines if f is an object file or an archive.
2.If f is an object file, the linker adds f to E, updates U and D to reflect the symbol definitions and references in f , and proceeds to the next input file.
3.If f is an archive, the linker attempts to match the unresolved symbols in U against the symbols defined by the members of the archive. If some archive member, m, defines a symbol that resolves a reference in U , then m is added to E, and the linker updates U and D to reflect the symbol definitions and references in m. This process iterates over the member object files in the archive until that U and D no longer change. At this point, any member object files not contained in E are simply discarded and the linker proceeds to the next input file.
4.If U is nonempty when the linker finishes scanning the input files on the command line, it prints an error and terminates. Otherwise, it merges and relocates the object files in E to build the output executable file.
This algorithm can result in some link-time errors because the ordering of libraries and object files on the command line is significant. If the library that defines a symbol appears on the command line before the object file that references that symbol, then the reference will not be resolved and linking will fail.
E.g.:
unix> gcc -static ./libvector.a main2.c
/tmp/cc9XH6Rp.o: In function ‘main’:
/tmp/cc9XH6Rp.o(.text+0x18): undefined reference to ‘addvec’
Reason: When libvector.a is processed, U is empty, so no member object files from libvector.a are added to E. Thus, the reference to addvec is never resolved and the linker emits an error message and terminates.
The general rule for libraries is to place them at the end of the command line. If the members of the different libraries are independent, then the libraries can be placed at the end of the command line in any order.
If the libraries are dependent, then they must be ordered so that for each symbol s that is referenced by a member of an archive, at least one definition of s follows a reference to s on the command line.
Suppose foo.c calls functions in libx.a and libz.a that call functions in liby.a. Then libx.a and libz.a must precede liby.a on the command line:
unix> gcc foo.c libx.a libz.a liby.a
Libraries can be repeated on the command line if necessary to satisfy the dependence requirements. Suppose foo.c calls a function in libx.a that calls a function in liby.a that calls a function in libx.a. Then libx.a must be repeated on the command line:
unix> gcc foo.c libx.a liby.a libx.a

7.7 Relocation
Once the linker has completed the symbol resolution step, it has associated each symbol reference in the code with exactly one symbol definition (i.e., a symbol table entry in one of its input object modules). At this point, the linker knows the exact sizes of the code and data sections in its input object modules. It is now ready to begin the relocation step, where it merges the input modules and assigns run-time addresses to each symbol.
Relocation consists of two steps:
1.Relocating sections and symbol definitions.
1)The linker merges all sections of the same type into a new aggregate section of the same type. For example, the .data sections from the input modules are all merged into one section that will become the .data section for the output executable object file.
2)The linker then assigns run-time memory addresses to the new aggregate sections, to each section defined by the input modules, and to each symbol defined by the input modules. When this step is complete, every instruction and global variable in the program has a unique run-time memory address.
2.Relocating symbol references within sections.
The linker modifies every symbol reference in the bodies of the code and data sections so that they point to the correct run-time addresses. To perform this step, the linker relies on data structures in the relocatable object modules known as relocation entries.
7.7.1 Relocation Entries

When an assembler generates an object module, it does not know where the code and data will ultimately be stored in memory. Nor does it know the locations of any externally defined functions or global variables that are referenced by the module. So whenever the assembler encounters a reference to an object whose ultimate location is unknown, it generates a relocation entry that tells the linker how to modify the reference when it merges the object file into an executable.
Relocation entries for code are placed in .rel.text. Relocation entries for initialized data are placed in .rel.data.

The offset is the section offset of the reference that will need to be modified.
The symbol identifies the symbol that the modified reference should point to.
The type tells the linker how to modify the new reference.
ELF defines 11 different relocation types. We are concerned with the two most basic relocation types:
1.R_386_PC32: Relocate a reference that uses a 32-bit PC-relative address.
A PC-relative address is an offset from the current run-time value of the program counter (PC). When the CPU executes an instruction using PC-relative addressing, it forms the effective address by adding the 32-bit value encoded in the instruction to the current run-time value of the PC, which is the address of the next instruction in memory.
2.R_386_32: Relocate a reference that uses a 32-bit absolute address.
With absolute addressing, the CPU directly uses the 32-bit value encoded in the instruction as the effective address, without modifications.
7.7.2 Relocating Symbol References(Can’t understand now).
Please indicate the source: http://blog.csdn.net/gaoxiangnumber1.

0 0