Shared Library in X86/ARM

来源:互联网 发布:开淘宝店的心得体会 编辑:程序博客网 时间:2024/06/05 02:07

Reference Articles :  

http://eli.thegreenplace.net/2011/11/03/position-independent-code-pic-in-shared-libraries/

http://eli.thegreenplace.net/2011/11/03/position-independent-code-pic-in-shared-libraries/

http://eli.thegreenplace.net/2012/01/03/understanding-the-x64-code-models/

http://eli.thegreenplace.net/2011/11/11/position-independent-code-pic-in-shared-libraries-on-x64/


X86 and X64

Shared library with dynamic relocation (without -fpic) works fine for X86 and X64 and introduced no performance penalty in running time, while the linker needs to scan whole object to fill the relocated data and function with correct and address. Also,  the .text library can not be shared by other thread using the same library. ( the .data section is copyed for each process.)


PIC could be helpful to reuse the code in shared library. GOT table is introduced to record the actually virtual address of one data or function ( function introduce PLT also). 

 data = [GOT[data-index]]

 function = call PLT -> jmp GOT[function-index]

It actually longer the pathlength with extra indirect jump which also brings more memory reference and register allocate pressure, especially in X86 which only has 6 register available at most time.

Another bad news is, X86 can not direct refer the EIP register, which means you are totally not able to use get one data value by [base-add + offset ] format. One trick needs to be done before use PIC, which is to fetch the address of EIP.

    call TMPLABELTMPLABEL:    pop ebx

Also the ebx register is occupied to hold the address, unless you choose the regenerate the EIP value every time needs it.

Things becomes more interesting in X64: first, the RIP(EIP) value can be directly referenced, it makes easy to refer the PIC data and function call. But call function by PC relative is limited with 32-bit sized offset (+- 2GB) as near function call. Far call and near call larger than 2GB distance which needs to encode whole 64-bit address into instruction( but actual unable) is actually implement by first move imm64 value into one register  and call. Or directly call the imm64 address from memory. It is caused by the only instruction who has ability to encode whole 64-bit address : mov(movaps) instruction.

So, GCC actually include 3 types of flag: 

 -mcmodel=small : only support code in +-2GB relative reference, just use RIP reference

 -mcmodel=large : support > 2GB reference, just use imm64 type refernce

 -mcmodel=medium: those in 2GB distance use RIP reference, > 2GB use imm64 type.

As far as we know, the performance difference between X64-No-PIC and X64-PIC is few.

ARMv7 and ARMv8

Good news in ARMv7 and ARMv8: they both can directly use PC in relative form for data and function call, however, bad news: with such a limit distance.

instructiondatafunction callARMv7ldr <label> +/- 4MBB +/- 32MBARMv8ldr<label> +/- 1MBBL +/-128MB
So, when the distance is large than 1MB for data in ARMv8, reference GOT value by relative value need one bridge (IP r12 in ARMv7).

data = [[IP]]

another bad news is, handle imm32 in ARMv7 and imm64 in ARMv8 is painful.




0 0
原创粉丝点击