assembly___经典函数调用printf___操作
来源:互联网 发布:广告设计软件怎么安装 编辑:程序博客网 时间:2024/04/28 16:01
转自:http://blogs.arm.com/software-enablement/155-how-to-call-a-function-from-arm-assembler/
How to Call a Function from ARM Assembler
Once you move beyond short sequences of optimised ARM assembler, the next likely step will be to managing more complex, optimised routines using macros and functions. Macros are good for short repeated sequences, but often quickly increase the size of your code. As lower power and smaller code sizes are often closely tied, it is not long before you will need to make effective and efficient use of the processor by calling functions from your carefully hand-crafted code.
Leaving, only to Return
To start, here is a small example in ARM Assembler with one function calling another.
///编译命令:arm-none-linux-gnueabi-gcc asm_printf2.s -o asm_printf2 -static
The interesting instructions, at least when we are talking about the link register and the stack, are push,pop and bl. If you are familiar with other assembler languages, then I suspect push and pop are no mystery. They simply take the provided register list and push them onto the stack - or pop them off and into the provided registers. bl, as you may have guessed, is no more than branch with link, where the address of the next instruction after the branch is loaded into the link register lr. Once the routine we are calling has been executed, lr can be copied back to pc, which will enable the CPU to continue from the code after the bl instruction
In do_something we push the link register to the stack, so that we can pop it back off again to return, even though the call to abs will have overwritten the original contents of the link register. The program stores r4, because the ARM procedure call standard specifies that r4-r11 must be preserved between function calls and that the called function is responsible for that preservation. This means both that do_something needs to preserve the result of r0 + r1 in a register that will not be destroyed by abs, and that We must also preserve the contents of whichever register we use to hold that result. Of course in this particular case, we could have just used r3, but it is something that needs to be considered.
We push and pop the ip register, even though we do not have to preserve it, because the procedure call standard requires that the stack be 64-bit aligned. This gives a performance benefit when using the stack operations as they can take advantage of 64-bit data paths within the CPU.
We could just push the value, after all if abs needs the register, then that is how it will preserve it. There is a minor performance case for pushing r4 rather than the value we know we will need, but the strongest argument is probably that just pushing/popping any registers you need at the start and end of the function makes for less error prone and more readable code.
You will also notice that the 'main' function also pushes and pops the contents of lr. That is because while the main code may be the first thing in my code to be executed, it is not the first thing to be executed when my program is loaded. The compiler will be insert calls to some basic setup functions before main is called, and to some final clean up calls for when we exit.
The Special Case of Windows CE
Windows CE uses a technique known as Structured Exception Handling to unwind the stack when an exception occurs. This requires anyone writing assembler code to take notice of some additional restrictions when implementing for that OS. Coding examples are available on MSDN, and should be consulted, but the general idea is that there should be no changes to the value of sp other than as the very first and very last instructions in your function. If you perform a stack push or pop at any other point the virtual unwinder can cause your application some very non-virtual trouble.
Passing on
It is almost certainly worth your time becoming familiar with the details of the ARM Procedure Call Standard but apart from the list of registers that need to be preserved that was covered earlier it is probably worth quickly covering the passing in of parameters and the returning of results.
The first four 32-bit values are passed in the registers r0-r3. If one of the parameters is 64 bits long, then either r0 and r1 or r2 and r3 will be used - but not r1 and r2. The endianness used is officially defined to be "as if the value had been loaded from memory representation with a single LDM instruction". Rather than looking up what that means, I would suggest simply writing some code to test it. If there are more parameters than will fit in r0-r3, then the last of the values are written to the stack before the function is called.
Results are returned in r0, or r0 and r1 if it requires 64-bits. Check the link above for more detailed information, but that should cover most cases.
Need for Speed?
One important thing to remember when working with the link register is that the latest ARM processors provide Return Stack Prediction in addition to normal branch prediction. If the processor comes across an instruction like pop {...,pc} or bx lr it will try to 'branch predict' the return. This allows the processor to successfully predict return branches when common code is called from many points and normal branch prediction techniques could not be used. On processors with longer pipelines this can be a useful optimisation. To make use of it from your assembler code you need to follow some simple guidelines:
Do
Leaving, only to Return
To start, here is a small example in ARM Assembler with one function calling another.
CODE
.globl main
.extern abs
.extern printf
.text
output_str:
.ascii "The answer is %d\n\0"
@ returns abs(z)+x+y
@ r0 = x, r1 = y, r2 = z
.align 4
do_something:
push {r4, lr}
add r4, r0, r1
mov r0, r2
bl abs
add r0, r4, r0
pop {r4, pc}
main:
push {ip, lr}
mov r0, #1
mov r1, #3
mov r2, #-4
bl do_something
mov r1, r0
ldr r0, =output_str
bl printf
mov r0, #0
pop {ip, pc}
.extern abs
.extern printf
.text
output_str:
.ascii "The answer is %d\n\0"
@ returns abs(z)+x+y
@ r0 = x, r1 = y, r2 = z
.align 4
do_something:
push {r4, lr}
add r4, r0, r1
mov r0, r2
bl abs
add r0, r4, r0
pop {r4, pc}
main:
push {ip, lr}
mov r0, #1
mov r1, #3
mov r2, #-4
bl do_something
mov r1, r0
ldr r0, =output_str
bl printf
mov r0, #0
pop {ip, pc}
///编译命令:arm-none-linux-gnueabi-gcc asm_printf2.s -o asm_printf2 -static
The interesting instructions, at least when we are talking about the link register and the stack, are push,pop and bl. If you are familiar with other assembler languages, then I suspect push and pop are no mystery. They simply take the provided register list and push them onto the stack - or pop them off and into the provided registers. bl, as you may have guessed, is no more than branch with link, where the address of the next instruction after the branch is loaded into the link register lr. Once the routine we are calling has been executed, lr can be copied back to pc, which will enable the CPU to continue from the code after the bl instruction
In do_something we push the link register to the stack, so that we can pop it back off again to return, even though the call to abs will have overwritten the original contents of the link register. The program stores r4, because the ARM procedure call standard specifies that r4-r11 must be preserved between function calls and that the called function is responsible for that preservation. This means both that do_something needs to preserve the result of r0 + r1 in a register that will not be destroyed by abs, and that We must also preserve the contents of whichever register we use to hold that result. Of course in this particular case, we could have just used r3, but it is something that needs to be considered.
We push and pop the ip register, even though we do not have to preserve it, because the procedure call standard requires that the stack be 64-bit aligned. This gives a performance benefit when using the stack operations as they can take advantage of 64-bit data paths within the CPU.
We could just push the value, after all if abs needs the register, then that is how it will preserve it. There is a minor performance case for pushing r4 rather than the value we know we will need, but the strongest argument is probably that just pushing/popping any registers you need at the start and end of the function makes for less error prone and more readable code.
You will also notice that the 'main' function also pushes and pops the contents of lr. That is because while the main code may be the first thing in my code to be executed, it is not the first thing to be executed when my program is loaded. The compiler will be insert calls to some basic setup functions before main is called, and to some final clean up calls for when we exit.
The Special Case of Windows CE
Windows CE uses a technique known as Structured Exception Handling to unwind the stack when an exception occurs. This requires anyone writing assembler code to take notice of some additional restrictions when implementing for that OS. Coding examples are available on MSDN, and should be consulted, but the general idea is that there should be no changes to the value of sp other than as the very first and very last instructions in your function. If you perform a stack push or pop at any other point the virtual unwinder can cause your application some very non-virtual trouble.
Passing on
It is almost certainly worth your time becoming familiar with the details of the ARM Procedure Call Standard but apart from the list of registers that need to be preserved that was covered earlier it is probably worth quickly covering the passing in of parameters and the returning of results.
The first four 32-bit values are passed in the registers r0-r3. If one of the parameters is 64 bits long, then either r0 and r1 or r2 and r3 will be used - but not r1 and r2. The endianness used is officially defined to be "as if the value had been loaded from memory representation with a single LDM instruction". Rather than looking up what that means, I would suggest simply writing some code to test it. If there are more parameters than will fit in r0-r3, then the last of the values are written to the stack before the function is called.
Results are returned in r0, or r0 and r1 if it requires 64-bits. Check the link above for more detailed information, but that should cover most cases.
Need for Speed?
One important thing to remember when working with the link register is that the latest ARM processors provide Return Stack Prediction in addition to normal branch prediction. If the processor comes across an instruction like pop {...,pc} or bx lr it will try to 'branch predict' the return. This allows the processor to successfully predict return branches when common code is called from many points and normal branch prediction techniques could not be used. On processors with longer pipelines this can be a useful optimisation. To make use of it from your assembler code you need to follow some simple guidelines:
Do
- Use instructions like pop {pc} when you are returning normally
- Use b instead of bl or blx if you do not expect to return to execute the next instruction
- Use blx when calling code indirectly (using a value in a register) rather than loading directly to pc
- assembly___经典函数调用printf___操作
- 经典FSO操作函数
- 【CSAPP读书笔记】过程及其相关操作的分析(经典:包括函数调用过程详解)
- 文件操作函数调用
- select操作的函数调用
- 函数调用的内存操作
- 重载函数调用操作符
- 重载函数调用操作符
- 经典的调用窗口函数:调用融合窗口
- Javascript 经典自定义函数[硬件操作、客户端]
- C++ 函数调用操作符 () 、 函数对象
- 读书笔记--调用操作符和函数对象
- Windows文件操作的直接函数调用
- 调用操作符和函数对象
- 调用操作符和函数对象
- 关于函数对象和调用操作符
- 调用fork()函数执行的操作
- android 调用系统函数进行拨号操作
- iReport与JasperReport中文支持全面解决
- 成功IT外包作推力 详解与业务部门共进退四部曲
- linux 常见缩写
- hdu 1196
- 简单计算器代码
- assembly___经典函数调用printf___操作
- 新青年
- hdu 1197
- hdu 1201
- 离散制造与流程制造的区别
- Failed to fectch URl https://dl-ssl.google.com/android/repository/addons_list.xml, reason: Connectio
- 编译制作(arm格式的)jffs2文件系统工具(mtd-snapshot-20050519.tar.bz2)
- CSTO线下沙龙:教育领域的应用数字化
- JasperReport 学习必备手册