ARM GCC浮点编译选项

来源：互联网发布：淘宝蜗牛移动充值卡编辑：程序博客网时间：2024/05/16 15:04

1 浮点类型-mfloat-abi

1.1 选项

-mfloat-abi=soft/softfp/hard

支持3种类型，各类型含义如下：

soft

不使用硬件浮点单元，gcc使用软浮点库来完成浮点运算。适用于不含FPU的CPU。

softfp

使用硬浮点单元，会生成硬浮点指令，生成何种类型的硬浮点指令由-mfpu选项指定。调用接口的规则和soft选项一致。

hard

使用硬浮点单元，生成硬浮点指令。与softfp的区别在于调用接口的规则不同。

1.2 示例

1.2.1 示例代码

float mul(float a, float b)

{

return a*b;

}

1.2.2 soft选项

编译及反汇编命令

arm-linux-gcc -Wall -march=armv7-a -mcpu=cortex-a9 -mfloat-abi=soft -c test.c

arm-linux-objdump -D test.o | less

生成的汇编指令

00000000 <mul>:

0: e92d4800 push {fp, lr}

4: e28db004 add fp, sp, #4

8: e24dd008 sub sp, sp, #8

c: e50b0008 str r0, [fp, #-8]

10: e50b100c str r1, [fp, #-12]

14: e51b0008 ldr r0, [fp, #-8]

18: e51b100c ldr r1, [fp, #-12]

1c: ebfffffe bl 0 <__aeabi_fmul>

20: e1a03000 mov r3, r0

24: e1a00003 mov r0, r3

28: e24bd004 sub sp, fp, #4

2c: e8bd8800 pop {fp, pc}

可以看出是调用__aeabi_fmul接口来进行浮点运算。

1.2.3 softfp选项

编译命令

arm-linux-gcc -Wall -march=armv7-a -mcpu=cortex-a9 -mfloat-abi=softfp -mfpu=vfpv3-d16 -c test.c

生成的汇编指令

00000000 <mul>:

0: e52db004 push {fp} ; (str fp, [sp, #-4]!)

4: e28db000 add fp, sp, #0

8: e24dd00c sub sp, sp, #12

c: e50b0008 str r0, [fp, #-8]

10: e50b100c str r1, [fp, #-12]

14: ed1b7a02 vldr s14, [fp, #-8]

18: ed5b7a03 vldr s15, [fp, #-12]

1c: ee677a27 vmul.f32 s15, s14, s15

20: ee173a90 vmov r3, s15

24: e1a00003 mov r0, r3

28: e28bd000 add sp, fp, #0

2c: e49db004 pop {fp} ; (ldr fp, [sp], #4)

30: e12fff1e bx lr

生成了vxxx的硬浮点指令。并且可以看出和soft一样，都是用r0,r1来传递形参。

1.2.4 hard

编译命令

arm-linux-gcc -Wall -march=armv7-a -mcpu=cortex-a9 -mfloat-abi=hard -mfpu=vfpv3-d16 -c test.c

生成的汇编指令

00000000 <mul>:

0: e52db004 push {fp} ; (str fp, [sp, #-4]!)

4: e28db000 add fp, sp, #0

8: e24dd00c sub sp, sp, #12

c: ed0b0a02 vstr s0, [fp, #-8]

10: ed4b0a03 vstr s1, [fp, #-12]

14: ed1b7a02 vldr s14, [fp, #-8]

18: ed5b7a03 vldr s15, [fp, #-12]

1c: ee677a27 vmul.f32 s15, s14, s15

20: eeb00a67 vmov.f32 s0, s15

24: e28bd000 add sp, fp, #0

28: e49db004 pop {fp} ; (ldr fp, [sp], #4)

2c: e12fff1e bx lr

同样生成了硬浮点指令，与softfp的区别在于，这里使用FPU的寄存器s0、s1来传递形参。

2 使用NEON

2.1 选项

-O3 -mfloat-abi=softfp -mfpu=neon -ftree-vectorize

neon可以做浮点运算，有了neon，可以不使用vfp。

为了提升生成的代码性能，应该使用neon intrinsics的方式来写代码。

2.2 示例

普通代码

void NeonTest(int * x, int * y, int * z)

{

int i;

for(i=0;i<200;i++) {

z[i] = x[i] + y[i];

}

neon intrinsics格式的代码

#include <arm_neon.h>

void intrinsics(uint32_t *x, uint32_t *y, uint32_t *z)

{

int i;

uint32x4_t x4,y4; // These 128 bit registers will contain 4 values from the x array and 4 values from the y array

uint32x4_t z4; // This 128 bit register will contain the 4 results from the add intrinsic

uint32_t *ptra = x; // pointer to the x array data

uint32_t *ptrb = y; // pointer to the y array data

uint32_t *ptrz = z; // pointer to the z array data

for(i=0; i < 200/4; i++)

{

x4 = vld1q_u32(ptra); // intrinsic to load x4 with 4 values from x

y4 = vld1q_u32(ptrb); // intrinsic to load y4

z4=vaddq_u32(x4,y4); // intrinsic to add z4=x4+y4

vst1q_u32(ptrz, z4); // store the 4 results to z

ptra+=4; // increment pointers

ptrb+=4;

ptrz+=4;

}

转自：http://blog.csdn.net/jijiagang/article/details/12952681

0 0