PXA310 平台上的浮点数支持方案研究

来源:互联网 发布:男士皮鞋推荐知乎 编辑:程序博客网 时间:2024/05/23 02:05

理论研究

这两天发现 PXA310 的浮点运算不如 OMAP2420, 研究发现 OMAP2420 支持硬件级 VFP, pxa310 不支持硬件级浮点数计算。

按照以前的做法,使用内核的 nwfpe (或者 fastfpe)进行浮点运算模拟:系统运行时发现不支持的指令,于是进入中断陷入序列,然后跳到 nwfpe的软件模拟函数中执行浮点运算,然后返回。

新的 gcc EABI版本)则支持直接嵌入浮点模拟运算,从而节省了状态切换的时间。

Gcc –mfloat-abi=soft 表示使用gcc内嵌软件模拟。 Softfp 以及 hard 则表示生成硬件vfp 指令。其中softfp 可以和使用soft编译的二进制进行连接,而hard则要求所有代码使用。

从而: 如果系统硬件支持 VFP, 则使用-mfloat-abi=softfp, 如果硬件不支持 VFP, 则使用-mfloat-abi=soft

另外: 最新gcc 针对 PXA CPU会产生更加优化的浮点运算指令,需要使用-march=iwmmxt编译选项。

附注: 发现最新的内核里已经不存在 /arm/arm/fastfpe目录了, nwfpe对于 EABI应该也是过时了:该算法针对 FPA, EABI 支持的 VFP 应该不能正确支持。

 

参考:http://wiki.debian.org/ArmEabiPort

 

测试浮点运算速度

测试程序

#include <stdlib.h>

#include <sys/time.h>

#include <time.h>

 

#define MAX_DIVIDEND 1000000.231

#define MIN_DIVIDEND 0.29

#define STEP_DIVIDEND 0.33

#define DIVISOR 23.0

#define BUFFER_SIZE 200

 

static void timestamp(const char* buffer) {

  static int startSecond = 0;

  static int startMs = 0;

  struct timeval tv;

  int deltaSecond, deltaMs;

 

  gettimeofday(&tv, NULL);

  /* Running for the first time? */

  if (startSecond == 0) {

    /* Copy to prev so that we get 0 delta. */

    startSecond = tv.tv_sec;

    startMs = tv.tv_usec;

  }

 

  /* Calculate the delta (in microseconds). */

  deltaSecond = tv.tv_sec - startSecond;

  deltaMs = tv.tv_usec - startMs;

 

  /* Create the string giving offset from start in seconds. */

  snprintf(buffer, BUFFER_SIZE, "%u.%u",deltaSecond,deltaMs);

}

 

int main(int argc, char * argv[])

{

    double divident, result;

    char buffer[BUFFER_SIZE];

   

    timestamp(buffer);

    printf("Start time is: %s/n",buffer);

   

    for(divident=MIN_DIVIDEND; divident<MAX_DIVIDEND; divident+=STEP_DIVIDEND)

        result = divident/DIVISOR;

    timestamp(buffer);

    printf("DIV End time is: %s/n",buffer);

   

    for(divident=MIN_DIVIDEND; divident<MAX_DIVIDEND; divident+=STEP_DIVIDEND)

       result = divident*DIVISOR;

    timestamp(buffer);

    printf("MUL End time is: %s/n",buffer);

   

    for(divident=MIN_DIVIDEND; divident<MAX_DIVIDEND; divident+=STEP_DIVIDEND)

      result = divident+DIVISOR;

    timestamp(buffer);

    printf("ADD End time is: %s/n",buffer);

   

    for(divident=MIN_DIVIDEND; divident<MAX_DIVIDEND; divident+=STEP_DIVIDEND)

        result = divident-DIVISOR;

    timestamp(buffer);

    printf("SUB End time is: %s/n",buffer);

   

    return 0;

}

编译器

编译器1: 以下为maemo gcc 信息:

[sbox-CHINOOK_ARMEL: ~] > gcc --version

sbox-arm-linux-gcc (GCC) 3.4.4 (release) (CodeSourcery ARM 2005q3-2)

Copyright (C) 2004 Free Software Foundation, Inc.

This is free software; see the source for copying conditions.  There is NO

warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

 

[sbox-CHINOOK_ARMEL: ~] > gcc -v

Reading specs from /scratchbox/compilers/cs2005q3.2-glibc2.5-arm/bin/../lib/gcc/arm-none-linux-gnueabi/3.4.4/specs

Reading specs from /scratchbox/compilers/cs2005q3.2-glibc2.5-arm/gcc.specs

rename spec cpp to old_cpp

Configured with: /home/kl/cs2005q3-2_toolchain/gcc/glibc/work/gcc-2005q3-2/configure --build=i386-linux --host=i386-linux --target=arm-none-linux-gnueabi --prefix=/scratchbox/compilers/cs2005q3.2-glibc-arm --with-headers=/scratchbox/compilers/cs2005q3.2-glibc-arm/usr/include --enable-languages=c,c++ --enable-shared --enable-threads --disable-checking --enable-symvers=gnu --program-prefix=arm-linux- --with-gnu-ld --enable-__cxa_atexit --disable-libssp --disable-libstdcxx-pch --with-cpu= --enable-interwork

Thread model: posix

gcc version 3.4.4 (release) (CodeSourcery ARM 2005q3-2)

 

编译器2: 以下为marvell gcc 信息:

tmp>arm-iwmmxt-linux-gnueabi-gcc --version

arm-iwmmxt-linux-gnueabi-gcc (GCC) 4.1.1

Copyright (C) 2006 Free Software Foundation, Inc.

This is free software; see the source for copying conditions.  There is NO

warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

 

tmp>arm-iwmmxt-linux-gnueabi-gcc -v

Using built-in specs.

Target: arm-iwmmxt-linux-gnueabi

Configured with: /home1/bridge/toolchain/crosstool/toolchain-2007-03-19/build/arm-iwmmxt-linux-gnueabi/gcc-4.1.1-glibc-2.5/gcc-4.1.1/configure --target=arm-iwmmxt-linux-gnueabi --host=i686-host_pc-linux-gnu --prefix=/usr/local/bridge/arm-iwmmxt-linux-gnueabi --with-cpu=iwmmxt --with-float=soft --enable-cxx-flags=-msoft-float --with-headers=/usr/local/bridge/arm-iwmmxt-linux-gnueabi/arm-iwmmxt-linux-gnueabi/include --with-local-prefix=/usr/local/bridge/arm-iwmmxt-linux-gnueabi/arm-iwmmxt-linux-gnueabi --disable-nls --enable-threads=posix --enable-symvers=gnu --enable-__cxa_atexit --enable-languages=c,c++ --enable-shared --enable-c99 --enable-long-long

Thread model: posix

gcc version 4.1.1

 

测试方法

使用不同的编译器配合不同的编译选项对测试程序进行编译,并分别在 OMAP2420上以及 PXA310上运行, 前三个使用编译器1,最后一个使用编译器2,注意前面三个在scratchbox 中编译,所以没有交叉编译前缀。

gcc -mfloat-abi=soft float.c -o float1

gcc -mfloat-abi=softfp float.c -o float2

gcc -march=iwmmxt float.c -o float3

arm-iwmmxt-linux-gnueabi-gcc float.c -o float4

 

测试结果

OMAP2420+float1

OMAP2420:/tmp# ./float1

Start time is: 0.0

DIV End time is: 8.4294827617

MUL End time is: 10.303344

ADD End time is: 13.4294875774

SUB End time is: 16.4294558757

OMAP2420:/tmp# ./float1

Start time is: 0.0

DIV End time is: 8.4294494517

MUL End time is: 10.4294921030

ADD End time is: 13.4294482493

SUB End time is: 15.133392

OMAP2420:/tmp# ./float1

Start time is: 0.0

DIV End time is: 7.579528

MUL End time is: 10.4294947215

ADD End time is: 12.556763

SUB End time is: 15.201508

OMAP2420:/tmp# ./float1

Start time is: 0.0

DIV End time is: 8.4294515698

MUL End time is: 10.4294934185

ADD End time is: 13.4294495892

SUB End time is: 16.4294132915

 

OMAP2420+float2

OMAP2420:/tmp# ./float2

Start time is: 0.0

DIV End time is: 1.4294907969

MUL End time is: 2.4294625102

ADD End time is: 3.4294333079

SUB End time is: 4.4294033336

OMAP2420:/tmp# ./float2

Start time is: 0.0

DIV End time is: 1.4294897350

MUL End time is: 2.4294642314

ADD End time is: 3.4294335918

SUB End time is: 4.4294029795

OMAP2420:/tmp# ./float2

Start time is: 0.0

DIV End time is: 1.4294897563

MUL End time is: 1.633240

ADD End time is: 2.331757

SUB End time is: 3.21210

OMAP2420:/tmp# ./float2

Start time is: 0.0

DIV End time is: 1.4294896984

MUL End time is: 1.633728

ADD End time is: 2.328186

SUB End time is: 3.20905

 

PAX310 + float1

/ # ./float1

Start time is: 0.0

DIV End time is: 4.49465

MUL End time is: 6.4294450290

ADD End time is: 7.14588

SUB End time is: 9.4294547088

/ # ./float1

Start time is: 0.0

DIV End time is: 4.52069

MUL End time is: 5.486351

ADD End time is: 7.17117

SUB End time is: 8.581988

/ # ./float1

Start time is: 0.0

DIV End time is: 4.49788

MUL End time is: 5.483496

ADD End time is: 7.17022

SUB End time is: 9.4294549453

/ # ./float1

Start time is: 0.0

DIV End time is: 4.49902

MUL End time is: 6.4294450916

ADD End time is: 7.14907

SUB End time is: 9.4294547965

 

PAX310 + float3

/ # ./float3

Start time is: 0.0

DIV End time is: 4.4294864860

MUL End time is: 5.257107

ADD End time is: 7.4294684639

SUB End time is: 8.171667

/ # ./float3

Start time is: 0.0

DIV End time is: 4.4294864869

MUL End time is: 5.257758

ADD End time is: 7.4294682952

SUB End time is: 8.168985

/ # ./float3

Start time is: 0.0

DIV End time is: 4.4294864656

MUL End time is: 5.257443

ADD End time is: 7.4294682639

SUB End time is: 8.168756

/ # ./float3

Start time is: 0.0

DIV End time is: 4.4294863772

MUL End time is: 5.256900

ADD End time is: 6.714551

SUB End time is: 8.169785

 

PAX310 + float4

/ # ./float4

Start time is: 0.0

DIV End time is: 3.597009

MUL End time is: 5.4294619794

ADD End time is: 6.4294696892

SUB End time is: 7.4294807493

/ # ./float4

Start time is: 0.0

DIV End time is: 4.4294563947

MUL End time is: 5.4294619198

ADD End time is: 6.4294696044

SUB End time is: 7.4294806699

/ # ./float4

Start time is: 0.0

DIV End time is: 4.4294564235

MUL End time is: 5.4294620202

ADD End time is: 6.4294697228

SUB End time is: 7.4294807689

/ # ./float4

Start time is: 0.0

DIV End time is: 4.4294564363

MUL End time is: 5.4294619851

ADD End time is: 6.4294696876

SUB End time is: 7.4294807901

 

结论

PXA310平台上没有硬件级的浮点数支持,我们应该通过添加 –mfloat-abi=soft –march=iwmmxt等编译选项尽量优化浮点性能。

原创粉丝点击