Producing optimised NDK code for multiple architectures?
来源:互联网 发布:windows subst 编辑:程序博客网 时间:2024/06/07 05:03
转自:Producing optimised NDK code for multiple architectures?
Question:
I have some C code for Android that does lots of low-level number crunching. I'd like to know what settings I should use (e.g. for my Android.mk and Application.mk) files so that the code produced will run on all current Android devices but also takes advantage of optimisations for specific chipsets. I'm looking for good default Android.mk and Application.mk settings to use and I want to avoid having to litter my C code with #ifdef branches.
For example, I'm aware that ARMv7 has floating point instructions and some ARMv7 chips support NEON instructions and that the default ARM supports neither of these. Is it possible to set flags so that I can build ARMv7 with NEON, ARMv7 without NEON and the default ARM build? I'm know how to do the latter two but not all 3. I'm cautious about what settings I use as I assume the current defaults are the safest settings and what risks other options have.
For GCC specific optimisation, I'm using the following flags:
LOCAL_CFLAGS=-ffast-math -O3 -funroll-loops
I've checked all 3 of these speed up my code. Are there any other common ones I could add?
Another tip I have is to add "LOCAL_ARM_MODE := arm" to Android.mk to enable a speed up on newer arm chips (although I'm confused at exactly what this does and what happens on older chips).
Answer:
ARM processors have 2 general instruction sets that they support: "ARM" and "Thumb". Though there are different flavors of both, ARM instructions are 32 bits each and Thumb instructions are 16 bits. The main difference between the two is that ARM instructions have the possibility to do more in a single instruction than Thumb can. For example a single ARM instruction can add one register to another register, while performing a left shift on the second register. In Thumb one instruction would have to do the shift, then a second instruction would do the addition.
ARM instructions are not twice as good, but in certain cases they can be faster. This is especially true in hand-rolled ARM assembly, which can be tuned in novel ways to make the best use of "shifts for free". Thumb instructions have their own advantage as well as size: they drain the battery less.
Anyway, this is what LOCAL_ARM_MODE does - it means you compile your code as ARM instructions instead of Thumb instructions. Compiling to Thumb is the default in the NDK as it tends to create a smaller binary and the speed difference is not that noticeable for most code. The compiler can't always take advantage of the extra "oomph" that ARM can provide, so you end up needing more or less the same number of instructions anyway.
The result of what you see from C/C++ code compiled to ARM or Thumb will be identical (barringcompiler bugs).
This by itself is compatible between new and old ARM processors for all Android phones available today. This is because by default the NDK compiles to an "Application Binary Interface" for ARM-based CPUs that support the ARMv5TE instruction set. This ABI is known as "armeabi" and can be explicitly set in the Application.mk by puttingAPP_ABI := armeabi
.
Newer processors also support the Android-specific ABI known as armeabi-v7a
, which extends armeabi to add theThumb-2 instruction set and a hardware floating point instruction set called VFPv3-D16. armeabi-v7a compatible CPUs can also optionally support the NEON instruction set, which you have to check for at run time and provide code paths for when it is available and when it is not. There's an example in the NDK/samples directory that does this (hello-neon). Under the hood, Thumb-2 is more "ARM-like" in that its instructions can do more in a single instruction, while having the advantage of still taking up less space.
In order to compile a "fat binary" that contains both armeabi and armeabi-v7a libraries you would add the following to Application.mk:
APP_ABI := armeabi armeabi-v7a
When the .apk file is installed, the Android package manager installs the best library for the device. So on older platforms it would install the armeabi library, and on newer devices the armeabi-v7a one.
If you want to test for CPU features at run time then you can use the NDK functionuint64_t android_getCpuFeatures()
to get the features supported by the processor. This returns a bit-flag ofANDROID_CPU_ARM_FEATURE_ARMv7
on v7a processors,ANDROID_CPU_ARM_FEATURE_VFPv3
if hardware floating points are supported andANDROID_CPU_ARM_FEATURE_NEON
if advanced SIMD instructions are supported. ARM can't have NEON without VFPv3.
In summary: by default, your programs are the most compatible. Using LOCAL_ARM_MODE may make things slightly faster at the expense of battery life due to the use of ARM instructions - and it is as compatible as the default set-up. By adding theAPP_ABI := armeabi armeabi-v7a
line you will have improved performance on newer devices, remain compatible with older ones, but your .apk file will be larger (due to having 2 libraries). In order to use NEON instructions, you will need to write special code that detects the capabilities of the CPU at run time, and this only applies to newer devices that can run armeabi-v7a.
- Producing optimised NDK code for multiple architectures?
- android ndk CPUs and Architectures
- X-code Pods was rejected as an implicit dependency for 'libPods.a' because its architectures 'armv7'
- Oracle Applications Multiple Organizations Access Control for Custom Code
- Effective STL 47 Avoid producing write-only code
- NDK: multiple definition of 'typeinfo for XXX' "vtable for xxx" 错误
- Architectures for a Compositing Manager
- No architectures to compile for
- GCC options for ARM architectures
- 上传App报错 Your binary is not optimised for iPhone5
- Domain Architectures: Models and Architectures for UML Applications
- Write Portable Code: An Introduction to Developing Software for Multiple Platforms [ILLUSTRATED]
- Powershell: Launch multiple script in parallel and check exit code for each.
- The UML profile for framework architectures
- Protocols and Architectures for Wireless Sensor Networks
- The Method Framework for Engineering System Architectures
- Application-Specific Protocol Architectures for Wireless Networks
- No Architectures to Compile for (ONLY_ACTIVE_ARCH=
- Android版 RTSP客户端
- Linux下各种基础软件简介
- javascript使用页面上使用动态时间
- 直方图均衡化(不直接用opencv均衡化函数)
- Leetcode_remove-element
- Producing optimised NDK code for multiple architectures?
- 漫谈-----抽象类与接口的应用(六)
- hdu1863 畅通工程 kruskal
- Java调用bat文件
- [Phonegap+Sencha Touch] 移动开发10 调用focus方法使输入框获得焦点,不弹出软键盘的解决方法
- JAVA 中打开/保存开文件对话框 源代码
- 数据陷阱
- Cipher Lock
- WORD脚注引用添加