GGC 编译Intrinsic
来源:互联网 发布:淘宝网店信誉怎么升 编辑:程序博客网 时间:2024/06/05 05:21
http://www.linuxjournal.com/content/introduction-gcc-compiler-intrinsics-vector-processing?page=0,1
http://stackoverflow.com/questions/7156908/sse-intrinsic-functions-reference
Table 1. GCC Command-Line Options to Generate SIMD Code
Here are the include files you need:
- arm_neon.h - ARM Neon types & intrinsics
- altivec.h - Freescale Altivec types & intrinsics
- mmintrin.h - X86 MMX
- xmmintrin.h - X86 SSE1
- emmintrin.h - X86 SSE2
X86: MMX, SSE, SSE2 Types and Debugging
The X86 compatibles with MMX, SSE1 and SSE2 have the following types:
- MMX: __m64 64 bits of integers broken down as eight 8-bit integers, four 16-bit shorts or two 32-bit integers.
- SSE1: __m128 128 bits: four single precision floats.
- SSE2: __m128i 128 bits of any size packed integers, __m128d 128 bits: two doubles.
Table 2. Subset of vector operators and intrinsics used in the examples.
Operation
Altivec
Neon
MMX/SSE/SSE2
loading
vec_ld
vld1q_f32
_mm_set_epi16
vector
vec_splat
vld1q_s16
_mm_set1_epi16
vec_splat_s16
vsetq_lane_f32
_mm_set1_pi16
vec_splat_s32
vld1_u8
_mm_set_pi16
vec_splat_s8
vdupq_lane_s16
_mm_load_ps
vec_splat_u16
vdupq_n_s16
_mm_set1_ps
vec_splat_u32
vmovq_n_f32
_mm_loadh_pi
vec_splat_u8
vset_lane_u8
_mm_loadl_pi
storing
vec_st
vst1_u8
vector
vst1q_s16
_mm_store_ps
vst1q_f32
vst1_s16
add
vec_madd
vaddq_s16
_mm_add_epi16
vec_mladd
vaddq_f32
_mm_add_pi16
vec_adds
vmlaq_n_f32
_mm_add_ps
subtract
vec_sub
vsubq_s16
multiply
vec_madd
vmulq_n_s16
_mm_mullo_epi16
vec_mladd
vmulq_s16
_mm_mullo_pi16
vmulq_f32
_mm_mul_ps
vmlaq_n_f32
arithmetic
vec_sra
vshrq_n_s16
_mm_srai_epi16
shift
vec_srl
_mm_srai_pi16
vec_sr
byte
vec_perm
vtbl1_u8
_mm_shuffle_pi16
permutation
vec_sel
vtbx1_u8
_mm_shuffle_ps
vec_mergeh
vget_high_s16
vec_mergel
vget_low_s16
vdupq_lane_s16
vdupq_n_s16
vmovq_n_f32
vbsl_u8
type
vec_cts
vmovl_u8
_mm_packs_pu16
conversion
vec_unpackh
vreinterpretq_s16_u16
vec_unpackl
vcvtq_u32_f32
vec_cts
vqmovn_s32
_mm_cvtps_pi16
vec_ctu
vqmovun_s16
_mm_packus_epi16
vqmovn_u16
vcvtq_f32_s32
vmovl_s16
vmovq_n_f32
vector
vec_pack
vcombine_u16
combination
vec_packsu
vcombine_u8
vcombine_s16
maximum
_mm_max_ps
minimum
_mm_min_ps
vector
_mm_andnot_ps
logic
_mm_and_ps
_mm_or_ps
rounding
vec_trunc
misc
_mm_empty
Check Processor at Runtime
Next, your code should check your processor at runtime to see if you have vector support for it. If you don't have a vector code path for that processor, fall back to your scalar code. If you have vector support, and the vector support is faster, use the vector code path. Test processor features on X86 with the cpuid instruction from <cpuid.h>. (You saw examples of that in samples/simple/x86/*c.) We couldn't find something that well established for Altivec and Neon, so the examples there parse /proc/cpuinfo. (Serious code might insert a test SIMD instruction. If the processor throws a SIGILL signal when it encounters that test instruction, you do not have that feature.)
Summary
In summary, GCC offers intrinsics that allow you to get more from your processor without the work of going all the way to assembly. We have covered basic types and some of the vector math functions. When you use intrinsics, make sure you test thoroughly. Test for speed and correctness against a scalar version of your code. Different features of each processor and how well they operate means that this is a wide open field. The more effort you put into it, the more you will get out.
References:
The GCC include files that map intrinsics to compiler built-ins (eg arm_neon.h) and the GCC info pages that explain those built-ins:
http://gcc.gnu.org/onlinedocs/gcc/Target-Builtins.html
http://ds9a.nl/gcc-simd/
http://softpixel.com/~cwright/programming/simd/index.php
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dht0002a/BABCJFDG.html
http://www.arm.com/products/processors/technologies/neon.php
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dht0002a/ch01s04s02.html
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0205j/BABGHIFH.html
http://www.tommesani.com/Docs.html
http://www.linuxjournal.com/article/7269
http://developer.apple.com/hardwaredrivers/ve/sse.html
http://en.wikipedia.org/wiki/Multiplication_algorithm#Shift_and_add
http://www.ibm.com/developerworks/power/library/pa-unrollav1/
http://en.wikipedia.org/wiki/MMX_(instruction_set)
Integrated Performance Primitives
http://software.intel.com/en-us/articles/intel-ipp/
http://software.intel.com/en-us/articles/non-commercial-software-download/
OpenMAX
http://www.khronos.org/developers/resources/openmax
Freescale AltiVec Libs for Linux
http://www.freescale.com/webapp/sps/site/overview.jsp?code=DRPPCNWALTVCLIB
AltiVec TM Technology Programming Interface Manual
http://www.freescale.com/files/32bit/doc/ref_manual/ALTIVECPIM.pdf
http://developer.apple.com/hardwaredrivers/ve/instruction_crossref.html
Ian Ollmann's Altivec Tutorial
http://www-linux.gsi.de/~ikisel/reco/Systems/Altivec.pdf
http://arstechnica.com/civis/viewtopic.php?f=19&t=381165
RealView Compilation Tools Compiler Reference Guide (especially Appendix E)
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0348c/index.html
RealView Compilation Tools Assembler Guide (esp chapter 5)
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0204j/index.html
Intel C++ Intrinsics Reference
http://software.intel.com/sites/default/files/m/9/4/c/8/e/18072-347603.pdf
- GGC 编译Intrinsic
- ggc扩展
- 介绍intrinsic
- Intrinsic function
- Intrinsic function
- intrinsic image decomposition
- Intrinsic 基础入门【1】
- Intrinsic Locks & Synchronized Statements
- Intrinsic image / video
- intrinsic image decomposition
- SSE intrinsic函数_优化
- CUDA中使用Intrinsic Function
- UEFI #pragma intrinsic( function1 [, function2, ...] )
- Cognition & The Intrinsic User Experience
- SSE intrinsic函数_优化
- SSE intrinsic函数_优化
- 跨平台使用Intrinsic函数
- intrinsic parameters内参数求解
- debug的使用步骤
- PHP网站页面静态化的生成方法介绍
- KingPager,自己做的分页控件,纯JS,支持所有语言
- android listview
- centos 6.0 出现no suitable device found
- GGC 编译Intrinsic
- 非典型文科屌丝男是如何进入腾讯的
- win 32 APP 项目简单创建窗体
- VS/猎豹浏览器修改背景色
- sql注入攻击详解
- 提问
- 【飞秋教程】个性头像/形象照片/换肤
- Hibernate的Criteria的用法
- Apache自带压力测试工具ab