Programming trivia: 4x4 integer matrix transpose in SSE2
来源:互联网 发布:知乎 不知道 编辑:程序博客网 时间:2024/05/02 06:11
From http://www.randombit.net/bitbashing/2009/10/08/integer_matrix_transpose_in_sse2.html
It is a good example of SSE for matrix transpose.
he Intel SSE2 intrinsics has a macro _MM_TRANSPOSE4_PSwhich performs a matrix transposition on a 4x4 array represented byelements in 4 SSE registers. However, it doesn’t work with integerregisters because Intel intrinsics make a distinction between integerand floating point SSE registers. Theoretically one could cast and usethe floating point operations, but it seems quite plausible that thiswill not round trip properly; for instance if one of your integervalues happens to have the same value as a 32-bit IEEE denormal.
However it is easy to do with the punpckldq, punpckhdq, punpcklqdq,and punpckhqdq instructions; code and diagrams ahoy.
If we name the 4 input registers I0, I1, I2, and I3, then label their cooresponding elements as0{0,1,2,3} and so on, then the transpose operation looks like this:
When we are done, O{0,1,2,3} contains the all of the first,second, third, or fourth elements (resp) of the input vectors.
In Intel’s intrinsics (also usable in at least GNU C++ and VisualC++), this can be expressed as:
__m128i T0 = _mm_unpacklo_epi32(I0, I1);__m128i T1 = _mm_unpacklo_epi32(I2, I3);__m128i T2 = _mm_unpackhi_epi32(I0, I1);__m128i T3 = _mm_unpackhi_epi32(I2, I3);/* Assigning transposed values back into I[0-3] */I0 = _mm_unpacklo_epi64(T0, T1);I1 = _mm_unpackhi_epi64(T0, T1);I2 = _mm_unpacklo_epi64(T2, T3);I3 = _mm_unpackhi_epi64(T2, T3);
The diagram was done with latex2png,a handly little tool for generating images with LaTeX inputs.
- Programming trivia: 4x4 integer matrix transpose in SSE2
- Assembly x64 Intro - SSE2 4x4D Transpose
- Matrix Transpose
- Assembly x64 Intro - SSE2 2x4x4W Transpose
- Inverse transpose matrix
- Uva 10895 - Matrix Transpose
- uva 10895 Matrix Transpose
- UVA 10895 Matrix Transpose
- UVA - 10895 Matrix Transpose
- UVa 10895 Matrix Transpose
- UVa 10895 - Matrix Transpose
- Sparse Matrix's Transpose
- UVA 10895 - Matrix Transpose(STL)
- uva 10895 - Matrix Transpose(STL)
- UVA - 10895Matrix Transpose(vector)
- UVA 10895——Matrix Transpose
- The effective tools for processing matrix in C++ programming
- 整型规划的凸松弛(Convex Relaxation in Integer Programming)
- java开发编程规范
- UITableView的使用大全
- 从零开始学习OpenGL ES之四补遗 – setupView重写
- Android开发工具下载地址
- V4L2 Video Capture - 02
- Programming trivia: 4x4 integer matrix transpose in SSE2
- C# Enum,Int,String的互相转换 枚举转换
- 比波超人比比比比
- 功能协商与重置功能实现
- 海量数据处理
- asp.net中Gradview绑定数据后输出到Excel表格中
- OpenGL ES 从零开始系列08:交叉存取顶点数据
- 平凡之路
- Annotation(章节摘要)