SSE instruction optimization challenges in C/C++
来源:互联网 发布:sql having count 编辑:程序博客网 时间:2024/04/29 20:38
Many suggest that the core of Ray-tracing should be implemented using SSE/SSE2, and several SIMD-based Ray-tracer has even been published. People say the performance has been enhanced amazingly…
Very attracting isn’t it… But SSE/SSE2 has an critical constraint of the data that SSE instructions can handle: all data should be 16-byte aligned, or there’ll be runtime errors. It is a typical trade-off between performance and convenience. (Another more common situation is the trade-off between performance and memory occupation amount)
Due to this intractably constraint, the data that is intended to give to SSE/SSE2 has to be specifically put:
-- heap vars: _aligned_malloc
-- global vars: __declspec(align(16))
An Intel guy implemented a SSE-based Ray-Tracer successfully http://software.intel.com/en-us/articles/architecture-of-a-real-time-ray-tracer/ I didn’t see any alignment decorator and I guess he used Intel compilers… Another paper on this:
http://www.computer.org/portal/web/csdl/doi/10.1109/TVCG.2009.73
Another constaint: std::vector cannot contain _declspec(align(16)) data.
At the same time, some complained that SSE code is no faster than VC optimized code. Also, PBRT doesn’t use SSE either but it does put a SSE option in the makefile as the compiler parameter. Maybe modern compilers are too smart and they all support SSE/SSE2, who knows… Anyway it is still worthy of further investigation.
- SSE instruction optimization challenges in C/C++
- Optimization of Computer Programs in C
- first step in order to optimization of my C program
- algorithm optimization with Intel SSE
- Writing-Efficient-C-and-C-Code-Optimization
- Challenges in Lean model
- SSE指令介绍及其C、C++应用
- SSE指令介绍及其C、C++应用
- SSE指令介绍及其C、C++应用
- SSE指令介绍及其C、C++应用
- SSE指令介绍及其C、C++应用
- SSE指令介绍及其C、C++应用
- SSE指令介绍及其C、C++应用
- SSE2 SSE简介和C代码示例
- SSE指令集 c,c++程序代码优化
- Intel SSE Tutorial : An Introduction to the SSE Instruction Set
- Code Optimization Using the GNU C Compiler
- System Call Optimization with the SYSENTER Instruction
- POJ 3349
- XML and Oracle
- SVN 1.6的与Apache HTTP Server的集成
- 游戏的胜败
- 堆排序(Heap Sort)算法的实现
- SSE instruction optimization challenges in C/C++
- Google 2008 台北程式開發日-活動花序
- Ubuntu Android开发环境搭配
- 使用.net 中的动态方法编程备忘录6(OpCodes.Ldloc 与OpCodes.Ldloca)
- 游戏心态
- eclipse plugin 扩展点
- 电脑高手常用键
- 快捷键使用啊
- eclipse plugin 扩展点 总结