Parallel Reduction --- (3) Free Strides
来源:互联网 发布:网络管理软件排行榜 编辑:程序博客网 时间:2024/05/20 01:13
Abstract
This blog will try to improve the performance of previous reduction algorithm by freeing strides.
1. Strides
The existence of strides directly results in a discount of load or store efficiency. This is because half the threads in the transaction are not used and represent wasted bandwidth. Therefore, ensuring that as much as possible of the data fetched without strides is an important part of performance optimization of memory accesses.
2. Key Codes
// reduction for(int i = 1024/2; i > 0; i >>=1){ if(tid < i){ data[tid] += data[tid + i]; } __syncthreads(); }
A operation diagram is presented to explain these codes figuratively as follows.
3. Experimental Results
The experimental Results shows a much higher performance achieved by freeing strides, which is a fundamental but indispensable strategy when optimizing CUDA codes.
4. More
The source code can be viewed on Github.
- Parallel Reduction --- (3) Free Strides
- Parallel Reduction --- (4) Free Loops
- parallel reduction
- Parallel Reduction --- (0) Intro
- Parallel Reduction --- (1) Original Implementation
- Parallel Reduction --- (2) Remove Unnecessary Modular Arithmetic
- CUDA中并行规约(Parallel Reduction)的优化
- openmp 快速入门 常用技巧 parallel for sections reduction critical
- parallel reduction 并行规约,unroll last warp 同步问题
- CUDA中并行规约(Parallel Reduction)的优化
- CUDA中并行规约(Parallel Reduction)的优化
- reduction
- Dimensionality Reduction(学习Free Mind知识整理)
- Parallel Reduction --- (5) Question: How Many Threads on Earth We Need?
- TensorFlow strides 参数讨论
- tensorflow中的strides参数
- Parallel&Distributed Algorithm-3
- Parallel
- session机制
- FZOJ 2242
- hdu 1253 胜利大逃亡(dfs+剪枝)
- Asp.net登录页面(前台+后台)
- struts2中s:datetimepicker标签不能使用的问题解决
- Parallel Reduction --- (3) Free Strides
- Android 离线文档打开慢的N种解决方法
- Android PopUpWindow简单使用
- 多线程练习
- 欢迎使用CSDN-markdown编辑器
- lintcode, 解码方法
- 使用do{ } while(0)的好处
- 满足条件的n
- 线程休眠(七)