Parallel Reduction --- (1) Original Implementation
来源:互联网 发布:js时间戳转换时间格式 编辑:程序博客网 时间:2024/05/11 18:24
Abstract
This blog will implement an original version of parallel reduction.
1. Key Codes
// reduction on a CUDA block for (int i=1; i < 1024; i *= 2){ if ((tid % (2 * i)) == 0){ data[tid] += data[tid + i]; } __syncthreads(); }
So, what is the meaning of the above codes? Well, to explain them figuratively, see a operation diagram as follows.
2. Experimental Results
In our first version of experiment, we implement the basic reduction on CUDA. The CUDA kernel runs on NVIDIA GTX 780Ti, Intel Core I7 and the operating system, Windows 7. The results show that 13.417 ms is consumed to calculate the 0+1+2+…+1023 reduction for one thousand times.
3. More Details
For more details, you can visit my source codes on Github, anyone interested in this project is warmly welcome to contribute to it.
0 0
- Parallel Reduction --- (1) Original Implementation
- parallel reduction
- Parallel Reduction --- (0) Intro
- Parallel Reduction --- (3) Free Strides
- Parallel Reduction --- (4) Free Loops
- Parallel Reduction --- (2) Remove Unnecessary Modular Arithmetic
- CUDA中并行规约(Parallel Reduction)的优化
- openmp 快速入门 常用技巧 parallel for sections reduction critical
- parallel reduction 并行规约,unroll last warp 同步问题
- CUDA中并行规约(Parallel Reduction)的优化
- CUDA中并行规约(Parallel Reduction)的优化
- reduction
- Parallel Reduction --- (5) Question: How Many Threads on Earth We Need?
- THE FASTEST C++ IMPLEMENTATION FOR ORIGINAL DPM — SOURCE CODE DOWNLOAD - BUG Fixed
- Architecting Parallel Applications(1)
- 并行编程(1):Parallel
- Parallel&Distributed Algorithm-1
- Parallel
- 嵌入式复习整理
- 短信群发助手调试笔记
- 数据结构——线性表(顺序存储)
- 个人记录-LeetCode 42. Trapping Rain Water
- 发现了以元素 'd:skin' 开头的无效内容。此处不应含有子元素
- Parallel Reduction --- (1) Original Implementation
- php实现单链表操作
- Browser/Server结构浏览器和服务器结构介绍
- linux文件系统权限解析
- 重回JS—简单选择器(练习)
- jq基础笔记02
- Linux下AndroidStudio安装配置
- Oracle,day1
- 内核开发细节