Parallel Reduction --- (5) Question: How Many Threads on Earth We Need?
来源:互联网 发布:2005sql向表中添加数据 编辑:程序博客网 时间:2024/05/08 16:18
Abstract
This blog will try to show you how to further optimize your CUDA performance via exploring how many threads should be called.
1. How many threads we need?
The first question here is how many threads on earth that we need? Someone may say as much as possible (known as TLP) while the others may say as little as possible (known as ILP). Well, I would like to say that this is a trade-off problem!
In the previous case, our reduction implementation is based on the TLP strategies. It is very likely that our previous performance can be further improved by combining TLP and ILP strategies together.
2. Use Less Threads (512 threads VS 1024 threads)
// load one-time coputation data into shared memory __shared__ uint64_t data[512]; if(tid < 512){ data[tid] = data_gpu[tid] + data_gpu[tid + 512]; } __syncthreads(); // reduction if(tid < 256){ data[tid] += data[tid + 256]; } __syncthreads(); if(tid < 128){ data[tid] += data[tid + 128]; } __syncthreads(); if(tid < 64){ data[tid] += data[tid + 64]; } __syncthreads(); if(tid < 32){ data[tid] += data[tid + 32]; data[tid] += data[tid + 16]; data[tid] += data[tid + 8]; data[tid] += data[tid + 4]; data[tid] += data[tid + 2]; data[tid] += data[tid + 1]; } // write root node (data[0]) back if(tid == 0){ data_gpu[tid] = data[tid]; }
4. Experimental Results
The experimental results shows an new higher performance we can get by balancing the TLP and ILP strategies.
5. More Details
The whole project has been submitted to Github, where anyone who is interested on CUDA is warmly welcome to develop this project. Big thanks to all of you!
- Parallel Reduction --- (5) Question: How Many Threads on Earth We Need?
- parallel reduction
- openmp--test how many threads used
- Parallel Reduction --- (0) Intro
- Should we stop to find what on earth has changed before we can move forward?
- Parallel Reduction --- (1) Original Implementation
- Parallel Reduction --- (3) Free Strides
- Parallel Reduction --- (4) Free Loops
- ZOJ3812--We need Medicine
- Why we need StackOverFlow?
- louis vuitton pas cher There are so many handbags prefer Versace copy handbags in the earth is we pe
- How we Built an iOS game on PC
- [神经网络]2.2/2.3-How the backpropagation algorithm works-The two assumptions we need...(翻译)
- Parallel Reduction --- (2) Remove Unnecessary Modular Arithmetic
- Advanced Spoken English--we need master 5 skills
- Why we need mathematics model
- Do We Need Design Patterns?
- 【ZOJ】3812 We Need Medicine
- 深度学习-----思想篇(三)
- 深度学习学习资料
- OKVIS RelativePoseError
- ttttttttt
- 在Linux中让echo命令显示带颜色的字
- Parallel Reduction --- (5) Question: How Many Threads on Earth We Need?
- javascipt 的 Fibonacci 数列优化
- Compile、Make以及Build
- jsp电子商务 购物车实现之二 登录和分页篇
- 76. Minimum Window Substring, leetcode
- 时间间隔递增
- GDUT新生赛—B
- 基于MATLAB短时傅里叶变换和小波变换的时频分析
- C#制作gif