CUDA Optimization tips

来源:互联网 发布:java 文件夹复制 覆盖 编辑:程序博客网 时间:2024/06/05 08:26

摘自 "CUDA C Best Practices"

1. To maximize developer productivity, profile the application to determine hotspots and bottlenecks

2. To get the maximum benefit from CUDA, focus first on finding ways to parallelize sequential code.

3. Use the effective bandwidth of your computation as a metric when measuring performance and optimization benefits.

4. Minimize data transfer between the host and the device, even if it means running some kernels on the device that do not show performance gains when compared with running them on the host CPU.

5. When you have to transfer data between host and device, then higher bandwidth can be achieved by using pagelocked (or pinned) memory.

6. Ensure global memory accesses are coalesced whenever possible.

7. Non-unit-stride global memory accesses should be avoided whenever possible.


To be continued...

原创粉丝点击