GPU优化思路

来源:互联网 发布:中国联合网络通信缴费 编辑:程序博客网 时间:2024/06/07 07:37

1 each SM support maximum 8 block

2 each SM support maximum 1024? thread

3 SM split block into warp(32)

4 max shared memory 16K

5 max register?

6 IO / calulate

7 bank conflict

8 reduction

9 memory coaleseing -> load serialize into share memory

10 长时间指令提前?

11 Loop unrolling

12 prefetching

13 Use texture constant memory

0 0
原创粉丝点击