阅读《大规模并行处理器程序设计》影印版心得第三章 Introduction to CUDA

来源：互联网发布：网络中jk是什么意思啊编辑：程序博客网时间：2024/06/03 03:39

3.1 data parallelism

数据可并行化处理是应用GPU计算的核心。矩阵相乘是简单的数据可并行化的例子，更多的应用中体现出更复杂的数据并行化。

3.2 CUDA program structure

grid -- 每一个kernel调用时，所生成的所有threads，统称为一个grid，可以认为grid是threads的一个组织单位。

3.3 a metrix-matrix multiplication example

GPU计算的基本结构：1）从CPU MM拷贝数据到GPU memory；2）GPU上进行并行计算；3）把计算结果从GPU Memory上拷贝到CPU Memory

3.4 Device Memories and Data Transfer

cudaMalloc函数，cudaFree函数，cudaMemcpy函数

3.5 Kernel Functions and Threading

thread indices: 让线程能够区分自己，并能指引线程找到自己应该处理的parts of data.

在一个grid内的thread，可以被组织为包含线程数相等的block。同一个block中的thread，可以共享带宽很快的共享内存，还可以进行同步。每一个grid内的block，被以二维的形式组织起来，由blockIdx.x和blockIdx.y来指定。

每一个block中线程，被以三维的形式组织起来，最多可以到512个线程。每一个线程，由threadIdx.x, threadIdx.y和threadIdx.z来指定。

在运行一个kernel时，必须指定，到底需要多少个block，也需要指定到底每个block需要多少个thread

SPMD or SIMD?

brainstorms

1）CPU program porting to GPU. How to make this done by computers instead of humans?

阅读 《大规模并行处理器程序设计》影印版心得 第三章 Introduction to CUDA