OpenCl 笔记1 Memory Model

来源:互联网 发布:李子君 知乎 编辑:程序博客网 时间:2024/05/03 19:52

Memory model

Registers

Equally to a CPU register file, it is private for each thread and read-/write-able. The amount of registers is limited depending on the occupancy, the kernel complexity and the GPU generation. Should the register file be exhausted, then data spills into local memory.


Local Memory

It is introduced to provide a dynamic approach of register files in order to overcome hardware limitations.  The price to be paid is performance loss.


Shared Memory

It can be used for communication between all threads of a thread block as well as primary local storage space. Shared memory is generally the lowest latency communication method between threads. It is read- and write-able, but no coherency is guaranteed if two threads try to access it at the same point of time. Therefore atomic functions are included in the framework.


Constant Memory

The constant memory is one of the read only address spaces.


1-D Texture Array

In contrast to the constant memory the texture array allows an automatic interpolation between neighboring values - in hardware - depending on the given position.

1-D Linear Texture

In contrast to the 1-D texture array, the 1-D linear texture is write-able for kernel functions. Since the texture caches don't force coherence, it is important to understand the behavior will be undefined if a thread writes to a certain position while another thread is reading the position.

2-D Texture Array

Similar to 1-D texture array, provides a bilinear interpolations by hardware.

2-D Texture from Pitch-Linear Memory

Similar to 1-D linear Texture, they are write-able by the kernel.

3-D Texture Array

Unfortunately, there is no 3-D write-able texture available.

Global Memory

Compared to others,it is the slowest possible access. It is limited only by the amount of memory available on the graphics card.




Conclusion: without detailed knowledge about this memory model a parallel implementation is still possible, but a huge loss in performance is very likely.

0 0