OpenCL 笔记4 projector application

来源:互联网 发布:任小龙 java基础笔记 编辑:程序博客网 时间:2024/05/21 20:21

1. Ray-driven back-projection is less suitable for parallelization due to possible racing conditions. Using OpenCL these race can be prevented using atomic functions at the cost of losing performance. This is valid for cone-beam CT, as the racing is inevitable, while for spiral CT, memory can be optimized to maintain a less racing.

2. Parallelization scheme suggests a partitioning over x1, x3 axes as illustracted in Figure. The x2 axis is then executed as innermost loop for each kernel. The benefits are two-fold: Key advantage is that neighboring threads inl first dimension always access the volume data coalesced. The coalesced memory access is necessary to maximize the acheived memory bandwidth.This could be true if the cone-beam CT detector won't rotate, while spiral CT cannot guarantee this at all.

3.Accessing the discrete measurement at arbitary positions requires interpolation. This enables the implicit bilinear interpolation by the texture unit as well as automatic usage of the existing texture cache.

4. A obvious to use  constant memory is to store the projection matrix in a globally defined array. For spiral CT, the initial detector could be stored instead.

5. The grid- and block-configuration influence both the global memory access pattern and the texture cache usage.

6. The GPU could achieve almost a factor of 3 times better performance than CELL processor. Here the bilinear interpolation is weighty on the resulting performances as the CELL processor lack of texture units. Without bilinear interpoaltion, instead using nearest-neighbor interpolation the performance benifits of GPU is narrowed to a factor of 1.6.




0 0
原创粉丝点击