基于移动平台消除冗余GPU绘制片段的技术

来源:互联网 发布:詹姆斯各项数据排名 编辑:程序博客网 时间:2024/05/21 13:19

A brief report about paper Eliminating Redundant Fragment Shader Executions on a Mobile GPU via Hardware Memoization .


GPU is made for render, no matter rendering for movies, for video games, or just for models. In this rendering, here comes the problem that there are some fragments that have been rendered not only once, or in an other word, redundant[i]rendering, especially in the scenes of video games. And this paper aims to reduce or eliminate the redundant fragment shader. Less computation,  more battery. So this paper is also about energy saving for GPU.

 

The main difficulty of this work is that the redundancy exists across frames, which means, this is a temporal problem, not spatial. And the paper's scheme can remove about 60% of the redundant fragment computations for mobile devices. To remove the redundant fragments in temporal domain, a task-level memoization scheme is added on the top of PFR(Parallel Frame Rendering).

 

Yet,programmer can't access the graphics memory directly, so the author uses the HW structure as signature of the total input. When a computation is executed, the input and result will be cached in a Look Up Table. The following executions will probe the input to find out whether hit or not before calculation. And the concept is quite straightforward.

 

The Parallel Frame Rendering PFR mentioned above renders two consecutive frames in parallel, so the baseline GPU is splited into two clusters, even frames for cluster 0, odd frames to cluster 1. 50% of the redundant fragments  have distances smaller than 64 fragments,61.3% smaller than 2000.

 

There is a balance between re-use computation(memoization structures) and actual rendering computation, so comes the task-level complexity. The distance of re-use should be limited small. The fragment shaders render just a single output color, not all the details, so referential transparency can be guaranteed by monitoring the API class.

 

The memoizaton system will detect the candidate fragments and lookup of prior fragment information, so to replace the redundant components. Fragment with much information about registers and texture samplers will not be used as candidates. So not all input bits are used for generating signatures. To find the proper input, hash function generator is implemented.

 

In order to evaluate the result, the author used a mobile GPU simulation to run unmodified Android applications to get better evaluation. The OpenGL commands are redirected to GPU driver to provide hardware accelerated graphics. The GPU instruction and memory trace is used to drive the simulator.

 

2D games fit perfectly for the memoization technique, with static backgrounds,which can be easily understood. But for scrolling 2D games and 3D games, the result is not very bad, for 3D games still have some degrees of redundancy,especially the background, when the camera not move around. When the camera is moving, the optimization process works a little worse.

 

I think about the problem that paper proposed before, reduce the redundant rendering is a great way to speed up the computation, but the cost of computing the relation between the fragment must be controlled, both time and space. And the author does a lot of work to optimization this process.

0 0
原创粉丝点击