GPGPU能耗优化

来源:互联网 发布:js中获取元素的方法 编辑:程序博客网 时间:2024/05/04 07:50

This is a brief report about paper GPUWattch: Enabling Energy Optimizations in GPGPUs

This paper proposed a model for GPU to do energy-efficiency job. The major properties the paper care are the robustness, dynamics and energy savings. The paper first introduces how to simulate a hardware like GPU, since the real architecture of GPU is mixed and can not be analysed easily; and then, the paper tells us which parts of the GPU power needed to be calculated, easy or hard; finally, based on the methods above, the author is able to change the instruction and thread while tracking the power consuming of the architecture, and give a  better GPU power model.


Basically,the parts of GPU power estimation took into considerations are Register File, Shared Memory, Execution Units, Memory Coalescing Logic(MCL) and Main memory. Some of which are familiar to us, the Register File memory array, for example, can be estimated by the CACTI. Just like we are told, shared memory SIMD can access any address in shared memory, so it has similar structure to register file. Also, we can image execution units like ALU, which is stable power consuming part, but here it do the FP pipeline jobs, it's consumption depends on the latency and throughput of instructions, and can be estimated by Synopsys Power Complier. 


As for GPU power modeling, micorbenchmarking is introduced, which is quite useful in dynamic power consumption, like main memory's power consuming, which is a major part of the GPU chip and can be estimated by using DRAM models. In this way, microbenchmarking is used to address the power modeling uncertainties. But there is something to be notified, the microbenmarks should have low a correlation with each other to solve the LSE(least-squares estimation) problem formed by microbenchmarks. For different parts of the model, additional microbenchmarks is needed, as to match all components of the processor. 


To measure the power, the author separate the power into constant power and dynamic power. The constant power is independent of processor and memory frequency, that means this number is a constant. but dynamic power is influenced by processor and memory frequency, and there is a linear relationship between them. So dynamic power is validated by using microbenchmarks. In Microbenchmark based validation, average power error is near 15%. While in component-level validation, author isolated it with microbenchmarks, and the component's modeling error is equivalent to the total power error. In real GPU, execution Units and DRAM is easy to isolate, while L1 and L1 is hard to.


As for energy optimizations, the author exploit phase behavior, like implementing DVFS algorithm to monitoring the average stall cycles rate, global synchronous behavior. After that, author is able to adjust different part of the model to make the power model work more efficiently.
From this paper, I learned the basic idea of how to do a hardware-based estimation and optimization job. For most of the hardware,  the machine is hard to be divided, or the divided parts can not be tested easily, so we need to use extra testable part to simulate the entirety. Only in that way, can we get the points about which part to optimize. Also, when we analyse the data, it can usually be divided into constant part and dynamic part, just like linear and polynomial function. Of course, this paper also makes me get a better understanding of the important roles memory and calculation unit take on, which also means they will consume more energy.
0 0