GPU performance tunning

来源:互联网 发布:js时间戳相减 编辑:程序博客网 时间:2024/06/03 18:48
DDR 输出带宽:
640M*8byte=5.1GB/s(其中8byte受限为DMC/BUS宽度的影响)

latency read:107ns
latency write:43ns

outstanding:
read :9 transactions ,write:3 transactions
burst length:7 ;
transfer size:3;

L2 cache:64K
target performance :gfxbenchmark
MALI-T820 MP1
Manhattan
FPS :2.1fps
BW(rw+wr):255.2MB/frame=536.134MB/s
T-Rex:
FPS:6.5fps
BW(rw+wr):166.7MB/frame=1.08355GB/s
Egypt HD:
FPS:18.5fps
BW(rw+wr):56.1MB/frame=1.03785GB/s
Egypt classic
FPS:18.5fps
BW(rw+wr):20.7MB/frame=836.28MB/s

 i)理论计算带宽: rbw=wbw=1.6795GB/s,
     busmonitor 测量带宽:rbw  = 1.80783GB/s    ,wr =  209.408MB/s
 ii)读通道带宽与理论计算带宽相差 较少,主要是实际GPU会读少量的job descriptor ,但理论计算中忽略job descriptor ;

写通道的带宽与理论带宽相差较大,主要是因GPU的TE进行对写进行了优化,根据波形的busmon_wbw_cnt中的字节总量为

133.1988Mbyte左右,约等于2个16K的FB容量(4096*4096*4*2)大小。


calculate antutu6.0 3D score:

garden: fps*1000*0.62
maroon: fps*1000*1.35

      GPU write  测试条件:

1)GPU OD :600MZH ,MP1;

2)DDR fre  :640M

3) CPU :1.3GHZ

     

GPU RW/WR bandwidth :

 t=8.0685S ,BW =3.1014GB/s;

GPU filltare:

t=56.436028S ,fillrate=567.013375M pixels/s;



0 0
原创粉丝点击