D3D9 资源管理最佳实践 Resource Management Best Practices

来源：互联网发布：知乎樱桃 3000 编辑：程序博客网时间：2024/06/04 20:01

Resource Management Best Practices

D3D9 共有表1所罗列的8种资源，他们都继承与IDirect3DResource9，有着类似的接口。他们可以存放在不同的memory-pool，加上不同的hint-flag，合适的存放位置和flag会极大的影响程序的效率，下面我们就此展开细致的讨论。

IDirect3DBaseTexture9

IDirect3DCubeTexture9

IDirect3DTexture9

IDirect3DVolumeTexture9

IDirect3DIndexBuffer9

IDirect3DSurface9

IDirect3DVertexBuffer9

IDirect3DVolume9

volume resources

表1， D3D9资源类型列表

首先我们拿IDirect3DVertexBuffer9作为例子，我们可以通过IDirect3DDevice9的CreateVertexBuffer方法来创建一个VertexBuffer。详细的函数参数如下，其中Usage和Pool是我们要讨论的重点。

HRESULT CreateVertexBuffer(

UINT Length,

DWORD Usage,

DWORD FVF,

D3DPOOL Pool,

IDirect3DVertexBuffer9** ppVertexBuffer,

HANDLE* pSharedHandle

);

Video Memory

如图1所示，Video RAM是GPU板上内存，又称local video memory，读写速度最快，但是CPU不能访问，某些资源必须存储在Video RAM上，比如render targets and depth/stencil buffers；System RAM是系统内存，只能被CPU访问；AGP Aperature Memory(简称AGP Memory，又称non-local video memory)是从System RAM中独立出来的一块内存，专门用于图形渲染，他既可以被CPU访问也可以被GPU访问。需要注意的是当设备丢失（lost-device situations）的时候，所有存储在Video RAM和AGP Aperature Memory上的资源都必须销毁并在device-reset的时候重新创建。

Managed Resources

大多数资源应该被创建为POOL_MANAGED存放类型，就是说D3D9 Runtime会帮我们管理这个内存池里的资源，资源在System RAM中有一份拷贝，当需要时再在Video RAM中创建一份拷贝。当设备丢失时Video RAM上的资源会自动拷贝到System RAM，在设备恢复并用到这些资源时也会自动拷贝到Video RAM。由于GPU不会在每一帧使用所有资源，

3种内存池

图1，memory类型和cpu/gpu访问

没有使用到的Managed Resources可以让出他们的Video Memory空间，从而提高Video Memory的最大承载带宽，这好比系统内存跟磁盘页文件的关系。

D3D9 Runtime会记载资源的时间戳(timestamp)，当Video Memory allocation fails，Runtime会根据LRU（least recent used）算法选择一个资源文件去释放。SetPriority可以改变这个顺序，优先级高的资源总会在优先级低的资源之后被选择，另外可以调用EvictManagedResources强制释放Video Memory上所有managed resources，可整理放内存碎片。另外Frame count也被记录，这样可以判断要被释放的资源是否在当前frame被使用过，如果是则表示当前frame使用的内存总量已经超过了Video Memory总量，这会导致Runtime更改选择算法为MRU(most recent used)，这对性能影响重大。Frame结束标志是EndScene函数调用，所以任何时候都要确保正确的调用了EndScene。

Developers looking to find more information about how managed resources are behaving in their application can make use of the RESOURCEMANAGER event query via the IDirect3DQuery9 interface. This only works when using the debug runtimes, so this information cannot be depended upon by the application, but it provides deep detail on the resources managed by the runtime.

Driver-Managed Resources，Driver可以实现自己的资源管理（rare），我们可以忽略Driver的资源管理，只要在创建device时候加D3DCREATE_DISABLE_DRIVER_MANAGEMENT flag。

Default Resources

Failure to specify USAGE_WRITEONLY or making a render target lockable can also impose serious performance penalties.

Calling Lock on a POOL_DEFAULT resource is more likely to cause the GPU to stall than working with a POOL_MANAGED resource, unless using certain hint flags. Depending on the location of the resource, the pointer returned could be to a temporary system memory buffer, or it can be a pointer directly into AGP memory. If it is a temporary system memory buffer, data will need to be transferred to the video memory after the Unlock call. If the video resource is not write-only, data will have to be transferred into the temporary buffer during the Lock. If it is an AGP memory area, temporary copies are avoided but the cache behavior required can result in slow performance.

Care should be taken to write a full cache line of data into any pointer to AGP aperture memory to avoid the penalty of write-combing, which induces a read-write cycle, and sequential access of the memory area is preferred. If your application needs to make random access to data during creation, and you do not wish to make use of a managed resource for the buffer, you should work with a system memory copy instead. Once the data has been created, you can then stream the result into the locked resource memory to avoid paying a high penalty for the cache write-combining operation.

The LOCK_NOOVERWRITE flag can be used to append data in an efficient manner for some resources, but ideally, multiple Lock and Unlock calls to the same resource can be avoided. Making proper use of the various lock flags is important to optimal performance, as is using a cache-friendly pattern of data access when filling locked memory.

General Recommendations

Getting the technical implementation details of resource management correct will go a long way toward achieving your performance goals for your application. Planning how the resources are presented to Direct3D and the architectural design around getting the data loaded in a timely fashion is a more complicated task. We recommend a number of best practices when making these decisions for your application:

· Pre-process all your resources. Relying on expensive load-time conversion and optimization for your resources is convenient during development, but doing so puts a great performance burden on your users' computers. Pre-processed resources are faster to load, faster to use, and give you the option of doing sophisticated off-line work.

· Avoid creating many resources per frame. The driver interactions required can serialize the CPU and GPU, and the operations involved are heavy-weight, as they often require kernel transitions. Spread out creation over several frames or reuse resources without creating/releasing them. Ideally, you should wait several frames before locking or releasing resources that were recently used to render.

· At the end of the frame, be sure to unbind all resource channels (that is, stream sources, texture stages, and current indices). Doing so will ensure that dangling references to resources are removed before they cause the resource manager to keep resources resident that are actually no longer in use.

· For textures, use compressed formats (for example, DXTn) with mip-maps, and consider making use of a texture atlas. These greatly reduce bandwidth requirements, and they can reduce the overall size of the resources, thus making them more efficient.

· For geometry, make use of indexed geometry as this helps compress vertex buffer resources, and modern video hardware is heavily optimized around reuse of vertices. By making use of programmable vertex shaders, you can compress the vertex information and expand it during the vertex processing. Again, this helps reduce bandwidth requirements and makes vertex buffer resources more efficient.

· Avoid over-optimizing your resource management. Future revisions of drivers, hardware, and the operating system can potentially cause compatibility problems if the application is tuned too heavily to a particularly combination. Since most applications are CPU-bound, expensive CPU-based management generally causes more performance issues than it solves.