在DirectShow中支持DXVA 2.0

来源：互联网发布：c语言函数调用过程编辑：程序博客网时间：2024/06/08 20:22

　　这几天在做dxva2硬件加速，找不到什么资料，翻译了一下微软的两篇相关文档。并准备记录一下用ffmpeg实现dxva2，将在第三篇写到。这是第二篇。，英文原址：https://msdn.microsoft.com/en-us/library/aa965245(v=vs.85).aspx
　　第一篇翻译的Direct3D device manager，链接：http://blog.csdn.net/qq_33892166/article/details/53325887
　　
　　本主题描述如何在DirectShow的解码器中支持DirectX Video Acceleration (DXVA) 2.0。具体而言，是描述解码器与视频渲染器之间的联通（communication ）。本主题不描述如何实现DXVA解码。
1.准备(Prerequisites)
　　本主题假定你熟悉如何写DirectShow过滤器。更多信息请参考DirectShow SDK文档的Writing DirectShow Filters主题（https://msdn.microsoft.com/en-us/library/dd391013(v=vs.85).aspx ）。代码简例假定解码器继承自CTransformFilter类，定义如下：

class CDecoder : public CTransformFilter{public:    static CUnknown* WINAPI CreateInstance(IUnknown *pUnk, HRESULT *pHr);    HRESULT CompleteConnect(PIN_DIRECTION direction, IPin *pPin);    HRESULT InitAllocator(IMemAllocator **ppAlloc);    HRESULT DecideBufferSize(IMemAllocator *pAlloc, ALLOCATOR_PROPERTIES *pProp);    // TODO: The implementations of these methods depend on the specific decoder.    HRESULT CheckInputType(const CMediaType *mtIn);    HRESULT CheckTransform(const CMediaType *mtIn, const CMediaType *mtOut);    HRESULT CTransformFilter::GetMediaType(int,CMediaType *);private:    CDecoder(HRESULT *pHr);    ~CDecoder();    CBasePin * GetPin(int n);    HRESULT ConfigureDXVA2(IPin *pPin);    HRESULT SetEVRForDXVA2(IPin *pPin);    HRESULT FindDecoderConfiguration(        /* [in] */  IDirectXVideoDecoderService *pDecoderService,        /* [in] */  const GUID& guidDecoder,         /* [out] */ DXVA2_ConfigPictureDecode *pSelectedConfig,        /* [out] */ BOOL *pbFoundDXVA2Configuration        );private:    IDirectXVideoDecoderService *m_pDecoderService;    DXVA2_ConfigPictureDecode m_DecoderConfig;    GUID                      m_DecoderGuid;    HANDLE                    m_hDevice;    FOURCC                    m_fccOutputFormat;};

　　本主题中，解码器是指decoder filter，包括接收压缩视频数据到输出解压缩的视频数据的过程。解码设备指图形驱动所实现的硬件视频加速器。
　　一个解码器要支持DXVA 2.0必须有以下基本步骤：
　　（1）确定一个文件类型（个人理解：应该是指根据获取到的原文件类型，找到DXVA2对应的文件类型。比如ffmpeg获取到了文件类型，要知道这个文件类型在DXVA2中对应的是什么文件类型）
　　（2）找到对应的DXVA解码器配置
　　（3）告知视频渲染设备解码器用的是DXVA
　　（4）提供一个客户分配器来分配Direct3D surfaces.
原文：
这里写图片描述
2.变更提示(Migration Notes)
　　如果你是从DXVA 1.0变更到DXVA 2.0，你需要注意这两个版本之间的以下一些重大区别：
　　(1)DXVA 2.0不使用 IAMVideoAccelerator 和 IAMVideoAcceleratorNotify 接口，因为解码器可以通过 IDirectXVideoDecoder 接口直接获得DXVA 2.0 的API
　　(2)确定文件类型时（原文：During media type negotiation），解码器不用video acceleration GUID做为子类型，子类型直接为和软解一样的解压缩的视频格式（如NV12）
　　(3)配置加速器的流程变更了。在DXVA 1.0 ，解码器调用带DXVA_ConfigPictureDecode结构的Execute函数来配置加速器。在DXVA 2.0中，解码器用IDirectXVideoDecoderService接口来配置，下一部分将会讲到。
　　(4)由解码器来分配解压缩数据的缓存，不再由视频渲染器来做这项工作。
　　(5)不再用IAMVideoAccelerator::DisplayFrame来显示解码帧，与软解一样，解码器调用IMemInputPin::Receive函数把解码帧数据传给渲染器
　　(6)解码器不再检查什么时候数据缓存是安全可更新的（原文：The decoder is no longer responsible for checking when data buffers are safe for updates）。因此DXVA 2.0没有任何方法（或函数，原文：method）是与IAMVideoAccelerator::QueryRenderStatus等效的。
　　(7)子像素混合（原文：Subpicture blending）由视频渲染器调用DXVA2.0视频处理API来做。提供子像素的解码器（如DVD解码器）应当把子像素数据发送到一个独立的输出Pin。（原文：Subpicture blending is done by the video renderer, using the DXVA2.0 video processor APIs. Decoders that provide subpictures (for example, DVD decoders) should send subpicture data on a separate output pin.）
　　对于解码操作，DXVA 2.0与DXVA 1.0用的相同的数据结构（原文:data structures）。（个人理解:这里的数据结构应该是指存储数据的结构体）
　　EVR过滤器支持DXVA 2.0。视频混合器（原文：Video Mixing Renderer filters）（VMR-7 和 VMR-9）仅支持DXVA 1.0。
3.查找解码器配置（Finding a Decoder Configuration）
　　解码器确定了输出媒体类型后，必须给DXVA解码器设备找到一个兼容的配置。你可以在输出Pin的CBaseOutputPin::CompleteConnect方法中完成这个步骤。这一步确保图形驱动器在解码器用DXVA之前支持解码器所需要的能力（原文：This step ensures that the graphics driver supports the capabilities needed by the decoder, before the decoder commits to using DXVA.）。
　　以下是为解码器设备查找配置：
　　1）为IMFGetService接口查询渲染器输入Pin
　　2）调用IMFGetService::GetService以获取IDirect3DDeviceManager9接口的指针。这项服务的GUID是MR_VIDEO_ACCELERATION_SERVICE。
　　3）调用IDirect3DDeviceManager9::OpenDeviceHandle以获取渲染器的Direct3D 设备的句柄。
　　4）调用IDirect3DDeviceManager9::GetVideoService并传入设备句柄。这个方法返回一个指向IDirectXVideoDecoderService接口的指针。
　　5）调用IDirectXVideoDecoderService::GetDecoderDeviceGuids。这个方法返回一个解码设备GUID的数组。
　　6）循环查找解码器GUID数组找到解码器支持的GUID。如，一个MPEG-2解码器，你可以查找DXVA2_ModeMPEG2_MOCOMP, DXVA2_ModeMPEG2_IDCT, 或者 DXVA2_ModeMPEG2_VLD。
　　7）当你找到一个可能的解码设备GUID，把GUID传给IDirectXVideoDecoderService::GetDecoderRenderTargets方法。这个方法返回一个渲染器目标格式数组，指定为D3DFORMAT 格式（原文：This method returns an array of render target formats, specified as D3DFORMAT values.）。
　　8）循环查找到匹配你的输出格式的渲染器目标格式。特别地，一个解码器只支持一个渲染目标格式。解码器将用这个子类型与渲染器连接。In the first call to CompleteConnect（不懂，不知道怎么翻译，大概CompleteConnect是个什么函数），解码器可以决定渲染目标格式，然后返回这个格式作为一个首选的输出类型。
　　9）调用IDirectXVideoDecoderService::GetDecoderConfigurations。传入相同的解码设备GUID，以及描述预期格式的DXVA2_VideoDesc结构。这个方法返回一个DXVA2_ConfigPictureDecode结构的数组。每个结构描述一个可能的解码器设备配置。
　　10）假定以上步骤都成功了，保存Direct3D 设备句柄、解码器设备GUID和所配置的结构（原文：and the configuration structure）。过滤器将用这个信息去创建解码器设备。
以下代码展示如何查找一个解码器设备：

HRESULT CDecoder::ConfigureDXVA2(IPin *pPin){    UINT    cDecoderGuids = 0;    BOOL    bFoundDXVA2Configuration = FALSE;    GUID    guidDecoder = GUID_NULL;    DXVA2_ConfigPictureDecode config;    ZeroMemory(&config, sizeof(config));    // Variables that follow must be cleaned up at the end.    IMFGetService               *pGetService = NULL;    IDirect3DDeviceManager9     *pDeviceManager = NULL;    IDirectXVideoDecoderService *pDecoderService = NULL;    GUID   *pDecoderGuids = NULL; // size = cDecoderGuids    HANDLE hDevice = INVALID_HANDLE_VALUE;    // Query the pin for IMFGetService.    HRESULT hr = pPin->QueryInterface(IID_PPV_ARGS(&pGetService));    // Get the Direct3D device manager.    if (SUCCEEDED(hr))    {        hr = pGetService->GetService(            MR_VIDEO_ACCELERATION_SERVICE,            IID_PPV_ARGS(&pDeviceManager)            );    }    // Open a new device handle.    if (SUCCEEDED(hr))    {        hr = pDeviceManager->OpenDeviceHandle(&hDevice);    }     // Get the video decoder service.    if (SUCCEEDED(hr))    {        hr = pDeviceManager->GetVideoService(            hDevice, IID_PPV_ARGS(&pDecoderService));    }    // Get the decoder GUIDs.    if (SUCCEEDED(hr))    {        hr = pDecoderService->GetDecoderDeviceGuids(            &cDecoderGuids, &pDecoderGuids);    }    if (SUCCEEDED(hr))    {        // Look for the decoder GUIDs we want.        for (UINT iGuid = 0; iGuid < cDecoderGuids; iGuid++)        {            // Do we support this mode?            if (!IsSupportedDecoderMode(pDecoderGuids[iGuid]))            {                continue;            }            // Find a configuration that we support.             hr = FindDecoderConfiguration(pDecoderService, pDecoderGuids[iGuid],                &config, &bFoundDXVA2Configuration);            if (FAILED(hr))            {                break;            }            if (bFoundDXVA2Configuration)            {                // Found a good configuration. Save the GUID and exit the loop.                guidDecoder = pDecoderGuids[iGuid];                break;            }        }    }    if (!bFoundDXVA2Configuration)    {        hr = E_FAIL; // Unable to find a configuration.    }    if (SUCCEEDED(hr))    {        // Store the things we will need later.        SafeRelease(&m_pDecoderService);        m_pDecoderService = pDecoderService;        m_pDecoderService->AddRef();        m_DecoderConfig = config;        m_DecoderGuid = guidDecoder;        m_hDevice = hDevice;    }    if (FAILED(hr))    {        if (hDevice != INVALID_HANDLE_VALUE)        {            pDeviceManager->CloseDeviceHandle(hDevice);        }    }    SafeRelease(&pGetService);    SafeRelease(&pDeviceManager);    SafeRelease(&pDecoderService);    return hr;}HRESULT CDecoder::FindDecoderConfiguration(    /* [in] */  IDirectXVideoDecoderService *pDecoderService,    /* [in] */  const GUID& guidDecoder,     /* [out] */ DXVA2_ConfigPictureDecode *pSelectedConfig,    /* [out] */ BOOL *pbFoundDXVA2Configuration    ){    HRESULT hr = S_OK;    UINT cFormats = 0;    UINT cConfigurations = 0;    D3DFORMAT                   *pFormats = NULL;     // size = cFormats    DXVA2_ConfigPictureDecode   *pConfig = NULL;      // size = cConfigurations    // Find the valid render target formats for this decoder GUID.    hr = pDecoderService->GetDecoderRenderTargets(        guidDecoder,        &cFormats,        &pFormats        );    if (SUCCEEDED(hr))    {        // Look for a format that matches our output format.        for (UINT iFormat = 0; iFormat < cFormats;  iFormat++)        {            if (pFormats[iFormat] != (D3DFORMAT)m_fccOutputFormat)            {                continue;            }            // Fill in the video description. Set the width, height, format,             // and frame rate.            DXVA2_VideoDesc videoDesc = {0};            FillInVideoDescription(&videoDesc); // Private helper function.            videoDesc.Format = pFormats[iFormat];            // Get the available configurations.            hr = pDecoderService->GetDecoderConfigurations(                guidDecoder,                &videoDesc,                NULL, // Reserved.                &cConfigurations,                &pConfig                );            if (FAILED(hr))            {                break;            }            // Find a supported configuration.            for (UINT iConfig = 0; iConfig < cConfigurations; iConfig++)            {                if (IsSupportedDecoderConfig(pConfig[iConfig]))                {                    // This configuration is good.                    *pbFoundDXVA2Configuration = TRUE;                    *pSelectedConfig = pConfig[iConfig];                    break;                }            }            CoTaskMemFree(pConfig);            break;        } // End of formats loop.    }    CoTaskMemFree(pFormats);    // Note: It is possible to return S_OK without finding a configuration.    return hr;}

　　由于这是个通用的例子，所以有些逻辑就放置在了辅助函数里面，需要由解码器来实现。以下是所用到的辅助函数：

// Returns TRUE if the decoder supports a given decoding mode.BOOL IsSupportedDecoderMode(const GUID& mode);// Returns TRUE if the decoder supports a given decoding configuration.BOOL IsSupportedDecoderConfig(const DXVA2_ConfigPictureDecode& config);// Fills in a DXVA2_VideoDesc structure based on the input format.void FillInVideoDescription(DXVA2_VideoDesc *pDesc);

4.通知视频渲染器（Notifying the Video Renderer）
　　如果解码器找到了解码配置，下一步就是通知视频渲染器将要使用硬件加速来解码。你可以在CompleteConnect方法中完成这个步骤。这一步必须在选择分配器之前做，因为它会影响分配器如何选择。
　　1）为IMFGetService接口查询渲染器的输入Pin（原文：Query the renderer’s input pin for the IMFGetService interface.）
　　2）调用IMFGetService::GetService获取指向IDirectXVideoMemoryConfiguration接口的指针。该服务的GUID是MR_VIDEO_ACCELERATION_SERVICE。
　　3）循环调用IDirectXVideoMemoryConfiguration::GetAvailableSurfaceTypeByIndex，从0增长dwTypeIndex 变量。当该方法在pdwType 参数返回DXVA2_SurfaceType_DecoderRenderTarget 时停止循环。这一步确保视频渲染器支持硬件加速转码。对于EVR过滤器而言这一步总是成功的。
　　4）如果上一步成功，用DXVA2_SurfaceType_DecoderRenderTarget参数调用IDirectXVideoMemoryConfiguration::SetSurfaceType。用这个参数调用SetSurfaceType将视频渲染器置于DXVA模式。当视频渲染器处于这种模式时，解码器必须提供它自己的分配器。
以下代码展示如何通知视频渲染器：

HRESULT CDecoder::SetEVRForDXVA2(IPin *pPin){    HRESULT hr = S_OK;    IMFGetService                       *pGetService = NULL;    IDirectXVideoMemoryConfiguration    *pVideoConfig = NULL;    // Query the pin for IMFGetService.    hr = pPin->QueryInterface(__uuidof(IMFGetService), (void**)&pGetService);    // Get the IDirectXVideoMemoryConfiguration interface.    if (SUCCEEDED(hr))    {        hr = pGetService->GetService(            MR_VIDEO_ACCELERATION_SERVICE, IID_PPV_ARGS(&pVideoConfig));    }    // Notify the EVR.     if (SUCCEEDED(hr))    {        DXVA2_SurfaceType surfaceType;        for (DWORD iTypeIndex = 0; ; iTypeIndex++)        {            hr = pVideoConfig->GetAvailableSurfaceTypeByIndex(iTypeIndex, &surfaceType);            if (FAILED(hr))            {                break;            }            if (surfaceType == DXVA2_SurfaceType_DecoderRenderTarget)            {                hr = pVideoConfig->SetSurfaceType(DXVA2_SurfaceType_DecoderRenderTarget);                break;            }        }    }    SafeRelease(&pGetService);    SafeRelease(&pVideoConfig);    return hr;}

　　如果解码器找到了有效的配置并成功通知了视频渲染器，解码器就可以用DXVA来解码了。解码器必须给输出Pin实现客户分配器（原为：a custom allocator）,如下面一部分描述的。
5.分配解码数据缓存（Allocating Uncompressed Buffers）
　　在DXVA 2.0中，解码器负责分配作为解压缩视频数据缓存的Direct3D surfaces。因此，解码器必须实现一个创建surfaces的custom allocator（不知道怎么翻译，不翻译了，意思大概是由用户来实现的分配器）。这个分配器提供的media samples会有一个指向Direct3D surfaces的指针。EVR通过调用这个media sample的IMFGetService::GetService取回这个指向surface的指针。这个服务的标识符是MR_BUFFER_SERVICE。
　　要实现custom allocator，需执行以下步骤：
　　1）给media samples定义一个类。这个类继承自CMediaSample。在这个类中，做以下：
　　　　a)保存一个指向the Direct3D surface的指针；
　　　　b)实现IMFGetService接口。在GetService方法中，如果service GUID i是MR_BUFFER_SERVICE，query the Direct3D surface for the requested interface。否则，GetService 会返回MF_E_UNSUPPORTED_SERVICE。
　　　　c)重写CMediaSample::GetPointer 方法来返回 E_NOTIMPL.
　　2）给the allocator定义一个类。the allocator可以继承自CBaseAllocator类。在这个类中，做以下：
　　　　a)重写CBaseAllocator::Alloc方法。在这个方法中，调用IDirectXVideoAccelerationService::CreateSurface创建surface。（ IDirectXVideoDecoderService 接口从IDirectXVideoAccelerationService继承这个方法）。
　　　　b)重写CBaseAllocator::Free方法释放surface。
　　3）在你的过滤器的输出Pin中，重写CBaseOutputPin::InitAllocator方法。在这个方法中，创建一个你实现的custom allocator的实例。
　　4）在你的filter中，实现CTransformFilter::DecideBufferSize方法。pProperties 参数表明EVR所需的surface的数量。把这个值增加的解码器所需的大小，并在allocator中调用IMemAllocator::SetProperties。
以下代码展示如何实现media sample类：

class CDecoderSample : public CMediaSample, public IMFGetService{    friend class CDecoderAllocator;public:    CDecoderSample(CDecoderAllocator *pAlloc, HRESULT *phr)        : CMediaSample(NAME("DecoderSample"), (CBaseAllocator*)pAlloc, phr, NULL, 0),          m_pSurface(NULL),          m_dwSurfaceId(0)    {     }    // Note: CMediaSample does not derive from CUnknown, so we cannot use the    //       DECLARE_IUNKNOWN macro that is used by most of the filter classes.    STDMETHODIMP QueryInterface(REFIID riid, void **ppv)    {        CheckPointer(ppv, E_POINTER);        if (riid == IID_IMFGetService)        {            *ppv = static_cast<IMFGetService*>(this);            AddRef();            return S_OK;        }        else        {            return CMediaSample::QueryInterface(riid, ppv);        }    }    STDMETHODIMP_(ULONG) AddRef()    {        return CMediaSample::AddRef();    }    STDMETHODIMP_(ULONG) Release()    {        // Return a temporary variable for thread safety.        ULONG cRef = CMediaSample::Release();        return cRef;    }    // IMFGetService::GetService    STDMETHODIMP GetService(REFGUID guidService, REFIID riid, LPVOID *ppv)    {        if (guidService != MR_BUFFER_SERVICE)        {            return MF_E_UNSUPPORTED_SERVICE;        }        else if (m_pSurface == NULL)        {            return E_NOINTERFACE;        }        else        {            return m_pSurface->QueryInterface(riid, ppv);        }    }    // Override GetPointer because this class does not manage a system memory buffer.    // The EVR uses the MR_BUFFER_SERVICE service to get the Direct3D surface.    STDMETHODIMP GetPointer(BYTE ** ppBuffer)    {        return E_NOTIMPL;    }private:    // Sets the pointer to the Direct3D surface.     void SetSurface(DWORD surfaceId, IDirect3DSurface9 *pSurf)    {        SafeRelease(&m_pSurface);        m_pSurface = pSurf;        if (m_pSurface)        {            m_pSurface->AddRef();        }        m_dwSurfaceId = surfaceId;    }    IDirect3DSurface9   *m_pSurface;    DWORD               m_dwSurfaceId;};

以下代码展示如何在allocator中实现Alloc方法

HRESULT CDecoderAllocator::Alloc(){    CAutoLock lock(this);    HRESULT hr = S_OK;    if (m_pDXVA2Service == NULL)    {        return E_UNEXPECTED;    }    hr = CBaseAllocator::Alloc();    // If the requirements have not changed, do not reallocate.    if (hr == S_FALSE)    {        return S_OK;    }    if (SUCCEEDED(hr))    {        // Free the old resources.        Free();        // Allocate a new array of pointers.        m_ppRTSurfaceArray = new (std::nothrow) IDirect3DSurface9*[m_lCount];        if (m_ppRTSurfaceArray == NULL)        {            hr = E_OUTOFMEMORY;        }        else        {            ZeroMemory(m_ppRTSurfaceArray, sizeof(IDirect3DSurface9*) * m_lCount);        }    }    // Allocate the surfaces.    if (SUCCEEDED(hr))    {        hr = m_pDXVA2Service->CreateSurface(            m_dwWidth,            m_dwHeight,            m_lCount - 1,            (D3DFORMAT)m_dwFormat,            D3DPOOL_DEFAULT,            0,            DXVA2_VideoDecoderRenderTarget,            m_ppRTSurfaceArray,            NULL            );    }    if (SUCCEEDED(hr))    {        for (m_lAllocated = 0; m_lAllocated < m_lCount; m_lAllocated++)        {            CDecoderSample *pSample = new (std::nothrow) CDecoderSample(this, &hr);            if (pSample == NULL)            {                hr = E_OUTOFMEMORY;                break;            }            if (FAILED(hr))            {                break;            }            // Assign the Direct3D surface pointer and the index.            pSample->SetSurface(m_lAllocated, m_ppRTSurfaceArray[m_lAllocated]);            // Add to the sample list.            m_lFree.Add(pSample);        }    }    if (SUCCEEDED(hr))    {        m_bChanged = FALSE;    }    return hr;}

以下代码是Free方法：

void CDecoderAllocator::Free(){    CMediaSample *pSample = NULL;    do    {        pSample = m_lFree.RemoveHead();        if (pSample)        {            delete pSample;        }    } while (pSample);    if (m_ppRTSurfaceArray)    {        for (long i = 0; i < m_lAllocated; i++)        {            SafeRelease(&m_ppRTSurfaceArray[i]);        }        delete [] m_ppRTSurfaceArray;    }    m_lAllocated = 0;}

6.解码（Decoding）
　　调用IDirectXVideoDecoderService::CreateVideoDecoder方法创建解码器设备，该方法返回一个指向解码器设备IDirectXVideoDecoder接口的指针。
　　对每一帧，调用IDirect3DDeviceManager9::TestDevice来测试设备句柄。如果设备改变了，方法将返回DXVA2_E_NEW_VIDEO_DEVICE。如果这种情况发生，做以下：
　　1)调用IDirect3DDeviceManager9::CloseDeviceHandle关闭设备句柄
　　2)释放IDirectXVideoDecoderService 和IDirectXVideoDecoder 指针
　　3)打开一个新的设备句柄
　　4)确定一个新的解码器配置，如3所述。
　　5)创建一个新的解码器设备。
假定设备句柄有效，解码进程以如下步骤工作：
　　1)调用IDirectXVideoDecoder::BeginFrame
　　2)做以下，一次或多次：
　　　　a)调用IDirectXVideoDecoder::GetBuffer获取一个DXVA解码器缓存
　　　　b)填充缓存
　　　　c)调用IDirectXVideoDecoder::ReleaseBuffer
　　3）调用IDirectXVideoDecoder::Execute对该帧执行解码操作
　　DXVA 2.0解码操作所用数据的结构与DXVA 1.0相同。
　　在每一对BeginFrame/Execute的调用之间，你可能要多次调用GetBuffer，但每种DXVA缓存类型只能一次。如果你对同一种缓存类型调用两次，数据将会覆盖。
　　调用Execute之后，调用IMemInputPin::Receive把该帧传给视频渲染器，这与软解一样。Receive方法是异步的，它返回之后，解码器可以继续解码下一帧。显示驱动器（display driver）阻止任何解码命令在缓存使用期间覆写缓存。解码器不应该在渲染器释放sample之前重用surface来解码另一帧数据。当渲染器释放sample之后，分配器把sample放回可用sample池中。要获取下一个可用sample，调用CBaseOutputPin::GetDeliveryBuffer，它转而调用IMemAllocator::GetBuffer（原文：which in turn calls IMemAllocator::GetBuffer）。

0 0