Direct Compute Shader 处理图像
来源:互联网 发布:男士卡包 知乎 编辑:程序博客网 时间:2024/04/29 04:10
原文链接: http://www.codinglabs.net/tutorial_compute_shaders_filters.aspx
代码下载: http://download.csdn.net/detail/liuwendong0902/8859431
With DirectX 11 Microsoft has introduced Compute Shaders (also known as Direct Compute), which is based on programmable shaders and takes advantage of the GPU to perform high-speed general purpose computing. The idea is to use a shader, written inHLSL, to make something which is not strictly graphical. Differently from the usual shaders we write, compute shaders provide some form of memory sharing and thread synchronization which helps to improve what we can do with this tool. The execution of the compute shader is not attached to any stage of the graphics pipeline even if it has access to the graphics resources. What we do when we dispatch a compute shader call is spawning a number of GPU threads that run some shader code we wrote.
The main thing that we need to understand about compute shader is that, differently from pixel and vertex shaders, compute shaders are not bound to the input/output data; there is no implicit mapping between the thread that is executing our code and the data processed. Every thread can potentially read from any memory location and write anywhere as well. This is really the main thing about compute shaders. They provide a way to use the GPU as a massive vectorized processor for generic calculations.
I've tried to hide a bit all the Window's fluff on windows initialization, thus feel free to skip main.cpp which effectively just creates an instance of DXApplication (simply named application) and call the following methods as shown.
<pre name="code" class="cpp"> if( FAILED( InitWindow( hInstance, nCmdShow ) ) ) return 0; if(!application.initialize(g_hWnd, width, height)) return 0; // Main message loop MSG msg = {0}; while( WM_QUIT != msg.message ) { if( PeekMessage( &msg, NULL, 0, 0, PM_REMOVE ) ) { TranslateMessage( &msg ); DispatchMessage( &msg ); } else { application.render(); } } return ( int )msg.wParam;
I won't present either initialize or render because they are very trivial. The former create the DX11 device, loads the textures and so on, while the latter renders a big quad with the two textures applied. The only interesting bit is in initializewhere we invoke two methods, createInputBuffer and createOutputBuffer, to create the buffers required for the compute shader to work, and immediatly after it we load and run the compute shader (runComputeShader). We will see these three functions in detail since they are the core of our tutorial.
if( FAILED( InitWindow( hInstance, nCmdShow ) ) ) return 0; if(!application.initialize(g_hWnd, width, height)) return 0; // Main message loop MSG msg = {0}; while( WM_QUIT != msg.message ) { if( PeekMessage( &msg, NULL, 0, 0, PM_REMOVE ) ) { TranslateMessage( &msg ); DispatchMessage( &msg ); } else { application.render(); } } return ( int )msg.wParam;
Pressing F1 and F2 we switch between the two compute shaders we have. This is done in code calling runComputeShader( L"data/Desaturate.hlsl") and runComputeShader( L"data/Circles.hlsl") respectively on F1 and F2 key up event. TherunComputeShader function load the compute shader from the HLSL file and dispatch the thread groups.
Pressing F1 and F2 we switch between the two compute shaders we have. This is done in code calling runComputeShader( L"data/Desaturate.hlsl") and runComputeShader( L"data/Circles.hlsl") respectively on F1 and F2 key up event. TherunComputeShader function load the compute shader from the HLSL file and dispatch the thread groups.
It's important to think about compute shaders in terms of threads, not "pixels" or "vertices". Threads that process data. We could work on 4 pixels for every thread, or we could make physics calculations to move rigid bodies around instead. Compute shaders allow us to use the GPU as a massively powerful vectorized parallel processor!
Now we know that we can spawn threads on the GPU and that we have to organize these threads in groups. How does this translate into code? We specify the number of threads we spawn directly inside the shader code. This is done with the following syntax:
<pre name="code" class="cpp">[numthreads(X, Y, Z)]void ComputeShaderEntryPoint( /* compute shader parameters */ ){ // ... Compue shader code}
Where X, Y and Z represent the group's size per axis. This means that if we specify X = 8, Y = 8 and Z = 1 we get 8*8*1 = 64 threads per group.
<pre name="code" class="cpp">[numthreads(32, 16, 1)]void CSMain( uint3 dispatchThreadID : SV_DispatchThreadID ){ float3 pixel = readPixel(dispatchThreadID.x, dispatchThreadID.y); pixel.rgb = pixel.r * 0.3 + pixel.g * 0.59 + pixel.b * 0.11; writeToPixel(dispatchThreadID.x, dispatchThreadID.y, pixel);}
Now, what is the C++ code that starts our groups? Let's have a look:
<pre name="code" class="cpp">/*** Run a compute shader loaded by file*/bool DXApplication::runComputeShader( LPCWSTR shaderFilename ) { // Some service variables ID3D11UnorderedAccessView* ppUAViewNULL[1] = { NULL }; ID3D11ShaderResourceView* ppSRVNULL[2] = { NULL, NULL }; // We load and compile the shader. If we fail, we bail out here. if(!loadComputeShader( shaderFilename, &m_computeShader )) return false; // We now set up the shader and run it m_pImmediateContext->CSSetShader( m_computeShader, NULL, 0 ); m_pImmediateContext->CSSetShaderResources( 0, 1, &m_srcDataGPUBufferView ); m_pImmediateContext->CSSetUnorderedAccessViews( 0, 1, &m_destDataGPUBufferView, NULL ); m_pImmediateContext->Dispatch( 32, 21, 1 ); m_pImmediateContext->CSSetShader( NULL, NULL, 0 ); m_pImmediateContext->CSSetUnorderedAccessViews( 0, 1, ppUAViewNULL, NULL ); m_pImmediateContext->CSSetShaderResources( 0, 2, ppSRVNULL ); ...
So this far we have seen how to specify threads and group of threads. This is half of what we need to know about compute shaders. The other half is how to provide the data to the GPU for the compute shader to work.
<pre name="code" class="cpp">struct Pixel{ int colour;};StructuredBuffer<Pixel> Buffer0 : register(t0);
Now, to create the structured buffer with DX11 we use the following code:
<pre name="code" class="cpp">/*** Once we have the texture data in RAM we create a GPU buffer to feed the* compute shader.*/bool DXApplication::createInputBuffer(){ if(m_srcDataGPUBuffer) m_srcDataGPUBuffer->Release(); m_srcDataGPUBuffer = NULL; if(m_srcTextureData) { // First we create a buffer in GPU memory D3D11_BUFFER_DESC descGPUBuffer; ZeroMemory( &descGPUBuffer, sizeof(descGPUBuffer) ); descGPUBuffer.BindFlags = D3D11_BIND_UNORDERED_ACCESS | D3D11_BIND_SHADER_RESOURCE; descGPUBuffer.ByteWidth = m_textureDataSize; descGPUBuffer.MiscFlags = D3D11_RESOURCE_MISC_BUFFER_STRUCTURED; descGPUBuffer.StructureByteStride = 4; // We assume the data is in the // RGBA format, 8 bits per chan D3D11_SUBRESOURCE_DATA InitData; InitData.pSysMem = m_srcTextureData; if(FAILED(m_pd3dDevice->CreateBuffer( &descGPUBuffer, &InitData, &m_srcDataGPUBuffer ))) return false; // Now we create a view on the resource. DX11 requires you to send the data // to shaders using a "shader view" D3D11_BUFFER_DESC descBuf; ZeroMemory( &descBuf, sizeof(descBuf) ); m_srcDataGPUBuffer->GetDesc( &descBuf ); D3D11_SHADER_RESOURCE_VIEW_DESC descView; ZeroMemory( &descView, sizeof(descView) ); descView.ViewDimension = D3D11_SRV_DIMENSION_BUFFEREX; descView.BufferEx.FirstElement = 0; descView.Format = DXGI_FORMAT_UNKNOWN; descView.BufferEx.NumElements=descBuf.ByteWidth/descBuf.StructureByteStride; if(FAILED(m_pd3dDevice->CreateShaderResourceView( m_srcDataGPUBuffer, &descView, &m_srcDataGPUBufferView ))) return false; return true; } else return false;}
<span style="font-family: Arial, Helvetica, sans-serif; background-color: rgb(255, 255, 255);"></span><pre name="code" class="cpp">/*** We know the compute shader will output on a buffer which is * as big as the texture. Therefore we need to create a* GPU buffer and an unordered resource view.*/bool DXApplication::createOutputBuffer(){ // The compute shader will need to output to some buffer so here // we create a GPU buffer for that. D3D11_BUFFER_DESC descGPUBuffer; ZeroMemory( &descGPUBuffer, sizeof(descGPUBuffer) ); descGPUBuffer.BindFlags = D3D11_BIND_UNORDERED_ACCESS | D3D11_BIND_SHADER_RESOURCE; descGPUBuffer.ByteWidth = m_textureDataSize; descGPUBuffer.MiscFlags = D3D11_RESOURCE_MISC_BUFFER_STRUCTURED; descGPUBuffer.StructureByteStride = 4; // We assume the output data is // in the RGBA format, 8 bits per channel if(FAILED(m_pd3dDevice->CreateBuffer( &descGPUBuffer, NULL, &m_destDataGPUBuffer ))) return false; // The view we need for the output is an unordered access view. // This is to allow the compute shader to write anywhere in the buffer. D3D11_BUFFER_DESC descBuf; ZeroMemory( &descBuf, sizeof(descBuf) ); m_destDataGPUBuffer->GetDesc( &descBuf ); D3D11_UNORDERED_ACCESS_VIEW_DESC descView; ZeroMemory( &descView, sizeof(descView) ); descView.ViewDimension = D3D11_UAV_DIMENSION_BUFFER; descView.Buffer.FirstElement = 0; // Format must be must be DXGI_FORMAT_UNKNOWN, when creating // a View of a Structured Buffer descView.Format = DXGI_FORMAT_UNKNOWN; descView.Buffer.NumElements = descBuf.ByteWidth / descBuf.StructureByteStride; if(FAILED(m_pd3dDevice->CreateUnorderedAccessView( m_destDataGPUBuffer, &descView, &m_destDataGPUBufferView ))) return false; return true;}
ader view which in this case is an unordere access view.
<pre name="code" class="cpp">struct Pixel{ int colour;};StructuredBuffer<Pixel> Buffer0 : register(t0);RWStructuredBuffer<Pixel> BufferOut : register(u0);float3 readPixel(int x, int y){ float3 output; uint index = (x + y * 1024); output.x = (float)(((Buffer0[index].colour ) & 0x000000ff) ) / 255.0f; output.y = (float)(((Buffer0[index].colour ) & 0x0000ff00) >> 8 ) / 255.0f; output.z = (float)(((Buffer0[index].colour ) & 0x00ff0000) >> 16) / 255.0f; return output;}void writeToPixel(int x, int y, float3 colour){ uint index = (x + y * 1024); int ired = (int)(clamp(colour.r,0,1) * 255); int igreen = (int)(clamp(colour.g,0,1) * 255) << 8; int iblue = (int)(clamp(colour.b,0,1) * 255) << 16; BufferOut[index].colour = ired + igreen + iblue;}[numthreads(32, 16, 1)]void CSMain( uint3 dispatchThreadID : SV_DispatchThreadID ){ float3 pixel = readPixel(dispatchThreadID.x, dispatchThreadID.y); pixel.rgb = pixel.r * 0.3 + pixel.g * 0.59 + pixel.b * 0.11; writeToPixel(dispatchThreadID.x, dispatchThreadID.y, pixel);}
- Direct Compute Shader 处理图像
- Android 图像处理(一) : Shader
- Android 图像处理(一) : Shader
- Compute Shader (DX11)
- DirectX 11 - Compute Shader
- unity3d 从零开始compute shader
- Unity5 Compute && Geometry Shader
- GLSL-Compute Shader
- DirectX 11 Compute Shader tutorial
- 安卓 opengles compute shader
- GLSL(5)-compute shader小结
- GPU处理图像 Shader的入门
- GPU处理图像 Shader的入门
- GPU处理图像 Shader的入门
- GPU处理图像 Shader的入门
- 【OpenGL】GPU处理图像 Shader的入门
- 《Android群英传》图像处理之画笔特效处理--Shader篇
- Directx 计算着色器(compute shader)
- OJ第三批——Problem P: B 继承 圆到圆柱体
- linux内核——从fork()看进程管理
- yii2-整合PayPal SDK 待测
- 控件getHeight和getWidth等于0的解决办法
- Winfrom开发之通过treeview实现树形结构
- Direct Compute Shader 处理图像
- 我的这些人我的那些事
- 斯坦福《机器学习》Lesson1-3感想-------1、机器学习的基本定义
- Scala:Tuple、Array、Map与文件操作
- maven scope含义的说明
- Activity生命周期
- WebService到底是什么?
- 如何手工释放linux内存
- ExtJs中分页时带查询条件