在OpenGL中实现Geometry Instancing

来源：互联网发布：微信聊天记录数据恢复编辑：程序博客网时间：2024/05/17 08:52

声明：本文仅供个人学习交流使用，版权归原作者所有。

译者：tyxxy

Emial：tyxxyhm@hotmail.com。

如需转载请注明出处：http://tyxxy.spaces.live.com/

原文地址：http://blog.benjamin-thaut.de/?p=29

本人也在学习阶段，若有翻译得不准确的地方，请不吝赐教。也欢迎有相同兴趣的同仁前来探讨。为了保证原文原意不被曲解，特保留英文原文，若有翻译不清楚的地方，请参考原文。2008-5-11

Introduction

简介

To coincide with the release of dx10 class gpus, instancing has become available in OpenGL due to the EXT_draw_instanced extension.

为了与DX10级别的GPU发行同步，OpenGL添加了扩展EXT_draw_instanced，从而使OpenGL具备实例化(instancing)绘制的能力。

By itself this extension, which enables you to draw a vertexbuffer multiple times in conjunction with a instance ID accessable in the vertexshader, is of little use.

扩展本身允许你多次绘制同一个顶点缓存(vertexbuffer)，且vertexshader中可以访问实例的ID，但单有这些还是不怎么有用。

But if you look closer at the extension string you will notice another new extension “EXT_bindable_uniform” which enables you to specify a buffer object as data source for an uniform. With these the GLSL shader has access to much more data. With a geforce8 it is possible to have 12 of these buffers each having a maximum of 64kb, thus in total of 768kb can be stored. The most important use of these buffers is that data only has to be uploaded once to the card, where it can later by reaccessed without the need to resend the data. This allows you to store the worldmatrix transformations of the drawn objects on the graphics card; the subsequentual performance increase is obvious.

但是如果认真查看扩展的字符串，你会注意到另一个新的扩展“EXT_bindable_uniform”，可以用该扩展指定一个缓存对象作为一个统一参数的数据源。有了这个，GLSL shader就能访问到多得多的数据。在GeForce8xxx显卡中，可以有12个这样的缓存，每个缓存可以存放最多64kb数据，这样总共能存放768kb数据。这些缓存最重要的用途是：数据可以一次性传输到显卡中，随后对其访问就不需要再次发送数据了。你可以将物体的world matrix存储到显卡中，这样潜在的性能提升会是很明显的。

How-To

如何做

First we have to create a buffer on the graphics card which we store the objects world matrix data in, since the amount of data a buffer can hold is limited we have to divide the data between various buffers.

首先我们在显卡上创建一个buffer，将物体的world matrix存储在里边。由于单个buffer所能够存储的数据量是有限的，故得将其分割并将每部分存储在不同的buffer中。

见程序列表清单List 1

C++ Code:

List 1：在显卡上创建world matrix buffers

1. mat4 *WorldMats;

2. //how many of the objects we wish to draw

3. int iNumberOfInstances;

4. //Buffer Array

5. GLuint *UniformBuffers;

6. //the size of a single buffer

7. int *UniformBuffersSize;

8. //total number buffers

9. int AnzBuffers;

10.

11. void Init(){

12.

13. ...

14.

15. iNumberOfInstances = 65535;

16. // create the world matrix of all instances

17. WorldMats = new mat4[iNumberOfInstances];

18.

19. ...

20.

21. #define DRAWS 512

22. int remaining=iNumberOfInstances;

23.

24. UniformBuffers = new GLuint[AnzBuffers];

25. UniformBuffersSize = new int[AnzBuffers];

26.

27. for(int i=0;i<AnzBuffers;i++){

28. //the size of the remaining/current buffer

29. UniformBuffersSize[i] = remaining;

30. if(UniformBuffersSize[i]> DRAWS)

31. UniformBuffersSize[i] = DRAWS;

32. //create and bind the buffer

33. glGenBuffers(1,&UniformBuffers[i]);

34. glBindBuffer(GL_UNIFORM_BUFFER_EXT, UniformBuffers[i]);

35. //establish the size and sort of the buffer

36. //the buffer has to be at least the same size as

37. //the uniform in the shader

38. glBufferData(GL_UNIFORM_BUFFER_EXT, 16*sizeof(float)*DRAWS, NULL, GL_STATIC_READ);

39. //send the data

40. glBufferSubData(GL_UNIFORM_BUFFER_EXT, 0, 16*sizeof(float)*UniformBuffersSize[i], &WorldMats[i*DRAWS]);

41. //count down the remaining matrixs

42. remaining -= DRAWS;

43. }

44.

45. //finished, thus unbind the buffer

46. glBindBuffer(GL_UNIFORM_BUFFER_EXT, 0);

}

Now that the data is stored the graphics card, we can turn to the actual rendering (knowledge of vertexbuffers and GLSL shaders is assumed)

这样数据就保存到显卡中了，现在我们来看实际的渲染代码（假设你已经有顶点buffer和GLSL shader的相关知识）。代码List 2。

List 2：渲染代码

1. void Draw(){

3. ...

5. //loop through the buffers

6. for(int i=0;i<AnzBuffers;i++){

7. InstancingShader->BindBufferToUniform(0,UniformBuffers[i]);//将缓存绑定到统一参数中

9. //bind the instancing shader

10. InstancingShader->Use();

11. //draw

12. Wuerfel->DrawInstanced(UniformBuffersSize[i]);

13. //unbind the current instancing shader

14. UnloadShader();

15. }

16. //unbind the buffer (bind to 0)

17. InstancingShader->BindBufferToUniform(0,0);

18. }

InstancingShader->BindBufferToUniform(0,UniformBuffers[i]);

Inside this function I bind the buffer to the uniform, with the openGL function

glUniformBufferEXT(program, location, buffer)

1/ the handle/ID of the shader program object

2/ the location of the uniform

3/ the buffers ID.

在函数InstancingShader->BindBufferToUniform(0,UniformBuffers[i]);中，调用OpenGL函数glUniformBufferExt(program, location, buffer)将缓存绑定到统一参数中。

函数glUniformBufferExt参数的解析如下：

1、 Program：shader program object的句柄/ID;

2、 uniform：统一参数的位置

3、 buffer：缓存ID

The determination of this uniforms location is similar to the usual method of locating uniforms in GLSL. It’s very important that the binding of the buffer happens before the use of the shader. If the shader is currently in use the binding attempt will be simply ignored.

该统一参数位置的确定跟GLSL常用的统一参数的定位方法类似。很重要的一点是在shader使用缓存之前必须先绑定缓存。如果当前shader正在使用中，那么绑定尝试将会被简单忽略。

Wuerfel->DrawInstanced(UniformBuffersSize[i]);

The actual rendering. This is the same as the standard vertexarray methods except that glDrawArraysInstancedEXT is used instead of glDrawArrays with the last parameter containing the number of instances to be drawn. For indexed VBO’s this would be glDrawElementsInstancedEXT. Instancing objects that are not constructed from triangles or quads are more difficult to draw since MultiDrawArraysInstanced etc are not available. To draw models that are constructed from triangle strips you must use an extra instance for each triangle strip.

函数Wuerfel->DrawInstanced(UniformBuffersSize[i]);是实质的绘制调用。所调用的函数glDrawArraysInstancedEXT除了多一个表示绘制实例数量的参数外，其他跟glDrawArrays对标准顶点数组的操作方法无异。对于索引VBO，则所用函数是glDrawElementsInstancedEXT。不是由三角形或者四边形构建的实例对象则比较难绘制，因为没有MultiDrawArraysInstanced方法。为了绘制由多个三角形条带构建的模型，你必须为每个三角行条带使用一个额外的实例。

Last but not least the GLSL instancing shader

最后但也一样很重要的是GLSL实例化shader。

List 3：GLSL shader

1. #version 120

2. #extension GL_EXT_bindable_uniform: enable

3. #extension GL_EXT_gpu_shader4: enable

5. bindable uniform mat4 WorldMats[512];

7. void main(void){

8. vec4 position = WorldMats[gl_InstanceID] * gl_Vertex;

9. position = gl_ModelViewMatrix * position;

10. gl_Position = gl_ProjectionMatrix * position;

11.

12. vec3 normal = mat3(WorldMats[gl_InstanceID]) * gl_Normal;

13. normal = mat3(gl_ModelViewMatrix) * normal;

14.

15. vec3 lightVectorView = normalize(gl_LightSource[0].position.xyz - position.xyz);

16.

17. gl_FrontColor = ((gl_LightSource[0].diffuse * max(dot(normal, lightVectorView), 0.0)) + gl_LightSource[0].ambient + 0.2) * gl_Color;

18. }

The defines at the beginning are necessary to specify that we use shader model 4.0 and the EXT_bindable_uniform extension. The most important parts of the shader are the first 3 lines of the main function. There the individual world matrix of each instance is accessed with the instance ID to compute the correct position of each vertex. In this case the view matrix would be the OpenGL model view matrix. The rest of the main functions creates a simple per vertex diffuse lighting as the fixed function pipeline does. To avoid problems with transforming normals into the worldspace, avoid scaling within the matrices. If you want this method to work in all cases you have to compute a normal matrix per instance by yourself and pass it to the shader too.

开始处的定义是必须的，它用于指示将采用shader model 4.0和EXT_bindable_uniform扩展。Shader中最重要的部分是主函数的前三句。在那里，每个实例各自的world matrix都用Instance ID来访问，并用于计算每个顶点的正确位置。在这种情况下，view matrix将会是OpenGL中的modelView matrix（没有乘上model矩阵的modelView matrix当然就是view matrix啦）。主函数的其余部分创建了一个简单的逐顶点diffuse光照，这跟固定功能流水线所做的工作一样。为了避免将法线变换到世界空间时出现的问题，不要在矩阵中做缩放。如果希望该方法在所有情况下都能正常工作，那就需要对每一个实例计算一个归一化的矩阵并将其传到shader中。

Performance

性能评价

In the following diagram we compare the three drawing methods (X axis is the number of drawn instances per frame, the Y axis shows the frames per second)

在下图中，我们比较三种绘制方法(X轴是每帧绘制实例的个数，Y轴是FPS)

performance

We can conclude that EXT_draw_instanced is about twice as fast as nvidias pseudo instancing which in turn is about twice as fast as the standard drawing method. With instancing a geforce 8800 GTX is capable of drawing 131072 cubes 45 times a second.

我们推断， EXT_draw_instanced比nvidias伪实例化速度大约快两倍，而伪实例化又是常规绘制方法速度的两倍。采用实例化技术，Geforce8800GTX每秒能够绘制131072个方盒子45次。

Since the number of objects the user wishes to draw at the same time varies, I’ve benchmarked various sizes. In the following diagram I've drawn 131072 cubes (X axis is the number of cubes drawn with one call, the Y axis shows the frames per second).

由于用户在每次绘制时，所期望绘制的对象数量不断变化，我已对不同数量的绘制做了一个基准。在接下来的图表中，我绘制了131072个立方盒(X轴是每次调用绘制的立方盒数量，Y轴是FPS)

batch-fps

Drawing 16 cubes a call, this method has no performance increase compared to pseudo instancing. With group sizes of 256 or larger the performance increase is much smaller ( a 0.5fps with each doubling of the groupsize )

Due to the buffer size limitation of EXT_bindable_uniform the maximum group size is 1024.

每次调用绘制16个立方盒，该方法跟伪实例化相比，性能上没有提升。每组的数量大于等于256时，性能提升有限（组大小每增大一倍，提高0.5fps）。

由于EXT_bindable_uniform缓存大小的限制，最大的组大小是1024。

Conclusion

Instancing performs best if the objects to be drawn are static, if the objects are moving, requiring you to update the world matrices each frame, the benefits over pseudo-instancing are greatly reduced. Because you can update the data at once and not send the world matrices one by one, as it is done by pseudo instancing, it would be still faster than pseudo instancing. To sum up the new draw call is definitely effective and coupled with the bindable uniform extension very useful. The downsides though are at the moment only a limited number of graphics cards support the extensions as well as the current driver’s instability with their usage. I regularly experienced driver memory access violations when I wanted to terminate my program.

结论：

当绘制的物体时静态的时，实例化能工作得最好。如果物体是不停移动的，你每帧都需要更新世界矩阵，这样相对于伪实例化的优势就降低了。由于你可以一次更新数据，但不能逐个发送world matrix，就像伪实例化做的一样，但这还是比伪实例化快一些。总得来说，新的绘制调用肯定是很高效的，与可绑定统一参数扩展一起使用时，尤为有用。但是目前支持该扩展的显卡非常有限，而且使用该方法时，当前的驱动也不甚稳定。当我结束程序时会经常碰到驱动内存访问违例的问题。

本文来自CSDN博客，转载请标明出处：http://blog.csdn.net/swq0553/archive/2010/12/08/6063654.aspx