OpenGL Pixel Buffer Object (PBO)

来源：互联网发布：企业网站seo教程编辑：程序博客网时间：2024/05/16 17:22

(2013-09-17 14:22:56)

Download: pboUnpack.zip, pboPack.zip

Overview
Creating PBO
Mapping PBO
Example: Streaming Texture Uploads with PBO
Example: Asynchronous Readback with PBO

Overview

OpenGL <wbr>Pixel <wbr>Buffer <wbr>Object <wbr>(PBO)

OpenGL PBO

OpenGL ARB_pixel_buffer_object extension is very close to ARB_vertex_buffer_object. It simply expands ARB_vertex_buffer_object extension in order to store not only vertex data but also pixel data into the buffer objects. This buffer object storing pixel data is called Pixel Buffer Object (PBO). ARB_pixel_buffer_object extension borrows all VBO framework and APIs, plus, adds 2 additional "target" tokens. These tokens assist the PBO memory manger (OpenGL driver) to determine the best location of the buffer object; system memory, shared memory or video memory. Also, the target tokens clearly specify that the bound PBO will be used in one of 2 different operations; GL_PIXEL_PACK_BUFFER_ARB to transfer pixel data to a PBO, or GL_PIXEL_UNPACK_BUFFER_ARB to transfer pixel data from PBO.

For example, glReadPixels() and glGetTexImage() are "pack" pixel operations, and glDrawPixels(), glTexImage2D() and glTexSubImage2D() are "unpack" operations. When a PBO is bound with GL_PIXEL_PACK_BUFFER_ARB token, glReadPixels() reads pixel data from a OpenGL framebuffer and write (pack) the data into the PBO. When a PBO is bound with GL_PIXEL_UNPACK_BUFFER_ARB token, glDrawPixels() reads (unpack) pixel data from the PBO and copy them to OpenGL framebuffer.

The main advantage of PBO is fast pixel data transfer to and from a graphics card through DMA (Direct Memory Access) without involing CPU cycles. And, the other advantage of PBO is asynchronous DMA transfer. Let's compare a conventional texture transfer method with using a Pixel Buffer Object. The left side of the following diagram is a conventional way to load texture data from an image source (image file or video stream). The source is first loaded into the system memory, and then, copied from the system memory to an OpenGL texture object with glTexImage2D(). These 2 transfer processes (load and copy) are all performed by CPU.

Texture loading without PBO

Texture loading with PBO

On the contrary in the right side diagram, the image source can be directly loaded into a PBO, which is controlled by OpenGL. CPU still involves to load the source to the PBO, but, not for transferring the pixel data from a PBO to a texture object. Instead, GPU (OpenGL driver) manages copying data from a PBO to a texture object. This means OpenGL performs a DMA transfer operation without wasting CPU cycles. Further, OpenGL can schedule an asynchronous DMA transfer for later execution. Therefore, glTexImage2D() returns immediately, and CPU can perform something else without waiting the pixel transfer is done.

There are 2 major PBO approaches to improve the performance of the pixel data transfer: streaming texture update and asynchronous read-back from the framebuffer.

Creating PBO

As mentioned earlier, Pixel Buffer Object borrows all APIs from Vertex Buffer Object. The only difference is there are 2 additional tokens for PBOs: GL_PIXEL_PACK_BUFFER_ARB andGL_PIXEL_UNPACK_BUFFER_ARB. GL_PIXEL_PACK_BUFFER_ARB is for transferring pixel data from OpenGL to your application, and GL_PIXEL_UNPACK_BUFFER_ARB means transferring pixel data from an application to OpenGL. OpenGL refers to these tokens to determine the best memory space of a PBO, for example, a video memory for uploading (unpacking) textures, or system memory for reading (packing) the framebuffer. However, these target tokens are solely hint. OpenGL driver decides the appropriate location for you.

Creating a PBO requires 3 steps;

Generate a new buffer object with glGenBuffersARB().
Bind the buffer object with glBindBufferARB().
Copy pixel data to the buffer object with glBufferDataARB().

If you specify a NULL pointer to the source array in glBufferDataARB(), then PBO allocates only a memory space with the given data size. The last parameter of glBufferDataARB() is another performance hint for PBO to provide how the buffer object will be used. GL_STREAM_DRAW_ARB is for streaming texture upload and GL_STREAM_READ_ARB is for asynchronous framebuffer read-back.

Please check VBO for more details.

Mapping PBO

PBO provides a memory mapping mechanism to map the OpenGL controlled buffer object to the client's memory address space. So, the client can modify a portion of the buffer object or the entire buffer by using glMapBufferARB() and glUnmapBufferARB().

void* glMapBufferARB(GLenum target, GLenum access) GLboolean glUnmapBufferARB(GLenum target)

glMapBufferARB() returns the pointer to the buffer object if success. Otherwise it returns NULL. The target parameter is either GL_PIXEL_PACK_BUFFER_ARB or GL_PIXEL_UNPACK_BUFFER_ARB. The second parameter, access specifies what to do with the mapped buffer; read data from the PBO (GL_READ_ONLY_ARB), write data to the PBO (GL_WRITE_ONLY_ARB), or both (GL_READ_WRITE_ARB).

Note that if GPU is still working with the buffer object, glMapBufferARB() will not return until GPU finishes its job with the corresponding buffer object. To avoid this stall(wait), call glBufferDataARB() with NULL pointer right before glMapBufferARB(). Then, OpenGL will discard the old buffer, and allocate new memory space for the buffer object.

The buffer object must be unmapped with glUnmapBufferARB() after use of the PBO. glUnmapBufferARB() returns GL_TRUE if success. Otherwise, it returns GL_FALSE.

Example: Streaming Texture Uploads

OpenGL Pixel Buffer Object (PBO)

Download the source and binary: pboUnpack.zip.

This demo application uploads (unpack) streaming textures to an OpenGL texture object using PBO. You can switch to the different transfer modes (single PBO, double PBOs and without PBO) by pressing the space key, and compare the performance differences.

The texture sources are written directly on the mapped pixel buffer every frame in the PBO modes. Then, these data are transferred from the PBO to a texture object using glTexSubImage2D(). By using PBO, OpenGL can perform asynchronous DMA transfer between a PBO and a texture object. It significantly increases the texture upload performance. If asynchronous DMA transfer is supported, glTexSubImage2D() should return immediately, and CPU can process other jobs without waiting the actual texture copy.

Streaming texture uploads with 2 PBOs

To maximize the streaming transfer performance, you may use multiple pixel buffer objects. The diagram shows that 2 PBOs are used simultaneously; glTexSubImage2D() copies the pixel data from a PBO while the texture source is being written to the other PBO.

For nth frame, PBO 1 is used for glTexSubImage2D() and PBO 2 is used to get new texture source. For n+1th frame, 2 pixel buffers are switching the roles and continue to update the texture. Because of asynchronous DMA transfer, the update and copy processes can be performed simultaneously. CPU updates the texture source to a PBO while GPU copies texture from the other PBO.

// "index" is used to copy pixels from a PBO to a texture object // "nextIndex" is used to update pixels in the other PBO index = (index + 1) % 2; nextIndex = (index + 1) % 2; // bind the texture and PBO glBindTexture(GL_TEXTURE_2D, textureId); glBindBufferARB(GL_PIXEL_UNPACK_BUFFER_ARB, pboIds[index]); // copy pixels from PBO to texture object // Use offset instead of ponter.glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, WIDTH, HEIGHT, GL_BGRA, GL_UNSIGNED_BYTE, 0); // bind PBO to update texture source glBindBufferARB(GL_PIXEL_UNPACK_BUFFER_ARB, pboIds[nextIndex]); // Note that glMapBufferARB() causes sync issue. // If GPU is working with this buffer, glMapBufferARB() will wait(stall) // until GPU to finish its job. To avoid waiting (idle), you can call // first glBufferDataARB() with NULL pointer before glMapBufferARB(). // If you do that, the previous data in PBO will be discarded and // glMapBufferARB() returns a new allocated pointer immediately // even if GPU is still working with the previous data. glBufferDataARB(GL_PIXEL_UNPACK_BUFFER_ARB, DATA_SIZE, 0, GL_STREAM_DRAW_ARB); // map the buffer object into client's memory GLubyte* ptr = (GLubyte*)glMapBufferARB(GL_PIXEL_UNPACK_BUFFER_ARB, GL_WRITE_ONLY_ARB); if(ptr) { // update data directly on the mapped buffer updatePixels(ptr, DATA_SIZE); glUnmapBufferARB(GL_PIXEL_UNPACK_BUFFER_ARB); // release the mapped buffer } // it is good idea to release PBOs with ID 0 after use. // Once bound with 0, all pixel operations are back to normal ways. glBindBufferARB(GL_PIXEL_UNPACK_BUFFER_ARB, 0);

Example: Asynchronous Read-back

OpenGL Pixel Buffer Object (PBO)
Download the source and binary: pboPack.zip.

This demo application reads (pack) the pixel data from the framebuffer (left-side) to a PBO, then, draws it back to the right side of the window after modifying the brightness of the image. You can toggle PBO on/off by pressing the space key, and measure the performance of glReadPixels().

Conventional glReadPixels() blocks the pipeline and waits until all pixel data are transferred. Then, it returns control to the application. On the contrary, glReadPixels() with PBO can schedule asynchronous DMA transfer and returns immediately without stall. Therefore, the application (CPU) can execute other process right away, while transferring data with DMA by OpenGL (GPU).

Asynchronous glReadPixels() with 2 PBOs

This demo uses 2 pixel buffers. At frame n, the application reads the pixel data from OpenGL framebuffer to PBO 1 using glReadPixels(), and processes the pixel data in PBO 2. These read and process can be performed simultaneously, because glReadPixels() to PBO 1 returns immediately and CPU starts to process data in PBO 2 without delay. And, we alternate between PBO 1 and PBO 2 on every frame.

// "index" is used to read pixels from framebuffer to a PBO // "nextIndex" is used to update pixels in the other PBO index = (index + 1) % 2; nextIndex = (index + 1) % 2; // set the target framebuffer to read glReadBuffer(GL_FRONT); // read pixels from framebuffer to PBO // glReadPixels() should return immediately. glBindBufferARB(GL_PIXEL_PACK_BUFFER_ARB, pboIds[index]); glReadPixels(0, 0, WIDTH, HEIGHT, GL_BGRA, GL_UNSIGNED_BYTE, 0); // map the PBO to process its data by CPU glBindBufferARB(GL_PIXEL_PACK_BUFFER_ARB, pboIds[nextIndex]); GLubyte* ptr = (GLubyte*)glMapBufferARB(GL_PIXEL_PACK_BUFFER_ARB, GL_READ_ONLY_ARB); if(ptr) { processPixels(ptr, ...); glUnmapBufferARB(GL_PIXEL_PACK_BUFFER_ARB); } // back to conventional pixel operation glBindBufferARB(GL_PIXEL_PACK_BUFFER_ARB, 0);

Comments:

Max
I read this article a while back, so I thought about giving some feedback.
A few weeks back I had to implement an efficient readback. After some experimentation (with NSight's timeline) I found that instead of using two (or more) buffers and ping ponging between them, it's better to use a single buffer and a fence object with 0 timeout. If the buffer isn't ready (which is very rare in my case, because I render very heavy scenes), then I just reuse the data from the previous readback.
The outline is something like this:
if(ClientWaitSync(0)==signaled) {
MapBuffer and read it
UnmapBuffer

ReadPixels into pack buffer
reinit fence
}
In this outline MapBuffer returns immediately (because the buffer was already streamed to main mem) without any synchronization, and ReadPixels returns immediately as well because the buffer is free to be used. If your mileage varies with ReadPixels you can try orphaning (mine worked perfectly well without orphaning).
For very large buffers it's possible to improve this further by mapping and reading in the middle of the frame (asynchronously), and syncing the unmap + readpixels + fence init to a cpu atomic.
songho
@d10ca8d11301c2f4993ac2279ce4b930
Thanks for posting your experiment. I will look into GL_ARB_sync extension.
Max
A small addendum. Although this layout alleviates the restriction of only one frame lag, it has problems in a few scenarios. For instance, loosing temporal information in physical simulations is highly undesirable. Using more than one buffer will prolong the coherency of the stream, but will not solve the actual problem.
Divick Kishore
I tried the pboPack example and I don't see any difference with and without pbo on a Nvidia GTS250 as well as on an Intel Sandybridge mobile h/w. The Unpack definitely has noticeable difference. What could be the reason for no difference in performance with pboPacking example? I have tried incrementing the number of pbos being used but I don't see any difference even then. The only difference that I found was on a Nvidia Tesla M2050 h/w and the different there too was negligible from 40 Mpixels/s to around 30 Mpixels/s. Interestingly on GTS250 the throughput reported is close to 40-50 Mpixels/sec which is a lower end graphics h/w than Tesla M2050.
songho
I believe it is related to the implementation (driver) issue. OpenGL spec provides the ideal concept of asynchronous readback, but the vendors may have some technical difficulties to implement it.
Have you increases the texture size?
Divick Kishore
Aah I see that increasing the window size and hence the read size increases the difference in PBO vs non PBO mode but only for Nvidia GPU but not for Intel gpu. This validates your comments that it is implementation dependent and also on the texture read size.
BTW I tried to replicate your example in another application of mine and somehow even with PBOs the glReadPixels does not return immediately. I tried your example and with PBOs I see that on Nvidia GTW250 and , the glReadpixels return in approximately 0.02 milliseconds and without PBOs it returns in 0.2 milliseconds. But with my example somehow both with PBOs and without PBOs the glReadPixels take same amount of time i.e. 0.2 milliseconds. The example that I have is very straightforward and I just initialize GLUT and I don't even draw anything. After setting up GLUT and initializing buffers for PBO I bind one PBO buffer and do readpixels and measure the time. Any clues on what could be wrong? I can send you my sample if that helps.
Divick Kishore
I see one more post here http://www.opengl.org/discussi... with the poster had the same problem as that of mine. But the thread is not conclusive.
songho
glBindBuffer() call for the readback (right before glReadPixels()) may stall because the target PBO is still in use.
Please check and compare the elapsed time on this glBindBuffer() call as well.
Divick Kishore
glBindBuffer as well as glMapBuffer calls return almost immediately. I also noticed that if the readformat is not GL_BGRA then glReadPixels does not return immediately and takes almost same time as regular glReadPixels.
On a side note I see that in the packpbo example you seem to be setting up the pixelstorei(GL_UNPACK_ALIGNMENT) which clearly is not going to make any difference for glReadpixels. Did you mean to set the PACK alignment? And does having the alignment set to a word boundary help in increasing the glReadPixles? I don't seem to notice any difference with my Nvidia GTS 250 GPU.
songho
Yes, there are GL_PACK_ALIGNMENT and GL_UNPACK_ALIGNMENT tokens, but I did not pay attention to them. Let me know if you find some interesting results. Thanks.
Max Vorobiev
Your PBO article is awesome! While still having a lot of other information sources about async transfers, i`m still peep here to refresh my knowledge, because it's short and easy to read. Thanks for sharing your knowledge!
Sanyam Kapoor
Also I am not clear with what glReadBuffer() and its arguments are actually doing. The arguments are mainly that I am not clear with.
Sanyam Kapoor
And also same ways for glDrawBuffer()
songho
You mostly use double-buffer rendering with OpenGL You render a 3D scene to the back framebuffer (GL_BACK) first, then swap it with the front framebuffer (GL_FRONT). So, you can finally see the visual on the display.
glReadBuffer() and glDrawBuffer() is to specify which framebuffer (either GL_BACK or GL_FRONT) will be used to read from or write to.
GL_BACK is the default state in the double-buffer configuration. And, you can change it to GL_FRONT with glReadBuffer() and glDrawBuffer().
Sanyam Kapoor
..and what exactly does the left/ right mean in these kind of arguments: GL_FRONT_(RIGHT/LEFT)
songho
If you render a stereoscopic (3D vision), then you need 2 additional framebuffers; GL_LEFT for the left eye and GL_RIGHT for the right eye.
Therefore, if you want to render a scene only on the left back buffer, you can call:
```
glDrawBuffer(GL_BACK_LEFT);
```
Sanyam Kapoor
Suppose I want to update only a part of the texture (basically an image that I have loaded into openGL context), then while using glMapBuffer(GL_PIXEL_UNPACK_BUFFER, GL_WRITE_ONLY) , and I want to update a sub-part of texture (the image loaded), should I first bind a buffer of the size of the sub-part of the texture, then offset the pointer to the desired location in original data?
songho
Yes, you need to bind a PBO before read/write pixel data.
Yan Li
I test your program of unpackpbo, the result of pbo mode 1 and 2 don't differ too much. It that because the texture is small?
songho
Yes. Try bigger texture size. And, video drivers are not fully optimized for asynchronous transfer, which means glMapBuffer() causes stall.
Krishx007
I frequently visit your website for enhanching my opengl knowledge. Thanks and Best wishes
krishx007@yahoo.com
www.gfxguru.org
tester
Sorry, I've tried your src, but it seems no use for reading depth value. When I change glReadPixels to read GL_DETPH_COMPONENT, read time is more than 10ms. But if I switch back to GL_RGB, it returns to 0.5ms. Could you please tell me why?
songho
What data type is used for glReadPixels()?
Anonymous
hello i downloaded your demo but the "process time" is 16ms with the pbo on and 0.3 with it off other than that i seems to work fine any ideas? i want to no if anything similar will happen if i implement PBOs In my program
Anonymous
nvm figured it out ik the solution had something to do with downloading ati catalyst control center not that sure about anything else tho
Song Ho Ahn
Glad to hear you figured it out. Aware that the graphics driver may not fully support asynchronous transfer with PBOs.
Anonymous
well i actually downloaded a package with a new driver so it does now =) merry Christmas
songho
Fixed Makefile for Linux. To compile on Linux, type it on the terminal;
```
make -f Makefile.linux
```
Confused
Does anyone know why glReadPixels would be taking >600ms when the comments state it should return immediately? Trying on Ubuntu 12.04 with Intel Graphics Controller.
songho
600ms seems too slow even without async . What PBO size you use?
Confused
const int SCREEN_WIDTH = 900;
const int SCREEN_HEIGHT = 1000;
const int CHANNEL_COUNT = 3;
const int DATA_SIZE = SCREEN_WIDTH * SCREEN_HEIGHT * CHANNEL_COUNT;
songho
@Confused, I got around 60~65ms with that resolution (900x1000) on my Virtual Machine (Ubuntu 12.04 installed).
Since it is on VM, asynchronous transfer is not working. So, I have the same result with/without PBO.
codeonwort
struggled with various versions and usage of OpenGL API and your articles greatly help me. Now I understand how to use OpenGL API in C++ that came to me very awkward at first. Thanks to you.
Anonymous Coward
Sorry to be negative but this Disqus floating banner nonsense makes this page very hard to read. Good article otherwise, thanks!
songho
Thanks for your concern. This comment box is designed for sharing more specific Q/A related on each topic, and getting user feedback (just like yours) to enhance the content.
I will add a toggle button to show/hide this comment column. Thanks.
HerpDerp
Yo! I too find this comment thingy slightly obnoxious while reading the page. It sortof intrudes, but comments is always a nice thing. So i've made an awesome design scetch to inspire you with.
http://imgur.com/z6KxY3a
Moving the comments away from the actual content a bit, and maybe highlighting the content in some way (Like putting it in a box or somesuch). Would make the site more easily readable :D
songho
Thanks for the suggestions of design/layout. I will look into it.

阅读全文

0 0

OpenGL Pixel Buffer Object (PBO)

Overview

Creating PBO

Mapping PBO

Example: Streaming Texture Uploads

Example: Asynchronous Read-back

Max

songho

Max

Divick Kishore

songho

Divick Kishore

Divick Kishore

songho

Divick Kishore

songho

Max Vorobiev

Sanyam Kapoor

Sanyam Kapoor

songho

Sanyam Kapoor

songho

Sanyam Kapoor

songho

Yan Li

songho

Krishx007

tester

songho

Anonymous

Anonymous

Song Ho Ahn

Anonymous

songho

Confused

songho

Confused

songho

codeonwort

Anonymous Coward

songho

HerpDerp

songho