Understanding OpenCL-OpenGL Interoperability

来源：互联网发布：网络一线牵林更新编辑：程序博客网时间：2024/05/14 09:10

Accelerating 3-D Graphics with OpenCL Kernels

OpenGL (Open Graphics Library) is one of the most popular toolsets available for graphical processing, and many computer games and CAD tools rely on it for 3-Drendering. Originally developed by Silicon Graphics in the early 1990s, OpenGL has been ported to Windows, Linux, Mac OS, and many embedded devices. On desktop computers, a modern OpenGL application consists of two parts: a host application that runs on the CPU and special-purpose routines called shaders that execute on the graphics processing unit, or GPU. In general, the CPU handles complex graphical routines such as physics and geometry and the GPU performs simple tasks like assigning positions to vertices and colors to pixels. Figure 1 depicts the relationship between the host and the shaders in an OpenGL application.

Figure 1: Operation of a Simple OpenGL Application

In contrast, OpenCL (Open Compute Language) is only a few years old and isn't nearly as well-known as OpenGL. However, it allows developers to access GPUs (and many other devices) for purposes other than graphics. Because of this general-purpose GPU (GPGPU) processing, OpenCL is frequently employed to crunch numbers at high speed, and common OpenCL applications include data sorting, statistical computation, and frequency analysis. An OpenCL application consists of a host application that runs on the CPU and general-purpose routines called kernels that can execute on any OpenCL-compliant device, including a GPU. Figure 2 shows how an OpenCL application combines CPU and GPU processing.

Figure 2: Operation of a Simple OpenCL Application

One of OpenCL's most important strengths is that it can accelerate and enhance OpenGL's graphical rendering. Stated more precisely, OpenCL kernels can process graphic data before OpenGL shaders start executing. This provides a great advantage because OpenCL kernels have three capabilities that OpenGL shaders lack:

Kernels can access all of the device's memory, including its local and private memory.
Kernels can invoke a broader number of functions than shaders.
Kernels have synchronization routines that allow them to coordinate data processing between multiple execution units.

Figure 3 depicts the operation of an application that combines OpenCL kernels with OpenGL shaders.

Figure 3: Operation of an OpenCL-OpenGL Application

If you compare Figure 3 to Figure 1, you'll see that most of the overall operation doesn't change. But now, thanks to OpenCL, the GPU is performing the high-level graphical processing that used to require the CPU. This transfer of computational load provides two important advantages:

Reduction in data transfer between the CPU and GPU.
Many graphical algorithms can be processed by the GPU with greater performance than the CPU.

The first advantage is crucial. In traditional games and CAD tools, CPU-GPU data transfer causes a significant bottleneck in performance. No matter how fast either device processes data, it still needs to wait for the other to finish before it can start. But by combining OpenCL and OpenGL, the data doesn't move. OpenCL processes vertices and texture coordinates as though they were regular kernel arguments, and then the OpenGL shaders use the data for rendering.

The goal of this article is to explain how OpenCL-OpenGL interoperability works and how to coordinate their operation in code. The first part examines how OpenCL data structures access OpenGL data. The second part presents a simple example of OpenCL-OpenGL interoperability.

1. Configuring OpenCL-OpenGL Data Sharing

OpenCL applications access data using two structures: buffer objects and image objects. Similarly, OpenGL applications package data using three data structures: vertex buffer objects (VBOs), texture objects, and renderbuffer objects. The central concept behind OpenCL-OpenGL interoperability is that OpenCL memory objects can be created from data inside OpenGL data structures. This allows kernels to process OpenGL structures as though they were regular buffer objects and image objects. Figure 4 depicts this graphically.

Figure 4: OpenCL-OpenGL Data Sharing

As shown, the data doesn't move between OpenCL and OpenGL. Kernels and shaders access the same data using different types of data structures. After the OpenCL kernel processes the data, the OpenGL rendering process can continue normally.

To configure OpenGL-OpenCL interoperability in code, three steps must be performed in sequence:

Create an OpenCL context (cl_context) that references the current OpenGL context or share group.
Construct OpenCL memory objects (buffer objects and image objects) from OpenGL data objects (VBOs, texture objects, and renderbuffer objects).
Acquire exclusive access to the shared data for the kernel. After the kernel executes, release this access so the rendering can proceed.

The rest of this section explores the functions that perform these steps. They're not hard to use or understand, but they must be invoked in order to ensure interoperability.

1.1 Creating the OpenCL Context

To establish interoperability, an OpenCL context must be created with a reference to an OpenGL context or share group. This serves as the bridge between the operating system and the graphical window. Once the context/share group is active, the operating system will direct all OpenGL rendering operations to its associated window.

NOTE: On Windows and Linux systems, the bridge between the OS and the window is called a context. On Mac OS systems, it's called a share group. In the interest of simplicity, this chapter will use the term context to refer to both data structures.

The primary OpenCL function to know is clCreateContext, whose first argument is a pointer to a cl_context_properties structure. Most applications set this argument to NULL, but if configured properly, this enables the OpenCL context to access and modify OpenGL data.

The cl_context_properties parameter is defined as an array of property names and associated values, and is terminated by a zero. The names of the properties required for OpenGL interoperability depend on the host's operating system.

Context Properties on Windows

On Windows, the cl_context_properties array must identify three data structures:

CL_GL_CONTEXT_KHR - The handle to an OpenGL rendering context (HGLRC) for the X11 window
CL_WGL_HDC_KHR - The handle to a device context (HDC) for the rendering window
CL_CONTEXT_PLATFORM - The cl_platform structure associated with the context

You can't access the first two objects using the GL Utility Toolkit (GLUT), Qt, or any other OS-independent toolset. Instead, you need to invoke two functions from the venerable Win32 API. wglGetCurrentContext returns the handle to the rendering context, whose data type is given as HGLRC. wglGetCurrentDC returns the handle to the window's device context, whose Win32 data type is given as HDC. The following code shows how these functions are used to initialize a cl_context_properties structure:

cl_context_properties properties[] = {
   CL_GL_CONTEXT_KHR, (cl_context_properties) wglGetCurrentContext(),
   CL_WGL_HDC_KHR, (cl_context_properties) wglGetCurrentDC(),
   CL_CONTEXT_PLATFORM, (cl_context_properties) platform,
   0};

The third argument is the cl_platform structure acquired through clGetPlatform. Note that the last argument of the array is always zero.

Context Properties on Linux

On Linux, the X Window System version 11, frequently called X11, provides the overall graphical interface. To configure acl_context_properties structure for a system running X11, three properties must be defined:

CL_GL_CONTEXT_KHR - The X11 rendering context (glXContext) for the window
CL_GLX_DISPLAY_KHR - The Display object that represents the connection to an X server
CL_CONTEXT_PLATFORM - The cl_platform structure associated with the context

The first object can be acquired by calling glxGetCurrentContext and the second can be acquired through glXGetCurrentDisplay. The following code shows how these functions are used to initialize a cl_context_properties structure:

cl_context_properties properties[] = {
   CL_GL_CONTEXT_KHR, (cl_context_properties) glXGetCurrentContext(),
   CL_GLX_DISPLAY_KHR, (cl_context_properties) glXGetCurrentDisplay(),
   CL_CONTEXT_PLATFORM, (cl_context_properties) platform,
   0};

The third argument is the cl_platform structure acquired through clGetPlatform. Note that the last argument of the array is always zero.

Context Properties on Mac OS

When it comes to configuring OpenCL contexts, Mac OS is the simplest operating system to work with. The cl_context_propertiesstructure needs only one property: CL_CONTEXT_PROPERTY_USE_CGL_SHAREGROUP_APPLE. The value associated with this property must have the data type CGLShareGroupObj, and it can be acquired by calling the function CGLGetShareGroup. This function requires aCGLContextObj structure, and this can be acquired by calling CGLGetCurrentContext. The following code shows how these functions work together:

CGLContextObj glContext = CGLGetCurrentContext();
CGLShareGroupObj shareGroup = CGLGetShareGroup(glContext);

cl_context_properties properties[] = {
   CL_CONTEXT_PROPERTY_USE_CGL_SHAREGROUP_APPLE,
   (cl_context_properties)shareGroup,
   0};

After the cl_context_properties structure is set, you can use it to create an OpenCL context capable of accessing OpenGL data. As discussed in Chapter 2, the function needed to create an OpenCL context is clCreateContext. The following code shows how it can be called with a cl_context_properties structure called properties:

ctx = clCreateContext(properties, 1, &device, NULL, NULL, &err);

Once you've created a cl_context that supports OpenGL interoperability, you can create memory objects (buffer objects and image objects) that access OpenGL data. This is discussed next.

1.2 Creating Interoperable OpenCL Memory Objects

To enable OpenGL-OpenCL interoperability, OpenCL memory objects must be created from OpenGL data. This OpenGL data can be accessed in one of three forms:

Vertex buffer objects (VBOs) - contains vertex data such as coordinates, colors, and normal vectors
Texture objects - contains texture data in image form
Renderbuffer objects - contains pixels to be displayed

This discussion focuses on accessing vertex buffer objects. In a regular OpenGL application, the host creates and initializes VBOs and then transfers them to a GPU. There, the shaders process the data and render the corresponding model.

In contrast, combining OpenGL and OpenCL makes it possible to initialize, read, and modify VBO data on the GPU using OpenCL kernels. But before a kernel can access VBO data, the host needs to create a buffer object specifically configured for the purpose. The function clCreateFromGLBuffer makes this possible, and its signature is given as follows:

cl_mem clCreateFromGLBuffer(cl_context context, cl_mem_flags flags,
GLuint vbo_desc, cl_int *err)

The context parameter must be configured for OpenGL interoperation as described earlier. The flags parameter identifies the kernel's access mode, which can be set to CL_MEM_READ_ONLY, CL_MEM_WRITE_ONLY, or CL_MEM_READ_WRITE. The third parameter,vbo_desc, should be set to the VBO's unique identifier produced by glGenBuffers.

NOTE: An OpenCL memory object must be created after the corresponding OpenGL VBO has been created, but before the OpenGL rendering starts.

An example will demonstrate how this works. The following code creates a vertex buffer object called vbo and binds it toGL_ARRAY_BUFFER. It configures the VBO to hold 400 bytes and then creates a buffer object, vbo_buff, to access its data:

glGenBuffers(1, &vbo);
glBindBuffer(GL_ARRAY_BUFFER, vbo);
glBufferData(GL_ARRAY_BUFFER, 400, NULL, GL_STATIC_DRAW);
vbo_buff = clCreateFromGLBuffer(ctx, CL_MEM_WRITE_ONLY, 2, &err);

It's important to see why the third argument of glBufferData is set to NULL. This states that the host won't transfer data to the VBO. That is, the VBO is configured to hold 400 bytes, but this memory won't be allocated on the host. Instead, the 400 bytes will be allocated on the GPU and the kernel will initialize the VBO data by accessing the write-only buffer object, vbo_buff.

Once the buffer is created, it can be accessed like a regular memory object. That is, it can be made an argument for an OpenCL kernel with clSetKernelArg and the host can read its data with clEnqueueReadBuffer.

Once you've created memory objects to share data with OpenGL objects, the next step involves synchronizing access to the data. We'll look at this next.

1.3 Synchronizing Access to Shared Data

OpenGL and OpenCL routines can share data, but they can't access it at the same time. For example, if an OpenCL kernel is processing the shared data as a buffer object, the OpenGL vertex shader can't access the shared data as a VBO.

Synchronization between OpenGL-OpenCL is made possible through two OpenCL functions. The first is clEnqueueAcquireGLObjects, which ensures that the kernel will have exclusive access to the data. The second is clEnqueueReleaseGLObjects, which allows other processes, such as the OpenGL renderer, to access the data. These functions have similar signatures and they are given as follows:

int clEnqueueAcquireGLObjects(cl_command_queue queue, cl_uint num_objects,
   const cl_mem *mem_objects, cl_uint num_events_in_wait_list,
   const cl_event *event_wait_list, cl_event *event)

int clEnqueueReleaseGLObjects(cl_command_queue queue, cl_uint num_objects,
   const cl_mem *mem_objects, cl_uint num_events_in_wait_list,
   const cl_event *event_wait_list, cl_event *event)

Both functions operate similarly to the command-enqueueing functions discussed in Chapters 3 and 6, but the third parameter is new. This accepts an array of one or more memory objects, and when clEnqueueAcquireGLObjects is called, the kernel will have exclusive access to them. When clEnqueueReleaseGLObjects is called, the kernel will give up this exclusive access.

You can think of these functions as forming a mutex. That is, they serve to lock and unlock access to the memory objects defined by the mem_objects parameter.

There are two important points to keep in mind when using these functions in code:

Before acquiring a lock on the data, you should call glFinish to ensure that all OpenGL routines have completed their operation.
After releasing the lock on the data, you should call clFinish to ensure that all OpenCL routines have completed their operation.

The following code shows how these functions work together to ensure that a kernel can process the OpenGL data interfering with OpenGL. In this case, the kernel proc needs to accesss data in a buffer object called buff.

glFinish();
clEnqueueAcquireGLObjects(queue, 1, &buff, 0, NULL, NULL);

clEnqueueNDRangeKernel(queue, proc, 1, NULL, global_size, local_size, 0, NULL, NULL);

clEnqueueReleaseGLObjects(queue, 1, &buff, 0, NULL, NULL);
clFinish();

The clFinish at the end of the code ensures that the kernel will finish its execution and that the lock will be released before further operations can commence. After clFinish completes, the vertex shader will be able to access the kernel's output.