Implementing graphics

来源:互联网 发布:淘宝店铺流量100 编辑:程序博客网 时间:2024/06/07 05:22
Implementing graphics
In this document


    Requirements
    Implementation
        OpenGL and EGL drivers
        Pre-rotation
        Gralloc HAL
        Protected buffers
        Hardware Composer HAL
        VSYNC
        Virtual displays
    Testing


Follow the instructions here to implement the Android graphics HAL.
Requirements


The following list and sections describe what you need to provide to supportgraphics in your product:


    OpenGL ES 1.x Driver
    OpenGL ES 2.0 Driver
    OpenGL ES 3.0Driver (optional)
    EGL Driver
    Gralloc HAL implementation
    HardwareComposer HAL implementation
    Framebuffer HAL implementation


Implementation
OpenGL and EGL drivers


You must provide drivers for OpenGL ES 1.x, OpenGL ES 2.0, and EGL. Here aresome key considerations:


    The GL driver needs to be robust and conformant to OpenGL ESstandards.
    Do not limit the number of GL contexts. Because Android allowsapps in the background and tries to keep GL contexts alive, you should notlimit the number of contexts in your driver.
    It is not uncommon to have20-30 active GL contexts at once, so you should also be careful with the amountof memory allocated for each context.
    Support the YV12 image format andany other YUV image formats that come from other components in the system suchas media codecs or the camera.
    Support the mandatory extensions:GL_OES_texture_external,EGL_ANDROID_image_native_buffer, andEGL_ANDROID_recordable. TheEGL_ANDROID_framebuffer_target extension is required for HardwareComposer 1.1 and higher, as well.
    We highly recommend also supportingEGL_ANDROID_blob_cache, EGL_KHR_fence_sync,EGL_KHR_wait_sync, and EGL_ANDROID_native_fence_sync.


Note the OpenGL API exposed to app developers is different from the OpenGLinterface that you are implementing. Apps do not have access to the GL driverlayer and must go through the interface provided by the APIs.
Pre-rotation


Many hardware overlays do not support rotation, and even if they do it costsprocessing power. So the solution is to pre-transform the buffer before itreaches SurfaceFlinger. A query hint in ANativeWindow was added(NATIVE_WINDOW_TRANSFORM_HINT) that represents the most likelytransform to be applied to the buffer by SurfaceFlinger. Your GL driver can usethis hint to pre-transform the buffer before it reaches SurfaceFlinger so whenthe buffer arrives, it is correctly transformed.


For example, you may receive a hint to rotate 90 degrees. You must generatea matrix and apply it to the buffer to prevent it from running off the end ofthe page. To save power, this should be done in pre-rotation. See theANativeWindow interface defined insystem/core/include/system/window.h for more details.
Gralloc HAL


The graphics memory allocator is needed to allocate memory that is requestedby image producers. You can find the interface definition of the HAL at:hardware/libhardware/modules/gralloc.h
Protected buffers


The gralloc usage flag GRALLOC_USAGE_PROTECTED allows thegraphics buffer to be displayed only through a hardware-protected path. Theseoverlay planes are the only way to display DRM content. DRM-protected bufferscannot be accessed by SurfaceFlinger or the OpenGL ES driver.


DRM-protected video can be presented only on an overlay plane. Video playersthat support protected content must be implemented with SurfaceView. Softwarerunning on unprotected hardware cannot read or write the buffer.Hardware-protected paths must appear on the Hardware Composer overlay. Forinstance, protected videos will disappear from the display if Hardware Composerswitches to OpenGL ES composition.


See the DRM page for a descriptionof protected content.
Hardware Composer HAL


The Hardware Composer HAL is used by SurfaceFlinger to composite surfaces tothe screen. The Hardware Composer abstracts objects like overlays and 2Dblitters and helps offload some work that would normally be done withOpenGL.


We recommend you start using version 1.3 of the Hardware Composer HAL as itwill provide support for the newest features (explicit synchronization,external displays, and more). Because the physical display hardware behind theHardware Composer abstraction layer can vary from device to device, it isdifficult to define recommended features. But here is some guidance:


    The Hardware Composer should support at least four overlays (statusbar, system bar, application, and wallpaper/background).
    Layers can bebigger than the screen, so the Hardware Composer should be able to handlelayers that are larger than the display (for example, a wallpaper).
    Pre-multiplied per-pixel alpha blending and per-plane alpha blending should besupported at the same time.
    The Hardware Composer should be able toconsume the same buffers that the GPU, camera, video decoder, and Skia buffersare producing, so supporting some of the following properties is helpful:
        RGBA packing order
        YUV formats
        Tiling, swizzling, and strideproperties
    A hardware path for protected video playback must bepresent if you want to support protected content.


The general recommendation when implementing your Hardware Composer is toimplement a non-operational Hardware Composer first. Once you have thestructure done, implement a simple algorithm to delegate composition to theHardware Composer. For example, just delegate the first three or four surfacesto the overlay hardware of the Hardware Composer.


Focus on optimization, such as intelligently selecting the surfaces to sendto the overlay hardware that maximizes the load taken off of the GPU. Anotheroptimization is to detect whether the screen is updating. If not, delegatecomposition to OpenGL instead of the Hardware Composer to save power. When thescreen updates again, continue to offload composition to the HardwareComposer.


Devices must report the display mode (or resolution). Android uses the firstmode reported by the device. To support televisions, have the TV device reportthe mode selected for it by the manufacturer to Hardware Composer. Seehwcomposer.h for more details.


Prepare for common use cases, such as:


    Full-screen games in portrait and landscape mode
    Full-screenvideo with closed captioning and playback control
    The home screen(compositing the status bar, system bar, application window, and livewallpapers)
    Protected video playback
    Multiple display support


These use cases should address regular, predictable uses rather than edgecases that are rarely encountered. Otherwise, any optimization will have littlebenefit. Implementations must balance two competing goals: animation smoothnessand interaction latency.


Further, to make best use of Android graphics, you must develop a robustclocking strategy. Performance matters little if clocks have been turned downto make every operation slow. You need a clocking strategy that puts the clocksat high speed when needed, such as to make animations seamless, and then slowsthe clocks whenever the increased speed is no longer needed.


Use the adb shell dumpsys SurfaceFlinger command to seeprecisely what SurfaceFlinger is doing. See the HardwareComposer section of the Architecture page for example output and adescription of relevant fields.


You can find the HAL for the Hardware Composer and additional documentationin: hardware/libhardware/include/hardware/hwcomposer.hhardware/libhardware/include/hardware/hwcomposer_defs.h


A stub implementation is available in thehardware/libhardware/modules/hwcomposer directory.
VSYNC


VSYNC synchronizes certain events to the refresh cycle of the display.Applications always start drawing on a VSYNC boundary, and SurfaceFlingeralways composites on a VSYNC boundary. This eliminates stutters and improvesvisual performance of graphics. The Hardware Composer has a functionpointer:


 int (waitForVsync*) (int64_t *timestamp) 


This points to a function you must implement for VSYNC. This function blocksuntil a VSYNC occurs and returns the timestamp of the actual VSYNC. A messagemust be sent every time VSYNC occurs. A client can receive a VSYNC timestamponce, at specified intervals, or continuously (interval of 1). You mustimplement VSYNC to have no more than a 1ms lag at the maximum (0.5ms or less isrecommended), and the timestamps returned must be extremely accurate.
Explicit synchronization


Explicit synchronization is required and provides a mechanism for Grallocbuffers to be acquired and released in a synchronized way. Explicitsynchronization allows producers and consumers of graphics buffers to signalwhen they are done with a buffer. This allows the Android system toasynchronously queue buffers to be read or written with the certainty thatanother consumer or producer does not currently need them. See the Synchronization framework section for an overview ofthis mechanism.


The benefits of explicit synchronization include less behavior variationbetween devices, better debugging support, and improved testing metrics. Forinstance, the sync framework output readily identifies problem areas and rootcauses. And centralized SurfaceFlinger presentation timestamps show when eventsoccur in the normal flow of the system.


This communication is facilitated by the use of synchronization fences,which are now required when requesting a buffer for consuming or producing. Thesynchronization framework consists of three main building blocks:sync_timeline, sync_pt, and sync_fence.
sync_timeline


A sync_timeline is a monotonically increasing timeline that should beimplemented for each driver instance, such as a GL context, display controller,or 2D blitter. This is essentially a counter of jobs submitted to the kernelfor a particular piece of hardware. It provides guarantees about the order ofoperations and allows hardware-specific implementations.


Please note, the sync_timeline is offered as a CPU-only referenceimplementation called sw_sync (which stands for software sync). If possible,use sw_sync instead of a sync_timeline to save resources and avoid complexity.If you’re not employing a hardware resource, sw_sync should be sufficient.


If you must implement a sync_timeline, use the sw_sync driver as a startingpoint. Follow these guidelines:


    Provide useful names for all drivers, timelines, and fences. Thissimplifies debugging.
    Implement timeline_value str and pt_value_stroperators in your timelines as they make debugging output much more readable.
    If you want your userspace libraries (such as the GL library) to haveaccess to the private data of your timelines, implement the fill driver_dataoperator. This lets you get information about the immutable sync_fence andsync_pts so you might build command lines based upon them.


When implementing a sync_timeline, don’t:


    Base it on any real view of time, such as when a wall clock or otherpiece of work might finish. It is better to create an abstract timeline thatyou can control.
    Allow userspace to explicitly create or signal a fence.This can result in one piece of the user pipeline creating a denial-of-serviceattack that halts all functionality. This is because the userspace cannot makepromises on behalf of the kernel.
    Access sync_timeline, sync_pt, orsync_fence elements explicitly, as the API should provide all requiredfunctions.


sync_pt


A sync_pt is a single value or point on a sync_timeline. A point has threestates: active, signaled, and error. Points start in the active state andtransition to the signaled or error states. For instance, when a buffer is nolonger needed by an image consumer, this sync_point is signaled so that imageproducers know it is okay to write into the buffer again.
sync_fence


A sync_fence is a collection of sync_pts that often have differentsync_timeline parents (such as for the display controller and GPU). These arethe main primitives over which drivers and userspace communicate theirdependencies. A fence is a promise from the kernel that it gives upon acceptingwork that has been queued and assures completion in a finite amount oftime.


This allows multiple consumers or producers to signal they are using abuffer and to allow this information to be communicated with one functionparameter. Fences are backed by a file descriptor and can be passed fromkernel-space to user-space. For instance, a fence can contain two sync_pointsthat signify when two separate image consumers are done reading a buffer. Whenthe fence is signaled, the image producers know both consumers are doneconsuming.Fences, like sync_pts, start active and then change state based upon the stateof their points. If all sync_pts become signaled, the sync_fence becomessignaled. If one sync_pt falls into an error state, the entire sync_fence hasan error state.Membership in the sync_fence is immutable once the fence is created. And sincea sync_pt can be in only one fence, it is included as a copy. Even if twopoints have the same value, there will be two copies of the sync_pt in thefence.To get more than one point in a fence, a merge operation is conducted. In themerge, the points from two distinct fences are added to a third fence. If oneof those points was signaled in the originating fence, and the other was not,the third fence will also not be in a signaled state.


To implement explicit synchronization, you need to provide thefollowing:


    A kernel-space driver that implements a synchronization timeline fora particular piece of hardware. Drivers that need to be fence-aware aregenerally anything that accesses or communicates with the Hardware Composer.Here are the key files (found in the android-3.4 kernel branch):
        Coreimplementation:
            kernel/common/include/linux/sync.h
            kernel/common/drivers/base/sync.c
        sw_sync:
            kernel/common/include/linux/sw_sync.h
            kernel/common/drivers/base/sw_sync.c
        Documentation:
        kernel/common//Documentation/sync.txt Finally, theplatform/system/core/libsync directory includes a library tocommunicate with the kernel-space.
    A Hardware Composer HAL module(version 1.3 or later) that supports the new synchronization functionality. Youwill need to provide the appropriate synchronization fences as parameters tothe set() and prepare() functions in the HAL.
    Two GL-specific extensionsrelated to fences, EGL_ANDROID_native_fence_sync andEGL_ANDROID_wait_sync, along with incorporating fence support intoyour graphics drivers.


For example, to use the API supporting the synchronization function, youmight develop a display driver that has a display buffer function. Before thesynchronization framework existed, this function would receive dma-bufs, putthose buffers on the display, and block while the buffer is visible, likeso:


/* * assumes buf is ready to be displayed.  returns when buffer is no longer on * screen. */void display_buffer(struct dma_buf *buf); 


With the synchronization framework, the API call is slightly more complex.While putting a buffer on display, you associate it with a fence that says whenthe buffer will be ready. So you queue up the work, which you will initiateonce the fence clears.


In this manner, you are not blocking anything. You immediately return yourown fence, which is a guarantee of when the buffer will be off of the display.As you queue up buffers, the kernel will list dependencies. With thesynchronization framework:


/* * will display buf when fence is signaled.  returns immediately with a fence * that will signal when buf is no longer displayed. */struct sync_fence* display_buffer(struct dma_buf *buf, struct sync_fence*fence); 


Sync integration
Integration conventions


This section explains how to integrate the low-level sync framework withdifferent parts of the Android framework and the drivers that need tocommunicate with one another.


The Android HAL interfaces for graphics follow consistent conventions sowhen file descriptors are passed across a HAL interface, ownership of the filedescriptor is always transferred. This means:


    if you receive a fence file descriptor from the sync framework, youmust close it.
    if you return a fence file descriptor to the syncframework, the framework will close it.
    if you want to continue using thefence file descriptor, you must duplicate the descriptor.


Every time a fence is passed through BufferQueue - such as for a window thatpasses a fence to BufferQueue saying when its new contents will be ready - thefence object is renamed. Since kernel fence support allows fences to havestrings for names, the sync framework uses the window name and buffer indexthat is being queued to name the fence, for example:SurfaceView:0


This is helpful in debugging to identify the source of a deadlock. Thosenames appear in the output of /d/sync and bug reports whentaken.
ANativeWindow integration


ANativeWindow is fence aware. dequeueBuffer,queueBuffer, and cancelBuffer have fenceparameters.
OpenGL ES integration


OpenGL ES sync integration relies upon these two EGL extensions:


    EGL_ANDROID_native_fence_sync - provides a way to eitherwrap or create native Android fence file descriptors in EGLSyncKHR objects.
    EGL_ANDROID_wait_sync - allows GPU-side stalls rather than inCPU, making the GPU wait for an EGLSyncKHR. This is essentially the same as theEGL_KHR_wait_sync extension. See theEGL_KHR_wait_sync specification for details.


These extensions can be used independently and are controlled by a compileflag in libgui. To use them, first implement theEGL_ANDROID_native_fence_sync extension along with the associatedkernel support. Next add a ANativeWindow support for fences to your driver andthen turn on support in libgui to make use of theEGL_ANDROID_native_fence_sync extension.


Then, as a second pass, enable the EGL_ANDROID_wait_syncextension in your driver and turn it on separately. TheEGL_ANDROID_native_fence_sync extension consists of a distinctnative fence EGLSync object type so extensions that apply to existing EGLSyncobject types don’t necessarily apply to EGL_ANDROID_native_fenceobjects to avoid unwanted interactions.


The EGL_ANDROID_native_fence_sync extension employs a corresponding nativefence file descriptor attribute that can be set only at creation time andcannot be directly queried onward from an existing sync object. This attributecan be set to one of two modes:


    A valid fence file descriptor - wraps an existing native Androidfence file descriptor in an EGLSyncKHR object.
    -1 - creates a nativeAndroid fence file descriptor from an EGLSyncKHR object.


The DupNativeFenceFD function call is used to extract the EGLSyncKHR objectfrom the native Android fence file descriptor. This has the same result asquerying the attribute that was set but adheres to the convention that therecipient closes the fence (hence the duplicate operation). Finally, destroyingthe EGLSync object should close the internal fence attribute.
Hardware Composer integration


Hardware Composer handles three types of sync fences:


    Acquire fence - one per layer, this is set before callingHWC::set. It signals when Hardware Composer may read the buffer.
    Release fence - one per layer, this is filled in by the driver inHWC::set. It signals when Hardware Composer is done reading the buffer so theframework can start using that buffer again for that particular layer.
    Retire fence - one per the entire frame, this is filled in by thedriver each time HWC::set is called. This covers all of the layers for the setoperation. It signals to the framework when all of the effects of this setoperation has completed. The retire fence signals when the next set operationtakes place on the screen.


The retire fence can be used to determine how long each frame appears on thescreen. This is useful in identifying the location and source of delays, suchas a stuttering animation.
VSYNC Offset


Application and SurfaceFlinger render loops should be synchronized to thehardware VSYNC. On a VSYNC event, the display begins showing frame N whileSurfaceFlinger begins compositing windows for frame N+1. The app handlespending input and generates frame N+2.


Synchronizing with VSYNC delivers consistent latency. It reduces errors inapps and SurfaceFlinger and the drifting of displays in and out of phase witheach other. This, however, does assume application and SurfaceFlinger per-frametimes don’t vary widely. Nevertheless, the latency is at least two frames.


To remedy this, you may employ VSYNC offsets to reduce the input-to-displaylatency by making application and composition signal relative to hardwareVSYNC. This is possible because application plus composition usually takes lessthan 33 ms.


The result of VSYNC offset is three signals with same period, offsetphase:


    HW_VSYNC_0 - Display begins showing next frame
    VSYNC - App reads input and generates next frame
    SFVSYNC - SurfaceFlinger begins compositing for next frame


With VSYNC offset, SurfaceFlinger receives the buffer and composites theframe, while the application processes the input and renders the frame, allwithin a single frame of time.


Please note, VSYNC offsets reduce the time available for app and compositionand therefore provide a greater chance for error.
DispSync


DispSync maintains a model of the periodic hardware-based VSYNC events of adisplay and uses that model to execute periodic callbacks at specific phaseoffsets from the hardware VSYNC events.


DispSync is essentially a software phase lock loop (PLL) that generates theVSYNC and SF VSYNC signals used by Choreographer and SurfaceFlinger, even ifnot offset from hardware VSYNC.
DispSync flow


Figure 4. DispSync flow


DispSync has these qualities:


    Reference - HW_VSYNC_0
    Output - VSYNC and SFVSYNC
    Feedback - Retire fence signal timestamps from HardwareComposer


VSYNC/Retire Offset


The signal timestamp of retire fences must match HW VSYNC even on devicesthat don’t use the offset phase. Otherwise, errors appear to have greaterseverity than reality.


“Smart” panels often have a delta. Retire fence is the end of direct memoryaccess (DMA) to display memory. The actual display switch and HW VSYNC is sometime later.


PRESENT_TIME_OFFSET_FROM_VSYNC_NS is set in the device’sBoardConfig.mk make file. It is based upon the display controller and panelcharacteristics. Time from retire fence timestamp to HW Vsync signal ismeasured in nanoseconds.
VSYNC and SF_VSYNC Offsets


The VSYNC_EVENT_PHASE_OFFSET_NS andSF_VSYNC_EVENT_PHASE_OFFSET_NS are set conservatively based onhigh-load use cases, such as partial GPU composition during window transitionor Chrome scrolling through a webpage containing animations. These offsetsallow for long application render time and long GPU composition time.


More than a millisecond or two of latency is noticeable. We recommendintegrating thorough automated error testing to minimize latency withoutsignificantly increasing error counts.


Note these offsets are also set in the device’s BoardConfig.mk make file.The default if not set is zero offset. Both settings are offset in nanosecondsafter HW_VSYNC_0. Either can be negative.
Virtual displays


Android added support for virtual displays to Hardware Composer in version1.3. This support was implemented in the Android platform and can be used byMiracast.


The virtual display composition is similar to the physical display: Inputlayers are described in prepare(), SurfaceFlinger conducts GPU composition, andlayers and GPU framebuffer are provided to Hardware Composer in set().


Instead of the output going to the screen, it is sent to a gralloc buffer.Hardware Composer writes output to a buffer and provides the completion fence.The buffer is sent to an arbitrary consumer: video encoder, GPU, CPU, etc.Virtual displays can use 2D/blitter or overlays if the display pipeline canwrite to memory.
Modes


Each frame is in one of three modes after prepare():


    GLES - All layers composited by GPU. GPU writes directly tothe output buffer while Hardware Composer does nothing. This is equivalent tovirtual display composition with Hardware Composer <1 li="">
    MIXED -GPU composites some layers to framebuffer, and Hardware Composer compositesframebuffer and remaining layers. GPU writes to scratch buffer (framebuffer).Hardware Composer reads scratch buffer and writes to the output buffer. Buffersmay have different formats, e.g. RGBA and YCbCr.
    HWC - Alllayers composited by Hardware Composer. Hardware Composer writes directly tothe output buffer.


Output format


MIXED and HWC modes: If the consumer needs CPU access, the consumerchooses the format. Otherwise, the format is IMPLEMENTATION_DEFINED. Gralloccan choose best format based on usage flags. For example, choose a YCbCr formatif the consumer is video encoder, and Hardware Composer can write the formatefficiently.


GLES mode: EGL driver chooses output buffer format indequeueBuffer(), typically RGBA8888. The consumer must be able to accept thisformat.
EGL requirement


Hardware Composer 1.3 virtual displays require that eglSwapBuffers() doesnot dequeue the next buffer immediately. Instead, it should defer dequeueingthe buffer until rendering begins. Otherwise, EGL always owns the “next” outputbuffer. SurfaceFlinger can’t get the output buffer for Hardware Composer inMIXED/HWC mode.


If Hardware Composer always sends all virtual display layers to GPU, allframes will be in GLES mode. Although it is not recommended, you may use thismethod if you need to support Hardware Composer 1.3 for some other reason butcan’t conduct virtual display composition.
Testing


For benchmarking, we suggest following this flow by phase:


    Specification - When initially specifying the device, suchas when using immature drivers, you should use predefined (fixed) clocks andworkloads to measure the frames per second rendered. This gives a clear view ofwhat the hardware is capable of doing.
    Development - In thedevelopment phase as drivers mature, you should use a fixed set of user actionsto measure the number of visible stutters (janks) in animations.
    Production - Once the device is ready for production and you want tocompare against competitors, you should increase the workload until stuttersincrease. Determine if the current clock settings can keep up with the load.This can help you identify where you might be able to slow the clocks andreduce power use.


For the specification phase, Android offers the Flatland tool to help derivedevice capabilities. It can be found at:platform/frameworks/native/cmds/flatland/


Flatland relies upon fixed clocks and shows the throughput that can beachieved with composition-based workloads. It uses gralloc buffers to simulatemultiple window scenarios, filling in the window with GL and then measuring thecompositing. Please note, Flatland uses the synchronization framework tomeasure time. So you must support the synchronization framework to readily useFlatland.
0 0