opencl之code builder学习教程3

来源:互联网 发布:java输出菱形 编辑:程序博客网 时间:2024/06/06 04:22

以下文章中缺少图片,请在百度网盘下载word观看

https://pan.baidu.com/s/1sloNYyx

Intel® Code Builder for OpenCL™API for Microsoft Visual Studio*

Intel® SDK for OpenCL™ applications for Windows* OfflineCompiler plug-in for Microsoft Visual Studio* IDE enables you to develop OpenCLapplications with Visual Studio IDE.

The plug-in supports the following features:

·        New project templates

·        New OpenCL code file (*.cl)template

·        Syntax highlighting

·        Types and functions auto-completion

·        Offline compilation and build of OpenCL kernels

·        LLVM code view

·        Assembly code view

·        Program IR generation

·        Selection of target OpenCL device - CPU or Intel Graphics

NOTE

To work with the plug-in features, create an OpenCL code projecttemplate or convert an existing project into the OpenCL code project.

See Also

·        ConvertingExisting Projects into OpenCL Code Projects

·        Creatingand Building

·        KernelDevelopment Framework

·        DebuggingOpenCL™ Kernels on CPU

·        DebuggingOpenCL™ Kernels on GPU

·        APIDebugging for GPU and CPU

·        CodeAnalyzer

Code Editing and Building

·        ConfiguringMicrosoft Visual Studio* IDE

·        ConvertingExisting Project into OpenCL™ Project

·        OpenCL™New Project Wizard

·        BuildingOpenCL™ Project

·        UsingOpenCL™ Build Properties

·        SelectingTarget OpenCL™ Device

·        Generatingand Viewing Assembly Code

·        Generatingand Viewing LLVM Code

·        GeneratingIntermediate Program Binaries in Visual Studio*

·        ConfiguringOpenCL™ Build Options

Configuring Microsoft VisualStudio* IDE

To configure the OpenCL™ API Offline Compilerplug-in for Microsoft Visual Studio* IDE, do the following:

1.    In the Visual Studiosoftware select Project > Properties.

2.    In the C/C++ > General propertypage, under Additional Include Directories, enter the full path tothe directory where the OpenCL code header files are located:
$(INTELOCLSDKROOT)\include

3.    In the Linker > General propertypage, under Additional Library Directories, enter the full path tothe directory where the OpenCL code run-time import library file is located.For example, for 64-bit application:
$(INTELOCLSDKROOT)\lib\x64

4.    In the Linker > Input propertypage, under Additional Dependencies, enter the name of the OpenCLICD import library file OpenCL.lib.

Converting an Existing Projectinto an OpenCL™ Project

OpenCL™ API Offline Compiler plug-in forMicrosoft Visual Studio* IDE enables you to convert a standard C/C++ project toan OpenCL project and vice versa.

To convert your project, do the following:

1.    Right-click theproject you want to convert in the Solution Explorer.

2.    In the project menuclick Convert to a project for OpenCL API.

OpenCL™ New Project Wizard

OpenCL™ New Project wizard is a plug-in forMicrosoft Visual Studio* software enables developing Windows* and Android*OpenCL applications with Visual Studio IDE either from scratch (empty project)or based on template projects.

The wizard kit supports the followingfeatures:

·        Creatingan Empty OpenCL™ Project for Windows

·        Createa New OpenCL™ Project from OpenCL Project Template for Windows*

·        Createa New OpenCL™ Project from OpenCL Project Template for Android*

Creating an Empty OpenCL™ Projectfor Windows

To create an empty OpenCL™ project forMicrosoft Visual Studio* IDE, do the following:

1.    Go to File > New > Project...

2.    Select OpenCLtemplates from the Templates tree view.

3.    Select EmptyOpenCL Project for Windows.

4.    Fill the Name, Location,and Solution name fields and click OK

Create a New OpenCL™ Project fromOpenCL Project Template for Windows*

To create an OpenCL™ template project forWindows platforms in Microsoft Visual Studio* IDE, do the following:

1.    Go to File > New > Project...

2.    Select OpenCLtemplates from the Templates tree view.

3.    Select CodeBuilder Project for Windows.

4.    Fill the Name,Location, and Solution name fields and click OK.

5.    In the CodeBuilder wizard for OpenCL API dialog, you can select the basicsettings for the behavior of the OpenCL application and kernel. The parametersthat can be set are platform name, device type, kernel type (images or buffermanipulation), build options, and local work group size behavior. Each fieldhas a short tool-tip explanation.

6.    Click Finish tocreate the default template project or click Next to openthe Advanced Settings screen enabling you to set some advancedoptions like whether to enable profiling queue and the kernel's argumentsmemory source type. For CPU device type, you can also set the out-of-orderexecution mode and debug mode for the kernel.

7.    Click Finish tocreate the default template project or click Previous toreturn the Basic Settings screen.

Building OpenCL™ Project

To build the solution using OpenCL™ APIOffline Compiler plug-in for Microsoft Visual Studio* IDE, click Build > BuildSolution.

When building solution, Intel OpenCL compilerautomatically builds attached OpenCL kernels. See the build result in the Output builddialog of the Microsoft Visual Studio IDE.

Using OpenCL™ Build Properties

OpenCL Build properties page in the Microsoft VisualStudio* IDE enables you to set compilation flags and change target device whenbuilding an OpenCL kernel. To change the settings, do the following:

1.    Go to Project > Properties.

2.    Click the Intel SDK for OpenCL Applications entryunder the Configuration Properties group.

3.    Modify properties and click OK.

NOTE

The Intel® SDK for OpenCL™ Applications entryexists for OpenCL projects with *.cl source files attached. If the entrydoes not exist, convert an existing standard project into the OpenCL project.

SeeAlso

Creating an EmptyOpenCL™ Project

Converting ExistingProject into OpenCL Project

Selecting Target OpenCL™ Device

OpenCL™ API Offline Compiler plug-in forMicrosoft Visual Studio* IDE enables you to choose the target device whenbuilding your OpenCL code:

·        Intel CPU

·        Intel® Graphics

·        Intel CPU onExperimental OpenCL 2.1 Platform

The default device is CPU.

To choose a target device, do the following:

1.    Go to Project > Properties.

2.    Click ConfigurationProperties > Intel SDK for OpenCL Applications > General.

3.    Change the Device optionaccording your needs.

4.    Click OK.

Generating and Viewing AssemblyCode

OpenCL™ API Offline Compiler plug-in forMicrosoft Visual Studio* IDE enables generating assembly representation of theOpenCL code. To enable generating and viewing the assembly code, do thefollowing:

1.    Go to Project > Properties.

2.    Click ConfigurationProperties > OpenCL Code Builder > General.

3.    Modify the propertiesas needed, and click OK.

After the build, you can open the generatedassembly file in the Visual Studio editor by double-clicking the message inthe Output view.

Generating and Viewing LLVM Code

OpenCL™ API Offline Compiler plug-in forMicrosoft Visual Studio* IDE enables generating LLVM representation of the OpenCLcode. To enable generating and viewing LLVM code, do the following:

1.    Go to Project > Properties.

2.    Click ConfigurationProperties > Intel SDK for OpenCL Applications > General.

3.    Change the GenerateLLVM Code option to Yes.

After the build, you can open the generatedLLVM file in the Visual Studio editor by double-clicking the message inthe Output view.

Generating Intermediate ProgramBinaries in Visual Studio*

OpenCL™ API Offline Compiler plug-in forMicrosoft Visual Studio* IDE generating program binaries of the OpenCL code.

An application can use generated programbinaries to create program from binaries later (clCreateProgramFromBinary(...)). To generate intermediate program binaries, do the following:

1.    Go to Project > Properties.

2.    Click ConfigurationProperties > Intel SDK for OpenCL Applications > General.

3.    Change the CreateProgram Binary option to Yes.

4.    Click OK.

Configuring OpenCL™ Build Options

OpenCL™ API Offline Compiler plug-in forMicrosoft Visual Studio* IDE enables configuring build options for the OpenCLcode. To configure the build options, do the following:

1.    Go to Project > Properties.

2.    Click ConfigurationProperties > Intel SDK for OpenCL Applications > General.

3.    Add build options intothe Additional build options line.

4.    Click OK.

Kernel Development Framework

Kernel Development Framework is nativeintegrated development environment in the Microsoft Visual Studio* IDE thatenables you to build and analyze OpenCL™ kernels.

The framework supports Intel® Architectureprocessors, Intel Processor Graphics, and remote development on Android*devices. The tool provides full offline OpenCL language compilation, whichincludes:

·        OpenCL syntax checker

·        Cross-platformcompilation

·        Low Level VirtualMachine (LLVM) viewer

·        Assembly code viewer

·        Intermediate programbinary Generator

·        The feature alsoprovides a way to assign input to the kernel, test the correctness, and analyzekernel performance based on group size, build options, and target device.

·        KernelDevelopment Framework Session

·        Buildingand Compiling OpenCL™ Program

·        BuildArtifacts

·        CodeBuilder Build Toolbar

·        Analyzingthe Kernel

·        VariableManagement

Kernel Development FrameworkSession

Work in the Kernel Development Framework is managed throughsessions. To create, build, or analyze an OpenCL™ kernel you need to create asession.

A session contains:

·        A file with an OpenCL program

·        Build artifacts:

·         

o   Generated LLVM code

o   Assembly code

o   Intermediate binary files

·        OpenCL kernels with assigned variables

·        Analysis reports

There are two methods for creating a session, depending on thefile type that you are working on:

·        For a *.cl file, you need to create asession

·        For a *.exe file, you need to generatea session

See Also

Creating a NewSession

Generate a Session

Creating Sessionfrom Existing OpenCL Code

Saving and LoadingSessions

Removing Sessions

Configuring Sessions

Code BuilderConfiguration Toolbar

Configuring DeviceOptions

Configuring BuildOptions

Configuring General Options

Code Builder Build Toolbar

To show the toolbar go to: VIEW > Toolbars andmake sure that the Code Builder Build option is checked.

You can use the Code Builder Build toolbarto perform basic operations on sessions such as:

·        Create new session

·        Open session

·        Save session

·        Build session

·        Compile session

·        Open session'ssettings dialog

·        Generate a new sessionfrom an existing OpenCL application

·        Generate C++ code froman existing session

Creating a Session

Creating a NewSession

Creating a Sessionfrom Existing OpenCL Code

Generate a Sessionfrom an OpenCL Application

Creating a New Session

This topic explains how to create and build a session.

To create a new session:

1.    Go to CODE-BUILDER > OpenCL Kernel Development > NewSessions or click the New Session button   in the toolbar.

2.    Specify the session name, path to the folder to store thesession file and the content of the session (can be either empty session orwith pre-defined OpenCL code).

3.    Click Done.

Once the session is created, the new session appears inthe Code Builder Session Explorer view.

If you don't see the Code Builder Session Explorer dialog,go to: CODE-BUILDER > OpenCL Kernel Development > Windows > CodeBuilder Session Explorer.

To build the session:

Right-click the session in the Code Builder SessionExplorer and select BuildSession.

After the build is complete, you can see the following newitems:

·        the Build Artifacts folder inthe session shows the related build artifacts that were created during thesession build

·        the Kernels folder in thesession shows the kernel that ran during the session build.

·        The Code Builder Analysis Input paneappears with the Input configuration and the Kernel Arguments

Creating a Session from ExistingOpenCL Code

The Kernel Development Framework enables youto create a session from an existing project that contains OpenCL™ code files.If you have a project in Microsoft Visual Studio* that contains such files(s),you can do the following:

·        Right-click the OpenCLfile and select Create Code Builder Session

·        

·        A new Session iscreated and becomes available in the Code Builder Session Explorer dialog.

Generate a Session from an OpenCLApplication

The Generate Session featureenables you to generate a KDF session from an existing working executable fileof an application that uses OpenCL APIs. This enables you to rerun a singlekernel.

To create a KDF Session:

1.    In Microsoft* Visual Studio*, go to CODE-BUILDER > OpenCL KernelDevelopment > Generate Session.

2.    Select an application and its working directory.

3.    Optionally, you can configure the following:

o   Add application arguments in the Arguments field

o   Change the output directory of the created sessions in the Output Directory field.

o   Set the number of configurations generated per kernel in theNumber of instance per NDRange.
By default, this is set to 1. In this case if there is more than one instanceof NDRange with the same global and local workgroup sizes, only oneconfiguration is created.

o   Load the created sessions to the KDF. By default, this checkboxis checked.

Session Generator Features

·        The Session Generator generates regular KDF sessions. It recordsany data needed to rerun the kernels: *.cl input file, build option, deviceselection (CPU/GPU), assigned arguments and their data, global/local groupsizes, target architecture (32/64 bit)

·        The created variables are saved locally within the session.

·        Every use of clBuildProgram API in the application creates aunique session file

·        Programs that are not used in the application do not create asession file

·        Every use of clEnqueueNDRangeKernel API in the applicationcreates a kernel configuration (see Creatinga New Analysis Configuration section).

Building a Session

Building andCompiling OpenCL™ Program

Build Artifacts

Intel GraphicsDisassembly Source Mapping

Building and Compiling OpenCL™Program

To build an OpenCL™ program via the Kernel DevelopmentFramework feature of the OpenCL Code Builder, do the following:

1.    Select the sessionwith the code that you would like to build.

2.    Go to CODE-BUILDER > OpenCLKernel Development > Build Program. Or click the BuildProgram button in Code Builder - Build toolbar menu.

The build log appears in the ConsoleOutput dialog.

Build Artifacts

Once a OpenCL™ program build is completed, thebuild artifacts appear under the Builds Artifacts note inthe Code Builder Session Explorer. The list of artifacts includes:

·        Generated LLVM code (<file_name>.ll)

·        Generated assemblycode for CPU only (<file_name>.asm)

·        Generated IntelGraphics disassembly code for each kernel (<file_name>_<kernel_name>.gen)

·        Program's intermediateprogram's binary (<file_name>.ir)

·        32-bit version ofgenerate SPIR LLVM code (<file_name>_x86.ll)

·        64-bit version ofgenerate SPIR LLVM code (<file_name>_x64.ll)

·        32-bit version of theSPIR binary (<file_name>_x86.spir)

·        64-bit version of theSPIR binary (<file_name>_x64.spir)

·        32-bit version ofgenerate SPIR-V code (<file_name>_x86.txtspirv)

·        64-bit version ofgenerate SPIR-V code (<file_name>_x64.txtspirv)

·        32-bit version of theSPIR-V binary (<file_name>_x86.spirv)

·        64-bit version of theSPIR-V binary (<file_name>_x64.spirv)

All build artifacts are stored in thesessions' folder. You can double-click the LLVM\Assembly code to see itscontent in the IDE's editor. You can open the containing folder byright-clicking one of the files and selecting OpenCL Containing Folder.

Intel Graphics Disassembly SourceMapping

NOTE: thisfeature is only supported on for Intel 64 architecture builds.

Once a OpenCL™ program build for GPU is completed, IntelGraphics disassembly code is generated for each kernel.

An artifact <file_name>_<kernel_name>.gen isadded to the build artifacts of the session for each of the kernels built.

Double-click on an artifact to open the source mapping view forthe specific kernel.

In the view, click on a source line to highlight the correlatingIntel Graphics disassembly lines.

The following screen capture shows a selected source line withthe related disassembly lines highlighted:

See Also

For more information on the architectural behavior andprogramming environment of chipset and graphics devices, see Intel®Graphics Programmers Reference Manual (PRM)

Configuring a Session

Code Builder ConfigurationToolbar

Configuring Sessions

Configuring DeviceOptions

Configuring BuildOptions

Configuring GeneralOptions

Code Builder ConfigurationToolbar

You can control some of the session optionsthrough the Code Builder Config toolbar.

To show the toolbar in Visual Studio*, goto: VIEW > Toolbars and make surethat Code Builder Config is checked.

The Code Builder Configuration toolbar enablesyou to:

·        Select the targetmachine to work on: local machine or a connected Android device

·        Select the targetOpenCL™ platform (Currently only Intel and PowerVR platform are supported)

·        Select the targetOpenCL device

·        Select the targetplatform architecture

·        Show the platform informationdialog

Configuring Sessions

A configuration is a set of analysis inputs:assigned variables, number of iterations, global sizes and local sizes of aspecific kernel and so on. You can create a separate configuration for each setof inputs that you want to analyze.

To configure the session, right-click thesession in the Code Builder Session Explorer and select SessionOptions...

The Session Configuration dialogbox enables you to define:

·        Target device toperform build or analysis operations

·        Build options

·        Target platformarchitecture

Configuring Device Options

Open the Session Options menu viaselecting CODE-BUILDER > OpenCL Kernel Development > SessionOptions.

The Device Options tab provides severalconfiguration options.

Target Machine group box enables selectingthe target machine:

·        Local Machine

·        Remote Android Machine

To use the Remote Machine option, you need to

1.    Connect an Android* device with Intel processor or an emulatorbased on IA-32 System Image.

2.    Copy OpenCL

3.    runtime to the Android device or emulator. See section InstallingOpenCL™ Runtime on Emulator.

4.    In Visual Studio* click Setup to copy OpenCLtools to the device.

NOTE

You need to use the Setup option each time youstart an emulator device.

OpenCL Device group box enables selectingthe target platform and device for the selected machine:

·        Intel CPU

·        Intel(R) Graphics

·        Intel CPU on Experimental OpenCL 2.1 Platform

Network Settings group box enables configuringthe network port range.

Configuring Build Options

Open the Session Options menu viaselecting Code-Builder > OpenCL Kernel Development > SessionOptions.

The Build Options tab provides severalconfiguration options.

Enter preprocessor and compiler options in the OpenCLOptions text box, or click the ... buttonto configure the settings in the OpenCL CompilerOptions dialog box. The options that you set are added tothe Options String text box at the bottom ofthe dialog.

Target Build Architecture group box enables:

·        Using the current platform architecture.

·        Configuring the build architecture manually by uncheckingthe Use current platform architecture check box, andselecting:

·         

o   Select Target operating system:

o    

§  CurrentOperating System

§  Android OperatingSystem

o   Choosing the Target instruction set:

o    

§  StreamingSIMD Extension 4.2 (SSE4.2)

§  AdvancedVector Extension (AVX)

§  AdvancedVector Extension 2 (AVX2)

Changing the Target Build Architecture optionsenables viewing assembly code of different instruction set architectures andgenerating program binaries for different hardware platforms.

NOTE

Target Build Architecture options are available forthe CPU device only.

Configuring General Options

Open the Session Options menuvia selecting CODE-BUILDER

The General Options tabenables defining the target session's platform architecture (x86 or x64).

Network Settings group box enables configuring thenetwork port range to be used to connect to the Code Builder server.

Saving Loading and ExportingSessions

To save your session, go to Code-Builder > OpenCLKernel Development > Save Session. Or click the SaveSessionbutton   in the Code Builder Build toolbar menu.

New sessions are saved under the NewSession Default Directory defined in the Kernel DevelopmentFramework's Settings. See ConfiguringSessions for informationon how to change these settings.

To load a saved session, do the following:

1.    Go to Code-Builder > OpenCLKernel Development > Load Session. Or click the LoadSession button   in the Code Builder - Build toolbar.

2.    Select the session toload in the Open File dialog and click Open.

To export a saved session, copy the sessionfolder to the new location, on the current system, or on another system, andload as usual.

Removing Sessions

To remove a session fromthe Code Builder Session Explorer dialog, right-click thesession that you want to remove and select Remove Session (Keep localfiles) .

Variable Management

You can manage variables in Kernel Development Framework viathe Code Builder Variable Management pane. To open the pane,go to:

In Visual Studio*: CODE-BUILDER > OpenCLKernel Development > Windows > VariableManagement

Or, click the Variable Management button   in the Code Builder Analysis toolbar.

Supported Variable Types

These are the supported variable types:

·        Buffer

·        Image, in one of the following formats:

·         

o   RGB - BGRA

o   YUV (NV12, NV21, YV12)

·        Sampler

·        Pipe

Creating Buffer Variables

To create new buffer variable

1.    Open the variable management pane. Go to CodeBuilder > OpenCL Kernel Development > Windows > Variable Management

2.     Clickthe Add button   in the Code Builder Variable Management paneto open and select Buffer.
the 
Buffer Variable dialog box opens.

3.     Fill outthe dialog box entries:

Field

Options

Name

Enter a meaningful name, or leave the default.

IO Mode

Use the drop-down to select Input, Output, or InOut

Number of Elements

Enter the number of elements, or leave the default value of 1.

Memory flags to use

Use the drop-down to select a memory flag, or leave the default CL_MEM_USE_HOST_PTR

Use as SVM

Check to use the buffer as an SVM on systems that support SVM.

Initialization options

For Input, or InOut IO Mode, you need to set an initialization option.
To select an input .csv or binary file as the initialization option, click the browse (...) button.

·         When using CSV files, each line represents one OpenCL data type (like int4float16, and so on), with a value in each column to satisfy the type size. For example, for a long8, at least eight columns of long numbers should exist in each line. The size of the buffer is used as the number of lines to read from CSV. The CSV file may hold more columns or lines than needed for a specific buffer, but not fewer.

·         When using binary files, the content should be a concatenation of the OpenCL data type, and as with using CSV files, the file may hold more data than indicated by the Size argument.

 
You can also select Use random values
 or Zero the entire buffer

 

NOTE

Output buffers do not need a value assigned to them. If a valueis assigned, it is ignored.

Creating Image Variables

To create a new image variable, open the variable managementdialog. Go to Code Builder > OpenCL KernelDevelopment > Windows > Variable Management.

Create Image Variables inBitmap Format

1.    Click the Add button   in the Code Builder Variable Management dialogand choose Image in the opened context menu.

2.    In the Image Variable contextmenu, enter the Input image variable:

a.    Enter a meaningful name, such as bmp_input1.

b.   For IO Mode from thedrop-down, select Input.

c.    In the Source field browse to,or enter the path to input bitmap files.

d.   Enter the appropriate size, type, channel order, and otherparameters for your input image.

e.    Click Done to create the InputVariable.

3.    Click the Add button   in the Code Builder Variable Management dialogand choose Image in the opened context menu.

4.    In the Image Variable contextmenu, enter the Output image variable:

a.    Enter a meaningful name, such as bmp_output1.

b.   For IO Mode from thedrop-down, select Output. The output imagevariable does not take a source file.

c.    Enter the appropriate size, type, channel order, and otherparameters for your output image.

d.   Click Done to create theoutput Variable.

Create output images with the correct size, type, channel order,and so on.

 

Check the Get output image data checkbox todisable reading back the output buffer or image. This enables you to try morethan one combination of global or local work sizes, where there is no need toread the same output for all the combinations.

Create Image Variables in YUVFormat

1.     Inthe IO mode, select Input

2.     Fill outthe image variable dialog box features for the input image.

a.     Forthe Source field, browse to select your YUV imagefile.

b.    For Source Format, select the appropriate YUV format from thedrop-down.

c.     For widthand height enter the appropriate values for the image file.

d.    For therest of the fields, you can use the defaults, or revise to use one of the validoptions.

3.     Click Done to create the input images.
The number of created images depends on the image YUV format. For example, forNV12 format 2 new images are created

4.     Click Add ( ) and choose Image to createthe output image variable. This variable does not require a source file.

5.     Click Done to create the output images.

The YUV planes are created. You can now run the kernel with theimage.

Creating Sampler Variables

To create new sampler variable

1.    Open the variablemanagement dialog.

2.    Click the Add button   in the Code Builder Variable Management dialogand choose Sampler in the opened context menu.

Creating Pipe Variables

To create new sampler variable

1.    Open the variablemanagement dialog.

2.    Click the Add button   in the Code Builder Variable Management dialogand choose Pipe in the opened context menu.

Selecting Memory Options

You can change memory options of buffers or images using KernelDevelopment Framework. Refer to the relevant sections of this guide forguidelines on creating or editing variables.

NOTE

You are not limited in selecting options. Avoid selecting theoption combinations that are forbidden by the OpenCL 1.2 specification,otherwise you may encounter errors upon analysis.

To choose buffers and images memory options, do the following:

1.    Open the variable properties by right-clicking an image orbuffer variable in the Code Builder Variable Managementwindow andselecting Edit Variable.

2.     Open thecombo box next to Memory flags to use.

3.    Select the appropriate options and click Done.

Editing the Variables

To edit the variables in the system using theKernel Development Framework, do the following:

1.    Open the CodeBuilder Variable management window.

2.    Right-click a variablename.

3.    Click EditVariable.

4.    Change the desiredproperties and click Done.

Viewing Contents of the Variables

To view buffer or image contents when usingthe Kernel Development Framework, do the following:

1.    Open the CodeBuilder Variable management window.

2.    Right-click a bufferor image name you want to view.

3.    Click ViewVariable.

Copying Variables

To create a copy of buffer, image, or samplervariable when using the Kernel Development Framework, do the following:

1.    Open the CodeBuilder Variable management window.

2.    Right-click a buffer,image, or sampler name you want to copy.

3.    Click CopyVariable.

Removing Variables

To delete variables when using the KernelDevelopment Framework, do the following:

1.    Open the CodeBuilder Variable management window.

2.    Right-click a variablename.

3.    Click Deletevariable or Delete all variables.

You can delete all buffers, images, or samplesby right-clicking the corresponding node (Buffers, Images, Samplers, or Pipesrespectively).

Executing a Kernel

To execute the kernel:

1.    Select the session youare interested in from the Code Builder Session Explorer.
Tip
: If you don't see the Code Builder Session Explorer dialog,go to: Code-Builder > OpenCL Kernel Development > Windows >Code Builder Session Explorer.

2.    Optionally, select aconfiguration from the Code Builder Analysis Input pane:

a.    Select the desiredconfiguration from the down button next to the Configuration ID field,or from the Configurations folder in the Code BuilderSession Explorer.

b.   To create a newconfiguration based on the selected configuration, click the NewConfiguration button  .

3.    Select the kernel thatyou want to execute from the Select Kernel combo-box inthe Code Builder Analyze toolbar and click the Run buttonto execute the selected kernel with the selected configuration.

CodeBuilder Analysis Toolbar

KernelExecution Input

ViewingKernel Execution Results

OutputValidation

Runninga Session With a YUV Image Variable

Code Builder Analysis Toolbar

To show the toolbar go to: VIEW > Toolbars andmake sure that Code Builder Analyze option is checked.

You can use the Code Builder Analyze toolbarto execute a kernel and to perform several analysis operations:

·        Selecting the OpenCLkernel to execute or analyze

·        Start kernel execution

·        Start kernel analysis

·        Open Code BuilderAnalysis Input window

·        Open Code BuilderVariable Management windows

Kernel Execution Input

To assign analysis inputs for kernel execution, do thefollowing:

1.    Select the desirable kernel from the session's kernels list inthe Code Builder Session Explorer or from the SelectKernel combo box in the Code Builder Analyze toolbar.

2.    Open the Code Builder Analysis Input windowfrom:

In Visual Studio*: Code-Builder > OpenCLKernel Development > Windows > AnalysisInput

or by clicking the OpenAnalysis Input button   in the Code Builder Analysis toolbar.

3.    Assign a variable for each kernel argument in the KernelArguments table by clicking the Click here to assign linkunder the Assigned Variable column. You can assignone-dimensional variables (such as integerfloatcharhalf, and so on) on-the-fly by typing singlevalues into the table. See VariableManagement for details.

4.     You canassign a reference for each output variable (buffers or images) that you wantto verify. If you specify a validation reference, after the kernel executioneach one of the output variables will be compared with its assigned referencein order to validate the correctness of the kernel executions.

5.     Set thenumber of iterations, global size and local sizes per workload dimension inthe Workgroup size definitions group box.

You can use the local size(s) text boxes forseveral different test configurations:

·        Set single size value for a single test.

·        Add several comma-separated sizes for multiple tests.

·        Set 0 to utilize the default framework-assigned local size.

·        Check the Auto box to enable the Kernel Builderto iterate on all sizes that are smaller than global size and device maximumlocal size.

Also consider the following:

·        Each option is available for each dimension.

·        To analyze the kernel in its designed conditions, set a singlevalue.

·        To find the local size that provides higher performance results,click Auto or set a list of comma-separated values.

·        To improve the analysis accuracy, run each global and local worksize combination several times by increasing the Number of iterations value.Several iterations minimize the impact of other system processes or tasks onthe kernel execution time.

·        Use the Device Information dialog to comparedevice properties and choose the appropriate device for the kernel.

Creating a New AnalysisConfiguration

Follow these steps to create a new configuration for anexisting built session:

1.    In the Code BuilderAnalysis Input pane, you need to assign variables for each kernelargument that is listed. Click on the link in the AssignedVariable column to assign the appropriate variable.

2.    Enter a valid name to enable the New Configuration button  . Click the button to duplicate the current configuration. Youcan edit the new configuration to create multiple analysis configurations.

Viewing Kernel Execution Results

Once the kernel execution is completed, anew Run Results page is generated.

The Run Results reportcontains the following sections:

·        ExecutionOverview - providesinformation about the executed kernel, execution time and output validationresults.

·        OutputValidation - lists all theoutput variables which have a reference assigned. If the validation fails youcan click the variable name to open the image viewer or the buffer viewer andcompare the content of the output variable with the assigned reference.

·        Kernelvariables - lists all thekernel variables. You can click the variable name to open the image viewer orthe buffer viewer and see the content of the variable.

Output Validation

Kernel Development Framework enables you to set a referencepoint for your OpenCL* Kernel's output variable. The Kernel DevelopmentFramework validates the output variable after each kernel execution. Thevalidation results appear in the Run report.

You can compare the output variable with its reference inthe Image and Buffer viewers.

To validate output variables:

1.    In the Assigned Variables view,click the Validation Reference column andchoose a validation reference.

2.    Check the Enable output validation checkboxand run the kernel.

3.    For each validation reference you set in step 1, you can see theresults of the validation in the CodeBuilder Run Results window,in the Output Validation section.

4.    For a Validation failed result,click Validation failed link to view theoutput variable side-by-side with the reference variable.

5.    Сlick the Next/Prev buttons tofile the exact pixels and see their values:

Running a Session With a YUVImage Variable

After you have created a YUV format imagevariable, you can run an analysis session with the image variable as one of thekernel arguments:

1.    Create a session withthe relevant kernel and build the session.

2.    In the OpenCLKernel Analysis Input pane, assign the image variables to the kernelarguments.

3.    From the OpenCLKernel Development > Run Analysis.

4.    In the report that isgenerated, from the Select Report drop-down, select VariablesView.

5.    Right-click on one ofthe links and select Open in YUV Combiner.
The YUV Planes Combiner dialog box appears

6.    Fill in the UV Plane,height and width fields and click create YUV image:

Analyzing Kernel Performance

To start running an analysis:

1.    Select the session youare interested in from the Code Builder Session Explorer
If you don't see the Code Builder Session Explorer dialog, goto: Code-Builder > OpenCL Kernel Development > Windows > CodeBuilder Session Explorer.

2.    Optionally, select aconfiguration from the Code Builder Analysis Input pane:

a.    Select the desiredconfiguration from the down button next to the Configuration ID field, or fromthe Configurations folder in the Code Builder Session Explorer
You can also click New configuration button  to create a new configuration based on the selectedconfiguration.

3.    Select the kernel thatyou want to analyze from the Select Kernel combobox in the CodeBuilder Analyze toolbar and click the Analyze button to startanalyzing the selected kernel with the selected configuration.

Alternatively, right-click the session from the Code Builder Session Explorerand select Analyze to run the selected analysis with theselected configuration.

ViewingAnalysis Results

Viewing the Analysis Results

Once analysis is completed, several reportsare generated. A new report is generated for each analysis run.

The reports are available under the Reports nodein the Code Builder Session Explorer window.

The analysis reports are HTML files, saved bedefault in the session directory and can be viewed in any modern web-browser.

Each report contains several views:

·        SessionInfo - providesinformation about the application that was analyzed and the command line to runthe analysis. Also provides the kernel code for viewing.

·        ExecutionAnalysis - providesinformation on execution times statistics and on the best and worstconfigurations.

·        VariablesView - providesinformation on the read and read back time of the memory object being used inthe kernel and allows you to see their content.

You can toggle between the views throughthe Report Selection combo box located at the top of thereport's layout.

Session Info

This pane provides information about theanalysis and the command line to run the analysis


You can also click on the Kernel Code link on the top of theview to view the kernel source code:

Execution View

The top part of the Execution View enablesyou to see the tested global and local size best and the worst configurations,based on median execution time. In case only one configuration exists, the resultappears in both result windows.

The table below enables you to see statisticalanalysis results for all configurations. The statistics consists of thefollowing iteration execution time values for the selected configuration:

·        Median

·        Average

·        Standard deviation

·        Maximum

·        Minimum

Expanding each row in the table enables you tosee the total run time, the breakdown to queue, submit and execute times periteration for the given configuration.

The Execution View providesthe following features

·        Best configuration interms of median execution time is marked in blue. Click on the blue summaryline on the top of the view to open the results.

·        Click the [...] linkunder the Variables column to view the list of the kernel's input/outputvariables

·        Click on a variablename in the list to view the variable content

GPU Kernel Analysis View

For analysis on a GPU device, you can performKernel Analysis on any configuration in the report. Click on the KernelAnalysis node to see the various configurations.

Click a configuration to run KernelAnalysis on it.

Generating C++ Host Code From aSession

The Kernel Development Framework can generateC++ code from your Kernel session.

To generate the C++ code in Visual Studio*:

1.    Go to Code-Builder> OpenCL Kernel Development > Generate Host Code Generator

2.    In the OpenCLCodeBuilder Host Code Generator dialog box that appears, select thedesired name and location for the generated C++ code and related files, andclick Done to generate the files.

You can now use the generated C++ code to editthe kernel and rerun a session with the new kernel.

Debugging OpenCL™ Kernels on CPU

OpenCL™ Debugger plug-in for Microsoft Visual Studio* IDEenables debugging OpenCL kernels using the Microsoft Visual Studio softwaredebugger GUI. The Debugger enables debugging host code and OpenCL kernels in asingle Microsoft Visual Studio debug session.

Debugger supports existing Microsoft Visual Studio debugging windowssuch as:

·        Breakpoints

·        Memory view

·        Watch variables - including OpenCL types like float4int4, and so on

·        Call stack

·        Auto and local variables views

NOTE

Debugging is available only for CPU device. If thecode should run on Intel Graphics, debug on CPU device during developmentphase, then change the target device.

SeeAlso

For debugger limitations and known issues,refer to the Code Builder Release Notes at https://software.intel.com/en-us/intel-opencl-support.

Enabling Debuggingin OpenCL™ Runtime

Configuring theDebugger

Troubleshooting theDebugger

Enabling Debugging in OpenCL™Runtime

To enable debugging mode in the Intel OpenCL runtime forcompiling OpenCL code using OpenCL™ Debugger plug-in for Microsoft VisualStudio* IDE, do the following:

1.    Add the -g flagto the build options string parameter in the clBuildProgram function.

2.    Specify full path to the file in the build options stringparameter to the clBuildProgram functionaccordingly (including the CL file name):

-s <full path to the OpenCL source file>

If the path includes spaces, enclose the entire path with doublequotes.

For example:

err = clBuildProgram(
          g_program, 
          0, 
          NULL, 
          "-g -s \"<path_to_opencl_source_file>\"", 
          NULL, 
          NULL);

NOTE

Relative path to the CL file is not supported.

According to the OpenCL standard, work-items execute OpenCLkernels simultaneously. The Debugger requires setting the global ID of thework-item to debug before the debugging session starts. The Debugger stops onbreakpoints in OpenCL code only when the pre-set work-item reaches them.

NOTE

To work with the OpenCL™ Debugger plug-in for Microsoft VisualStudio* IDE, the OpenCL kernel code must exist in a text file, separate fromthe code of the host. Debugging OpenCL code that appears only in a stringembedded in the host application is not supported. Create your OpenCL projectwith the OpenCL Offline Compiler plug-in for Microsoft Visual Studio* to getseamless integration with the Debugger.

 Configuringand Running the OpenCL™ Debugger Plug-in

After you enabled debugging in the OpenCL™ Runtime, you canstart to use the  OpenCL Debugger plug-in for the Visual Studio* IDE:

1.    Start the Microsoft Visual Studio* IDE.

2.    Go to Code Builder > OpenCL Debugger> Options > Kernel Debugger.

3.    Check the box relating to the device to debug:

a.    To debug the CPU device, check the EnableOpenCL Kernel Debugging for CPU device checkbox, and fill out theappropriate work items and port.

b.   To debug the GPU device, check the EnableOpenCL Kernel Debugging for GPU device checkbox, and fill out theappropriate server, port and log file.

4.    Insert breakpoints in the application in different OpenCL kernelcalls, and then start debugging with F5.

NOTE

If NDRange is not 3D, leave unused dimension values at 0.

Debugging APIs for GPU and CPU

·        OpenCL™API Debugger

·        Enablingthe API Debugger

·        TraceView

·        ObjectsTree View

·        PropertiesView

·        CommandQueue View

·        ProblemsView

·        ImageView

·        DataView

·        MemoryTracing

Configuring the API Debugger

You can configure several features of the API debugger, goto CODE BUILDER > OpenCL Debugger > Options toopen the CodeBuilder - Debugger Configuration dialogbox, and select the API Debugger tab:

To do this...

Do this:

Configure the number of states to save per each memory object (Image, Buffers and SubBuffers).

In the Number of previous states field, set the appropriate number of states.

Enable raw data tracing.

Check the Enable Raw Data Tracing box.
Note that this feature may cause performance degradation.

Dump image bitmaps.

Check the Enable Image Bitmap Tracing box.
Note that this feature may cause performance degradation.

Enable automatic memory tracing.

In the Other settings group, check the Auto-generate sessionbox.
This option creates a separate directory for each captured session of the plug-in.
The directory is stored under the Output Folder specified in the same window.

Configure the data buffering size.

In the Data buffering size (bytes)  field, enter the appropriate number of bytes for the data buffer.
Note that when debugging memory-bound applications, you many need to reduce the size of the buffer.

Set a custom port for API debugging.

Check the Use Custom Port box and set the appropriate port number in the text field.

 

And can be enabled via:

CODE BUILDER > OpenCL Debugger > Options toopen the CodeBuilder - Debugger Configuration dialogbox, and select the API Debugger tab Othersettings > Auto-generate session

NOTES

·        Profiling performance measured by either clGetEventProfilingInfo runtimeAPI call or any other method for measuring execution time or occupied hostmemory, may cause performance degradation.

·        To get more accurate profiling results, use the runtime directlyvia Ctrl+F5Start w/o Debugging, or disabling APIDebugger in the plug-in configuration menu.

·        Only 2D images are supported for viewing, which is memoryobjects that contain CL_MEM_OBJECT_IMAGE2D intheir image_type fieldinside their descriptor (cl_image_desc).

·        Images above 2GB are not supported and are not be displayed

·        The bitmaps shown in the Image View are merelyan 8-bit RGBA approximation of the underlying pixel array of the associatedimages.
Behind the scenes, the plug-in does a liner color conversion from the inputrange of the pixels, which can be any type supported by the OpenCL runtime, forexample, 
CL_SNORM_INT8CL_UNSIGNED_INT16, and soon) to the [0..255] range.
As a result, the presented colors might not accurately represent the bitmap asexpected.

OpenCL™ API Debugger

The interface of the Microsoft Visual Studio* IDE provides standarddebugging capabilities for the host side of OpenCL™ applications, while theOpenCL Debugger plug-in of the Code Builder enables debugging OpenCL kernels.The stitch between simultaneous debugging of OpenCL kernel and host applicationmight be complicated in different stages. API Debugging feature of the CodeBuilder - Debugger plug-in for Microsoft Visual Studio covers the stitch.

The API Debugging feature enables monitoring and understandingOpenCL environment of applications throughout execution.

The feature supports the following:

·        API Tracing - lists a trace ofall OpenCL API calls that occurred during the execution, list of tracearguments, return values, and execution time.

·        OpenCL Objects View -shows all OpenCL objects that exist in memory during the execution.

·        Properties View -retrieves common information per each OpenCL object.

·        Command-Queue View -tracks the execution status of the enqueued commands.

·        Problems View -summarizing all error and warning messages.

·        Image View - visually displays all2D image objects as bitmaps.

·        Data View - visually displaysbuffer data and 2D image pixel arrays on a grid.

·        Save/Load session -enables capturing a state/snapshot of all views of the plug-in, saving it ondisk, and also loading the stored sessions.

·        Memory Tracing -enables storing OpenCL Images and Buffers content, and visually examiningeither by Bitmap or Grid view the contents of the underlying data associatedwith the memory object throughout the various API calls that affected it.

NOTE

Concurrent debugging sessions are not supported with the OpenCLAPI Debugger. This includes attaching the debugger to more than one process, oropening multiple instances of the Visual Studio and debugging processesconcurrently.

SeeAlso

TraceView
ObjectsTree View
PropertiesView
CommandQueue View
ProblemsView
ImageView
Data View
MemoryTracing

Enabling the API Debugger

To use the API Debugger, do the following:

1.    Start the Microsoft Visual Studio* IDE.

2.    Go to CODE BUILDER > OpenCL Debugger > Options > APIDebugger.

3.    Check Enable OpenCL API Debugger.

4.    Insert breakpoints in the application in different OpenCL APIcalls, and then start debugging with F5.

5.    Open the needed API Debugger views by selecting CODEBUILDER > OpenCL Debugger and select the view youneed.

The API Debugger updates the view panes when:

·        The Debugger hits a breakpoint in Microsoft Visual Studio* IDE.

·        One of the views behavior changes, which means you click abuttons.

·        The host application execution ends.

So, to see data in the views,

1.    Insert some breakpoints in your application (in different APIcalls), or run the application with Start Debugging (F5).

2.    Then open the needed views via CODE BUILDER > OpenCLDebugger.

See Also

TraceView
ObjectsTree View
PropertiesView
CommandQueue View
ProblemsView
ImageView
Data View
MemoryTracing

Trace View

The trace view contains trace of all OpenCL™ API Calls duringthe execution, API call arguments, returned values, error codes, and time ofexecution.

To access the trace view, select CODE BUILDER > OpenCLDebugger > Trace View.

Use the following buttons to control the view:

·        Save - enables saving thecurrent state of all views with live OpenCL objects, API trace, command queue,and so on,

o   to either a binary file (.trace) that can be later loaded with the LoadSession button.

o   or, you can export a list (trace) of all API calls into a CSV file

·        Load Session... -enables restoring the state of the views from a previously saved .trace file either using SaveAs... or Generate trace file option in the APIDebugger settings.

·        NOTE

This feature is available only when Visual Studio* IDE is not indebug mode, as views are synced with the application you debug.

·        Success/Errors -enables filtering successful or failed API calls.

·        API Display Mode -toggles between views:

·         

o   Function name only

o   Function name and arguments

o   Function name with argument names and values

o   Show Return Value

o   Show Error Code

o   Show Time

·        Filter - enables filtering out APIcalls by name. Start typing "device [1]" for example, to get only APIcalls using "device [1]":

·        

·        Right-click context menu -enables toggling between various display modes of arguments Hex\Decimal, andshow raw values (for example, 0x2 instead of CL_DEVICE_TYPE_CPU).

·        

To enable automatic trace generation, select CODEBUILDER > OpenCL Debugger > APIDebugger > Auto-generate session. Traces are saved inthe folder that is specified in the Output Folder text box.

Automatic trace generation is an equivalent to clicking Save... afterthe host application ended.

See Also

Enabling the APIDebugger

Objects Tree View

OpenCL™ API Debugger plug-in for Microsoft Visual Studio*IDE Objects Tree view enables:

·        Getting a better understanding of which objects are"alive"/released at any given point of time.

·        Showing hierarchy and dependencies of various OpenCL objects.

API Debugger also reflects the OpenCL objects that exist inmemory during application execution:

·        Platform

·        Devices

·        Context

·        Buffer

·        and so on

When creating an OpenCL context for with (for example, clCreateContext() APIcall), the Objects Tree updates immediately with the new context object.

Objects dim when become released by, for example, clRelease.

Use the following buttons to control the Objects Tree view:

·        Sort By - enables togglingthe way data is displayed:

·         

o   Sort by Context -all entities that are associated with a specific context are displayed ascontext successors.

o   Sort by Device -all contexts are displayed as children of the devices.

·        Show Objects -enables displaying only a subset of the OpenCL objects. Use it when you have alot of OpenCL objects that are alive at some given moment, and you need to seestatus of only several objects or object types.
To view objects of a specific type only,

·         

o   Select Show Objects > uncheck ShowAll.

o   Select Show Objects > select the object typeto display.

·        Open Source Code in a new tab - enables viewingthe source code associated with the program object. Right-click any Programobject in the tree, then click Open Source Code in a new tab.

·        

·        Save Binaries -enables dumping binary files that were built for the program object with use of clBuildProgram,or clCreateProgramWithBinaries.Right-click any built program object in the tree, then click SaveBinaries and select the location to save the binaries.

See Also

Enablingthe API Debugger

Properties View

OpenCL™ API Debugger plug-in for Microsoft Visual Studio* IDEexposes miscellaneous properties for each OpenCL object or Command Queue event.Properties view pre-fetches information about OpenCL objects or events, anddisplays it when a particular object is selected.

Access the Properties View by selecting CODEBUILDER > OpenCL Debugger > PropertiesView.

All properties in the Properties View areread-only.

OpenCLObjects Properties

To view properties for an OpenCL object, do the following:

1.    Select (left-click) some object from the Objects View window.

2.    Open the Properties view.

The OpenCL Objects Properties view is analternative to calling API calls such as clGetDeviceInfo().

Command Queue EventsProperties

To view properties for an OpenCL command-queue event, do thefollowing:

1.    Select (left-click) an event from the Command Queue View window.

2.    Open the Properties view.

The Command Queue Events Properties view is analternative to retrieving execution time by adding the CL_QUEUE_PROFILING_ENABLE parameterto clCreateCommandQueue() whencreating the command queue to which the commands are enqueued, and thenquerying the enqueued events execution times using clGetEventProfilingInfo().

To view properties for an OpenCL command-queue event:

1.    Select (left-click) some event from the Command QueueView window.

2.    Open the Properties view.

See Also

Enablingthe API Debugger

Command Queue View

OpenCL™ API Debugger plug-in for Microsoft Visual Studio* IDEprovides Command Queue View, which enables tracking the executionstatus of enqueued commands (issued by clEnqueue API call).

The status for a command can be one of the following options:

·        Submitted

·        Running

·        Completed

The Command Queue View also displays events fora particular command-queue (Separate Queues) or for all events from all queues(Unify Queues).

Access the Command Queue View byselecting CODE BUILDER > OpenCL Debugger > CommandQueue View.

Use the following buttons to control the Command QueueView:

·        Save As... - enables dumpingthe current status of commands to a text file for a later investigation.

·        Unify Queues -enables to view all commands across all queues.
Also note the following:

·         

o   When working in the Unified queues mode, eachentry is added a suffix of the form: CQ [NUMBER], which indicates thecommand-queue number, with which the command is associated.

o   For example: TASK(3)CQ[1], indicates that the 3rd command enqueued to some queue isa clEnqueueTaskcommand,and is associated with Command-Queue [1].

o   Each queue has a color and all its corresponding commands havethe color of the queue. Such differentiation makes it easy to spot in the eyethe corresponding queues of the commands in question:

o   

Command-queues in the Objects Tree view sharethe same color in the view as their color in the Command Queue view.

o   The Unify Queues button changes into SeparateQueues button after being clicked, which does the opposite operationand shows events status per queue.

·        Separate Queues -appears when working in Unified mode after clicking UnifyQueues, and does the opposite to Unify Queues operation,which is showing the commands per-queue. First select the queue from thedrop-down list under the Save As... button, then the viewupdates with the commands that are associated with the selected queue.

See Also

Enablingthe API Debugger

Problems View

OpenCL™ API Debugger plug-in for Microsoft Visual Studio* IDEprovides the Problems View that summarizes into a single viewall errors and warnings that occurred during the execution.

To access the view, select CODE BUILDER > OpenCLDebugger > Problems View.

Problems View supports the followingfeatures:

·        Displaying warnings and errors of kernel compilation.

·        Showing uninitialized kernel arguments, each one of them is setby calling clSetKernelArg() foreach argument.

·        Releasing OpenCL objects in the out-of-order mode, for example,when you release a program object before releasing its kernels (clReleaseProgram before clReleaseKernel).

·        Resource leaks: at the end of the program, an error entry isadded for each OpenCL resource (programs, buffers, images, and so on) that isnot released

·        API call failures - when an OpenCL API call fails, an errorentry is added to the problems view. You can right-click the entry, to jump tothe line item in the trace view that caused the failure.

Double-clicking an error in the Problems View opensthe compilation error log message in the code editing area.

See Also

Enabling theAPI Debugger

Image View

This view enables visual displaying of the OpenCL™ Image objectsin the host application.

Each Image object is added to the Objects View, andby double-clicking each Image object, the bitmap is displayed - the underlyingpixel array gets translated into a bitmap.

Double-click the Image you need and wait for the ImageView to appear with the latest state of the image

From the Images drop-down, select any Image. The view displaysthe image as bitmap.

The history drop-down enables viewing various states of theselected image, where each state is a result of an API call.

If, for instance, you create an image with all pixels set to 0,you see on Image creation the following view:

Now, after running the kernel on the selected Image, you canobserve that it was updated indirectly by clEnqueueNDRange APIcall (therefore causing the kernel to run).

Each state is related to the API call that caused the change,and is in the following format: #ID:API Call.

Where #ID isthe number of API call that caused the change, and API Call is the OpenCL API call thataffected (changed) the object.

This is the same API call that it shown in the TraceView.

When selecting an Image from the drop-down, or alternativelyselecting an Image state, the Trace View automaticallyhighlights the API call that is related to that state:

The Save As button in the Image View enablessaving a copy of the displayed image to disk, as bitmap.

To configure the number of states to save per each memory object(Image, Buffers and SubBuffers) see Configuringthe API Debugger

See Also

Enablingthe API Debugger

Configuringthe API Debugger

Data View

The Data View enables visual displaying on a grid of all theOpenCL Memory Objects: Images, Buffers and SubBuffers, that were instantiatedin the host application.

Each Memory Object is added to the Objects View, andby double-clicking Buffers/SubBuffers you can display the buffer contents, orby double-clicking an Image you can view the raw pixel data associated with theimage.

Double-click the Buffer you need, and Data View windowappears with the latest state of the buffer/sub-buffer.

From the Memory Objects drop-down, select any memory object andthe view shows the raw data associated with the object:

The history drop-down enables viewing various states of theselected memory object, where each state is a result of an API call.

Consider the situation of a host application that calculates ahistogram of a grayscale image. For example, use a buffer with 256 bins foreach color of the image, to calculate the histogram.

As a first step, issue an NDRange kernel called bzero toinitialize the buffer with zeros:

 

Examine the buffer contents on the grid and see that all bufferelements are set to zero:

Now, as a second step, issue a second NDRange command thatcalculates the histogram of the image using the buffer as bins counter:

This example has 22 pixel elements with grayscale value 5, and27 pixels with grayscale value 9, and so on.

Use the Type box to select the underlying datatype (for example, cl_uintcl_double).

The Save As button enables saving a CSVrepresentation of the data to disk.

When exporting Buffer/SubBuffer, you get each buffer cell in aseparated line. The Buffer/SubBuffer values are interpreted as a contiguousmemory chunk containing unsignedchars as its elements.

When exporting an Image as a CSV, the number of rows in theoutput CSV is the height of the image (number of rows), and each row representsall columns of that row joined and delimited by commas.

The values in the cells are interpreted according to the imagechannel data type, so, for example, an Image that has CL_SIGNED_INT32 asits data-type, causes the resulting output to display each row as an array ofsigned 32-bit integers.

Each state is related to the API call that caused the change,and is in the following format: #ID:API Call.

Where #ID isthe number of API call that caused the change, and API Call is the OpenCL API call thataffected (changed) the object.

This is the same API call that it shown in the TraceView.

When selecting a memory object from the drop-down list, oralternatively selecting a memory object state, the Trace Viewautomaticallyhighlights the API call that is related to that state:

To configure the number of states to save per each memory object(Image, Buffers and SubBuffers) see Configuringthe API Debugger

See Also

Enablingthe API Debugger

Configuringthe API Debugger

Memory Tracing

Memory tracing enables the user to capture the session of thedebugging into a file, and also to load a previously stored state into theviews.

The stored state contains:

·        State of all the views - this includes all the data that isfilled in the various views of the plug-in

·        Images bitmaps (if Enable Images Bitmap Tracing ison)

·        Memory objects raw data (if Enable Raw Data Tracing ison)

The state can be stored by either of the following ways:

·        Automatically when host application ends

·        Manually, by going to: Trace View > Save > SaveSession (.trace)

The automatic memory tracing contains:

·        State of all the views

·        CSV of all API calls that occurred during the execution

This option creates a separate directory for each capturedsession of the plug-in. The directory is stored under the Output Folder specifiedin the same window. To enable automatic memory tracing see Configuringthe API Debugger.

See Also

Enablingthe API Debugger

Configuringthe API Debugger

Code Analyzer

OpenCL™ Code Analyzer provides performanceinformation for OpenCL applications.

The Code Analyzer enables you to collectperformance data from both the host side and the kernels side of an OpenCLapplication.

For the kernel side - the Code Analyzerprovides:

·        Time of execution.

·        Throughput and thework size data of each OpenCL kernel that was launched during your program'sexecution.

·        Overall occupancy ofthe GPU execution units during the kernel execution.

·        Latency of each memoryaccess command that was executed during the kernel runs on the GPU.

For the host side - the Code Analyzerprovides:

·        Statistics of allOpenCL API calls

·        Data about memorycommands that were executed in your program.

Creatingand Launching a New Analyze Session

AnalysisResults

SessionInfo

HostProfiling

KernelOverview

KernelAnalysis

Host-sideAnalysis Optimization Tips

RevisingCode and Rerunning a Host Profiling Session

RevisingCode and Rerunning a Kernel Profiling Session

Configuringthe Code Analyzer

CodeAnalyzer Command Line Interface

Creating and Launching a NewAnalyze Session

When you run the Code Analyzer, you create anAnalyze Session, which contains the configuration data for collectingperformance information and the results of the analysis run. You can explorethe analysis sessions  in the Analyze Sessions Explorer window.

To create and run an Analysis session, do thefollowing:

1.    Go to CODE_BUILDER > OpenCLApplication Analysis > New Analysis Session...
the OpenCL Application Analysis Session dialog box appears.

2.    In the ApplicationSetting section, specify information about the application that youwant to analyze:

·        In the Application field,enter, or browse to, the full path and executable file name of the targetapplication.

·        Inthe Arguments field, enter the command-line options to use whenlaunching the target application.

·        In the WorkingDirectory field, optionally enter the working directory for the targetapplication.
To use the directory specified in the Application field, checkthe Use application directory as working directorybox.

·        In the SessionName section, enter a meaningful name for the session.

3.    Verify that the StartAutomatically on Process Launch check box is checked, and click Launch.
Your application is launched and the Code Analyzer starts to collect data.

While the application is running, a newsession run tab is opened in the main Visual Studio* window.

During the session, you can do the following:

·        Click Pause/Resumeanalysis button in the session run tab to pause or resume datacollection.

·        Exercise thefunctionality that might cause performance issues.

·        Click Closeanalysis to stop the analysis.

Analysis Results

After you finish running the application, the new analysissession that you created appears in the Analyze Sessions Explorer windowand a results view appears in Visual Studio* main area.

The analysis result view contains an analysis home page and ananalysis progress bar with several steps to review the analysis results.

In the home page   of the analysis view you can find an overview of thehost and kernel profiling results. You can click on each one of the sections onthis page to navigate to the relevant report or you can follow the analysisprogress bar steps:

Session Info

The session info page provides information about the applicationthat was analyzed and the command line to run the analysis.

Host Profiling

The Host Profiling page provides data that can help you optimizeyour host-side code.

This section includes the following reports:

·        API Calls

·        Memory Commands

·        OpenCL Objects

API Call Report

The API call report lists statistics of calls made to the OpenCLAPI, including the number of times the API call was called, the number oferrors returned, and statistics on the elapsed time each API call took whileexecuting

Click the + buttonon the left of any API name to expand and show additional information on thisAPI, including the error code, input arguments and timing data of each call tothis API during program execution.

To see the entire list of arguments for each API call press "[..]"and the arguments column:

Memory Commands

The Memory Commands report lists statistics of calls made to theOpenCL program memory commands, including the number of times the memorycommands were called, the number of times errors were returned, and statisticson the elapsed time each memory command call took while executing.

Click the + buttonon the left of any memory command name to expand its row. The expanded areapresents additional information, including the return value, memory objects andtiming data, of each call of this memory command occurred during the programexecution.

OpenCL Objects

The OpenCL Objects report presents all OpenCL objects that werecreated during the application run, shown as an hierarchical tree.

You can use the platform, context and devices menusto hide some branches of the tree.

Kernel Overview

The Kernel Overview page provides data thatcan help you optimize your kernel code.

This section includes the API Calls report,that shows every OpenCL kernel that was launched during the program execution.

Kernels with different name, different globalwork size, or different local work size are considered as a different kernelsand presented in a different rows.

Each row shows:

·        The total, minimum,maximum and average kernel's execution time.

·        EU Active - Thenormalized sum of all cycles on all cores spent actively executinginstructions.

·        EU Stalled - Thenormalized sum of all cycles on all cores spent stalled. At least one thread isloaded, but the core is stalled for some reason.

·        GPU MemoryReads/Writes - Reads/Writes from GPU from/to chip uncore (LLC) and memory. Thoseare all memory accesses that miss in internal GPU L3 cache and are servicedeither from uncore or main memory.

·        L3 Cache Misses - Allread and write misses in GPU L3 cache.

·        Untyped MemoryReads/Writes - Memory accesses to buffer created with clCreateBuffer

·        Typed MemoryReads/Writes - Memory accesses to typed buffers, e.g., writes to bufferscreated with clCreateImage. However, reads from images are counted by Sampleraccesses and Texture Read.

·        SLM Reads/WritesMemory accesses to Shared Local Memory

Click the '+' button on the left of any kernelname to expand its row. The expanded area presents additional information,including the latency, return value, command queue, context and timing data ofeach time this kernel was executed during the program execution.

Kernel Analysis

For analysis of the kernel side of the application, thefollowing reports are generated:

·        Occupancy

·        Ticks per Thread

·        Latency

Occupancy

The Occupancy report shows, for each kernel,the occupancy of each execution unit in the GPU.

You can also see the number of GPU threads launched, and themin, max and average thread execution time.

Ticks per Thread

The Ticks per Thread report shows, for eachnumber of active threads, the amount of time this number of threads was active.

Threads per Time

The Threads per Time report shows this numberof threads that were active at each point in time during the execution.

Latency

The Latency pane shows, for each kernel file,the overall latency of the memory commands.
Click the kernel name to see the latency of each memory command in the sourcecode of this kernel.

Host-side Analysis OptimizationTips

While you run the host-side performanceanalysis, the Code Analyzer identifies inefficient use of the OpenCL API. Whenthe analysis is done, a TIPS screen appears, showing all thedetected issues, each issue also has a short description.

Click a specific tip to open a related reportand highlight the data within the report which is relevant to this tip. Inaddition a popup window appears, with a recommendation how to fix the reportedissue:

The following table summarizes therecommendations that are reported from the Tips.

int Title

Description

Recommendation

Inefficient "clCreateBuffer"
 

The host program includes a call to clCreateBuffer where "flags" includes "CL_MEM_COPY_HOST_PTR"."

There are two ways to ensure zero-copy path on memory objects mapping. For best results, allocate memory with "CL_MEM_ALLOC_HOST_PTR", this method ensures that the memory is efficiently mirrored on the host. Another way is to allocate properly aligned and sized memory yourself and share the pointer with the OpenCL framework by using the "CL_MEM_USE_HOST_PTR" flag.

"clCreateBuffer" call where "host_ptr" isn't 4K aligned.

The host program includes a call to "clCreateBuffer where "host_ptr" is not 4K aligned.

For best results, align memory address to host memory page (4K bytes)

"clCreateBuffer" call where "size" isn't a multiple of 64 bytes

The host program includes call to "clCreateBuffer" where "size" is not a multiple of 64 bytes.

For best results, make sure that the amount of memory you allocate and the size of the corresponding OpenCL buffer is a multiple of the cache line sizes (64 bytes).

Redundant calls to "clBuildProgram"

The host program includes several calls to "clBuildProgram" with the same arguments.

When possible, call "clGetProgramInfo" to retrieve binaries generated from calls to "clCreateProgramWithSource" and "clBuildProgram".

Redundant calls to "clCompileProgram"

The host program includes several calls to "clCompileProgram" with the same arguments.

When possible, call "clGetProgramInfo" to retrieve previously compiled binaries.

Redundant calls to "clCreateContextFromType".

The host program includes calls to "clCreateContextFromType" with the same arguments."

Consider using the same OpenCL context instead of recreating it.

Redundant calls to "clCreateContext".

The host program includes calls to "clCreateContext" with the same arguments.

Consider using the same OpenCL context instead of recreating it.

Redundant calls to "clCreateCommandQueue"

The host program includes several calls to "clCreateCommandQueue" that refer to the same device

Consider using the same command-queue to access the device.

Redundant calls to "clCreateCommandQueueWithProperties"

The host program includes several calls to "clCreateCommandQueueWithProperties" that refer to the same device.

Consider using the same command-queue to access the device.

"clEnqueueReadBuffer" call

The host program includes several calls to "clEnqueueReadBuffer"

When possible, use "clEnqueueMapBuffer" and "clEnqueueUnmapMemObject instead of calls to "clEnqueueReadBuffer" or "clEnqueueWriteBuffer.

"clEnqueueWriteBuffer" call

The host program includes several calls to "clEnqueueWriteBuffer"

When possible, use "clEnqueueMapBuffer" and "clEnqueueUnmapMemObject instead of calls to "clEnqueueReadBuffer" or "clEnqueueWriteBuffer.

"clEnqueueReadImage" call

The host program includes several calls to "clEnqueueReadImage"

When possible, use "clEnqueueMapImage" and "clEnqueueUnmapMemObject instead of calls to "clEnqueueReadImage" or "clEnqueueWriteImage.

"clEnqueueWriteImage" call

The host program includes several calls to "clEnqueueWriteImage"

When possible, use "clEnqueueMapImage" and "clEnqueueUnmapMemObject instead of calls to "clEnqueueReadImage" or "clEnqueueWriteImage.

"clEnqueueReadBufferRect" call

The host program includes several calls to "clEnqueueReadBufferRect"

When possible, use "clEnqueueMapBuffer" and "clEnqueueUnmapMemObject" instead of calls to "clEnqueueReadBufferRect" or "clEnqueueWriteBufferRect".

"clEnqueueWriteBufferRect" call

The host program includes several calls to "clEnqueueWriteBufferRect"

When possible, use "clEnqueueMapBuffer" and "clEnqueueUnmapMemObject" instead of calls to "clEnqueueReadBufferRect" or "clEnqueueWriteBufferRect".

The work-group dimensions are defined as "column" work-group

The host program includes a call to "clEnqueueNDRange" where the work-group dimensions are defined as "column" work-group.

When reading from memory, best to reorganize the work-group to read in lines instead of columns.

Performance Information

Kernel register pressure is too high, spill fills will be generated. Additional surface needs to be allocated.

Consider simplifying your kernel.

Performance Information

Kernel private memory usage is too high and exhaust register space. Additional surface needs to be allocated.

Consider reducing the amount of private memory used, avoid using private memory arrays.

Performance Information

Local workgroup sizes selected for this workload may not be optimal

consider using a different local workgroup size,

Performance Information

Not aligned surface detected. Driver needs to disable L3 caching.

,

Performance Information

Kernel submission requires coherency with CPU, this may impact performance.

,

Performance Information

Null local workgroup size detected, Following sizes will be used for execution

,

 

Revising Code and Rerunning aHost Profiling Session

After you optimize your code, you can rerun the Code Analyzersession and compare the data to see how your changes improve your applicationperformance.

To rerun an analyze session, do the following:

1.    Open the Analyze Sessions Explore from CODEBUILDER > OpenCL Application Analysis > Windows > AnalyzeSession Explorer

2.    In the Analyze Sessions Explorer right-clickthe analyze session that you want to rerun and select Rerun.

3.    A new analyze session is created and launched and the profiledapplication starts.

4.    After the application is finished the new analyze sessionappears in the Analyze Explorer window and new reports aregenerated.

Output Files

For each analysis session, the Code Analyzer creates a sessiondirectory named with application's name, the date, and an incremented session number.

A new capture subdirectory is created in the session directorythen profiling begins, and also each time you pause and resume the datacollection during the session. The files in that directory comprise the capturereports. The format of the capture directory is:

capture_N where N is an incrementingnumber  

These are the types of files in a capture directory:

·        *.csv -capture reports in CSV format.

·        *.bin -capture reports in binary format - you can open such reports in Visual Studiofrom the Analyze Sessions Explorer.

In addition, a session file is created in the session directory.This file stores the data about session configuration. You can use it to createand run another similar session.

Revising Code and Rerunning aKernel Profiling Session

After you optimize your code, you can rerun the Code Analyzersession and compare the data to see how your changes improve your applicationperformance in the Kernel Development Framework.

During the performance analysis run the code analyzer stored allthe kernels that were executed during the analysis, together with their inputs,as a session of Kernel Development Framework.

To open this session in the Kernel Development Framework, clickthe EditKernel button in the Kernel Analysis page.

For each analysis session, the analysis tool creates a sessiondirectory named with the following information:

·        application's name

·        the date

·        an incremented session number.

When you start profiling begins, and each time you pause andresume the data collection during the session, a new capture subdirectory iscreated in the session directory.

The capture directory is called "capture" and anincrementing number (for example, capture_1, capture_2, and so on). The capturedirectory includes capture reports in a binary format.

You can open this binary file in Visual Studio from the Analyze SessionsExplorer.

In addition, a session file is created in the session directory.This file stores the data about session configuration. You can use it to createand run another similar session. You can also export the analysis result tohtml and csv format.  

See Also

KernelAnalysis

CodeAnalyzer Command Line Interface

Configuring the Code Analyzer

You can use the Analyze SessionSettings dialog to change the reports directory and also to change theconnection info for the analyze sessions.

To open the Analyze Session Setting dialogbox go to CODE BUILDER > OpenCL ApplicationAnalysis > Settings

These are the fields in the dialog box

·        ReportsLocation - full path tothe directory that contains analyze reports. If you check the Placeoutput reports in the solution directory check button and open thesolution in Visual Studio*, the Code Analyzer ignores the specified directoryand places the analysis reports into the solution directory.

·        Connection - range of available port numbers. TheCode Analyzer uses one of the ports in the specified range to establishconnection with Visual Studio*.

·        AdvancedSetting section

·         

o   Enable analysis of GPUactivity and memory transactions - check this box to collect and analyze GPUactivity and memory transactions in the GPU

o   Reduce disk space -check this box to disable the deep analysis and reduce the amount of storeddata.