【DSP开发】【并行计算-CUDA开发】TI OpenCL v01.01.xx

来源：互联网发布：电视直播软件安装包编辑：程序博客网时间：2024/06/03 20:27

TI OpenCL v01.01.xx

TI OpenCL™ Runtime Documentation Contents:

Introduction
OpenCL 1.1 Reference Material
Compilation
- Compile Host OpenCL Applications
- Compiling OpenCL C Programs
  - Create an OpenCL program from source, with embedded source
  - Create an OpenCL program from source, with source in a file
  - Create an OpenCL program from binary, with binary in a file
  - Create an OpenCL program from binary, with embedded binary
- Caching on-line compilation results
- The TI off-line OpenCL C compiler: clocl
Memory Usage
- Device Memory
  - Caching
- How DDR3 is Partitioned for Linux System and OpenCL
  - 66AK2x
  - AM57
  - Changing DDR3 Partition for OpenCL
- Alternate Host malloc/free Extension for Zero Copy OpenCL Kernels
- The OpenCL Memory Model
- OpenCL Buffers
  - Global Buffers
  - Local Buffers
  - Sub-Buffers
- Buffer Read/Write vs. Map/Unmap
- Discovering OpenCL Memory Sizes and Limits
- Cache Operations
- Large OpenCL buffers and Memory Beyond the 32-bit DSP Address Space
  - Large Buffer Use Cases
- User Defined DSP Heap Extension
  - User Defined DSP Heap Built-in Functions
  - Allocation of the Underlying Memory for User Defined DSP Heaps
  - Putting it all Together
Execution Model
- Terminology
- Device Discovery
- Understanding Kernels, Work-groups and Work-items
  - Enqueueing a Kernel
  - Mapping the OpenCL C work-item Built-in Functions
  - OpenCL C Kernel Code
  - NDRangeKernel Execution on DSP Devices
Extensions
- Calling Standard C Code From OpenCL C Code
- Calling Standard C code with OpenMP from OpenCL C code
  - OpenMP dispatch from OpenCL
- C66x standard C compiler intrinsic functions
- OpenCL C code using printf
- DMA Control Using EdmaMgr Functions
  - Single Transfer EdmaMgr APIs
  - Multiple Transfer EdmaMgr APIs
- Using Extended Memory on the 66AK2x device
- Fast Global buffers in on-chip MSMC memory
- OpenCL C Builtin Function Extensions
- Cache Operations
Environment Variables
Optimization Tips
- Optimization Techniques for Host Code
  - Use Off-line, Embedded Compilation Model
  - Avoid the read/write Buffer model on shared memory SoC platforms
  - Use MSMC Buffers Whenever Possible
  - Dispatch Appropriate Compute Loads
  - Prefer Kernels with 1 work-item per work-group
- Optimization Techniques for Device (DSP) Code
  - Prefer Kernels with 1 work-item per work-group
  - Use Local Buffers
  - Use async_work_group_copy and async_work_group_strided_copy
  - Avoid DSP writes directly to DDR
  - Use the reqd_work_group_size attribute on kernels
  - Use the TI OpenCL extension than allows Standard C code to be called from OpenCL C code
  - Avoid OpenCL C Barriers
  - Use the most efficient data type on the DSP
  - Do Not Use Large Vector Types
  - Consecutive memory accesses
  - Prefer the CPU style of writing OpenCL code over the GPU style
- Typical Steps to Optimize Device Code
- Optimizing 3x3 Gaussian smoothing filter
  - Overview of Gaussian Filter
  - Natural C Code
  - Optimizing for DSP
  - Performance Improvement
- Performance Data
Examples
- Building and Running
- Example Descriptions
  - platforms example
  - simple example
  - mandelbrot, mandelbrot_native examples
  - ccode example
  - matmpy example
  - offline example
  - vecadd_openmp example
  - vecadd_openmp_t example
  - vecadd example
  - vecadd_mpax example
  - vecadd_mpax_openmp example
  - dsplib_fft example
  - ooo, ooo_map examples
  - null example
  - sgemm example
  - dgemm example
  - edmamgr example
  - dspheap example
- Float compute example
  - Host Code (main.cpp)
  - OpenCL C kernel code (dsp_compute.cl)
  - Sample Output
- Monte Carlo example
  - Algorithm for Gaussian Random Number Generation
  - Executing the code
  - Sample Output
Debug
- Debug with printf
  - Host side OpenCL application code
  - DSP side OpenCL kernel code
- Debug with gdb
  - Host side gdb
  - DSP side debug with host side client gdbc6x
- Debug with CCS
  - Connect emulator to EVM and CCS
  - Debug DSP side code with CCS
- Debug with dsptop
Profiling
- Host Side Profiling
- DSP Side Profiling
OpenCL on TI-RTOS
- Overview
  - OpenCL on RTOS Package
  - Running Examples Shipped with OpenCL Package
- Basic OpenCL RTOS Application Development
  - Building Application on Linux
  - Building Application on Windows
  - Creating an OpenCL RTOS Application
  - Limited Customization: Participating DSP Core(s)
  - Differences from OpenCL Linux (Host running Linux)
- Advanced OpenCL RTOS Application Development
Frequently Asked Questions
- How do I get support for TI OpenCL products?
- Which TI OpenCL Version is Installed?
- Using Python OpenCL with the TI OpenCL implementation
- Guidelines for porting Stand-alone DSP applications to OpenCL
  - Heap Memory Management
  - Stack Usage
  - Boot Routine Dependencies
  - Linker Command Files
- OpenCL Interoperability with Host OpenMP
- MCSDK-HPC to OpenCL Component Version Map
- Does TI’s OpenCL support images and samplers?
- Why does the OpenCL ICD installed on my platform not find the TI OpenCL implementation?
- Why do I get messages about /var/lock/opencl when running OpenCL applications?
- Why do I get DLOAD error messages when running OpenCL applications?
- How do I limit log file sizes on EVM’s temporary file storage (tmpfs)?
  - 66AK2* EVMs
  - AM57* EVMs
Readme
- OpenCL v01.01.09.x Readme
  - Platforms supported
  - Release Notes
  - Compiler Versions
- OpenCL v01.01.08.x Readme
  - Platforms supported
  - Release Notes
  - Compiler Versions
- OpenCL v01.01.07.x Readme
  - Platforms supported
  - Release Notes
  - Compiler Versions
Disclaimer
Important Notice

0 0