【DSP开发】【并行计算-CUDA开发】TI OpenCL v01.01.xx
来源:互联网 发布:电视直播软件安装包 编辑:程序博客网 时间:2024/06/03 20:27
TI OpenCL v01.01.xx
TI OpenCL™ Runtime Documentation Contents:
- Introduction
- OpenCL 1.1 Reference Material
- Compilation
- Compile Host OpenCL Applications
- Compiling OpenCL C Programs
- Create an OpenCL program from source, with embedded source
- Create an OpenCL program from source, with source in a file
- Create an OpenCL program from binary, with binary in a file
- Create an OpenCL program from binary, with embedded binary
- Caching on-line compilation results
- The TI off-line OpenCL C compiler: clocl
- Memory Usage
- Device Memory
- Caching
- How DDR3 is Partitioned for Linux System and OpenCL
- 66AK2x
- AM57
- Changing DDR3 Partition for OpenCL
- Alternate Host malloc/free Extension for Zero Copy OpenCL Kernels
- The OpenCL Memory Model
- OpenCL Buffers
- Global Buffers
- Local Buffers
- Sub-Buffers
- Buffer Read/Write vs. Map/Unmap
- Discovering OpenCL Memory Sizes and Limits
- Cache Operations
- Large OpenCL buffers and Memory Beyond the 32-bit DSP Address Space
- Large Buffer Use Cases
- User Defined DSP Heap Extension
- User Defined DSP Heap Built-in Functions
- Allocation of the Underlying Memory for User Defined DSP Heaps
- Putting it all Together
- Device Memory
- Execution Model
- Terminology
- Device Discovery
- Understanding Kernels, Work-groups and Work-items
- Enqueueing a Kernel
- Mapping the OpenCL C work-item Built-in Functions
- OpenCL C Kernel Code
- NDRangeKernel Execution on DSP Devices
- Extensions
- Calling Standard C Code From OpenCL C Code
- Calling Standard C code with OpenMP from OpenCL C code
- OpenMP dispatch from OpenCL
- C66x standard C compiler intrinsic functions
- OpenCL C code using printf
- DMA Control Using EdmaMgr Functions
- Single Transfer EdmaMgr APIs
- Multiple Transfer EdmaMgr APIs
- Using Extended Memory on the 66AK2x device
- Fast Global buffers in on-chip MSMC memory
- OpenCL C Builtin Function Extensions
- Cache Operations
- Environment Variables
- Optimization Tips
- Optimization Techniques for Host Code
- Use Off-line, Embedded Compilation Model
- Avoid the read/write Buffer model on shared memory SoC platforms
- Use MSMC Buffers Whenever Possible
- Dispatch Appropriate Compute Loads
- Prefer Kernels with 1 work-item per work-group
- Optimization Techniques for Device (DSP) Code
- Prefer Kernels with 1 work-item per work-group
- Use Local Buffers
- Use async_work_group_copy and async_work_group_strided_copy
- Avoid DSP writes directly to DDR
- Use the reqd_work_group_size attribute on kernels
- Use the TI OpenCL extension than allows Standard C code to be called from OpenCL C code
- Avoid OpenCL C Barriers
- Use the most efficient data type on the DSP
- Do Not Use Large Vector Types
- Consecutive memory accesses
- Prefer the CPU style of writing OpenCL code over the GPU style
- Typical Steps to Optimize Device Code
- Optimizing 3x3 Gaussian smoothing filter
- Overview of Gaussian Filter
- Natural C Code
- Optimizing for DSP
- Performance Improvement
- Performance Data
- Optimization Techniques for Host Code
- Examples
- Building and Running
- Example Descriptions
- platforms example
- simple example
- mandelbrot, mandelbrot_native examples
- ccode example
- matmpy example
- offline example
- vecadd_openmp example
- vecadd_openmp_t example
- vecadd example
- vecadd_mpax example
- vecadd_mpax_openmp example
- dsplib_fft example
- ooo, ooo_map examples
- null example
- sgemm example
- dgemm example
- edmamgr example
- dspheap example
- Float compute example
- Host Code (main.cpp)
- OpenCL C kernel code (dsp_compute.cl)
- Sample Output
- Monte Carlo example
- Algorithm for Gaussian Random Number Generation
- Executing the code
- Sample Output
- Debug
- Debug with printf
- Host side OpenCL application code
- DSP side OpenCL kernel code
- Debug with gdb
- Host side gdb
- DSP side debug with host side client gdbc6x
- Debug with CCS
- Connect emulator to EVM and CCS
- Debug DSP side code with CCS
- Debug with dsptop
- Debug with printf
- Profiling
- Host Side Profiling
- DSP Side Profiling
- OpenCL on TI-RTOS
- Overview
- OpenCL on RTOS Package
- Running Examples Shipped with OpenCL Package
- Basic OpenCL RTOS Application Development
- Building Application on Linux
- Building Application on Windows
- Creating an OpenCL RTOS Application
- Limited Customization: Participating DSP Core(s)
- Differences from OpenCL Linux (Host running Linux)
- Advanced OpenCL RTOS Application Development
- Overview
- Frequently Asked Questions
- How do I get support for TI OpenCL products?
- Which TI OpenCL Version is Installed?
- Using Python OpenCL with the TI OpenCL implementation
- Guidelines for porting Stand-alone DSP applications to OpenCL
- Heap Memory Management
- Stack Usage
- Boot Routine Dependencies
- Linker Command Files
- OpenCL Interoperability with Host OpenMP
- MCSDK-HPC to OpenCL Component Version Map
- Does TI’s OpenCL support images and samplers?
- Why does the OpenCL ICD installed on my platform not find the TI OpenCL implementation?
- Why do I get messages about /var/lock/opencl when running OpenCL applications?
- Why do I get DLOAD error messages when running OpenCL applications?
- How do I limit log file sizes on EVM’s temporary file storage (tmpfs)?
- 66AK2* EVMs
- AM57* EVMs
- Readme
- OpenCL v01.01.09.x Readme
- Platforms supported
- Release Notes
- Compiler Versions
- OpenCL v01.01.08.x Readme
- Platforms supported
- Release Notes
- Compiler Versions
- OpenCL v01.01.07.x Readme
- Platforms supported
- Release Notes
- Compiler Versions
- OpenCL v01.01.09.x Readme
- Disclaimer
- Important Notice
0 0
- 【DSP开发】【并行计算-CUDA开发】TI OpenCL v01.01.xx
- 【并行计算-CUDA开发】Windows下opencl环境配置
- 【并行计算-CUDA开发】FPGA 设计者应该学习 OpenCL及爱上OpenCL的十个理由
- 【并行计算-CUDA开发】FPGA 设计者应该学习 OpenCL及爱上OpenCL的十个理由
- 【并行计算-CUDA开发】GPGPU OpenCL/CUDA 高性能编程的10大注意事项
- 【并行计算-CUDA开发】从零开始学习OpenCL开发(一)架构
- 【并行计算-CUDA开发】OpenCL、OpenGL和DirectX三者的区别
- 【并行计算-CUDA开发】Apple's OpenCL——再谈Local Memory
- 【并行计算-CUDA开发】CUDA ---- Warp解析
- 【并行计算-CUDA开发】CUDA存储器模型
- 【并行计算-CUDA开发】CUDA并行存储模型
- 【并行计算-CUDA开发】GPU---并行计算利器
- 【并行计算-CUDA开发】浅谈GPU并行计算新趋势
- 【并行计算-CUDA开发】__syncthreads的理解
- 【并行计算-CUDA开发】 NVIDIA Jetson TX1
- 【并行计算-CUDA开发】OpenACC与OpenHMPP
- 【并行计算-CUDA开发】CUDA bank conflict in shared memory
- 【并行计算-CUDA开发】CUDA shared memory bank 冲突
- Android布局管理器 - 详细解析布局实现
- 使用Graphics类的DrawImageUnscaled时的发现
- Android 的消息机制
- vim编辑器(二)
- 谷歌卫星地图下载器
- 【DSP开发】【并行计算-CUDA开发】TI OpenCL v01.01.xx
- 打印出如下图案(菱形)
- Android 常用方法xml大全
- POJ【4047】——Problem D. Garden 线段树
- nyoj67
- HDU 2841 Visible Trees (容斥原理好题)
- poj 3422 Kaka's Matrix Travels
- 数据结构之括号匹配问题
- android studio中R文件丢失错误