CUDA总结:同步
来源:互联网 发布:win10开机windows聚焦 编辑:程序博客网 时间:2024/06/05 18:48
cuda runtime api的同步行为
from cuda runtime api -2.API synchronization behavior
The API provides memcpy/memset functions in both synchronous and asynchronous forms, the latter having an “Async” suffix. This is a misnomer as each function may exhibit synchronous or asynchronous behavior depending on the arguments passed to the function. In the reference documentation, each memcpy function is categorized as synchronous or asynchronous, corresponding to the definitions below.
Memcpy
Synchronous
同步拷贝操作,默认的拷贝
All transfers involving Unified Memory regions are fully synchronous with respect to the host.
For transfers from pageable host memory to device memory, a stream sync is performed before the copy is initiated. The function will return once the pageable buffer has been copied to the staging memory for DMA transfer to device memory, but the DMA to final destination may not have completed.
For transfers from pinned host memory to device memory, the function is synchronous with respect to the host.
For transfers from device to either pageable or pinned host memory, the function returns only once the copy has completed.
For transfers from device memory to device memory, no host-side synchronization is performed.
For transfers from any host memory to any host memory, the function is fully synchronous with respect to the host.
Asynchronous
异步的拷贝操作,需要显式调用异步版本的memcpy函数
For transfers from device memory to pageable host memory, the function will return only once the copy has completed.
For transfers from any host memory to any host memory, the function is fully synchronous with respect to the host.
For all other transfers, the function is fully asynchronous. If pageable memory must first be staged to pinned memory, this will be handled asynchronously with a worker thread.
Memset
The synchronous memset functions are asynchronous with respect to the host except when the target is pinned host memory or a Unified Memory region, in which case they are fully synchronous. The Async versions are always asynchronous with respect to the host.
对于同步版本的memset,除了pinned内存和统一内存是同步的,其它都是异步的。对于异步版本则全部是异步的。即实质上memset是异步的,使用时需要注意这个问题,CPU端要等待memset完成
Kernel Launches
Kernel launches are asynchronous with respect to the host. Details of concurrent kernel execution and data transfers can be found in the CUDA Programmers Guide.
对于kernel,当然是异步的
注意:以上所说的异步,均是对于CPU来说的,即CPU端调用cuda函数后会立刻执行下一条指令,不会等待cuda函数运行结束。
同步函数
__syncthreads() 同步一个block内的所有线程,实质上是同步一个block内的warp,因为每个warp内的线程是同步的。
对于block间同步,需要采用原子操作
- CUDA总结:同步
- cuda总结
- CUDA 线程同步
- CUDA 同步函数
- CUDA异步和同步传输
- CUDA-CODE6-共享和同步
- CUDA之同步函数详解
- CUDA之同步函数详解
- cuda编程总结(转)
- CUDA编程总结
- CUDA问题总结1
- cuda总结(转)
- cuda 常用函数总结
- cuda编程 总结
- CUDA总结:Unified Memory
- CUDA优化总结
- CUDA总结:纹理内存
- CUDA总结:Occupancy
- python manage.py syncdb Unknown command: 'syncdb'问题解决方法
- MySql- Access denied for user 'mysql用户名'@'主机或IP' (using password: YES)'
- List集合实现自定义排序
- 如何在CentOS 7上安装MySQL
- Event Hubs Receiver Epoch
- CUDA总结:同步
- java读取excel时间不对的问题
- xUtils3中对Sqlite数据库的操作
- FindObjectOfType用法
- 浅谈 &0xFF操作
- lunux卸载mysql
- Android Studio下添加library、jar包、so包
- 单表60亿记录等大数据场景的MySQL优化和运维之道 | 高可用架构
- 金融知识入门之基金基础概念