CUDAExample-0-asyncAPI
来源:互联网 发布:无线破解密码软件 编辑:程序博客网 时间:2024/06/05 10:06
标签: CUDAExample
This sample illustrates the usage of CUDA events for both GPU timing and overlapping CPU and GPU execution. Events are inserted into a stream of CUDA calls. Since CUDA stream calls are asynchronous, the CPU can perform computations while GPU is executing (including DMA memcopies between the host and device). CPU can query CUDA events to determine whether GPU has completed tasks.
例程主要用于说明,gpu和cpu是可以同是执行的,即当gpu工作时,cpu也在工作,例子中先在gpu中计算加法,在cpu中进行计数,输出gpu使用时间,以及cpu计数。
查看显卡硬件信息
int devID; cudaDeviceProp deviceProps; //定义结构体类型的变量,保存硬件信息 // This will pick the best possible CUDA capable device devID = findCudaDevice(argc, (const char **)argv); //寻找合适的设备,只选一个 // get device name checkCudaErrors(cudaGetDeviceProperties(&deviceProps, devID)); printf("CUDA device [%s]\n", deviceProps.name);
运行结果:
CUDA device [GeForce GTX 980]
其中函数findCudaDevice()来自库文件helper_cuda.h,选择硬件,优先选择用户指定的设备,如果用户没有指定,则选择Gflops(每秒千兆次浮点运算)最高的一个硬件,返回设备ID号。
主要代码
// create cuda event handles cudaEvent_t start, stop; checkCudaErrors(cudaEventCreate(&start)); checkCudaErrors(cudaEventCreate(&stop)); StopWatchInterface *timer = NULL; sdkCreateTimer(&timer); sdkResetTimer(&timer); checkCudaErrors(cudaDeviceSynchronize()); float gpu_time = 0.0f; // asynchronously issue work to the GPU (all to stream 0) sdkStartTimer(&timer); cudaEventRecord(start, 0); cudaMemcpyAsync(d_a, a, nbytes, cudaMemcpyHostToDevice, 0); increment_kernel<<<blocks, threads, 0, 0>>>(d_a, value); //gpu并行计算加法核函数 cudaMemcpyAsync(a, d_a, nbytes, cudaMemcpyDeviceToHost, 0); cudaEventRecord(stop, 0); sdkStopTimer(&timer); // have CPU do some work while waiting for stage 1 to finish unsigned long int counter=0; while (cudaEventQuery(stop) == cudaErrorNotReady) //在gpu结束计算之间cpu进行计数 { counter++; } checkCudaErrors(cudaEventElapsedTime(&gpu_time, start, stop)); // print the cpu and gpu times 输出 printf("time spent executing by the GPU: %.2f\n", gpu_time); printf("time spent by CPU in CUDA calls: %.2f\n", sdkGetTimerValue(&timer)); printf("CPU executed %lu iterations while waiting for GPU to finish\n", counter);
运行结果
CUDA device [GeForce GTX 980]
time spent executing by the GPU: 12.40
time spent by CPU in CUDA calls: 0.04
CPU executed 2439 iterations while waiting for GPU to finish
从结果中可以看出cpu和gpu可以同时工作。
End
- CUDAExample-0-asyncAPI
- CUDAExample-0-cdpSimplePrint
- CUDAExample-0-clock
- CUDAExample-0-cppIntegration
- CUDAExample-0-cppOverload
- $0
- %~0
- #0
- '\0'
- #0
- ${0##*/} ${0#*/} ${0%/*} ${0%%/*}
- \0
- ${0##*/} ${0#*/} ${0%/*} ${0%%/*}
- 0 0
- 0 0
- 0 0
- 0,'\0','0'
- pid(0,0,0) erlang
- Mysql中文乱码问题完美解决方案(包括建库、导入数据、网页)
- Android之菜单总结
- 短信验证安卓集成mob.com
- LightOJ 1138Trailing Zeroes (III)
- codeforces 324# D. Dima and Lisa (素数问题)
- CUDAExample-0-asyncAPI
- XSD中的内置数据类型
- 地产IT人福利:帆软地产BI解决方案全解析
- OpenCV基本绘图
- 通过zssh在服务器和本地之间上传和下载文件
- 根据Item数动态设定ListView高度
- 深入探索 JUnit 4
- hadoop2.2重新格式化namenode
- Android 源码获取——在Windows环境下通过Git得到Android源代码