
来源:互联网 发布:无线破解密码软件 编辑:程序博客网 时间:2024/06/05 10:06

标签: CUDAExample

This sample illustrates the usage of CUDA events for both GPU timing and overlapping CPU and GPU execution. Events are inserted into a stream of CUDA calls. Since CUDA stream calls are asynchronous, the CPU can perform computations while GPU is executing (including DMA memcopies between the host and device). CPU can query CUDA events to determine whether GPU has completed tasks.



    int devID;    cudaDeviceProp deviceProps; //定义结构体类型的变量,保存硬件信息    // This will pick the best possible CUDA capable device    devID = findCudaDevice(argc, (const char **)argv);   //寻找合适的设备,只选一个    // get device name    checkCudaErrors(cudaGetDeviceProperties(&deviceProps, devID));    printf("CUDA device [%s]\n",;

CUDA device [GeForce GTX 980]



    // create cuda event handles    cudaEvent_t start, stop;    checkCudaErrors(cudaEventCreate(&start));    checkCudaErrors(cudaEventCreate(&stop));    StopWatchInterface *timer = NULL;    sdkCreateTimer(&timer);    sdkResetTimer(&timer);    checkCudaErrors(cudaDeviceSynchronize());    float gpu_time = 0.0f;    // asynchronously issue work to the GPU (all to stream 0)    sdkStartTimer(&timer);    cudaEventRecord(start, 0);    cudaMemcpyAsync(d_a, a, nbytes, cudaMemcpyHostToDevice, 0);    increment_kernel<<<blocks, threads, 0, 0>>>(d_a, value); //gpu并行计算加法核函数    cudaMemcpyAsync(a, d_a, nbytes, cudaMemcpyDeviceToHost, 0);    cudaEventRecord(stop, 0);    sdkStopTimer(&timer);    // have CPU do some work while waiting for stage 1 to finish    unsigned long int counter=0;    while (cudaEventQuery(stop) == cudaErrorNotReady) //在gpu结束计算之间cpu进行计数    {        counter++;    }    checkCudaErrors(cudaEventElapsedTime(&gpu_time, start, stop));    // print the cpu and gpu times 输出    printf("time spent executing by the GPU: %.2f\n", gpu_time);    printf("time spent by CPU in CUDA calls: %.2f\n", sdkGetTimerValue(&timer));    printf("CPU executed %lu iterations while waiting for GPU to finish\n", counter);

CUDA device [GeForce GTX 980]
time spent executing by the GPU: 12.40
time spent by CPU in CUDA calls: 0.04
CPU executed 2439 iterations while waiting for GPU to finish



0 0