CUDA计时

来源：互联网发布：从零开始写python爬虫编辑：程序博客网时间：2024/05/29 18:35

from：http://blog.sina.com.cn/s/blog_45209f340101341e.html

<1>使用cutil.h中的函数
unsigned int timer=0;
//创建计时器
cutCreateTimer(&timer);
//开始计时
cutStartTimer(timer);
{
//统计的代码段
…………
}
//停止计时
cutStopTimer(timer);
//获得从开始计时到停止之间的时间
cutGetTimerValue( timer);
//删除timer值
cutDeleteTimer( timer);

不知道在这种情况下，统计精度。

<2>time.h中的clock函数
clock_t start, finish;
float costtime;
start = clock();
{
//统计的代码段
…………
}
finish = clock();
//得到两次记录之间的时间差
costtime = (float)(finish - start) / CLOCKS_PER_SEC;
时钟计时单元的长度为1毫秒，那么计时的精度也为1毫秒。

<3>事件event
cudaEvent_t start,stop;
cudaEventCreate(&start);
cudaEventCreate(&stop);
cudaEventRecord(start,0);
{
//统计的代码段
…………
}
cudaEventRecord(stop,0);
float costtime;
cudaEventElapsedTime(&costtime,start,stop);

cudaError_t cudaEventCreate( cudaEvent_t* event )---创建事件对象；
cudaError_t cudaEventRecord( cudaEvent_t event，CUstream stream )--- 记录事件；
cudaError_t cudaEventElapsedTime( float* time，cudaEvent_t start，cudaEvent_t end )---计算两次事件之间相差的时间；
cudaError_t cudaEventDestroy( cudaEvent_t event )---销毁事件对象。
计算两次事件之间相差的时间（以毫秒为单位，精度为0.5微秒）。如果尚未记录其中任何一个事件，此函数将返回cudaErrorInvalidValue。如果记录其中任何一个事件使用了非零流，则结果不确定。

该例子是CUDA_C_Best_Practices_Guide中的例子：

cudaEvent_t start, stop;

float time;

cudaEventCreate(&start);

cudaEventCreate(&stop);

cudaEventRecord( start, 0 );

kernel<<>> ( d_odata, d_idata, size_x, size_y, NUM_REPS);

cudaEventRecord( stop, 0 );

cudaEventSynchronize( stop );

cudaEventElapsedTime( &time, start, stop );

cudaEventDestroy( start );

cudaEventDestroy( stop );

需要注意的是函数cudaEventSynchronize() 不可或缺，因为CUDA的kernel函数是以异步方式执行的，调用后立刻返回，这会导致计时不准确。cudaEventSynchronize(stop)会使得直到GPU执行完cudaEventRecord(stop, 0)之前的所有语句时，事件stop才会被记录下来，即起到同步的作用。

0 0