【opencv 官方教程】翻译9 GPU加速

来源：互联网发布：店铺logo设计软件编辑：程序博客网时间：2024/05/29 14:21

Squeeze out every little computation power from your system by using the power of your video card to run the OpenCV algorithms.

Similarity check (PNSR and SSIM) on the GPU
Compatibility: > OpenCV 2.0
Author: Bernát Gábor
This will give a good grasp on how to approach coding on the GPU module, once you already know how to handle the other modules. As a test case it will port the similarity methods from the tutorialVideo Input with OpenCV and similarity measurement to the GPU.

注意！本章在官网上被标记被待更新

还记得视频输入输出和相似度测量Video Input with OpenCV and similarity measurement吗？

在上文中提到的PNSR和SSIM都是检查两个图片相似度的方法。你会发现这两个方法，特别是后者，特别慢！

现在可以告诉你NVidia粑粑的CUDA GPU能加速这个计算。

这是相应的代码here.

double getPSNR(const Mat& I1, const Mat& I2){    Mat s1;    absdiff(I1, I2, s1);       // |I1 - I2|    s1.convertTo(s1, CV_32F);  // cannot make a square on 8 bits    s1 = s1.mul(s1);           // |I1 - I2|^2    Scalar s = sum(s1);         // sum elements per channel    double sse = s.val[0] + s.val[1] + s.val[2]; // sum channels    if( sse <= 1e-10) // for small values return zero        return 0;    else    {        double  mse =sse /(double)(I1.channels() * I1.total());        double psnr = 10.0*log10((255*255)/mse);        return psnr;    }}

double getPSNR_CUDA(const Mat& I1, const Mat& I2){    cuda::GpuMat gI1, gI2, gs, t1,t2;    gI1.upload(I1);    gI2.upload(I2);    gI1.convertTo(t1, CV_32F);    gI2.convertTo(t2, CV_32F);    cuda::absdiff(t1.reshape(1), t2.reshape(1), gs);    cuda::multiply(gs, gs, gs);    Scalar s = cuda::sum(gs);    double sse = s.val[0] + s.val[1] + s.val[2];    if( sse <= 1e-10) // for small values return zero        return 0;    else    {        double  mse =sse /(double)(gI1.channels() * I1.total());        double psnr = 10.0*log10((255*255)/mse);        return psnr;    }}

对比上面两段代码，基本的操作过程中多了将Mat放到cuda:GpuMat中这一步，其余过程基本不变。使用的方法转化为了在命名空间cuda下的相应操作。

说明

注意减少CPU向GPU传输的次数，那样反而会更慢。

GPU中有很多独立的处理单元，这意味着并行处理的可能，并且处理单元越多处理能力越强。

GPU有独立的存储空间，处理前后需要将处理数据上传，随后下载处理结果。小的处理过程并不适于交给GPU处理因为上传、下载的时间会更长。

Mat中的数据存储在CPU中，对应的在GPU中的对象是cv::cuda::GpuMat。后者仅支持2D并且不会返回引用。上传过程为upload，下载可以通过赋值函数（=）

减少GPU存储空间分配次数来保证io效率

减少不必要的数据传输，还是io问题，把

b.t1 = 2 * b.mu1_mu2 + C1;

改成

gpu::multiply(b.mu1_mu2, 2, b.t1);//b.t1 = 2 * b.mu1_mu2 + C1;

gpu::add(b.t1, C1, b.t1);
使用异步调用

0 0