图像处理的简单优化-04:循环消弱
来源:互联网 发布:java web开发案例精粹 编辑:程序博客网 时间:2024/05/17 08:40
内循环消弱
因为图像宽度超过4K,所以试试消弱这个循环。这次尝试一次处理4个像素,将内层循环减弱为原来1/4:
代码如下所示:
unsigned int count = width & ~3;unsigned int remains = width & 3;int gray;for(unsigned int h = 0; h < height; h++){for(unsigned int w = 0; w < count; w += 4){GRAY_PIXEL(buffer, gray);GRAY_PIXEL(buffer, gray);GRAY_PIXEL(buffer, gray);GRAY_PIXEL(buffer, gray);}for(unsigned int w = 0; w < remains; w++)GRAY_PIXEL(buffer, gray);}
可以看到,结果有了2ms的提高:
Average:15 msMax:17 Min:14 ms(Max + Min)/2=15 ms
外循环消弱
让我们再接再厉,试试消弱外层循环,一次处理4行。代码如下所示:
unsigned int hSteps = height & ~3;unsigned int hRemains = height & 3;unsigned int wSteps = width & ~3;unsigned int wRemains = width & 3;unsigned int stride = width * 4;unsigned char* ptr0, *ptr1, *ptr2, *ptr3;int gray;for(unsigned int h = 0; h < hSteps; h += 4){ptr0 = buffer;buffer += stride;ptr1 = buffer;buffer += stride;ptr2 = buffer;buffer += stride;ptr3 = buffer;buffer += stride;for(unsigned int w = 0; w < wSteps; w += 4){GRAY_PIXEL(ptr0, gray);GRAY_PIXEL(ptr0, gray);GRAY_PIXEL(ptr0, gray);GRAY_PIXEL(ptr0, gray);GRAY_PIXEL(ptr1, gray);GRAY_PIXEL(ptr1, gray);GRAY_PIXEL(ptr1, gray);GRAY_PIXEL(ptr1, gray);GRAY_PIXEL(ptr2, gray);GRAY_PIXEL(ptr2, gray);GRAY_PIXEL(ptr2, gray);GRAY_PIXEL(ptr2, gray);GRAY_PIXEL(ptr3, gray);GRAY_PIXEL(ptr3, gray);GRAY_PIXEL(ptr3, gray);GRAY_PIXEL(ptr3, gray);}for(unsigned int w = 0; w < wRemains; w++){GRAY_PIXEL(ptr0, gray);GRAY_PIXEL(ptr1, gray);GRAY_PIXEL(ptr2, gray);GRAY_PIXEL(ptr3, gray);}}for(unsigned int h = 0; h < hRemains; h++){for(unsigned int w = 0; w < wSteps; w+= 4){GRAY_PIXEL(buffer, gray);GRAY_PIXEL(buffer, gray);GRAY_PIXEL(buffer, gray);GRAY_PIXEL(buffer, gray);}for(unsigned int w = 0; w < wRemains; w++){GRAY_PIXEL(buffer, gray);}}
结果...很不给力,对不起这番苦功呀。看来这似乎就是极限了?
Average:15 msMax:16 Min:15 ms(Max + Min)/2=15 ms
使用交错处理
也许可以试试对行进行交错处理。代码如下:
unsigned int hSteps = height & ~3;unsigned int hRemains = height & 3;unsigned int wSteps = width & ~3;unsigned int wRemains = width & 3;unsigned int stride = width * 4;unsigned char* ptr0, *ptr1, *ptr2, *ptr3;int gray;for(unsigned int h = 0; h < hSteps; h += 4){ptr0 = buffer;buffer += stride;ptr1 = buffer;buffer += stride;ptr2 = buffer;buffer += stride;ptr3 = buffer;buffer += stride;for(unsigned int w = 0; w < wSteps; w += 4){GRAY_PIXEL(ptr0, gray);GRAY_PIXEL(ptr1, gray);GRAY_PIXEL(ptr2, gray);GRAY_PIXEL(ptr3, gray);GRAY_PIXEL(ptr0, gray);GRAY_PIXEL(ptr1, gray);GRAY_PIXEL(ptr2, gray);GRAY_PIXEL(ptr3, gray);GRAY_PIXEL(ptr0, gray);GRAY_PIXEL(ptr1, gray);GRAY_PIXEL(ptr2, gray);GRAY_PIXEL(ptr3, gray);GRAY_PIXEL(ptr0, gray);GRAY_PIXEL(ptr1, gray);GRAY_PIXEL(ptr2, gray);GRAY_PIXEL(ptr3, gray);}for(unsigned int w = 0; w < wRemains; w++){GRAY_PIXEL(ptr0, gray);GRAY_PIXEL(ptr1, gray);GRAY_PIXEL(ptr2, gray);GRAY_PIXEL(ptr3, gray);}}for(unsigned int h = 0; h < hRemains; h++){for(unsigned int w = 0; w < wSteps; w+= 4){GRAY_PIXEL(buffer, gray);GRAY_PIXEL(buffer, gray);GRAY_PIXEL(buffer, gray);GRAY_PIXEL(buffer, gray);}for(unsigned int w = 0; w < wRemains; w++){GRAY_PIXEL(buffer, gray);}}
很令人惊讶,又有了2ms的进步!原因是什么呢?可以猜一下。
14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 13, 14, 14, 14, 13, 14, 14, 985, 13, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 13, 15, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 13, 14, 14, 14, 14, 14, 14, 14, 14, 14, 13, 14, 14, 14, 14, 14, 14, 14, 985, Average:13 msMax:15 Min:13 ms(Max + Min)/2=14 ms
更大尺度的交错
既然交错处理能够提升性能,何不更进一步?先试试8行交错,结果如下:
Average:13 msMax:15 Min:13 ms(Max + Min)/2=14 ms
好像没有多大进步,再试试16行交错:
Average:15 msMax:17 Min:15 ms(Max + Min)/2=16 ms
问题大条了,好像还有点退步了。不死心,再试试其它方法,比如一次处理8个像素,将内循环消弱到原来的1/8:
Average:14 msMax:16 Min:14 ms(Max + Min)/2=15 ms比4像素平均提高了1ms。再试试8像素4行:
Average:14 msMax:16 Min:14 ms(Max + Min)/2=15 ms
还是没有惊喜。最后,最后试试88像素4行交错:
Average:13 msMax:15 Min:13 ms(Max + Min)/2=14 ms
0 0
- 图像处理的简单优化-04:循环消弱
- 图像处理的简单优化-01:结构设计
- 图像处理的简单优化-05:评估
- 图像处理的简单优化-06: SIMD
- 图像处理的简单优化-03:优化-消除临时变量
- 图像处理的简单优化-02:算法设计
- 图像的简单处理
- 简单的图像处理
- 【图像处理】简单的图像处理软件
- 优化JavaScript处理循环的性能
- 图像截屏的优化处理方案
- UITableView的优化处理(图像)
- 2个for循环的简单优化
- 一个简单的图像处理的程序
- 简单的Java图像处理程序
- 简单的图像处理系统v2
- 【数字图像处理】图像的简单几何变换
- c# 中图像的简单二值化处理
- HBase 快照操作
- 三星Galaxy Note II LTE/T889刷机教程/破解教程/ROM/刷机包下载索引
- CH1-3: remove duplicate char in a string, with/without additional buffer
- 图像处理的简单优化-03:优化-消除临时变量
- LeetCode: Linked List Cycle
- 图像处理的简单优化-04:循环消弱
- C++常用字符串长度计算函数
- linux下的c++(base 1)
- 图像处理的简单优化-05:评估
- Android手机访问Django测试服务器方法
- 硬件工程师的系统开发之路
- 通过一个例子学习stringstream
- Ubuntu常用命令
- 从明源动力到创新工场这一路走来