图像处理的简单优化-04:循环消弱

来源:互联网 发布:java web开发案例精粹 编辑:程序博客网 时间:2024/05/17 08:40

内循环消弱

因为图像宽度超过4K,所以试试消弱这个循环。这次尝试一次处理4个像素,将内层循环减弱为原来1/4:

代码如下所示:

        unsigned int count = width & ~3;unsigned int remains = width & 3;int gray;for(unsigned int h = 0; h < height; h++){for(unsigned int w = 0; w < count; w += 4){GRAY_PIXEL(buffer, gray);GRAY_PIXEL(buffer, gray);GRAY_PIXEL(buffer, gray);GRAY_PIXEL(buffer, gray);}for(unsigned int w = 0; w < remains; w++)GRAY_PIXEL(buffer, gray);}

可以看到,结果有了2ms的提高:

Average:15 msMax:17 Min:14 ms(Max + Min)/2=15 ms

外循环消弱
让我们再接再厉,试试消弱外层循环,一次处理4行。代码如下所示:

        unsigned int hSteps = height & ~3;unsigned int hRemains = height & 3;unsigned int wSteps = width & ~3;unsigned int wRemains = width & 3;unsigned int stride = width * 4;unsigned char* ptr0, *ptr1, *ptr2, *ptr3;int gray;for(unsigned int h = 0; h < hSteps; h += 4){ptr0 = buffer;buffer += stride;ptr1 = buffer;buffer += stride;ptr2 = buffer;buffer += stride;ptr3 = buffer;buffer += stride;for(unsigned int w = 0; w < wSteps; w += 4){GRAY_PIXEL(ptr0, gray);GRAY_PIXEL(ptr0, gray);GRAY_PIXEL(ptr0, gray);GRAY_PIXEL(ptr0, gray);GRAY_PIXEL(ptr1, gray);GRAY_PIXEL(ptr1, gray);GRAY_PIXEL(ptr1, gray);GRAY_PIXEL(ptr1, gray);GRAY_PIXEL(ptr2, gray);GRAY_PIXEL(ptr2, gray);GRAY_PIXEL(ptr2, gray);GRAY_PIXEL(ptr2, gray);GRAY_PIXEL(ptr3, gray);GRAY_PIXEL(ptr3, gray);GRAY_PIXEL(ptr3, gray);GRAY_PIXEL(ptr3, gray);}for(unsigned int w = 0; w < wRemains; w++){GRAY_PIXEL(ptr0, gray);GRAY_PIXEL(ptr1, gray);GRAY_PIXEL(ptr2, gray);GRAY_PIXEL(ptr3, gray);}}for(unsigned int h = 0; h < hRemains; h++){for(unsigned int w = 0; w < wSteps; w+= 4){GRAY_PIXEL(buffer, gray);GRAY_PIXEL(buffer, gray);GRAY_PIXEL(buffer, gray);GRAY_PIXEL(buffer, gray);}for(unsigned int w = 0; w < wRemains; w++){GRAY_PIXEL(buffer, gray);}}

结果...很不给力,对不起这番苦功呀。看来这似乎就是极限了?

Average:15 msMax:16 Min:15 ms(Max + Min)/2=15 ms

使用交错处理

也许可以试试对行进行交错处理。代码如下:

        unsigned int hSteps = height & ~3;unsigned int hRemains = height & 3;unsigned int wSteps = width & ~3;unsigned int wRemains = width & 3;unsigned int stride = width * 4;unsigned char* ptr0, *ptr1, *ptr2, *ptr3;int gray;for(unsigned int h = 0; h < hSteps; h += 4){ptr0 = buffer;buffer += stride;ptr1 = buffer;buffer += stride;ptr2 = buffer;buffer += stride;ptr3 = buffer;buffer += stride;for(unsigned int w = 0; w < wSteps; w += 4){GRAY_PIXEL(ptr0, gray);GRAY_PIXEL(ptr1, gray);GRAY_PIXEL(ptr2, gray);GRAY_PIXEL(ptr3, gray);GRAY_PIXEL(ptr0, gray);GRAY_PIXEL(ptr1, gray);GRAY_PIXEL(ptr2, gray);GRAY_PIXEL(ptr3, gray);GRAY_PIXEL(ptr0, gray);GRAY_PIXEL(ptr1, gray);GRAY_PIXEL(ptr2, gray);GRAY_PIXEL(ptr3, gray);GRAY_PIXEL(ptr0, gray);GRAY_PIXEL(ptr1, gray);GRAY_PIXEL(ptr2, gray);GRAY_PIXEL(ptr3, gray);}for(unsigned int w = 0; w < wRemains; w++){GRAY_PIXEL(ptr0, gray);GRAY_PIXEL(ptr1, gray);GRAY_PIXEL(ptr2, gray);GRAY_PIXEL(ptr3, gray);}}for(unsigned int h = 0; h < hRemains; h++){for(unsigned int w = 0; w < wSteps; w+= 4){GRAY_PIXEL(buffer, gray);GRAY_PIXEL(buffer, gray);GRAY_PIXEL(buffer, gray);GRAY_PIXEL(buffer, gray);}for(unsigned int w = 0; w < wRemains; w++){GRAY_PIXEL(buffer, gray);}}

很令人惊讶,又有了2ms的进步!原因是什么呢?可以猜一下。

14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 13, 14, 14, 14, 13, 14, 14, 985, 13, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 13, 15, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 13, 14, 14, 14, 14, 14, 14, 14, 14, 14, 13, 14, 14, 14, 14, 14, 14, 14, 985, Average:13 msMax:15 Min:13 ms(Max + Min)/2=14 ms

更大尺度的交错

既然交错处理能够提升性能,何不更进一步?先试试8行交错,结果如下:

Average:13 msMax:15 Min:13 ms(Max + Min)/2=14 ms

好像没有多大进步,再试试16行交错:

Average:15 msMax:17 Min:15 ms(Max + Min)/2=16 ms

问题大条了,好像还有点退步了。不死心,再试试其它方法,比如一次处理8个像素,将内循环消弱到原来的1/8:

Average:14 msMax:16 Min:14 ms(Max + Min)/2=15 ms
比4像素平均提高了1ms。再试试8像素4行:

Average:14 msMax:16 Min:14 ms(Max + Min)/2=15 ms

还是没有惊喜。最后,最后试试88像素4行交错:

Average:13 msMax:15 Min:13 ms(Max + Min)/2=14 ms




0 0