ics读书笔记——优化程序性能(一)

来源:互联网 发布:sqlserver误删数据库 编辑:程序博客网 时间:2024/06/14 19:47

本文为cs;app2第5章5.4~5.6学习笔记

1、消除循环的低效率:代码移动(code motiobn):

包括识别要执行多次(例如在循环里)但是计算结果不会改变的计算

例:

void combine1(vec_ptr v, data_t *dest){  long int i;  *dest = IDENT;  for (i = 0; i < vec_length(v); i++) {    data_t val;    get_vec_element(v, i, &val);    *dest = *dest OP val;  }}void combine2(vec_ptr v, data_t *dest){  long int i;  long int length = vec_length(v);*dest = IDENT;  for (i = 0; i < length; i++) {    data_t val;    get_vec_element(v, i, &val);    *dest = *dest OP val;  }}

参考:cs:app2:5.4 combine1, combine2


2、减少过程调用


void combine3(vec_ptr v, data_t *dest){  long int i;  long int length = vec_length(v);  data_t *data = get_vec_start(v);  *dest = IDENT;  for ( i = 0 ; i < length ; i++ ) {    *dest  = *dest OP data[i] ;}


改动后性能改进不明显,且跳过函数调用直接访问数组获取Vector中的元素


data_t *get_vec_start(vec_ptr v){   return v->data;}


3、消除不必要的存储器引用

combine3 x86-64代码

combine3: data_t = float, OP = *i in %rdx, data in %rax, dest in %rbp1 .L498: loop:2 movss (%rbp),  %xmm0 Read product from dest3 mulss (%rax,%rdx,4),  %xmm0 Multiply product by data[i]4 movss %xmm0,  (%rbp) Store product at dest5 addq $1,  %rdx Increment i6 cmpq %rdx,  %r12 Compare i:limit7 jg .L498If >, goto loop

2~4行多次访问内存,影响效率

改写combine3为combine4

void combine4(vec_ptr v, data_t *dest){  long int i;  long int length = vec_length(v);  data_t *data = get_vec_start(v);  data_t x = IDENT;  for (i = 0; i < length; i++)    x = x OP data[i];  *dest = x;}


上面这段程序的x86-64代码

combine4: data_t = float, OP = *i in %rdx, data in %rax, limit in %rbp, acc in %xmm01 .L488: loop:2 mulss (%rax,%rdx,4), %xmm0 Multiply acc by data[i]3 addq $1, %rdx     Increment i4 cmpq %rdx, %rbp     Compare limit:i5 jg .L488     If >, goto loop



注:编译器现在可以用%xmm0保存累积值

        练习题5.4,combine3用-O2选项的gcc编译时,cpe远好于-O1时

疑问:这种优化是否指编写“不妨碍编译器用某种算法实现更好的优化的代码”?

参考:http://csapp.cs.cmu.edu/

原创粉丝点击