性能优化:__builtin_expect详解

来源:互联网 发布:网络外教一对一价格 编辑:程序博客网 时间:2024/06/07 15:26

转自:http://hi.baidu.com/lammy/blog/item/bc5e3d4e869073c3d1c86a89.html

在GTK+2.0源码中有很多这样的宏:G_LIKELY和G_UNLIKELY。比如下面这段代码:

if (G_LIKELY (acat == 1))       /* allocate through magazine layer */      {        ThreadMemory *tmem = thread_memory_from_self();        guint ix = SLAB_INDEX (allocator, chunk_size);        if (G_UNLIKELY (thread_memory_magazine1_is_empty (tmem, ix)))          {            thread_memory_swap_magazines (tmem, ix);            if (G_UNLIKELY (thread_memory_magazine1_is_empty (tmem, ix)))              thread_memory_magazine1_reload (tmem, ix);          }        mem = thread_memory_magazine1_alloc (tmem, ix);      }

在源码中,宏G_LIKELY和G_UNLIKELY 是这么定义的:

#define G_LIKELY(expr) (__builtin_expect (_G_BOOLEAN_EXPR(expr), 1))  #define G_UNLIKELY(expr) (__builtin_expect (_G_BOOLEAN_EXPR(expr), 0))

宏_G_BOOLEAN_EXPR的作用是把expr转换为0和1,即真假两种。要理解宏G_LIKELY和G_UNLIKELY ,很明显必须理解__builtin_expect。__builtin_expect是GCC(version>=2.9)引进的宏,其作用就是帮助编译器判断条件跳转的预期值,避免跳转造成时间乱费。拿上面的代码来说:

if (G_LIKELY (acat == 1))     //表示大多数情况下if里面是真,程序大多数直接执行if里面的程序

if (G_UNLIKELY (thread_memory_magazine1_is_empty (tmem, ix)))//表示大多数情况if里面为假,程序大多数直接执行else里面的程序

可能大家看到还是一头雾水,看下面一段就会明白其中的乐趣啦;

//test_builtin_expect.c #define LIKELY(x) __builtin_expect(!!(x), 1)#define UNLIKELY(x) __builtin_expect(!!(x), 0)
int test_likely(int x){ if(LIKELY(x)) {    x = 5; } else {    x = 6; }   return x;}
int test_unlikely(int x){ if(UNLIKELY(x)) {    x = 5; } else {    x = 6; }   return x;}
[lammy@localhost test_builtin_expect]$ gcc -fprofile-arcs -O2 -c test_builtin_expect.c [lammy@localhost test_builtin_expect]$ objdump -d test_builtin_expect.o
test_builtin_expect.o:       file format elf32-i386
Disassembly of section .text:
00000000 <test_likely>:     0: 55                      push     %ebp     1: 89 e5                   mov      %esp,%ebp     3: 8b 45 08                mov      0x8(%ebp),%eax     6: 83 05 38 00 00 00 01  addl     $0x1,0x38     d: 83 15 3c 00 00 00 00  adcl     $0x0,0x3c  14: 85 c0                   test     %eax,%eax  16: 74 15                   je       2d <test_likely+0x2d>//主要看这里  18: 83 05 40 00 00 00 01  addl     $0x1,0x40  1f: b8 05 00 00 00          mov      $0x5,%eax  24: 83 15 44 00 00 00 00  adcl     $0x0,0x44  2b: 5d                      pop      %ebp  2c: c3                      ret        2d: 83 05 48 00 00 00 01  addl     $0x1,0x48  34: b8 06 00 00 00          mov      $0x6,%eax  39: 83 15 4c 00 00 00 00  adcl     $0x0,0x4c  40: 5d                      pop      %ebp  41: c3                      ret        42: 8d b4 26 00 00 00 00  lea      0x0(%esi,%eiz,1),%esi  49: 8d bc 27 00 00 00 00  lea      0x0(%edi,%eiz,1),%edi
00000050 <test_unlikely>:  50: 55                      push     %ebp  51: 89 e5                   mov      %esp,%ebp  53: 8b 55 08                mov      0x8(%ebp),%edx  56: 83 05 20 00 00 00 01  addl     $0x1,0x20  5d: 83 15 24 00 00 00 00  adcl     $0x0,0x24  64: 85 d2                   test     %edx,%edx  66: 75 15                   jne      7d <test_unlikely+0x2d>//主要看这里  68: 83 05 30 00 00 00 01  addl     $0x1,0x30  6f: b8 06 00 00 00          mov      $0x6,%eax  74: 83 15 34 00 00 00 00  adcl     $0x0,0x34  7b: 5d                      pop      %ebp  7c: c3                      ret        7d: 83 05 28 00 00 00 01  addl     $0x1,0x28  84: b8 05 00 00 00          mov      $0x5,%eax  89: 83 15 2c 00 00 00 00  adcl     $0x0,0x2c  90: 5d                      pop      %ebp  91: c3                      ret        92: 8d b4 26 00 00 00 00  lea      0x0(%esi,%eiz,1),%esi  99: 8d bc 27 00 00 00 00  lea      0x0(%edi,%eiz,1),%edi
000000a0 <_GLOBAL__I_65535_0_test_likely>:  a0: 55                      push     %ebp  a1: 89 e5                   mov      %esp,%ebp  a3: 83 ec 08                sub      $0x8,%esp  a6: c7 04 24 00 00 00 00  movl     $0x0,(%esp)  ad: e8 fc ff ff ff          call     ae <_GLOBAL__I_65535_0_test_likely+0xe>  b2: c9                      leave    b3: c3                      ret      [lammy@localhost test_builtin_expect]$

两个函数编译生成的汇编语句所使用到的跳转指令不一样,仔细分析下会发现__builtin_expect实际上是为了满足在大多数情况不执行跳转指令,所以__builtin_expect仅仅是告诉编译器优化,并没有改变其对真值的判断。

这种用法在Linux内核中也经常用到,国外也有一篇相关的文章,大家不妨看看:http://kernelnewbies.org/FAQ/LikelyUnlikely

不知大家注意到没有,我在生产汇编时用的是gcc -fprofile-arcs -O2 -c test_builtin_expect.c,而不是gcc -O2 -c test_builtin_expect.c,具体可以参考http://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html。




FAQ/LikelyUnlikely

likely() and unlikely()

What are they ?

In Linux kernel code, one often find calls to likely() and unlikely(), in conditions, like :

bvl = bvec_alloc(gfp_mask, nr_iovecs, &idx);if (unlikely(!bvl)) {  mempool_free(bio, bio_pool);  bio = NULL;  goto out;}

In fact, these functions are hints for the compiler that allows it to correctly optimize the branch, by knowing which is the likeliest one. The definitions of these macros, found in include/linux/compiler.h are the following :

#define likely(x)       __builtin_expect(!!(x), 1)#define unlikely(x)     __builtin_expect(!!(x), 0)

The GCC documentation explains the role of __builtin_expect() :

 -- Built-in Function: long __builtin_expect (long EXP, long C)     You may use `__builtin_expect' to provide the compiler with branch     prediction information.  In general, you should prefer to use     actual profile feedback for this (`-fprofile-arcs'), as     programmers are notoriously bad at predicting how their programs     actually perform.  However, there are applications in which this     data is hard to collect.     The return value is the value of EXP, which should be an integral     expression.  The value of C must be a compile-time constant.  The     semantics of the built-in are that it is expected that EXP == C.     For example:          if (__builtin_expect (x, 0))            foo ();     would indicate that we do not expect to call `foo', since we     expect `x' to be zero.  Since you are limited to integral     expressions for EXP, you should use constructions such as          if (__builtin_expect (ptr != NULL, 1))            error ();     when testing pointer or floating-point values.

How does it optimize things ?

It optimizes things by ordering the generated assembly code correctly, to optimize the usage of the processor pipeline. To do so, they arrange the code so that the likeliest branch is executed without performing any jmp instruction (which has the bad effect of flushing the processor pipeline).

To see how it works, let's compile the following simple C user space program with gcc -O2 :

#define likely(x)    __builtin_expect(!!(x), 1)#define unlikely(x)  __builtin_expect(!!(x), 0)int main(char *argv[], int argc){   int a;   /* Get the value from somewhere GCC can't optimize */   a = atoi (argv[1]);   if (unlikely (a == 2))      a++;   else      a--;   printf ("%d\n", a);   return 0;}

Now, disassemble the resulting binary using objdump -S (comments added by me) :

080483b0 <main>: // Prologue 80483b0:       55                      push   %ebp 80483b1:       89 e5                   mov    %esp,%ebp 80483b3:       50                      push   %eax 80483b4:       50                      push   %eax 80483b5:       83 e4 f0                and    $0xfffffff0,%esp //             Call atoi() 80483b8:       8b 45 08                mov    0x8(%ebp),%eax 80483bb:       83 ec 1c                sub    $0x1c,%esp 80483be:       8b 48 04                mov    0x4(%eax),%ecx 80483c1:       51                      push   %ecx 80483c2:       e8 1d ff ff ff          call   80482e4 <atoi@plt> 80483c7:       83 c4 10                add    $0x10,%esp //             Test the value 80483ca:       83 f8 02                cmp    $0x2,%eax //             -------------------------------------------------------- //             If 'a' equal to 2 (which is unlikely), then jump, //             otherwise continue directly, without jump, so that it //             doesn't flush the pipeline. //             -------------------------------------------------------- 80483cd:       74 12                   je     80483e1 <main+0x31> 80483cf:       48                      dec    %eax //             Call printf 80483d0:       52                      push   %edx 80483d1:       52                      push   %edx 80483d2:       50                      push   %eax 80483d3:       68 c8 84 04 08          push   $0x80484c8 80483d8:       e8 f7 fe ff ff          call   80482d4 <printf@plt> //             Return 0 and go out. 80483dd:       31 c0                   xor    %eax,%eax 80483df:       c9                      leave 80483e0:       c3                      ret

Now, in the previous program, replace the unlikely() by a likely(), recompile it, and disassemble it again (again, comments added by me) :

080483b0 <main>: //             Prologue 80483b0:       55                      push   %ebp 80483b1:       89 e5                   mov    %esp,%ebp 80483b3:       50                      push   %eax 80483b4:       50                      push   %eax 80483b5:       83 e4 f0                and    $0xfffffff0,%esp //             Call atoi() 80483b8:       8b 45 08                mov    0x8(%ebp),%eax 80483bb:       83 ec 1c                sub    $0x1c,%esp 80483be:       8b 48 04                mov    0x4(%eax),%ecx 80483c1:       51                      push   %ecx 80483c2:       e8 1d ff ff ff          call   80482e4 <atoi@plt> 80483c7:       83 c4 10                add    $0x10,%esp //             -------------------------------------------------- //             If 'a' equal 2 (which is likely), we will continue //             without branching, so without flusing the pipeline. The //             jump only occurs when a != 2, which is unlikely. //             --------------------------------------------------- 80483ca:       83 f8 02                cmp    $0x2,%eax 80483cd:       75 13                   jne    80483e2 <main+0x32> //             Here the a++ incrementation has been optimized by gcc 80483cf:       b0 03                   mov    $0x3,%al //             Call printf() 80483d1:       52                      push   %edx 80483d2:       52                      push   %edx 80483d3:       50                      push   %eax 80483d4:       68 c8 84 04 08          push   $0x80484c8 80483d9:       e8 f6 fe ff ff          call   80482d4 <printf@plt> //             Return 0 and go out. 80483de:       31 c0                   xor    %eax,%eax 80483e0:       c9                      leave 80483e1:       c3                      ret

How should I use it ?

You should use it only in cases when the likeliest branch is very very very likely, or when the unlikeliest branch is very very very unlikely.


原创粉丝点击