Understanding C/C++ Strict Aliasing

来源：互联网发布：佛道知乎编辑：程序博客网时间：2024/06/03 02:25

Understanding C/C++ Strict Aliasing

深入理解C/C++中的`Strict Aliasin`规则

or - Why won't the #$@##@^% compiler let me do what I need to do!

副标题 -- 为什么编译器违背了我的意愿!

What's The Problem? 引出问题

There's a lot of confusion about strict aliasing rules. The main source of people's confusion is that there are two different audiences that talk about aliasing, developers who use compilers, and compiler writers. In this document I'm going to try to clear it all up for you. The things that I'm going to cover are based on the aliasing rules in C89/90 (6.3), C98/99 (6.5/7) as well as in C++98 (3.10/15), and C++11 (3.10/10). To find the aliasing rules in any current version of the C or C++ standards, search for "may not be aliased", which will find a footnote that refers back up to the section on allowable forms of aliasing. For information about what was on the mind of the creators of the spec, see C89 Rationale Section 3.3 Expressions, where they talk about why and how the aliasing rules came about.

我发现很多普通的开发者一直对strict aliasing规则感到困惑，我想最主要的原因是他们没能理解strict aliasing其实是`程序`与`编译器`之间的一个优化约定导致的，从而导致编译器优化我们的程序时违背了我们原来的意思。在本篇文章中，我就同时站在`程序`和`编译器`的角度分析，力争把这个问题彻底搞清楚。接下来我的分析完全是基于C/C++标准中关于aliasing rules的论述进行展开的。在这里我给大家一个小窍门可以迅速地找到标准中描述aliasing rules的章节，就是以关键字“may not be aliased”搜索，这样会定位到文档中的一个脚注，依据该脚注就能很快地找到标准中描述aliasing rules的实际部分了。具体的可以参阅C89标准3.3表达式小节，在那里有关于aliasing rules的详细描述。

Developers get interested in aliasing when a compiler gives them a warning about type punning and strict aliasing rules and they try to understand what the warnings mean. They Google for the warning message, they find references to the section on aliasing in one of the C or C++ specs and think, "Yes, that's what I'm trying to do, alias." Then they study that section of the appropriate spec like they're studying arcane runes and try to divine the rules that will let them do the things that they're trying to do. They think that the aliasing rules are written to tell them how to do type punning. They couldn't be more wrong.

据我所知，aliasing rules引起普通开发者的注意是因为编译器报出了与“type punning”和“strict aliasing规则”相关的警告信息。于是乎，他们通过google搜索，千辛万苦终于找到标准中关于aliasing rules的章节，啊！异常兴奋，然后潜心埋进去学习，试图通过仔细研读标准内容而让自己以后不再犯错。然而，我想说的是，他们大错特错了。

The compiler writers know what the strict aliasing rules are for. They are written to let compiler writers know when they can safely assume that a change made through one variable won't affect the value of another variable, and conversely when they have to assume that two variables might actually refer to the same spot in memory.

相反的，编译器开发者是真真切切地了解什么是strict aliasing规则的。为什么上面说普通开发者是错误的呢？因为标准的内容压根就不是给普通开发者看的，而是给编译器开发者看的 -- 标准中关于aliasing rules的描述是用来告诉编译器开发者什么情况下可以安全地假设多个指针变量不会指向同一块内存，又是在什么情况下必须假设多个指针变量可能指向同一块内存。

So this document is divided into two parts. First I'll talk about what strict aliasing is and why it exists, and then I'll talk about how to do the kinds of things developers need to do in ways that won't come in conflict with those rules.

鉴于此，我打算将本文分成两部分。第一部分主要讨论什么是strict aliasing规则，以及它存在的必要性。第二部分总结一些经验用来告诉普通的程序开发者如何来规避违反strict aliasing规则。

Part the first. What is aliasing exactly? 第一部分。究竟什么是strict aliasing规则？

Aliasing is when more than one lvalue refers to the same memory location (when you hear lvalue, think of things (variables) that can be on the left-hand side of assignments), i.e. that are modifiable. As an example:

要理解什么是strict aliasing规则，首先需要理解什么是aliasing？aliasing指的是若干个左值同时指向同一块内存，这种情况下我们称这若干个左值彼此是aliasing的。举例如下：

int anint;int *intptr=&anint;

If you change the value of*intptr, the value referenced byanint also changes because*intptr aliasesanint, it's just another name for the same thing. Another example is:

如果你改变*intptr的值，那么anint的值同样改变，因为*intptr和anint彼此之间是aliasing的。再看一个例子：

int anint;void foo(int &i1, int &i2);foo(anint,anint);

Within the body of foo since we usedanint for both arguments, the two references, i1, andi2 alias, i.e. refer to the same location when foo is called this way.

可以看到我们传递给函数foo的都是变量anint的引用，这种情况下，在函数foo内我们称i1和i2是彼此aliasing的。

What's the problem? 会存在什么问题呢？

Examine the following code:
使用下面代码片段来做个测试：

int anint;void foo(double *dblptr){    anint = 1;    *dblptr = 3.14159;    bar(anint);}

Looking at this, it looks safe to assume that the argument to bar()is a constant 1. In the bad old days compiler writers had to make worst-case aliasing assumptions, to support lots of crazy wild west legacy code, and could not say that it was safe to assume the argument to barwas 1. They had to insert code to reload the value of anintfor the call, because the intervening assignment throughdblptr could have changed the value ofanint if dblptr pointed to it. It's possible that the call tofoowasfoo((double *) &anint).

咋一看，貌似可以肯定地说函数bar接受的实参值为常量1。然而你知道吗？在很久以前那黑暗的时光里，面对各种稀奇古怪的代码，编译器开发者必须要做最`糟糕`的aliasing假设，也就是不能够假设函数bar接受的实参值肯定为1。而是在给函数bar传递实参之前，必须生成相应的指令去anint的所在内存处去重新获取anint的值。为什么要这么做呢，这样做岂不是很不高效？这是因为中间插入的代码*dblptr = 3.14159;很有可能去改变anint的值。什么，怎么可能呢？你可能觉得很匪夷所思，但是的确是有这样的可能的，例如这样来调用函数foo：foo((double *) &anint)（哎呀，好变态呀）。

That's the problem that strict aliasing is intended to fix. There was low hanging fruit for compiler optimizer writers to pick and they wanted programmers to follow the aliasing rules so that they could pluck those fruit. Aliasing, and the problems it leads to, have been there as long as C has existed. The difference lately, is that compiler writers are being strict about the rules and enforcing them when optimization is in effect. In their respective standards, C and C++ include lists of the things that can legitimately alias, (see the next section), and in all other cases, compiler writers are allowed to assume no interactions between lvalues. Anything not on the list can be assumed to not alias, and compiler writers are free to do optimizations that make that assumption. For anything on the list, aliasing could possibly occur and compiler writers have to assume that it does. When compiler writers follow these lists, and assume that your code follows the rules, it's called strict-aliasing. Under strict aliasing, the compiler writer is free to optimize the function foo above because incompatible types,double and int, can't alias. That means that if you do call foo asfoo((double *)&anint) something will go quickly wrong, but you get what you deserve.

由此可见，由于aliasing导致了编译器生成的机器代码很不高效，其实，由aliasing导致的上述问题自从C语言存在时就一直存在，编译器开发者们越来越无法忍受了。于是乎，标准出台了strict aliasing规则来使编译器开发者们消消气，而普通开发者写程序时则必须遵守strict aliasing规则，否则有时会被坑的很惨。自从有了strict aliasing规则后，编译器开发者就可以理直气壮地生成高效的代码进行程序的优化了。现如今的C/C++标准中都有详细描述哪些情况下是合法的alias（具体下一小节会列出），除此以外，编译器开发者就可以假设不存在彼此aliasing的变量，从而可以尽情得来做程序的优化工作了。编译器开发者遵守标准的描述，并且假设普通开发者的程序也是严格遵守规定的这个过程就被称为strict-aliasing。在严格遵守strict aliasing规则的情况下，编译器就可以优化上面的函数foo了，因为double和int是彼此不兼容的类型，所以编译器认为它们不可能彼此aliasing，也就是假设语句*dblptr = 3.14159;不可能会改变anint的值，因此当你foo((double *)&anint)时，编译器最后传递给函数bar的实参值依然为1。

So what can alias? 到底哪些是合法的alias？
From C9899:201x 6.5 Expressions:
7. An object shall have its stored value accessed only by an lvalue expression that has one of the following types:

— a type compatible with the effective type of the object,
— a qualified version of a type compatible with the effective type of the object,
— a type that is the signed or unsigned type corresponding to the effective type of the object,
— a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
— an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
— a character type.

These can be summarized as follows:

可以总结出以下几点来：

Things that are compatible types or differ only by the addition of any combination ofsigned, unsigned, orvolatile. For most purposes compatible type just means the same type. If you want more details you can read the specs. (Example: If you get a pointer to long, and a pointer to const unsigned long they could point to the same thing.)
An aggregate (struct or class) or union type can alias types contained inside them. (Example: If a function gets passed a pointer to anint, and a pointer to a struct or unioncontaining an int, or possibly containing another struct or union containing an int, or containing...ad infinitum, it's possible that the int* points to anint contained inside thestruct or union pointed at by the other pointer.)
A character type. A char*, signed char*, orunsigned char* is specifically allowed by the specs to point to anything. That means it can alias anything in memory.
For C++ only, a possibly CV (const and/or volatile) qualified base class type of a dynamic type can alias the child type. (Example: ifclass dog hasclass animal for a base class, pointers or references to class dog andclass animalcan alias.)

Of course references have all these same issues and pointers and references can alias. Any lvalue has to be assumed to possibly alias to another lvalue if these rules say that they can alias. An aliasing issue is just as likely to come up with values passed by reference as it is with values passed as pointer to values. Additionally any combination of pointers and references have a possibility of aliasing, and you'd have to consult the aliasing rules to see if it might happen.

Part the second. How to do something the compiler doesn't like.

The following program swaps the halves of a 32 bit integer, and is typical of code you might use to handle data passed between a little-endian and big-endian machine. It also generates 6 warnings about breaking strict-aliasing rules. Many would dismiss them. The correct output of the program is:

00000020 00200000

but when optimization is turned on it's:

00000020 00000020

THAT's what the warning is trying to tell you, that the optimizer is going to do things that you don't like. Don't think this means that the optimizer broke your code. It's already broken. The optimizer just pointed it out for you.

Broken Version

uint32_tswaphalves(uint32_t a){    uint32_t acopy = a;    uint16_t *ptr=(uint16_t *)&acopy;// can't use static_cast<>, not legal.    // you should be warned by that.    uint16_t tmp = ptr[0];    ptr[0] = ptr[1];    ptr[1] = tmp;    return acopy;}int main(void){    uint32_t a;    a = 32;    cout << hex << setfill('0') << setw(8) << a << endl;    a = swaphalves(a);    cout << setw(8) << a << endl;}

So what goes wrong? Since a uint16_t can't alias a uint32_t, under the rules, it's ignored in considering what to do with acopy. Since it sees that nothing is done with acopy inside the swaphalvesfunction, it just returns the original value of a. Here's the (annotated) x86 assembler generated by gcc 4.4.1 for swaphalves, let's see what went wrong:

_Z10swaphalvesj:    pushl   %ebp    movl    %esp, %ebp    subl    $16, %esp    movl    8(%ebp), %eax   # get a in %eax    movl    %eax, -8(%ebp)  # and store it in acopy    leal    -8(%ebp), %eax  # now get eax pointing at acopy (ptr=&acopy)    movl    %eax, -12(%ebp) # save that ptr at -12(%ebp)    movl    -12(%ebp), %eax # get the ptr back in %eax    movzwl  (%eax), %eax    # get 16 bits from ptr[0] in eax    movw    %ax, -2(%ebp)   # store the 16 bits into tmp    movl    -12(%ebp), %eax # get the ptr back in eax    addl    $2, %eax        # bump up by two to get to ptr[1]    movzwl  (%eax), %edx    # get that 16 bits into %edx    movl    -12(%ebp), %eax # get ptr into eax    movw    %dx, (%eax)     # store the 16 bits into ptr[1]    movl    -12(%ebp), %eax # get the ptr again    leal    2(%eax), %edx   # get the address of ptr[1] into edx    movzwl  -2(%ebp), %eax  # get tmp into eax    movw    %ax, (%edx)     # store into ptr[1]    movl    -8(%ebp), %eax  # forget all that, return original a.    leave    ret

Scary, isn't it? Of course, if you are using gcc, you could use -fno-strict-aliasing to get the output you expect, but the generated code won't be as good, and you're just treating the symptom instead of curing the problem. A better way to accomplish the same thing without the warnings or the incorrect output is to define swaphalves like this. N.B. this is supported in C99 and later C specs, as noted in this footnote to 6.5.2.3 Structure and union members :

85. If the member used to access the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called ‘‘type punning’’). This might be a trap representation.

but your mileage may vary in C++. All C++ compilers that I know of support it, but the C++ spec doesn't allow it, so it would be risky to count on it. Right after this discussion I'll have another solution with memcpy that may be, (but probably isn't), slightly less efficient, and is supported by both C and C++):

Another Broken version, referencing a twice.

uint32_tswaphalves(uint32_t a){    a = (a >>= 16) | ( a <<= 16);    return a;}

This version looks reasonable, but you don't know if the right and left sides of the | will each get the original version of a or if one of them will get the result of the other. There's no sequence point here, so we don't know anything about the order of operations here, and you may get different results from the same compiler using different levels of optimization.

Union version. Fixed for C but not guaranteed portable to C++.

uint32_tswaphalves(uint32_t a){    typedef union {         uint32_t as32bit;         uint16_t as16bit[2];     } swapem;    swapem s={a};    uint16_t tmp;    tmp=s.as16bit[0];    s.as16bit[0]=s.as16bit[1];    s.as16bit[1]=tmp;    return s.as32bit;}

The C++ compiler knows that members of a union fill the same memory, and this helps the compiler generate MUCH better code:

_Z10swaphalvesj:    pushl   %ebp                # save the original value of ebp    movl    %esp, %ebp          # point ebp at the stack frame    movl    8(%ebp), %eax       # get a in eax    popl    %ebp                # get the original ebp value back    roll    $16, %eax           # swap the two halves of a and return it    ret

So do it wrong, via strange casts and get incorrect code, or by turning off strict-aliasing get inefficient code, or do it right and get efficient code.

You can also accomplish the same thing by using memcpy with char* to move the data around for the swap, and it will probably be as efficient. Wait, you ask me, how can that be? The will be at least two calls to memcpy added to the mix! Well gcc and other modern compilers have smart optimizers and will, in many cases, (including this one), elide the calls to memcpy. That makes it the most portable, and as efficient as any other method. Here's how it would look:

memcpy version, compliant to C and C++ specs and efficient

uint32_tswaphalves(uint32_t a){    uint16_t as16bit[2],tmp;    memcpy(as16bit, &a, sizeof(a));    tmp = as16bit[0];    as16bit[0] = as16bit[1];    as16bit[1] = tmp;    memcpy(&a, as16bit, sizeof(a));    return a;}

For the above code, a C compiler will generate code similar to the previous solution, but with the addition of two calls to memcpy (possibly optimized out). gcc generates code identical to the previous solution. You can imagine other variants that substitute reading and writing through a char pointer locally for the calls to memcpy.

Similar issues arrive from networking code where you don't know what type of packet you have until you examine it. unions and/or memcpy are your friends here as well.

The restrict keyword

In C99 and later C Standards, but not in any C++ you can promise the compiler that a pointer to something is not aliased with the restrict qualifier keyword. In a situation where the compiler would have to expect that things could alias, you can tell the compiler that you promise it will not be so. So in this:

void foo(int * restrict i1, int * restrict i2);

you're telling the compiler that you promise that i1 and i2 will never point at the same memory. You have to know well the implementation of foo and only pass into it things that will keep the promise that things accessed through i1 and i2 will never alias. The compiler believes you and may be able to do a better job of optimization. If you break the promise your mileage may vary (and by that I mean that you will almost certainly cry).

Current C++ Standards specify that when C libraries are used from C++ the restrict qualifier shall be omitted. restrict is not a keyword for C++ and is not part of the C++ Standard in any version. Nonetheless, as pointed out by Ian Mallett, many compilers, as a non-standard extension, allow the use of __restrict__ or __restrict as qualifiers. Since restrict is not a keyword of C++ you can't use it directly, but in g++, clang, and MSVC you can do something like #define restrict __restrict and accomplish the same thing. In spite of this, they are still required by the C++ Standard to omit the qualifier from linked C libraries. Use at your own risk;)
Let me know if it can be better

If you have comments, corrections, suggestions for improvement, or examples, feel free to email me.

Thanks,

Patrick Horgan
patrick at dbp-consulting dot com
Kudos and Thanks

Particular thanks go to people who participated in the discussion of this document on the boost-users and gcc-help mailing lists. In particular I'd like to thank Václav Haisman, Thomas Heller who wrote the memcpy version I use here and pointed out that it will generate exactly the same assembler, and Andrew Haley who pointed out a more portable way to define the union, and also pointed out that gcc will elide the calls to memcpy.

Thanks to Mike Dyckhoff for catching a thinko in an example. Additionally I'd like to thank Gabe Jones for catching a thinko, and Alex Markin who did the Russian translation:) Thanks to Ian Mallett who pointed out the availability of __restrict in many C++ compilers.

原文链接：http://dbp-consulting.com/tutorials/StrictAliasing.html

1 0