XScale alignment

来源:互联网 发布:手机优酷网络连接异常 编辑:程序博客网 时间:2024/06/06 20:32

转载from:

http://lecs.cs.ucla.edu/wiki/index.php/XScale_alignment

XScale alignment

From CSL Wiki

Jump to: navigation, search

Contents

[hide]
  • 1 The Problem
    • 1.1 Why silently ignoring unaligned memory accesses can be a problem
    • 1.2 An Example
  • 2 Solutions
    • 2.1 Outline
    • 2.2 Identifying alignment problems
      • 2.2.1 Use gcc
      • 2.2.2 Use the kernel
    • 2.3 Rewrite your code
      • 2.3.1 Add padding
      • 2.3.2 Just rewrite the code (best solution!)
    • 2.4 Use the packed attribute
    • 2.5 Use the aligned attribute (second best solution!)
    • 2.6 Have the kernel find the problem for you
      • 2.6.1 0 - ignore
      • 2.6.2 1 - warn
      • 2.6.3 2 - fixup
      • 2.6.4 3 - fixup+warn
      • 2.6.5 4 - signal
      • 2.6.6 5 - signal+warn
  • 3 Beyond this document

[edit]

The Problem

A nice explanation of how arm/xscale only does word accesses forread and writes, and how things get mixed up when you try to do loadsand stores with pointers not on the word boundaries.

A nice intro about how certain programing styles lead to situations like below.

The below example is a little contrived, but in certain styles of programming (embedded, network), it can be common.

This document is a work in progress... any and allsuggestion/criticism are welcome (send them to mlukac atlecs.cs.ucla.edu). Feel free to make minor changes in wording and alittle in order, but if you want to make major changes like removingsections or seriously reodering stuff, please warn me before hand.

[edit]

Why silently ignoring unaligned memory accesses can be a problem

If neither you or the OS is fixing unaligned memory accesses, this is the kind of behavior you are likely to see:

If the contents of memory look like this:

 
memory address 0 1 2 3 4 5 .....
(bytes) +----+----+----+----+----+
memory contents |0x0a|0x0b|0x0c|0x0d|0x0e| .....
+----+----+----+----+----+

If you do a 32-bit wide read starting from byte 1, you want to see0xe0d0c0b on a little endian processor, the contents of the 4contiguous bytes starting from address 1.

However, what you will actually read is 0xa0d0c0b on a littleendian processor, the contents of the 32-bit aligned memory, startingfrom address 0.

In other words, the problem will look like memory corruption(the CPU will not return the data which is 'really in' the address youspecify), and if you are reading pointers from unaligned memory, it cancause segmentation faults later on if that pointer is dereferenced.

[edit]

An Example

#include <stdio.h>
#include <inttypes.h>
#include <stdlib.h>

#define DATA_SIZE 20

typedef struct _bar {
int8_t data1;
int8_t data2[DATA_SIZE];
} bar_t;

typedef struct _foo {
char *b;
} foo_t;

int main()
{
bar_t bar = {};

// foo points to a chunk of memory that is *not* 32-bit aligned
foo_t *foo = (foo_t *)(bar.data2);

// good_foo is a 'valid' pointer, pointing to a chunk of 32-bit aligned memory
char *good_foo = (char *) malloc(sizeof(char));

// assign b to good_foo (so b is pointing to valid, aligned memory, but it is not itself 32-bit aligned)
foo->b = good_foo;

printf("/n");
printf("sizeof(bar)=%d, sizeof(foo)=%d/n",
sizeof(bar_t), sizeof(foo_t));
printf("/n");
printf("bar is at mem location %p/n", &bar);
printf("bar.data2 is at mem location %p/n", bar.data2);
printf("foo is at mem location %p/n", foo);
printf("... so foo is potentially now not on a word (4 byte) boundary/n/n");
printf("foo->b=%p should equal good_foo=%p/n", foo->b, good_foo);
printf("If they are not, dereferecing foo->b would most likely cause a segfault/n");

printf("/n/n");
return 0;
}
[edit]

Solutions

[edit]

Outline

  • Identifying alignment problems
  • Rewrite your code
  • Use the packed attribute
  • Use the aligned attribute
  • Have the kernel fix the alignment problem<
[edit]

Identifying alignment problems

[edit]

Use gcc

gcc can help you identify alignment problems. I have not tested itextensivly, but it seems to work for the relativly simple alignmentproblems like the above example. Add a -Wcast-align to your compileflags will tell gcc to print a warning whenever it thinks there will bean alignment problem. From the gcc man page:

      -Wcast-align
Warn whenever a pointer is cast such that the required alignment of the target is
increased. For example, warn if a "char *" is cast to an "int *" on machines where
integers can only be accessed at two- or four‐byte boundaries.

Other flags that might be helpful:

      -Wpadded
Warn if padding is included in a structure, either to align an element of the structure
or to align the whole structure. Sometimes when this happens it is possible to
rearrange the fields of the structure to reduce the padding and so make the structure
smaller.
      -Wpacked
Warn if a structure is given the packed attribute, but the packed attribute has no
effect on the layout or size of the structure. Such structures may be mis-aligned
for little benefit. For instance, in this code, the variable "f.x" in "struct bar"
will be misaligned even though "struct bar" does not itself have the packed attribute:
struct foo {
int x;
char a, b, c, d;
} __attribute__((packed));
struct bar {
char z;
struct foo f;
};
[edit]

Use the kernel

The arm-linux kernel can also identify alignment problems. Everytime there is any unaligned access to memroy, there is an alignmenttrap. The arm-linux kernel allows you to specify how to deal with thealignment trap. The default is to silently ignore the unaligned access.Other options are to have the kernel print when there is an unalignedaccess, send a signal to the process (typically this will kill theprocess), or to fix the alignment problem (at the cost of a few moreinstructions). The printing or signaling methods can be used to helpidentify the problems. For more information on how to use these kernelfeatures, please see the the below sections.

[edit]

Rewrite your code

[edit]

Add padding

One quick fix is to manually pad your structs to align all theimportant data members on the 4 byte boundaries. This way, when you doassignment and cast the structs around, you will always be usingcorrectly aligned addresses. You must keep in mind though that this isnot the best possible fix beacause it will become an inconvienience: asyour code develops, the structs and the meaning/use of the structs maychange, so the padding will have to change.

Padding can be applied in the example above. In the bar struct, 3 int8_t's can be added after the first data members as follows:

typedef struct _bar {
int8_t data1;
int8_t __pad1[3];
int8_t data[DATA_SIZE];
} bar_t;

This will force data2 to sit on the 4 byte boundary. So, theassignment to the foo data structure will not have any alignmentproblems.

[edit]

Just rewrite the code (best solution!)

A more difficult but long term solution is rewrite your code so thatyou do not need to cast structs around. Well-designed network protocols(like the IP stack) already do this; all the headers are carefullyaligned on 32-bit boundaries. However, this is not always possible; forexample custom protocols or protocols designed for 8 or 16-bitarchitectures will have no alignment issues on those architectures butcan have problems on 32-bit platforms. And of course a rewrite issometimes not practical, because the amount of code that needs to berewritten (and then tested) can be very large. It is always easier todesign things with alignment in mind (like IP) than to try to rectifythe problem later. -- add something more about this not always beingpossible without large rewrites and something about the usual networkprograming styles usually have the patterns that lead to alignmentproblems.

The arm-linux kernel has a feature that can help you identifyany alignment issues that you may have. The kernel can print a messageon the stargate serial console whenever an unaligned access isperformed. For instructions on how to do this, please see the sectionsbelow.

[edit]

Use the packed attribute

By default, gcc pads structs to align the data members on 4 byteboundaries. This means that structs may appear larger than the bytecount of the data members. For instance, if you create a struct with anint8_t and a char*, sizeof will return 8 bytes for the size of thestruct and not 5. This is beacuse padding was added between the twodata members. gcc will also pad out struct if they fall short of a 4byte boundary, for instance when running the example code above thesize of bar is 24, not 21, even though there is no padding betweendata1 and data2 members. There is no padding between data1 and data2beacuse the type of the data2 array is only one byte long. Because ofthis, the auto padding done by gcc is not good enough to fix thealignment problems.

The packed attribute which can be added to structs provides twousefull features, one of which helps solve the alignment problems. Thefirst is more relative for cross platform network programing: thepacked attribute prevents gcc from adding any padding to the structs,essentially preventing gcc from attempting to fix the alignmentproblems with padding. The second feature is that gcc will add in theextra code to properly deal with the misalligned memory accesses thatare created by not introducting padding to attempt to align the datamembers. If applied to the correct structs, this can fix the alignmentproblems at the cost of extra instructions introduced by gcc for eachaccess to the data members of that struct.

To fix the example code above, the foo_t struct definition would now look like:

typedef struct _foo {
char *b;
} __attribute__ ((packed)) foo_t;

Since this struct has only one data member, the size remains thesame, however for any memory accesses that use this struct, the extrainstructions that gcc added will rotate the bytes correctly so thereare no alignment proglems.

Using the packed attribute does have the drawback of addingextra instruction to every access to the data members of the packedstructs.

You can actually see the extra instructions by telling gcc to output the assembly for the example code above with the -S flag.

A common question is, should I just use the packed attributeeverywhere? I do not know the answer to this besides saying that it mayslow down your program. Maybe someone else can fill us in.

[edit]

Use the aligned attribute (second best solution!)

The aligned attribute is added to individual data members to tellgcc to add enough padding before the data member to make it sit on thespecified word boundary. To explain this better, consider the examplecode above. To fix the alignment problem using the aligned attribute,we would add the alinged attribute with a multiple of 4 as theparameter to the bar struct. So, it would look like:

typedef struct _bar {
int8_t data1;
int8_t data2[DATA_SIZE] __attribute__ ((aligned(4)));
} bar_t;

This is equivalent to adding 3 bytes of padding between the data1and data2 members as in a previous example, except that gcc willautomatically do it for you so it removes the management overhead.

The paramter to the aligned attribute specifies which byteboundary the data member should be padded to. For arm based processorsit only makes sense to use 4 (the arm word size is 32 bits), sinceanything smaller could lead to more alignment issues and anythinglarger will waste memory.

If rewriting your code is not possible, this is an idealsolution. This solution also does not add extra instructions to yourcode as the packed attribute does.

[edit]

Have the kernel find the problem for you

The arm-linux kernel provides a proc interface which providesinformation about the number of unaligned accesses as well as theability to change the kernel behavior on unaligned accesses. The procfile is:

   /proc/cpu/alignment

Simply 'cat'ing the file will give you the number of unalignedaccesses by user space programs, by the kernel, and the number of thetypes of unaligned accesses:

stargate-79:~# cat /proc/cpu/alignment 
User: 30
System: 1183781
Skipped: 0
Half: 755121
Word: 428660
Multi: 0
User faults: 0 (ignored)
stargate-79:~#

Any time there is unaligned access to memory, an alignment traphappens. The alignment proc file allows you to specify how the kernelbehaves when an alignment trap happens. To set the different behaviorsbelow, just echo the number to the alignment proc file. For instance,if I wanted to set the kernel to just give a warning whenever there isan alignment problem, I would type the following command at theconsole:

echo 1 > /proc/cpu/alignment

The following is a description of vareious bahviors the kernel has to deal with alignment traps:

[edit]

0 - ignore

This is the default behavior compiled in the arm-linux kernel. Allalignment traps are ignored by the kernel, and no attempt is made tonotify the user that there is a problem except for keep track of thenumber of traps.

[edit]

1 - warn

In this mode, the kernel prints an error that there was an alignmenttrap. The error typically comes to the serial console, but can bedirected to various log files using syslogd or a similar program. Theerror message is only usefull to identify if your code actually hasalignment issues.

Alignment trap: align (32124) PC=0x00008478 Instr=0xe5823000 Address=0xbffffc85 Code 0xffffffff
Alignment trap: align (32124) PC=0x000084dc Instr=0xe5931000 Address=0xbffffc85 Code 0x00
[edit]

2 - fixup

In this mode, the kernel fixes the alignment for all unalignedaccesses. This does introduc extra overhead just like the packedatribute does,

[edit]

3 - fixup+warn

This is equivalent to having mode 1 and mode 2 on at the same time,so the kernel fixes the alignment and prints the alignment trap to theconsole.

[edit]

4 - signal

In this mode, the kernel sends a SIGBUS signal to the processes.Unless you have implemented a signal handler, your process will bekilled and 'Bus Error' will be printed to the console. Combining thismode with running a debugger is the most usefull for finding theunaligned accesses in your code.

[edit]

5 - signal+warn

This is equivalent to having mode 1 and mode 4 on at the same time,so the kernel sends a signal to the process and prints the alignmenttrap to the console.

If you are interested, the kernel code that creates the procfile as well as the alignment trap handler is located inarch/arm/mm/alignment.c

[edit]

Beyond this document

Hopefully after reading this document you understand the alignment issues on the XScale platorm and have an idea

There are some other things that may be usefull and orinteresting to look into. In particular gcc has some flags which mayhelp you debug problems, redesign or optimize your code are: -Wpacked-Wpadded -malignment-traps -mno-alignment-traps