嵌入式汇编基本语法

来源:互联网 发布:陕西大数据集团张鹏 编辑:程序博客网 时间:2024/06/05 14:59

    Basic inline assembly

The format for basic inline assembly is very simple, and much like
Borland's method.

asm ("statements");

Pretty simple, no? So

asm ("nop");

will do nothing of course, and

asm ("cli");

will stop interrupts, with

asm ("sti");

of course enabling them. You can use *__asm__* instead of *asm* if the
keyword *asm* conflicts with something in your program.

When it comes to simple stuff like this, basic inline assembly is fine.
You can even push your registers onto the stack, use them, and put them
back.

asm ("pushl %eax\n\t"
     "movl $0, %eax\n\t"
     "popl %eax");

(The \n's and \t's are there so the *.s* file that GCC generates and
hands to GAS comes out right when you've got multiple statements per *asm*.)
It's really meant for issuing instructions for which there is no
equivalent in C and don't touch the registers.

But if you /do/ touch the registers, and don't fix things at the end of
your *asm* statement, like so:

asm ("movl %eax, %ebx");
asm ("xorl %ebx, %edx");
asm ("movl $0, _booga");

then your program will probably blow things to hell. This is because GCC
hasn't been told that your *asm* statement clobbered *ebx* and *edx* and
*booga*, which it might have been keeping in a register, and might plan
on using later. For that, you need:


    Extended inline assembly

The basic format of the inline assembly stays much the same, but now
gets Watcom-like extensions to allow input arguments and output arguments.

Here is the basic format:

asm ( "statements" : output_registers : input_registers : clobbered_registers);

Let's just jump straight to a nifty example, which I'll then explain:

asm ("cld\n\t"
     "rep\n\t"
     "stosl"
     : /* no output registers */
     : "c" (count), "a" (fill_value), "D" (dest)
     : "%ecx", "%edi" );

The above stores the value in *fill_value* *count* times to the pointer
*dest*.

Let's look at this bit by bit.

asm ("cld\n\t"

We are clearing the direction bit of the *flags* register. You never
know what this is going to be left at, and it costs you all of 1 or 2
cycles.

     "rep\n\t"
     "stosl"

Notice that GAS requires the *rep* prefix to occupy a line of it's own.
Notice also that *stos* has the *l* suffix to make it move *longwords*.

     : /* no output registers */

Well, there aren't any in this function.

     : "c" (count), "a" (fill_value), "D" (dest)

Here we load *ecx* with *count*, *eax* with *fill_value*, and *edi* with
*dest*. Why make GCC do it instead of doing it ourselves? Because GCC,
in its register allocating, might be able to arrange for, say,
*fill_value* to already be in *eax*. If this is in a loop, it might be
able to preserve *eax* thru the loop, and save a *movl* once per loop.

     : "%ecx", "%edi" );

And here's where we specify to GCC, "you can no longer count on the
values you loaded into *ecx* or *edi* to be valid." This doesn't mean
they will be reloaded for certain. This is the clobberlist.

Seem funky? Well, it really helps when optimizing, when GCC can know
exactly what you're doing with the registers before and after. It folds
your assembly code into the code it's generates (whose rules for
generation look /remarkably/ like the above) and then optimizes. It's
even smart enough to know that if you tell it to put (x+1) in a
register, then if you don't clobber it, and later C code refers to
(x+1), and it was able to keep that register free, it will reuse the
computation. Whew.

Here's the list of register loading codes that you'll be likely to use:

a        eax
b        ebx
c        ecx
d        edx
S        esi
D        edi
I        constant value (0 to 31)
q,r      dynamically allocated register (see below)
g        eax, ebx, ecx, edx or variable in memory
A        eax and edx combined into a 64-bit integer (use long longs)

Note that you can't directly refer to the byte registers (*ah*, *al*,
etc.) or the word registers (*ax*, *bx*, etc.) when you're loading this
way. Once you've got it in there, though, you can specify *ax* or
whatever all you like.

The codes /have/ to be in quotes, and the expressions to load in /have/
to be in parentheses.

When you do the clobber list, you specify the registers as above /with/
the *%*. If you write to a variable, you /must/ include *"memory"* as
one of The Clobbered. This is in case you wrote to a variable that GCC
thought it had in a register. This is the same as clobbering all
registers. While I've never run into a problem with it, you might also
want to add *"cc"* as a clobber if you change the condition codes (the
bits in the *flags* register the *jnz*, *je*, etc. operators look at.)

Now, that's all fine and good for loading specific registers. But what
if you specify, say, *ebx*, and *ecx*, and GCC can't arrange for the
values to be in those registers without having to stash the previous
values. It's possible to let GCC pick the register(s). You do this:

asm ("leal (%1,%1,4), %0"
     : "=r" (x)
     : "0" (x) );

The above example multiplies x by 5 really quickly (1 cycle on the
Pentium). Now, we could have specified, say *eax*. But unless we really
need a specific register (like when using *rep movsl* or *rep stosl*,
which are hardcoded to use *ecx*, *edi*, and *esi*), why not let GCC
pick an available one? So when GCC generates the output code for GAS, %0
will be replaced by the register it picked.

And where did *"q"* and *"r"* come from? Well, *"q"* causes GCC to
allocate from *eax*, *ebx*, *ecx*, and *edx*. *"r"* lets GCC also
consider *esi* and *edi*. So make sure, if you use *"r"* that it would
be possible to use *esi* or *edi* in that instruction. If not, use *"q"*.

Now, you might wonder, how to determine how the *%n* tokens get
allocated to the arguments. It's a straightforward
first-come-first-served, left-to-right thing, mapping to the *"q"*'s and
*"r"*'s. But if you want to reuse a register allocated with a *"q"* or
*"r"*, you use *"0"*, *"1"*, *"2"*... etc.

You don't need to put a GCC-allocated register on the clobberlist as GCC
knows that you're messing with it.

Now for output registers.

asm ("leal (%1,%1,4), %0"
     : "=r" (x_times_5)
     : "r" (x) );

Note the use of *=* to specify an output register. You just have to do
it that way. If you want 1 variable to stay in 1 register for both in
and out, you have to respecify the register allocated to it on the way
in with the *"0"* type codes as mentioned above.

asm ("leal (%0,%0,4), %0"
     : "=r" (x)
     : "0" (x) );

This also works, by the way:

asm ("leal (%%ebx,%%ebx,4), %%ebx"
     : "=b" (x)
     : "b" (x) );

2 things here:

  * Note that we don't have to put *ebx* on the clobberlist, GCC knows
    it goes into x. Therefore, since it can know the value of *ebx*, it
    isn't considered clobbered.
  * Notice that in extended asm, you must prefix registers with *%%*
    instead of just *%*. Why, you ask? Because as GCC parses along for
    %0's and %1's and so on, it would interpret %edx as a %e parameter,
    see that that's non-existent, and ignore it. Then it would bitch
    about finding a symbol named dx, which isn't valid because it's not
    prefixed with *%* and it's not the one you meant anyway.

*Important note:* If your assembly statement /must/ execute where you
put it, (i.e. must not be moved out of a loop as an optimization), put
the keyword *volatile* after *asm* and before the ()'s. To be
ultra-careful, use

__asm__ __volatile__ (...whatever...);

However, I would like to point out that if your assembly's only purpose
is to calculate the output registers, with no other side effects, you
should leave off the *volatile* keyword so your statement will be

processed into GCC's common subexpression elimination optimization.


Some useful examples

#define disable() __asm__ __volatile__ ("cli");

#define enable() __asm__ __volatile__ ("sti");

Of course, *libc* has these defined too.

#define times3(arg1, arg2) \
__asm__ ( \
  "leal (%0,%0,2),%0" \
  : "=r" (arg2) \
  : "0" (arg1) );

#define times5(arg1, arg2) \
__asm__ ( \
  "leal (%0,%0,4),%0" \
  : "=r" (arg2) \
  : "0" (arg1) );

#define times9(arg1, arg2) \
__asm__ ( \
  "leal (%0,%0,8),%0" \
  : "=r" (arg2) \
  : "0" (arg1) );

These multiply arg1 by 3, 5, or 9 and put them in arg2. You should be ok
to do:

times5(x,x);

as well.

#define rep_movsl(src, dest, numwords) \
__asm__ __volatile__ ( \
  "cld\n\t" \
  "rep\n\t" \
  "movsl" \
  : : "S" (src), "D" (dest), "c" (numwords) \
  : "%ecx", "%esi", "%edi" )

Helpful Hint: If you say *memcpy()* with a constant length parameter,
GCC will inline it to a *rep movsl* like above. But if you need a
variable length version that inlines and you're always moving dwords,
there ya go.

#define rep_stosl(value, dest, numwords) \
__asm__ __volatile__ ( \
  "cld\n\t" \
  "rep\n\t" \
  "stosl" \
  : : "a" (value), "D" (dest), "c" (numwords) \
  : "%ecx", "%edi" )

Same as above but for *memset()*, which doesn't get inlined no matter
what (for now.)

#define RDTSC(llptr) ({ \
__asm__ __volatile__ ( \
        ".byte 0x0f; .byte 0x31" \
        : "=A" (llptr) \
        : : "eax", "edx"); })

Reads the TimeStampCounter on the Pentium and puts the 64 bit result
into llptr.