AT&T汇编-Brennan's Guide to Inline Assembly

来源:互联网 发布:合肥数据恢复 编辑:程序博客网 时间:2024/05/24 06:51

DJGPP is a complete 32-bit C/C++ development system for Intel 80386 (and higher) PCs running DOS. It includes ports of many GNU development utilities. The development tools require a 80386 or newer computer to run, as do the programs they produce. In most cases, the programs it produces can be sold commercially without license or royalties.
       HOME:http://www.delorie.com/djgpp/
FROM:http://www.delorie.com/djgpp/doc/brennan/brennan_att_inline_djgpp.html

by Brennan "Bas" Underwood

Document version 1.1.2.2

Ok. This is meant to be an introduction to inline assembly under DJGPP. DJGPP is based on GCC, so it uses the AT&T/UNIX syntax and has a somewhat unique method of inline assembly. I spent many hours figuring some of this stuff out and toldInfo that I hate it, many times.

Hopefully if you already know Intel syntax, the examples will be helpful to you. I've put variable names, register names and other literals inbold type.

The Syntax

So, DJGPP uses the AT&T assembly syntax. What does that mean to you?

  • Register naming:
    Register names are prefixed with "%". To reference eax:
    AT&T:  %eax  Intel: eax  
  • Source/Destination Ordering:
    In AT&T syntax (which is the UNIX standard, BTW) the source is always on theleft, and the destination is always on the right.
    So let's load ebx with the value in eax:
    AT&T:  movl %eax, %ebx  Intel: mov ebx, eax  
  • Constant value/immediate value format:
    You must prefix all constant/immediate values with "$".
    Let's load eax with the address of the "C" variable booga, which is static.
    AT&T:  movl $_booga, %eax  Intel: mov eax, _booga  
    Now let's load ebx with 0xd00d:
    AT&T:  movl $0xd00d, %ebx  Intel: mov ebx, d00dh  
  • Operator size specification:
    You must suffix the instruction with one of b, w, orl to specify the width of the destination register as a byte, word or longword. If you omit this, GAS (GNU assembler) will attempt to guess.You don't want GAS to guess, and guess wrong! Don't forget it.
    AT&T:  movw %ax, %bx  Intel: mov bx, ax  
    The equivalent forms for Intel is byte ptr, word ptr, anddword ptr, but that is for when you are...
  • Referencing memory:
    DJGPP uses 386-protected mode, so you can forget all that real-mode addressing junk, including the restrictions on which register has what default segment, which registers can be base or index pointers. Now, we just get 6 general purpose registers. (7 if you use ebp, but be sure to restore it yourself or compile with -fomit-frame-pointer.)
    Here is the canonical format for 32-bit addressing:
    AT&T:  immed32(basepointer,indexpointer,indexscale)  Intel: [basepointer + indexpointer*indexscale + immed32]  
    You could think of the formula to calculate the address as:
      immed32 + basepointer + indexpointer * indexscale  
    You don't have to use all those fields, but you do have to have at least 1 of immed32, basepointer and youMUST add the size suffix to the operator!
    Let's see some simple forms of memory addressing:

    • Addressing a particular C variable:
      AT&T:  _booga  Intel: [_booga]  
      Note: the underscore ("_") is how you get at static (global) C variables from assembler.This only works with global variables. Otherwise, you can use extended asm to have variables preloaded into registers for you. I address that farther down.
    • Addressing what a register points to:
      AT&T:  (%eax)  Intel: [eax]  

    • Addressing a variable offset by a value in a register:
      AT&T: _variable(%eax)  Intel: [eax + _variable]  

    • Addressing a value in an array of integers (scaling up by 4):
      AT&T:  _array(,%eax,4)  Intel: [eax*4 + array]  

    • You can also do offsets with the immediate value:
      C code: *(p+1) where p is a char *  AT&T:  1(%eax) where eax has the value of p  Intel: [eax + 1]  

    • You can do some simple math on the immediate value:
      AT&T: _struct_pointer+8  
      I assume you can do that with Intel format as well.
    • Addressing a particular char in an array of 8-character records:
      eax holds the number of the record desired. ebx has the wanted char's offset within the record.
      AT&T:  _array(%ebx,%eax,8)  Intel: [ebx + eax*8 + _array]  
    Whew. Hopefully that covers all the addressing you'll need to do. As a note, you can putesp into the address, but only as the base register.

Basic inline assembly

The format for basic inline assembly is very simple, and much like Borland's method.

asm ("statements");  

Pretty simple, no? So

asm ("nop");  

will do nothing of course, and

asm ("cli");  

will stop interrupts, with

asm ("sti");  

of course enabling them. You can use __asm__ instead of asm if the keyword asm conflicts with something in your program.

When it comes to simple stuff like this, basic inline assembly is fine. You can even push your registers onto the stack, use them, and put them back.

asm ("pushl %eax\n\t"       "movl $0, %eax\n\t"       "popl %eax");  

(The \n's and \t's are there so the .s file that GCC generates and hands to GAS comes out right when you've got multiple statements perasm.)
It's really meant for issuing instructions for which there is no equivalent in C and don't touch the registers.

But if you do touch the registers, and don't fix things at the end of yourasm statement, like so:

asm ("movl %eax, %ebx");  asm ("xorl %ebx, %edx");  asm ("movl $0, _booga");  

then your program will probably blow things to hell. This is because GCC hasn't been told that yourasm statement clobbered ebx and edx andbooga, which it might have been keeping in a register, and might plan on using later. For that, you need:

Extended inline assembly

The basic format of the inline assembly stays much the same, but now gets Watcom-like extensions to allow input arguments and output arguments.

Here is the basic format:

asm ( "statements" : output_registers : input_registers : clobbered_registers);  

Let's just jump straight to a nifty example, which I'll then explain:

asm ("cld\n\t"       "rep\n\t"       "stosl"       : /* no output registers */       : "c" (count), "a" (fill_value), "D" (dest)       : "%ecx", "%edi" );  

The above stores the value in fill_value count times to the pointerdest.

Let's look at this bit by bit.

asm ("cld\n\t"  

We are clearing the direction bit of the flags register. You never know what this is going to be left at, and it costs you all of 1 or 2 cycles.

     "rep\n\t"       "stosl"  

Notice that GAS requires the rep prefix to occupy a line of it's own. Notice also thatstos has the l suffix to make it move longwords.

     : /* no output registers */  

Well, there aren't any in this function.

     : "c" (count), "a" (fill_value), "D" (dest)  

Here we load ecx with count, eax withfill_value, and edi with dest. Why make GCC do it instead of doing it ourselves? Because GCC, in its register allocating, might be able to arrange for, say,fill_value to already be in eax. If this is in a loop, it might be able to preserveeax thru the loop, and save a movl once per loop.

     : "%ecx", "%edi" );  

And here's where we specify to GCC, "you can no longer count on the values you loaded intoecx or edi to be valid." This doesn't mean they will be reloaded for certain. This is the clobberlist.

Seem funky? Well, it really helps when optimizing, when GCC can know exactly what you're doing with the registers before and after. It folds your assembly code into the code it's generates (whose rules for generation lookremarkably like the above) and then optimizes. It's even smart enough to know that if you tell it to put (x+1) in a register, then if you don't clobber it, and later C code refers to (x+1), and it was able to keep that register free, it will reuse the computation. Whew.

Here's the list of register loading codes that you'll be likely to use:

a        eax  b        ebx  c        ecx  d        edx  S        esi  D        edi  I        constant value (0 to 31)  q,r      dynamically allocated register (see below)  g        eax, ebx, ecx, edx or variable in memory  A        eax and edx combined into a 64-bit integer (use long longs)  

Note that you can't directly refer to the byte registers (ah,al, etc.) or the word registers (ax, bx, etc.) when you're loading this way. Once you've got it in there, though, you can specifyax or whatever all you like.

The codes have to be in quotes, and the expressions to load in have to be in parentheses.

When you do the clobber list, you specify the registers as above with the%. If you write to a variable, you must include "memory" as one of The Clobbered. This is in case you wrote to a variable that GCC thought it had in a register. This is the same as clobbering all registers. While I've never run into a problem with it, you might also want to add"cc" as a clobber if you change the condition codes (the bits in theflags register the jnz, je, etc. operators look at.)

Now, that's all fine and good for loading specific registers. But what if you specify, say,ebx, and ecx, and GCC can't arrange for the values to be in those registers without having to stash the previous values. It's possible to let GCC pick the register(s). You do this:

asm ("leal (%1,%1,4), %0"       : "=r" (x)       : "0" (x) );  

The above example multiplies x by 5 really quickly (1 cycle on the Pentium). Now, we could have specified, sayeax. But unless we really need a specific register (like when usingrep movsl or rep stosl, which are hardcoded to useecx, edi, and esi), why not let GCC pick an available one? So when GCC generates the output code for GAS, %0 will be replaced by the register it picked.

And where did "q" and "r" come from? Well, "q" causes GCC to allocate from eax, ebx,ecx, and edx. "r" lets GCC also consideresi and edi. So make sure, if you use "r" that it would be possible to useesi or edi in that instruction. If not, use "q".

Now, you might wonder, how to determine how the %n tokens get allocated to the arguments. It's a straightforward first-come-first-served, left-to-right thing, mapping to the"q"'s and "r"'s. But if you want to reuse a register allocated with a"q" or "r", you use "0", "1", "2"... etc.

You don't need to put a GCC-allocated register on the clobberlist as GCC knows that you're messing with it.

Now for output registers.

asm ("leal (%1,%1,4), %0"       : "=r" (x_times_5)       : "r" (x) );  

Note the use of = to specify an output register. You just have to do it that way. If you want 1 variable to stay in 1 register for both in and out, you have to respecify the register allocated to it on the way in with the"0" type codes as mentioned above.

asm ("leal (%0,%0,4), %0"       : "=r" (x)       : "0" (x) );  

This also works, by the way:

asm ("leal (%%ebx,%%ebx,4), %%ebx"       : "=b" (x)       : "b" (x) );  

2 things here:

  • Note that we don't have to put ebx on the clobberlist, GCC knows it goes into x. Therefore, since it can know the value ofebx, it isn't considered clobbered.
  • Notice that in extended asm, you must prefix registers with %% instead of just%. Why, you ask? Because as GCC parses along for %0's and %1's and so on, it would interpret %edx as a %e parameter, see that that's non-existent, and ignore it. Then it would bitch about finding a symbol named dx, which isn't valid because it's not prefixed with % and it's not the one you meant anyway.

Important note: If your assembly statement must execute where you put it, (i.e. must not be moved out of a loop as an optimization), put the keywordvolatile after asm and before the ()'s. To be ultra-careful, use

__asm__ __volatile__ (...whatever...);  

However, I would like to point out that if your assembly's only purpose is to calculate the output registers, with no other side effects, you should leave off thevolatile keyword so your statement will be processed into GCC's common subexpression elimination optimization.

Some useful examples

#define disable() __asm__ __volatile__ ("cli");    #define enable() __asm__ __volatile__ ("sti");  

Of course, libc has these defined too.

#define times3(arg1, arg2) \  __asm__ ( \    "leal (%0,%0,2),%0" \    : "=r" (arg2) \    : "0" (arg1) );    #define times5(arg1, arg2) \  __asm__ ( \    "leal (%0,%0,4),%0" \    : "=r" (arg2) \    : "0" (arg1) );    #define times9(arg1, arg2) \  __asm__ ( \    "leal (%0,%0,8),%0" \    : "=r" (arg2) \    : "0" (arg1) );  

These multiply arg1 by 3, 5, or 9 and put them in arg2. You should be ok to do:

times5(x,x);  

as well.

#define rep_movsl(src, dest, numwords) \  __asm__ __volatile__ ( \    "cld\n\t" \    "rep\n\t" \    "movsl" \    : : "S" (src), "D" (dest), "c" (numwords) \    : "%ecx", "%esi", "%edi" )  

Helpful Hint: If you say memcpy() with a constant length parameter, GCC will inline it to arep movsl like above. But if you need a variable length version that inlines and you're always moving dwords, there ya go.

#define rep_stosl(value, dest, numwords) \  __asm__ __volatile__ ( \    "cld\n\t" \    "rep\n\t" \    "stosl" \    : : "a" (value), "D" (dest), "c" (numwords) \    : "%ecx", "%edi" )  

Same as above but for memset(), which doesn't get inlined no matter what (for now.)

#define RDTSC(llptr) ({ \  __asm__ __volatile__ ( \          ".byte 0x0f; .byte 0x31" \          : "=A" (llptr) \          : : "eax", "edx"); })  

Reads the TimeStampCounter on the Pentium and puts the 64 bit result into llptr.


原创粉丝点击
热门问题 老师的惩罚 人脸识别 我在镇武司摸鱼那些年 重生之率土为王 我在大康的咸鱼生活 盘龙之生命进化 天生仙种 凡人之先天五行 春回大明朝 姑娘不必设防,我是瞎子 深圳国税公众号预约取号公司怎么办 社保买了停了2年怎么办 qq号被冻结申请不回来了怎么办 qq号被冻结 手机密保忘了怎么办 微信账号封了2天怎么办 买qq号被申诉找回了怎么办 收到了京东白条的催款通知单怎么办 成都买房社保不够两年怎么办18年 电话号码给人设置成骚扰电话怎么办 找不到领导电话不接短信不回怎么办 微信账号被别人手机号冻结了怎么办 微信冻结了手机号也让人换了怎么办 顺丰快递拒收退回丢件了怎么办 京东买东西快递电话没有听到怎么办 在京东购物自己电话号输错了怎么办 北京房子卖了户口没地方迁怎么办 微信弄丢了微信密码找不到了怎么办 微信背人用过找不到密码怎么办 超市的微信支付宝收付款怎么办 办理联华超市的会员储蓄卡怎么办 卡杰文具密码本如果忘记密码怎么办 火狐浏览器阻止要访问的网页怎么办 点我达被永久停用了怎么办 刚下的软件点开系统显示停用怎么办 红酒洋酒啤酒一起喝胃不舒服怎么办 儿子13岁初一不想读书了怎么办 微信不小心点了注册新账号怎么办 在京东买东西商家不发货怎么办 在京东买东西坏了商家不退货怎么办 苯扎氯铵溶液不小心喝了一口怎么办 苯扎氯铵溶液没有稀释就用了怎么办 牛油果切开了但是没熟怎么办 手机安装程序时解析包出错怎么办 因俩人不合适分手了很难受怎么办 中考结束后成绩不好的该怎么办 在京东自营药房买药没有处方怎么办 平安普惠账号不可以注销怎么办? 京东购物非自营货没到降价了怎么办 实体店商家不承认卖的是假货怎么办 衣服上的装饰圆扣掉下来了怎么办 没在京东买东西却收到退款怎么办