Sparc汇编指令介绍

来源:互联网 发布:体育废 知乎 编辑:程序博客网 时间:2024/05/01 19:00

SPARCCPU指令集架构的一种,其设计的目标是优化的编译和易用的流水线硬件执行。

SPARC指令集有以下主要特点:

    1. 线性的32位地址空间

    2. 精简的指令格式

所有指令都是32位宽和以32为分界对齐排列的。只有3个基本指令的格式--它们是非统一的操作数位置和寄存器地址域。特别要注意的是:只有loadstore指令能访问memoryI/O

    3. 很少的几个地址模式,总之比x86指令的地址模式少得多

要么是"register+register",要么是"register+immediate"

    4. 三个一组的寄存器地址

很多指令操作是基于2个寄存器操作数,和一个存放运算结果的寄存器。

例如:add %1,%2,%3 !%1+%2->%3

    5. A large “windowed” register file — At any one instant, a program sees 8 global integer registers plus a 24-register window into a larger register file. The windowed registers can be described as a cache of procedure arguments, local values, and return addresses(这里的register file就是寄存器文件,寄存器是寄存器文件的简称。感谢CU的jamesr告诉我这一点)

    6. A separate floating-point register file

 通过软件将(浮点数)配置成32单精度(32bit)16双精度(64bit)8四倍精度寄存器(128bit),或将它们混合。

    7. 延时控制传输----处理器常常是在一个延时控制传输指令后提取下一个指令。依靠控制传输指令的"annul"位判断是否执行它。

    8. 多处理器同步指令----一个指令执行一个原子的‘读-然后-设置存储’的操作;另一个执行原子的‘寄存器与存储器交换’的操作。

    9. 协处理器 (由于我使用的芯片上没有,所以关于它的指令就不介绍了 )


SPARC寄存器组

寄存器是位于cpu片上的存储单元。在SPARC架构中有32个通用的integer寄存器和32个浮点寄存器。在此之前,强烈推荐Peter Magnusson的一篇入门文章《Understanding stacks and registers in the Sparc architecture(s)》。

1、Interger通用寄存器组

    32个通用的integer寄存器名为%r0%r01%r2,… %r31。(由于它们在程序中运行的不同目的,又给它们起了各自的别名,peter哥的文章有介绍)

2、浮点寄存器组

    它们是%f0 .. %f31,通常用于支持实数。它们可用于成对和成组(4的很大的一个数的存储。

3、专用寄存器组

    %psr处理器状态寄存器(Processor State Register)

    %wim 窗口无效屏蔽寄存器(Window Invalid Mask Register)

    %tbr Trap基址寄存器(Trap Base Register)

    %y Y寄存器(Y register)

    %y寄存器用于乘法和除法中。在除法中,暂存被除数的高32位有效位;

在乘法中,暂存乘积的高32位有效位;

    %fsr 浮点状态寄存器(Floating-Point State Register)

    %csr 协处理器状态寄存器(Coprocessor-State Register)

    %fq 浮点指针队列寄存器(Floating-Point Queue)

    %cq 协处理器队列寄存器(Coprocessor Queue)

    %hi%lo 用于一元操作,截取操作数的高22位和低10位。 (一般用来配合sethi指令)


指令分类及指令格式

1.指令分类及简介

SPARC指令集可以分为6大组类:load/store指令、整数运算指令、控制传输(CTI)指令、读/写控制寄存器指令、浮点操作和协处理器操作指令。

2.指令格式

SPARC指令使用的基本格式如下:

Opcode {rs1, rs2(imm), rd}

操作数的个数根据不同类型的指令而不同,由0~3个不等。如nop,是无操作数的。

基本指令格式说明指今格式中所用的英文编写符号说明如下:

Opcode 操作码,指令助记符,如LDSTADD

Rs1: 源操作数寄存器1内容。

Rs2: 源操作数寄存器2内容。

Imm: 立即数

Rd: 目标寄存器

基本格式中“{}”和“()”的说明:

“{ }”内的项是项是可选的,例如,{rs1, rs2, rd}为指令操作数可能没有或有1~3个。

    “( )”表示或者,如“rs2(imm)”为要么是寄存器操作数,要么是立即数。        3.格式使用举例

    指令格式举例如下:

    add %l2, %l3, %g3 ! 本地寄存器%l2,加上%l3,结果存入到全局寄存器%g3

                     ! 

    ld [%g3], %fsr ! 读取g3地址的存储单元内容,放入到浮点状态寄存

                  ! fsr

    st %fsr, [%o2] ! fsr中的内容,存到输出寄存器%o0内容所指地址

                 ! 的存储单元中。

    subcc %g0, %o2, %g0 ! %g0%o2的减法运算,并对状态寄存器的条件码作

                       ! 相应操作

    bne ,a s1f ! 分支指令,如果%o2不为零,则跳到标识符为s1处。


好了,经过上面那么枯燥的介绍,下面来点更枯燥的。

SPARC寻址方式

    寻址方式是根据指令编码中给出的地址码字段来寻找真实操作数的方式。SPARC处理器支持的基本寻址方式很简单,其存储器地址的给出,要么是"register+register"要么是"register+immediate"。基本的寻址方式有以下几种。

1.立即寻址

    立即寻址也称为立即数寻址,是一种特殊的寻址方式。操作数是直接通过指令给出,数据就包含在指令的32位编码中,只要取出指令就可在指令执行时得到立即操作数。例如指令:

add %o1, 1, %o1 ! %o1 ← %o1 + 1

and %o2,0x0f,%o3 ! %o3 ← %o2 AND “0x0f”

    需要注意的是,在SPARC体系中,只有第二个操作数才会用到立即寻址方式。那么,SPARC处理器是如何识别立即数的呢,这要结合SPARC指令编码格式来说明(参看V8手册 )。其实,在有两个源操作数的指令编码格式中,有一位“i”标志符起了决定性的作用。用于为(整数)算术运算和load/store指令选择第二个运算器操作数。如果 i = 0,操作数是寄存器r[rs2];如果 i = 1,操作数是有符号立即数simm13,符号扩展从 13~32 bit

2.立即数的值

    另外值得注意的是立即数的取值范围,在SPARC体系中,依不同的指令类型而不同。并用于不同的寻址模式中。

    imm7:取值范围在 -64 ~ 127 之间的立即数(用7位表示,有符号或无符号)。常用于(trap)跟踪类型指令的偏址寻址方式中。

    uimm7:取值范围在0 ~ 127之间的立即数(用7位表示,无符号)。用于偏址寻址方式中。

    simm13:取值范围在 -4096 ~ 4095之间的立即数(用13位表示,有符号)。当i=1,用于算术运算指令或load/store指令的第二个运算器操作数。

    const22:用22位描述的常量。

    asi地址空间标识符,一个取值范围在0 ~ 255的立即数常量(用8位表示,无符号)。

3.寄存器寻址

    SPARC指令集中除了立即数寻址外,还有一种常用的是寄存器直接寻址。寄存器寻址利用寄存器中的数值作为操作数,指令中地址码给出的是寄存器编号。例如:

    Add %l0, %l1, %l2 ! %l2 ← %l0 + %l1

    本指令将2个寄存器(%l0和 %l1 )的内容相加,结果放人第3个寄存器%l2中。必须注意写操作数的顺序:第1个是第1寄存器,然后是第2操作数寄存器,最后是结果寄存器。

4.寄存器间接寻址

    寄存器间接寻址利用一个寄存器的值作为存储器地址,在指定的寄存器中存放有效地址,而操作数则放在存储单元中。这个寄存器用“[ ]”括起来,表示内容。

    例如:

        ld [%o2], %f0 ! %f0 ← mem[%o2]

        st %fsr, [%o0] ! mem[%o0] ← %fsr

    第l条指今将寄存器%o2指向的地址存储器单元的内容加载到寄存器%f0中。第2条指令将寄存器%fsr的内容存储到寄存器%o0指向的地址存储单元中。

5.基址加偏址寻址

    基址加偏址寻址也称为变址寻址,就是将基址寄存的内容与指令中给出的偏移量相加,形成存储器的有效地址,用于访问基址附近的存储器单元。寄存器间接寻址实质是偏移量为0的基址加偏移寻址,这种寻址方式有很高的执行效率且编程技巧很高,可编出短小但功能强大的汇编程序。

    指令可在系统存储器合理的范围内基址上加上不超过7位的偏移量(立即数时为7位偏移量)来计算传送地址。

    例如:

        ld [%g1 + 96], %o1 ! %o1 ← mem[%g1 + 96]

    这条指令把基址%g1的内容加上位移量为96,所指向的存储单元的内容送到寄存器%o1中。

6.偏移地址

    在以上的例子中基址寄存器的地址偏移一直是一个立即数。它同样可以是另一个寄存器,并且在加到基址寄存器前还可经过移位操作。

    有些指令使用到这种址寻址方式,语法定义如下:

    偏移地址可以如下方式给出:

reg1(等价于:reg + %g0

reg1 + reg2寄存器 寄存器

reg1 + simm13寄存器 + (13)立即数

reg1 - simm13寄存器 - (13)立即数

simm13(等价于:%g0 + simm13

simm13 + reg1(等价于:simm13 + reg

例如:

ld [%o0 + %i0], %o2 ! %o2 ← mem[%o0 + %i0]

ld [%l2 - 0x40], %o2 ! %o2 ← mem[%l2 - 0x40]

ld [0x40 + %l2], %o2 ! %o2 ← mem[0x40 + %l2]

7.相对寻址

    相对寻址可认为是基地址为程序计数器PC的变址寻址,偏移量指出了目的地址与现行指令之问的相对位置,偏移量与PC提供的基地址相加后得到有效的目的地址。

例如:

add %g1, %g2, %g1

ba SUM3 !转移到SUM3

sub %g3, 1, %g3

SUM3: !子程序入口地址

subcc %g3, 0, %g0


    其实上面那些东西想通了,就很容易记住了,是不是比x86简单呀。下面的就进入本文主题了。

SPARC指令详细介绍

SPARC指令集总体分为以下6类:

    • 算术运算/逻辑运算/移位指令

    • LOAD/STORE指令

    • 控制转移指令

    • /写专用寄存器指令

    • 浮点运算指令

    • 协处理器指令

    本指令集来基于SPARC V8版本,对于SPARC新增的指令请参见SPARC 最新的版本手册。不过那些指令真的又多又烦,我只是介绍常用的。那些不怎么用的,自己查V8手册吧。

数据处理指令(算术运算/逻辑运算/移位指令)

SPARC的数据处理指令主要完成寄存器中数据的算术和逻辑运算操作。SPARC数据处理指令的基本原则为:

  • 所有操作数都是32位宽,或来自寄存器或是在指令中定义的立即数(符号或0扩展)

  • 算术运算和逻辑运算指令都要用到3个寄存器,大部分的情况下,第一和最后一个参数是双寄存器,然而第二个参数可以是寄存器或13位有符号立即数。

  • 如果数据操作有结果,则结果为32位宽,放在一个寄存器中(有一个例外一一乘法令产生64位的结果)

  • %y寄存器用于乘法和除法中。在除法中,暂存被除数的高32位有效位;在乘法中,暂存乘积的高32位有效位;从事程序开发的人员应很清楚%y寄存器的价值,是既可用于乘法运算中检查乘积,又可除法指令中适当地设置操作数。

汇编格式

    根据第2操作数的类型,其汇编格式分为以下2种,

    Opcode %reg, %reg, %reg

    Opcode %reg, const, %reg


    每个运算操作根据是否设置条件码双分成两条指令,也就是说同一个运算操作指令,一条是会对%psr状态码位产生影响(带有cc后缀的指令),而另一条则不会。另外,乘法和除法又存在有符号和无符号运算之分(区别在于指令是否有S前缀)。

      1.加减法指令ADDADDccADDXccSUBSUBccSUBXcc

用法:

    这些指令是最基本的算术运算指令。其中加法ADD和减法SUB都是不区分操作数是否有符号,只是简单地进行相关运算。如果非要给它们加上符号的考虑的话,SPARC提供了很有用的条件码,只需在其指令助记符后加上cc。如ADDccSUBcc。这样,运算的结果就会影响PSR相应的条件码icc位。

关于条件码的补充:

    在SPARC的指令集中,统一用小写字母cc作为后缀加在指令助记符后面,表示执行时会影响相应状态寄存器的条件码位。SPARC条件码有三类(icc, fcc, ccc),分别在PSRFSRCSR状态寄存器中,而SPARC的分支跳转指令根据这些状态寄存器的条件位自动判断是否执行指令。

    例如: 算术运算类中的加法指令ADD ,加上条件后缀cc后成为ADDcc,表示“相加并设置条件码”,即执行完加的操作后,接着设置相应PSR状态寄存器的icc条件码标志位。

条件码说明:

    icc: 整数寄存器条件码位。

    icc条件码位,是%psr寄存器的一部分。它们在某些汇编指令执行期间被设置。它们的内容后来被查得并遵照其执行。

N(Negative)负标志位,如果算术运算的结果最高有效位为1, 那么置1

Z: (Zero)零标志位,如果算术运算的结果0, 那么置1

C: (Carry)进位标志位,进位置1

V: (over)溢出标志位,溢出置1


    fcc: 浮点寄存器条件码位。

    fcc只占用2位,可表示的数值范围是0~3

E 0 :如果 fregrs1 = fregrs2 ,那么fcc值为0
L 1 
:如果 fregrs1 < fregrs2 ,那么fcc值为1
G 2 
:如果 fregrs1 > fregrs2 ,那么fcc值为2
U 3 
:如果 fregrs1 ? fregrs2(无序的---一个或两个数是NaN,那么fcc值为3

ccc: 协处理器条件码位。

    是%csr寄存器的一部分域

注意事项:

    (1) ADDccADDXcc会更改icc,如果两操作数有相同的符号,而加起来的结果符号位又之前不同,那么会产生一个溢出。而对于减法,则有,如果两操作数有不相同的符号,而减运算后的结果符号位又第一操作数符号不同,那么会产生一个溢出。

    (2) 特别说明,ADDX是扩展加法指令,xextern,表示加上PSR中的carry状态位。也就是“Op1 + Op2 + c -> Op3”

    下面是x扩展的加法举例:

    mov -1, %l6

    addcc %l6, 1, %g0 !其中%g0恒为零,运算的结果只是影响了carry

    addx %g0, %g0, %l7 ! 0 + 0 + carry -> %l7

    cmp %l7, 1

    bne ERR


2.SUBcc的用法

    SUBcc和目的寄存器(rd=0)一起常用于有符号或无符号整数的比较,因为我们知道,比较指令(CMP)在任何架构的机器中最终都是进行减法运算,如果结果为0,则表示两数相等。

3.乘除法指令SMULSDIVUMULUDIV

    SPARC乘法指令完成两个32位源操作数相乘得到64位乘积的乘法运算,存在有符号和无符号运算之分。有符号操作,意味着它们的源操作数是有符号的,而且它所得的结果也是有符号的;无符号操作,意味着它们的源操作数是无符号的,而且它所得的结果也是无符号的。

    在乘法指令中,运算结果(乘积)的高32位有效位(most signifiacnt 32bit)放入%y寄存器中,而低32位有效位(least signisficnat 32bit)放入目的寄存器中;在除法中,参与运算的被除数是大于32位的数,那么它的高32位有效位放在%y中,而低32位有效位放入第一个源操作数寄存器中。除法运算所得的32位商放到目的操作数中。

新条件码位的乘除指令

    -cc结尾命名的乘除操作指令,会影响条件码寄存器。乘法指令操作(smulcc, umulcc)总是会清除条件码寄存器的V(溢出)C(进位)位。另外,这些操作会更新条件码寄存器的N(负结果)Z()位。尽管乘法操作产生64位的结果,但更新条件码寄存器的NZ位是根据结果的低32位的。像乘法一样,除法操作(sdivcc, udivcc)也清除条件码寄存器的C(进位)位。另外,根据32位结果,除法操作会更新条件码寄存器的NVZ位。

检验和设置%y寄存器

    很多情况下,你可能须要检查或设置%y寄存器,特别是在乘法运算之后或除法运算之前(设置被除数)的情况下。你可以使用(综合指令)MOV指令,把%y寄存器的内容拷贝出来。也可以使用MOV指令设置它的内容。

例如:

%y寄存器拷贝

mov %y, rd ! reg[rd] = reg[%y]

mov rs, %y ! reg[%y] = reg[rs]

%y寄存器设置

mov const,%y !reg[%y] = const

注意事项:

    当你(使用MOV指令)%y寄存器存入一个值时,实际上,到%y寄存器值的更新需要3个指令周期的时间。这就意味着,在开始写%y的指令与你要使用%y的值的指令之间,你必须确定至少有三条指令。


4.逻辑运算指令

    ANDORXOR分别完成“与”、“或”、“异或”的按位操作。AND可常用于提取寄存器中某些位的值,OR常用于将寄存器中某些位的值设置为1ANDNORNXNOR中的“N”意思是在进行主操作之前(andorxor),先将第2操作数取反。ANDN通常用于将寄存器中某些位的值清除为0。当然也有取反指令NOT

5.移位SLLSRLSRA

    在汇编语言中,移位一般分为逻辑移位和算术移位两种。逻辑移位是指寄存器中所  有位都往一个方向移,未尾补0。而算术移位则不是所有位都移,它必须始终保留寄存器中数据的符号位,而只移动其余的位。注意,移位操作不会影响条件码。

    例如:

    srl rs1, rs2, rd !reg[rd] = reg[rs1] >> reg[rs2]

    sra rs1, rs2, rd !reg[rd] = reg[rs1] >> reg[rs2] | (reg[rs1]31)


load/store (memery)指令

    SPARC处理器是Load/Store型的,即它对数据的操作是通过将数据从存储器加载到片内寄存器中进行处理,处理完成后的结果经过寄存器存回到存储器中,以加快对片外存储器进行数据处理的执行速度。SPARC的数据存取指令Load/Store是唯一用于寄存器和存储器之间进行数据传送的指令。(所以不要写出这样的指令:ld 5, %l4

    Load操作:将数据从存储器加载到片内寄存器

    Store操作:将数据从寄存器中放入存储器中。

指令说明:

    load/store指令支持bytehalfword(16bit)worddoubleword(64bit)访问。有些版本的load指令完成8位符号扩展成16位值用作存入目的寄存器。浮点和协处理器load/store指令支持字和双字的内存访问。

地址空间标识符

通常的load/store指令提供了一个地址空间标识符,这依靠处理器是在用户还是在管理员模式下。然而,有特权的load从预备空间指令到store提供了直接的地址空间标识符,这是从指令的asi域获得的。


load/store指令包括如下基本指令:

LDUB下载无符号字节(用“0”填补)

LDSB下载有符号字节(符号扩展)

LDUH下载无符号半字(用“0”填补)

LDSH下载有符号半字(符号扩展)

LD下载(一个4 Byte字,32位)

LDD下载双字


STB存储字节

STH存储半字

ST存储(字)

STD存储双字

SWAP交换指令

ARM指令集比,SPARC没有多寄存器存取指令,但双寄存器存取指令与之对应(LDDSTD)。也可用于大批数据的传送。一般这些指令用于进程的进入和退出、保存和恢复了作寄存器以及拷贝存储器中一块数据。

下面详细介绍主要的数据传送指令

  1. 字数据传送指令(LD, ST)

    这是最简单的数据传送指令,不用考虑符号位的问题,直接以机器的位长存取数据。LD 从内存中取32位字数据放人寄存器,ST将寄存器中的32位字数据保存到内存中。

ld some_addr, %r10

st %r10, some_addr


2双字数据传送指令(LDDSTD)

    这条指令需要用到一对寄存器存放双字,并且必须是偶数寄存器。LD 从内存中取64位双字数据放人一对寄存器中,注意,高字(bits 63 ~ 32)移入到偶寄存器中,低字(位于有效内存address+4)移入到紧跟着的奇寄存器中。ST将一对寄存器中的64位双字数据保存到内存中。

lddd some_addr, %r10 ! some_addr[0] ->%r10, some_addr[0+4] ->%r11

std %r10, some_addr



3. 交换指令(SWAP)

    将指定寄存器的数据与内存中的字数据相互交换。与intel x86的“XCH”交换指令功能相同。

swap some_addr, %r10



控制转移指令

    在SPARC中有2种方法可实现程序的转移:一种是用传送指令直接向PC 寄存器中写人转移的目标地址值,通过改变PC的值实现程序的跳转;另一种是下面要介绍的转移指令。在SPARC指令集中,转移指令有2种,有条件和无条件转移指令。值得注意的是,在SPARC中,条件是由条件码寄存器提供的,而条件码标志位的变化又是可控的,它通过不同的指令来控制的,这就是我们前面已经提到的,带有-cc的指令会影响条件码。(浮点的也相类似,就不介绍了)

    转移指令

    SPARC提供了16条基本的转移指令,注意,BABN是无条件的,用于指示跳转是否发生。其余的指令是有条件跳转,仅仅在指定的条件满足的时候,才发生跳转。

1.侦测单个条件码位

bneg -- branch on negative (N==1) (结果)

bpos -- branch on positive (N==0) 非负

bz -- branch on zero (Z==1) synonym for be 相等

bnz -- branch on not zero (Z==0) synonym for bne 不等

bvs -- branch on overflow set (V==1) 溢出

bvc -- branch on overflow clear (V==0) 不溢出

bcs -- branch on carry set (C==1)进位

bcc -- branch on carry clear (C==0) 进位为0


2.侦测多个条件码位

侦测比较的结果或其它操作的结果 (有符号算术运算)

bl -- branch on less than ((N xor V)==1) 小于

ble -- branch on less than or equal ((Z or (N xor V))==1) 小于等于

be -- branch on equal (Z==1) 相等

bne -- branch on not equal (Z==0) 不等

bge -- branch on greater than or equal ((N xor V)==0) 大于等于

bg -- branch on greater than ((Z or (N xor V))==0) 大于

其实还有两个:

bgu无符号大于

bleu 无符号小于等于


注意事项:

    1.默认情况下,只要分支转移指令被执行,紧跟分支指令后面的指令就会被执行。

    2.分支指令延时间隙(Branch delay slot), SPARC用两个程序指针PCnPC,来保持指令执行的轨迹。PC持有下一条要执行的指令的地址;第二个程序指针,nPC持有PC的下一个值。通常,SPARC在每一条指令执行结束时更新当前程序指针,更新时,用nPC的值代替PC的值,nPC的值则是“其原值+4”。当它执行一个转移指令时,SPARC分配nPC的值给PC,然后更新nPC的值。如果跳转发生,nPC被分配一个用指令声明的目标(地址)值;否则,nPC的值是自增4的。也就是说,跳转指令发生时,一时并没改变PC的值到目标地址,它只能执行原nPC给的下一条指令(这就产生了跳转间隙)

    3.在跳转指令的延时间隙里不方便放有用的指令的情形时,SPARC提供 了“nop”复合指令。nop的执行,它不改变任何寄存器或内存的值。然而,它的作用是导致处理器执行更多的指令,比如,增加所需执行程序的时间。

    4.用加后缀",a"来声明跳转间隙无效。如果条件分支指令没有发生,则跳转间隙中的指令无效。如 bg ,a top

这里讲的延时间隙是很牛的东东

Sparc汇编中经常看到,所以要花点时间:

test.c:

    int temp;

    int x = 0;

    int y = 0x9;

    int z = 0x42;

    temp = y;

    while(temp > 0)

    {  

        x = x+z;

        temp = temp-1;

    }

简单地转换,这里用到nop指令放到“分支指令延时间隙(Branch delay slot)”中。

    .data

    x: .word 0

    y: .word 0x9

    z: .word 0x42

    .text

    start: set y, %r1

    ld [%r1],%r2

    set z, %r1

    ld [%r1],%r3

    mov %r0,%r4


    add %r2, 1, %r2

    ba test

    nop !here, Branch delay slot

    top: add %r4, %r3, %r4

    test: subcc %r2, 1, %r2

    bg top

    nop !here, Branch delay slot

    set x,%r1

    st %r4,[%r1] !store x

    end: ta 0

    在上面,分支指令bg top,当发生跳转时,处理器并没有马上执行跳转,而是要执行完nop这条分支指令延时间隙中的指令后,才转移到top:处。

    Ok,在“分支指令延时间隙(Branch delay slot)”中用有用的指令。

    .data

    ......

    .text

    start: set y,%r1

    ld [%r1],%r2

    set z,%r1

    ld [%r1],%r3

    mov %r0,%r4


    add %r2, 1, %r2

    top: subcc %r2, 1, %r2

    bg,a top

    add %r4, %r3, %r4

    set x, %r1

    st %r4,[%r1]

    end: ta 0

    分支指令bg,a top,当发生跳转时,处理器并没有马上执行跳转,而是要执行完“add %r4, %r3, %r4”这条分支指令延时间隙中的指令后,才转移到top:处。而且,当跳转条件不满足时,分支指令延时间隙中的指令无效。


/写专用寄存器指令

    读写状态寄存器指令用于访问状态寄存器或对它们写入一个新值,这些状态寄存器可以是%y, %psr, %wim, %tbr %asr

读专用状态寄存器 RD

汇编语法

rd %y, regrd

rd %asr_regrs1, regrd

rd %wim, regrd

rd %tbr, regrd

例子

rd %psr, %r2 ! %psr -> %r2

写专用状态寄存器 WR

汇编语法

wr regrs1, reg_or_immed, %y

wr regrs1, reg_or_immed, %asr_regrd

wr regrs1, reg_or_immed, %wim

wr regrs1, reg_or_immed, %tbr

例子

wr %r2, %y ! %r2->%y


浮点操作指令

    浮点操作指令执行所有的浮点运算,它们是寄存器-寄存器的基于浮点寄存器的操作指令。你可以通过SPARC系统上的浮点环境开发可靠的、高性能的、可移植的数值应用程序。

浮点运算指令

    浮点操作指令支持在整形字和单、双、四精度浮点操作数之间操作。浮点运算指令根据功能的不同可分为:类型转换浮点指令、浮点算术运算指令、其它浮点指令。

(具体指令介绍就省略,因为用的不多)

    好辛苦才把这文章整理好,突然发现其实还有好多是没有写的,比如:callretjumpl、组合指令等等。但是一篇文章就把那么多的东西都写完那是不可能的。只能靠各人的自我修行了。好了,还有一个很常用的指令没有介绍,就用它作为这篇文章的ending吧。

sethi指令

格式:

sethi const22, %reg

这个指令的作用就在于把const22放在reg的高22位,和把reg的低十位设为0

sethi 0x333333, %L1;

0x333333 是 1100110011001100110011

经过指令作用%L1 1100110011001100110011 0000000000

好,下面进入机智问答时间:

    Q:为什么要这样的一条指令?

    A:目的在于把一个32位的常数(比如:地址)放入寄存器。因为你不可能用一条指令来完成这个功能,(因为所有的指令都是32位长,其中还包括操作命令、标志位等)所以用这个指令再配合(add或者or)把底位补上,就能很好的解决问题。

例如,%L1设为0x89abcdef

1.0x89abcdef分为高22位和底10

    89abcdef = 10001001101010111100110111101111

    高22位:1000100110101011110011 = 226ae3

    底10位:0111101111 = 1ef

2.把这两部分分别的放进%L1

    sethi 0x226af3, %L1

    or %L1, 0x1ef, %L1

    看完后,你心里也许开始骂指令的开发者,这不是傻A的后面吗?每次放一个数我都得自己拆分一次?

    呵呵,还记得开始介绍的%hiX)和%loX)吗?前一个指令就是获得常数X的高22位,后一个就是获得常数X10位。

所以,我们可以这样做:

    sethi %hi(0x89abcdef), %L1

    or %L1, lo(0x89abcdef), %L1

    不过这也挺麻烦的,我们可以使用一条等效的组合指令:set(它不是真正的sparc指令,其实就是把sethi...or...封装起来)。

    set const32, %reg






附录:《Understanding stacks and registers in the Sparcarchitecture(s)

TheSparc architecture fromSun Microsystems has some "interesting" characteristics. After having to deal with both compiler, interpreter, OS emulator, and OS porting issues for the Sparc, I decided to gather notes and documentation in one place. If there are any issues you don't find addressed by this page, or if you know of any similar Net resources, let me know. This document is limited to the V8 version of the architecture.

General Structure

Sparc has 32 general purpose integer registers visible to the program at any given time. Of these, 8 registers areglobal registers and 24 registers are in a register window. A window consists of three groups of 8 registers, theout, local, and in registers. See table 1. A Sparc implementation can have from 2 to 32 windows, thus varying the number of registers from 40 to 520. Most implentations have 7 or 8 windows. The variable number of registers is the principal reason for the Sparc being "scalable".

At any given time, only one window is visible, as determined by the current window pointer (CWP) which is part of the processor status register (PSR). This is a five bit value that can be decremented or incremented by the SAVE and RESTORE instructions, respectively. These instructions are generally executed on procedure call and return (respectively). The idea is that thein registers contain incoming parameters, the local register constitute scratch registers, theout registers contain outgoing parameters, and the global registers contain values that vary little between executions. The register windows overlap partially, thus theout registers become renamed by SAVE to become the in registers of the called procedure. Thus, the memory traffic is reduced when going up and down the procedure call. Since this is a frequent operation, performance is improved.

(That was the idea, anyway. The drawback is that upon interactions with the system the registers need to be flushed to the stack, necessitating a long sequence of writes to memory of data that is often mostly garbage. Register windows was a bad idea that was caused by simulation studies that considered only programs in isolation, as opposed to multitasking workloads, and by considering compilers with poor optimization. It also caused considerable problems in implementing high-end Sparc processors such as the SuperSparc, although more recent implementations have dealt effectively with the obstacles. Register windows is now part of the compatibility legacy and not easily removed from the architecture.)

Register GroupMnemonicRegister Addressglobal%g0-%g7r[0]-r[7]out%o0-%o7r[8]-r[15]local%l0-%l7r[16]-r[23]in%i0-%i7r[24]-r[31]

Table 1 - Visible Registers

The overlap of the registers is illustrated in figure 1. The figure shows an implementation with 8 windows, numbered 0 to 7 (labeled w0 to w7 in the figure).. Each window corresponds to 24 registers, 16 of which are shared with "neighboring" windows. The windows are arranged in a wrap-around manner, thus window number 0 borders window number 7. The common cause of changing the current window, as pointed to by CWP, is the RESTORE and SAVE instuctions, shown in the middle. Less common is the supervisor RETT instruction (return from trap) and the trap event (interrupt, exception, or TRAP instruction).


The "WIM" register is also indicated in the top left of figure 1. The window invalid mask is a bit map of valid windows. It is generally used as a pointer, i.e. exactly one bit is set in the WIM register indicating which window is invalid (in the figure it's window 7). Register windows are generally used to support procedure calls, so they can be viewed as a cache of the stack contents. The WIM "pointer" indicates how many procedure calls in a row can be taken without writing out data to memory. In the figure, the capacity of the register windows is fully utilized. An additional call will thus exceed capacity, triggering awindow overflow trap. At the other end, a window underflow trap occurs when the register window "cache" if empty and more data needs to be fetched from memory.

Register Semantics

The Sparc Architecture includes recommended software semantics. These are described in the architecture manual, the Sparc ABI (application binary interface) standard, and, unfortunately, in various other locations as well (including header files and compiler documentation).

Figure 2 shows a summary of register contents at any given time.

                 %g0  (r00)       always zero                 %g1  (r01)  [1]  temporary value                 %g2  (r02)  [2]  global 2     global      %g3  (r03)  [2]  global 3                 %g4  (r04)  [2]  global 4                 %g5  (r05)       reserved for SPARC ABI                 %g6  (r06)       reserved for SPARC ABI                 %g7  (r07)       reserved for SPARC ABI                 %o0  (r08)  [3]  outgoing parameter 0 / return value from callee                    %o1  (r09)  [1]  outgoing parameter 1                 %o2  (r10)  [1]  outgoing parameter 2     out         %o3  (r11)  [1]  outgoing parameter 3                 %o4  (r12)  [1]  outgoing parameter 4                 %o5  (r13)  [1]  outgoing parameter 5            %sp, %o6  (r14)  [1]  stack pointer                 %o7  (r15)  [1]  temporary value / address of CALL instruction                 %l0  (r16)  [3]  local 0                 %l1  (r17)  [3]  local 1                 %l2  (r18)  [3]  local 2     local       %l3  (r19)  [3]  local 3                 %l4  (r20)  [3]  local 4                 %l5  (r21)  [3]  local 5                 %l6  (r22)  [3]  local 6                 %l7  (r23)  [3]  local 7                 %i0  (r24)  [3]  incoming parameter 0 / return value to caller                 %i1  (r25)  [3]  incoming parameter 1                 %i2  (r26)  [3]  incoming parameter 2     in          %i3  (r27)  [3]  incoming parameter 3                 %i4  (r28)  [3]  incoming parameter 4                 %i5  (r29)  [3]  incoming parameter 5            %fp, %i6  (r30)  [3]  frame. pointer                 %i7  (r31)  [3]  return address - 8Notes:[1] assumed by caller to be destroyed (volatile) across a procedure call[2] should not be used by SPARC ABI library code[3] assumed by caller to be preserved across a procedure call
Figure 2 - Sparc register semantics

Particular compilers are likely to vary slightly.

Note that globals %g2-%g4 are reserved for the "application", which includes libraries and compiler. Thus, for example, libraries may overwrite these registers unless they've been compiled with suitable flags. Also, the "reserved" registers are presumed to be allocated (in the future) bottom-up, i.e. %g7 is currently the "safest" to use.

Optimizing linkers and interpreters are exmples that use global registers.

Register Windows and the Stack

The sparc register windows are, naturally, intimately related to the stack. In particular, the stack pointer (%sp or %o6) must always point to a free block of 64 bytes. This area is used by the operating system (Solaris, SunOS, and Linux at least) to save the current local and in registers upon a system interupt, exception, or trap instruction. (Note that this can occur at any time.)

Other aspects of register relations with memory are programming convention. The typical, and recommended, layout of the stack is shown in figure 3. The figure shows a stack frame.

                    low addresses               +-------------------------+              %sp  -->  | 16 words for storing    |               | LOCAL and IN registers  |               +-------------------------+               |  one-word pointer to    |               | aggregate return value  |               +-------------------------+               |   6 words for callee    |               |   to store register     |               |       arguments         |               +-------------------------+               |  outgoing parameters    |               |  past the 6th, if any   |               +-------------------------+               |  space, if needed, for  |               |  compiler temporaries   |               |   and saved floating-   |               |    point registers      |               +-------------------------+               +-------------------------+               |    space dynamically    |               |    allocated via the    |               |  alloca() library call  |               +-------------------------+               |  space, if needed, for  |               |    automatic arrays,    |               |    aggregates, and      |               |   addressable scalar    |               |       automatics        |               +-------------------------+    %fp  -->                     high addresses
Figure 3 - Stack frame. contents

Note that the top boxes of figure 3 are addressed via the stack pointer (%sp), as positive offsets (including zero), and the bottom boxes are accessed over the frame. pointer using negative offsets (excluding zero), and that the frame. pointer is the old stack pointer. This scheme allows the separation of information known at compile time (number and size of local parameters, etc) from run-time information (size of blocks allocated byalloca()).

"addressable scalar automatics" is a fancy name for local variables.

The clever nature of the stack and frame. pointers are that they are always 16 registers apart in the register windows. Thus, a SAVE instruction will make the current stack pointer into the frame. pointer and, since the SAVE instruction also doubles as an ADD, create a new stack pointer. Figure 4 illustrates what the top of a stack might look like during execution. (The listing is from the "pwin" command in theSimICS simulator.)

                  REGISTER WINDOWS                 +--+---+----------+                 |g0|r00|0x00000000| global                 |g1|r01|0x00000006| registers                 |g2|r02|0x00091278|      g0-g7      |g3|r03|0x0008ebd0|                 |g4|r04|0x00000000|                     (note: 'save' and 'trap' decrements CWP,                 |g5|r05|0x00000000|                      i.e. moves it up on this diagram. 'restore'                 |g6|r06|0x00000000|                      and 'rett' increments CWP, i.e. down)                 |g7|r07|0x00000000|                 +--+---+----------+ CWP (2)         |o0|r08|0x00000002|                 |o1|r09|0x00000000|                            MEMORY                 |o2|r10|0x00000001|      o0-o7      |o3|r11|0x00000001|             stack growth                 |o4|r12|0x000943d0|                 |o5|r13|0x0008b400|                  ^                 |sp|r14|0xdffff9a0| ----\           /|\                 |o7|r15|0x00062abc|     |            |                     addresses                 +--+---+----------+     |     +--+----------+         virtual     physical                 |l0|r16|0x00087c00|     \---> |l0|0x00000000|        0xdffff9a0  0x000039a0  top of frame. 0                    |l1|r17|0x00027fd4|           |l1|0x00000000|        0xdffff9a4  0x000039a4                 |l2|r18|0x00000000|           |l2|0x0009df80|        0xdffff9a8  0x000039a8      l0-l7      |l3|r19|0x00000000|           |l3|0x00097660|        0xdffff9ac  0x000039ac                 |l4|r20|0x00000000|           |l4|0x00000014|        0xdffff9b0  0x000039b0                 |l5|r21|0x00097678|           |l5|0x00000001|        0xdffff9b4  0x000039b4                 |l6|r22|0x0008b400|           |l6|0x00000004|        0xdffff9b8  0x000039b8                 |l7|r23|0x0008b800|           |l7|0x0008dd60|        0xdffff9bc  0x000039bc              +--+--+---+----------+           +--+----------+ CWP+1 (3)    |o0|i0|r24|0x00000002|           |i0|0x00091048|        0xdffff9c0  0x000039c0              |o1|i1|r25|0x00000000|           |i1|0x00000011|        0xdffff9c4  0x000039c4              |o2|i2|r26|0x0008b7c0|           |i2|0x00091158|        0xdffff9c8  0x000039c8      i0-i7   |o3|i3|r27|0x00000019|           |i3|0x0008d370|        0xdffff9cc  0x000039cc              |o4|i4|r28|0x0000006c|           |i4|0x0008eac4|        0xdffff9d0  0x000039d0              |o5|i5|r29|0x00000000|           |i5|0x00000000|        0xdffff9d4  0x000039d4              |o6|fp|r30|0xdffffa00| ----\     |fp|0x00097660|        0xdffff9d8  0x000039d8              |o7|i7|r31|0x00040468|     |     |i7|0x00000000|        0xdffff9dc  0x000039dc              +--+--+---+----------+     |     +--+----------+                                         |        |0x00000001|        0xdffff9e0  0x000039e0  parameters                                         |        |0x00000002|        0xdffff9e4  0x000039e4                                         |        |0x00000040|        0xdffff9e8  0x000039e8                                         |        |0x00097671|        0xdffff9ec  0x000039ec                                         |        |0xdffffa68|        0xdffff9f0  0x000039f0                                         |        |0x00024078|        0xdffff9f4  0x000039f4                                         |        |0x00000004|        0xdffff9f8  0x000039f8                                         |        |0x0008dd60|        0xdffff9fc  0x000039fc              +--+------+----------+     |     +--+----------+              |l0|      |0x00087c00|     \---> |l0|0x00091048|        0xdffffa00  0x00003a00  top of frame. 1              |l1|      |0x000c8d48|           |l1|0x0000000b|        0xdffffa04  0x00003a04              |l2|      |0x000007ff|           |l2|0x00091158|        0xdffffa08  0x00003a08              |l3|      |0x00000400|           |l3|0x000c6f10|        0xdffffa0c  0x00003a0c              |l4|      |0x00000000|           |l4|0x0008eac4|        0xdffffa10  0x00003a10              |l5|      |0x00088000|           |l5|0x00000000|        0xdffffa14  0x00003a14              |l6|      |0x0008d5e0|           |l6|0x000c6f10|        0xdffffa18  0x00003a18              |l7|      |0x00088000|           |l7|0x0008cd00|        0xdffffa1c  0x00003a1c              +--+--+---+----------+           +--+----------+ CWP+2 (4)    |i0|o0|   |0x00000002|           |i0|0x0008cb00|        0xdffffa20  0x00003a20              |i1|o1|   |0x00000011|           |i1|0x00000003|        0xdffffa24  0x00003a24              |i2|o2|   |0xffffffff|           |i2|0x00000040|        0xdffffa28  0x00003a28              |i3|o3|   |0x00000000|           |i3|0x0009766b|        0xdffffa2c  0x00003a2c              |i4|o4|   |0x00000000|           |i4|0xdffffa68|        0xdffffa30  0x00003a30              |i5|o5|   |0x00064c00|           |i5|0x000253d8|        0xdffffa34  0x00003a34              |i6|o6|   |0xdffffa70| ----\     |i6|0xffffffff|        0xdffffa38  0x00003a38              |i7|o7|   |0x000340e8|     |     |i7|0x00000000|        0xdffffa3c  0x00003a3c              +--+--+---+----------+     |     +--+----------+                                         |        |0x00000001|        0xdffffa40  0x00003a40  parameters                                         |        |0x00000000|        0xdffffa44  0x00003a44                                         |        |0x00000000|        0xdffffa48  0x00003a48                                         |        |0x00000000|        0xdffffa4c  0x00003a4c                                         |        |0x00000000|        0xdffffa50  0x00003a50                                         |        |0x00000000|        0xdffffa54  0x00003a54                                         |        |0x00000002|        0xdffffa58  0x00003a58                                         |        |0x00000002|        0xdffffa5c  0x00003a5c                                         |        |    .     |                                         |        |    .     |        .. etc (another 16 bytes)                                         |        |    .     |
Figure 4 - Sample stack contents

Note how the stack contents are not necessarily synchronized with the registers. Various events can cause the register windows to be "flushed" to memory, including most system calls. A programmer can force this update by using ST_FLUSH_WINDOWS trap, which also reduces the number of valid windows to the minimum of 1.

Writing a library for multithreaded execution is an example that requires explicit flushing, as islongjmp().

Procedure epilogue and prologue

The stack frame. described in the previous section leads to the standard entry/exit mechanisms listed in figure 5.

  function:    save  %sp, -C, %sp               ; perform. function, leave return value,                  ; if any, in register %i0 upon exit    ret        ; jmpl %i7+8, %g0    restore    ; restore %g0,%g0,%g0
Figure 5 - Epilogue/prologue in procedures

The SAVE instruction decrements the CWP, as discussed earlier, and also performs an addition. The constant "C" that is used in the figure to indicate the amount of space to make on the stack, and thus corresponds to the frame. contents in Figure 3. The minimum is therefore the 16 words for the LOCAL and IN registers, i.e. (hex) 0x40 bytes.

A confusing element of the SAVE instruction is that the source operands (the first two parameters) are read from the old register window, and the destination operand (the rightmost parameter) is written to the new window. Thus, allthough "%sp" is indicated as both source and destination, the result is actually written into the stack pointer of the new window (the source stack pointer becomes renamed and is now the frame. pointer).

The return instructions are also a bit particular. ret is a synthetic instruction, corresponding tojmpl (jump linked). This instruction jumps to the address resulting from adding 8 to the %i7 register. The source instruction address (the address of theret instruction itself) is written to the %g0 register, i.e. it is discarded.

The restore instruction is similarly a synthetic instruction, and is just a short form. for a restore that choses not to perform. an addition.

The calling instruction, in turn, typically looks as follows:

    call <function>    ; jmpl <address>, %o7    mov 0, %o0
Again, the call instruction is synthetic, and is actually the same instruction that performs the return. This time, however, it is interested in saving the return address, into register %o7. Note that the delay slot is often filled with an instruction related to the parameters, in this example it sets the first parameter to zero.

Note also that the return value is also generally passed in %o0.

Leaf procedures are different. A leaf procedure is an optimization that reduces unnecessary work by taking advantage of the knowledge that nocall instructions exist in many procedures. Thus, the save/restore couple can be eliminated. The downside is that such a procedure may only use theout registers (since the in and local registers actually belong to the caller). See Figure 6.

  function:               ; no save instruction needed upon entry               ; perform. function, leave return value,                  ; if any, in register %o0 upon exit    retl       ; jmpl %o7+8, %g0    nop        ; the delay slot can be used for something else   
Figure 6 - Epilogue/prologue in leaf procedures

Note in the figure that there is only one instruction overhead, namely the retl instruction. retl is also synthetic (return from leaf subroutine), is again a variant of thejmpl instruction, this time with %o7+8 as target.

Yet another variation of epilogue is caused by tail call elimination, an optimization supported by some compilers (including Sun's C compiler but not GCC). If the compiler detects that a called function will return to the calling function, it can replace its place on the stack with the called function. Figure 7 contains an example.

      int        foo(int n)      {        if (n == 0)          return 0;        else          return bar(n);      }
        cmp     %o0,0        bne     .L1        or      %g0,%o7,%g1        retl        or      %g0,0,%o0  .L1:  call    bar        or      %g0,%g1,%o7
Figure 7 - Example of tail call elimination

Note that the call instruction overwrites register %o7 with the program counter. Therefore the above code saves the old value of%o7, and restores it in the delay slot of the call instruction. If the function call is register indirect, this twiddling with%o7 can be avoided, but of course that form. of call is slower on modern processors.

The benefit of tail call elimination is to remove an indirection upon return. It is also needed to reduce register window usage, since otherwise thefoo() function in Figure 7 would need to allocate a stack frame. to save the program counter.

A special form. of tail call elimination is tail recursion elimination, which detects functions calling themselves, and replaces it with a simple branch. Figure 8 contains an example.

        int          foo(int n)        {          if (n == 0)            return 1;          else            return (foo(n - 1));        }
        cmp     %o0,0        be      .L1        or      %g0,%o0,%g1        subcc   %g1,1,%g1  .L2:  bne     .L2        subcc   %g1,1,%g1  .L1:  retl        or      %g0,1,%o0
Figure 8 - Example of tail recursion elimination

Needless to say, these optimizations produce code that is difficult to debug.

Procedures, stacks, and debuggers

When debugging an application, your debugger will be parsing the binary and consulting the symbol table to determine procedure entry points. It will also travel the stack frames "upward" to determine the current call chain.

When compiling for debugging, compilers will generate additional code as well as avoid some optimizations in order to allow reconstructing situations during execution. For example, GCC/GDB makes sure original parameter values are kept intact somewhere for future parsing of the procedure call stack. The live in registers other than %i0 are not touched. %i0 itself is copied into a freelocal register, and its location is noted in the symbol file. (You can find out where variables reside by using the "info address" command in GDB.)

Given that much of the semantics relating to stack handling and procedure call entry/exit code is only recommended, debuggers will sometimes be fooled. For example, the decision as to wether or not the current procedure is a leaf one or not can be incorrect. In this case a spurious procedure will be inserted between the current procedure and it's "real" parent. Another example is when the application maintains its own implicit call hierarchy, such as jumping to function pointers. In this case the debugger can easily become totally confused.

The window overflow and underflow traps

When the SAVE instruction decrements the current window pointer (CWP) so that it coincides with the invalid window in the window invalid mask (WIM), a window overflow trap occurs. Conversely, when the RESTORE or RETT instructions increment the CWP to coincide with the invalid window, a window underflow trap occurs.

Either trap is handled by the operating system. Generally, data is written out to memory and/or read from memory, and the WIM register suitably altered.

The code in Figure 9 and Figure 10 below are bare-bones handlers for the two traps. The text is directly from the source code, and sort of works. (As far as I know, these are minimalistic handlers for Sparc V8). Note that there is no way to directly access window registers other than the current one, hence the code does additional save/restore instructions. It's pretty tricky to understand the code, but figure 1 should be of help.

        /* a SAVE instruction caused a trap */window_overflow:        /* rotate WIM on bit right, we have 8 windows */        mov %wim,%l3        sll %l3,7,%l4        srl %l3,1,%l3        or  %l3,%l4,%l3        and %l3,0xff,%l3        /* disable WIM traps */        mov %g0,%wim        nop; nop; nop        /* point to correct window */        save        /* dump registers to stack */        std %l0, [%sp +  0]        std %l2, [%sp +  8]        std %l4, [%sp + 16]        std %l6, [%sp + 24]        std %i0, [%sp + 32]        std %i2, [%sp + 40]        std %i4, [%sp + 48]        std %i6, [%sp + 56]        /* back to where we should be */        restore        /* set new value of window */        mov %l3,%wim        nop; nop; nop        /* go home */        jmp %l1        rett %l2
Figure 9 - window_underflow trap handler

        /* a RESTORE instruction caused a trap */window_underflow:                /* rotate WIM on bit LEFT, we have 8 windows */         mov %wim,%l3        srl %l3,7,%l4        sll %l3,1,%l3        or  %l3,%l4,%l3        and %l3,0xff,%l3        /* disable WIM traps */        mov %g0,%wim        nop; nop; nop        /* point to correct window */        restore        restore        /* dump registers to stack */        ldd [%sp +  0], %l0        ldd [%sp +  8], %l2        ldd [%sp + 16], %l4        ldd [%sp + 24], %l6        ldd [%sp + 32], %i0        ldd [%sp + 40], %i2        ldd [%sp + 48], %i4        ldd [%sp + 56], %i6        /* back to where we should be */        save        save        /* set new value of window */        mov %l3,%wim        nop; nop; nop        /* go home */        jmp %l1        rett %l2
Figure 10 - window_underflow trap handler


Note: some of the figures and data is (c) copyright Sun Microsystems. I can't imagine they would object to my usage of the material, but if you make copies you are hereby advised.

Created and maintained by Peter Magnusson.
Created in March 1997, last revision in April 1997.



原创粉丝点击