LLVM学习笔记(7)

来源:互联网 发布:网络时代知乎 编辑:程序博客网 时间:2024/06/05 07:08

2.2.6.  调度信息

在Instruction定义430行的Itinerary以及433行的SchedRW用于描述指令调度的信息。

其中Itinerary是从指令执行步骤方面来描述指令。目标机器从InstrItinClass派生对应指令的定义。对像X86这样指令复杂且版本繁多的处理器来说,需要定义的InstrItinClass派生定义数量众多。它们都在X86Schedule.td里,几乎每条(类)指令对应一个InstrItinClass定义。注意这些定义实际上是给Atom这样的顺序流水线机器使用的,因此它们不支持的指令就无需定义对应的InstrItinClass(比如AVX指令集)。举例而言,除法的InstrItinClass定义是这样的:

167      // div

168      def IIC_DIV8_MEM   : InstrItinClass;

169      def IIC_DIV8_REG   : InstrItinClass;

170      def IIC_DIV16      : InstrItinClass;

171      def IIC_DIV32      : InstrItinClass;

172      defIIC_DIV64       : InstrItinClass;

指令执行的一个流水线步骤则由InstrStage来描述:

57        classInstrStage<int cycles, list<FuncUnit> units,

58                         int timeinc = -1,

59                         ReservationKind kind =Required> {

60          int Cycles          = cycles;      // length ofstage in machine cycles

61          list<FuncUnit> Units = units;      // choice offunctional units

62          int TimeInc         = timeinc;     // cycles tillstart of next stage

63          int Kind            = kind.Value;  // kind of FUreservation

64        }

Cycles代表完成这个这个步骤(阶段)所需的周期数。Units代表用于完成该阶段的功能单元的选择。比如,IntUnit1,IntUnit2。TimeInc表示在执行步骤中,从这个阶段的开始到下个阶段的开始,需要经历多少个周期。例如:可以两种方式之一来指明一个阶段:

InstrStage<1, [FU_x, FU_y]>     - TimeInc缺省为Cycles

InstrStage<1, [FU_x, FU_y], 0>  - 显式指定TimeInc

一条(类)指令如何在(顺序)流水线中执行,则需要InstrItinData派生定义将InstrItinClass与stages绑定起来。

110      classInstrItinData<InstrItinClass Class,list<InstrStage> stages,

111                          list<int>operandcycles = [],

112                          list<Bypass> bypasses= [], int uops = 1> {

113        InstrItinClass TheClass = Class;

114        int NumMicroOps = uops;

115        list<InstrStage> Stages = stages;

116        list<int> OperandCycles =operandcycles;

117        list<Bypass> Bypasses = bypasses;

118      }

NumMicroOps代表该类指令解码后的微操作(micro-operation)的数量。如果数量是0,意味着该指令可以解码为需要动态确定的、数量不定的微操作。这直接关系到执行步骤限制每周期可发布的微操作数的全局IssueWidth属性。

OperandCycles是可选的“周期数”。它们指出在指令发出这些周期后,指定的操作数完成写或读。

Bypasses是可选的“流水线转发路径”(即处理器将执行写入操作指令的结果直接交给后续的读操作指令,绕过寄存器的接力),如果在一条指令的值在一个特定旁路上可用,且另一条指令可以从这个旁路读出这个值,那么操作数的使用时延降低1个周期。

那么在这个例子里:

InstrItinData<IIC_iLoad_i , [InstrStage<1,[A9_Pipe1]>,

                               InstrStage<1,[A9_AGU]>],

                              [3, 1],[A9_LdBypass]>,

InstrItinData<IIC_iMVNr, [InstrStage<1,[A9_Pipe0, A9_Pipe1]>],

                              [1, 1],[NoBypass, A9_LdBypass]>,

IIC_iLoadi类别指令在发出后的周期1上读入输入,在周期3这次载入的结果可用。这个结果可以通过转发路径A9_LdBypass得到。如果IIC_iMVNr类别指令的第一个源操作数使用它,那么操作数时延减少1。

但对于具有乱序执行能力的处理器(即一周期里可以执行多条指令),比如SandyBridge即后续架构的处理器,这样的描述难以利用处理器提供的指令级并行性。这时就要使用Instruction定义433行的SchedRW。这是一个列表,与指令的输入、输出参数对应,描述该指令处理这些操作数时如何占用处理器资源。这样在使用资源不冲突,且没有依赖关系时,多条指令可以并发执行。这部分细节我们在后面的章节再来讨论。

注意,对很多指令而言,同时使用了Itinerary与SchedRW来描述调度。但实际上,哪个起作用,则取决于实际的目标机器。下面我们将会看到,对Atom处理器定义了一系列的InstrItinData定义,在Atom处理器上Itinerary将用于调度。而在Sandy Bridge则给出了一系列SchedReadWrite定义,指令通过SchedRW来调度。

2.2.7.  X86指令的定义

Instruction的定义是目标机器无关的。因此,目标机器几乎总是从Instruction派生出自己所需要的指令定义。以X86为例,X86有一个庞大复杂的芯片族,因此定义了自己一个在Instruction基础上极大扩展了的基类X86Inst(X86InstrFormats.td):

220      classX86Inst<bits<8> opcod, Format f, ImmType i,dag outs, dag ins,

221                    string AsmStr,

222                    InstrItinClass itin,

223                    Domain d = GenericDomain>

224        : Instruction {

225        let Namespace= "X86";

226     

227        bits<8> Opcode = opcod;

228        Format Form = f;

229        bits<7> FormBits = Form.Value;

230        ImmType ImmT = i;

231     

232        dagOutOperandList = outs;

233        dagInOperandList = ins;

234        string AsmString = AsmStr;

235     

236        // If this is apseudo instruction, mark it isCodeGenOnly.

237        letisCodeGenOnly = !eq(!cast<string>(f),"Pseudo");

238     

239        let Itinerary= itin;

240     

241        //

242        // Attributesspecific to X86 instructions...

243        //

244        bit ForceDisassemble = 0;// Force instruction to disassemble even though it's

245                                  // isCodeGenonly.Needed to hide an ambiguous

246                                  // AsmString from theparser, but still disassemble.

247     

248        OperandSize OpSize = OpSizeFixed;// Does this instruction's encoding change

249                                          // based onoperand size of the mode?

250        bits<2> OpSizeBits = OpSize.Value;

251        AddressSize AdSize = AdSizeX;// Does this instruction's encoding change

252                                      // based onaddress size of the mode?

253        bits<2> AdSizeBits = AdSize.Value;

254     

255        Prefix OpPrefix = NoPrfx;// Which prefix byte does this inst have?

256        bits<3> OpPrefixBits = OpPrefix.Value;

257        Map OpMap = OB;          // Whichopcode map does this inst have?

258        bits<3> OpMapBits = OpMap.Value;

259        bit hasREX_WPrefix  = 0; // Does this inst require the REX.W prefix?

260        FPFormat FPForm = NotFP; // What flavor ofFP instruction is this?

261        bit hasLockPrefix = 0;   // Does thisinst have a 0xF0 prefix?

262        Domain ExeDomain = d;

263        bit hasREPPrefix = 0;    // Does thisinst have a REP prefix?

264        Encoding OpEnc = EncNormal;// Encoding used by this instruction

265        bits<2> OpEncBits = OpEnc.Value;

266        bit hasVEX_WPrefix = 0;  // Does thisinst set the VEX_W field?

267        bit hasVEX_4V = 0;       // Doesthis inst require the VEX.VVVV field?

268        bit hasVEX_4VOp3 = 0;    // Does thisinst require the VEX.VVVV field to

269                                  // encode the thirdoperand?

270        bit hasVEX_i8ImmReg = 0; // Does this instrequire the last source register

271                                 // to be encoded in aimmediate field?

272        bit hasVEX_L = 0;        // Doesthis inst use large (256-bit) registers?

273        bit ignoresVEX_L = 0;    // Does thisinstruction ignore the L-bit

274        bit hasEVEX_K = 0;       // Doesthis inst require masking?

275        bit hasEVEX_Z = 0;       // Doesthis inst set the EVEX_Z field?

276        bit hasEVEX_L2 = 0;      // Does thisinst set the EVEX_L2 field?

277        bit hasEVEX_B = 0;       // Doesthis inst set the EVEX_B field?

278        bits<3> CD8_Form = 0;     // Compresseddisp8 form - vector-width.

279        // Declare it intrather than bits<4> so that all bits are defined when

280        // assigning tobits<7>.

281        int CD8_EltSize = 0;     // Compresseddisp8 form - element-size in bytes.

282        bit has3DNow0F0FOpcode =0;// Wacky 3dNow! encoding?

283        bit hasMemOp4Prefix = 0; // Same bit asVEX_W, but used for swapping operands

284        bit hasEVEX_RC = 0;      //Explicitly specified rounding control in FP instruction.

285     

286        bits<2> EVEX_LL;

287        let EVEX_LL{0}= hasVEX_L;

288        letEVEX_LL{1} = hasEVEX_L2;

289        // Vector size inbytes.

290        bits<7> VectSize = !shl(16, EVEX_LL);

291     

292        // The scalingfactor for AVX512's compressed displacement is either

293        //   - the size of a  power-of-two number of elements or

294        //   - the size of a single element forbroadcasts or

295        //   - the total vector size divided by apower-of-two number.

296        // Possiblevalues are: 0 (non-AVX512 inst), 1, 2, 4, 8, 16, 32 and 64.

297        bits<7> CD8_Scale = !if (!eq (OpEnc.Value,EncEVEX.Value),

298                                 !if (CD8_Form{2},

299                                      !shl(CD8_EltSize, CD8_Form{1-0}),

300                                      !if (hasEVEX_B,

301                                          CD8_EltSize,

302                                           !srl(VectSize, CD8_Form{1-0}))), 0);

303     

304        // TSFlags layoutshould be kept in sync with X86BaseInfo.h.

305        letTSFlags{6-0}   = FormBits;

306        letTSFlags{8-7}   = OpSizeBits;

307        letTSFlags{10-9}  = AdSizeBits;

308        letTSFlags{13-11} = OpPrefixBits;

309        letTSFlags{16-14} = OpMapBits;

310        letTSFlags{17}    = hasREX_WPrefix;

311        letTSFlags{21-18} = ImmT.Value;

312        letTSFlags{24-22} = FPForm.Value;

313        letTSFlags{25}    = hasLockPrefix;

314        letTSFlags{26}    = hasREPPrefix;

315        letTSFlags{28-27} = ExeDomain.Value;

316        letTSFlags{30-29} = OpEncBits;

317        letTSFlags{38-31} = Opcode;

318        letTSFlags{39}    = hasVEX_WPrefix;

319        letTSFlags{40}    = hasVEX_4V;

320        let TSFlags{41}    = hasVEX_4VOp3;

321        letTSFlags{42}    = hasVEX_i8ImmReg;

322        letTSFlags{43}    = hasVEX_L;

323        letTSFlags{44}    = ignoresVEX_L;

324        letTSFlags{45}    = hasEVEX_K;

325        letTSFlags{46}    = hasEVEX_Z;

326        letTSFlags{47}    = hasEVEX_L2;

327        letTSFlags{48}    = hasEVEX_B;

328        // If we run outof TSFlags bits, it's possible to encode this in 3 bits.

329        letTSFlags{55-49} = CD8_Scale;

330        letTSFlags{56}    = has3DNow0F0FOpcode;

331        letTSFlags{57}    = hasMemOp4Prefix;

332       letTSFlags{58}    = hasEVEX_RC;

333      }

在228行的Form指定了指令的格式,相关的定义在X86InstrFormats.td中。它们描述了X86指令的“mod-reg-r/m”字节等的内容。另外248行以下的内容也都是跟指令编码有关的。指令编码与格式不影响指令选择与指令分配,它只对反汇编器的生成有关。因此,这里我们跳过它们。

LLVM从X86Inst又派生了若干类,它们区别主要在X86Inst定义230行的ImmT,即带不带立即数,以及立即数的大小。这些类构成进一步定义指令的基础。我们看两个例子(X86InstrFormats.td):

340      classI<bits<8>o, Format f, dag outs,dagins, string asm,

341              list<dag>pattern, InstrItinClass itin = NoItinerary,

342              Domain d = GenericDomain>

343        : X86Inst<o, f,NoImm, outs, ins, asm, itin, d> {

344        let Pattern =pattern;

345        let CodeSize= 3;

346      }

347      classIi8 <bits<8> o, Formatf, dag outs, dagins, string asm,

348                 list<dag>pattern, InstrItinClass itin = NoItinerary,

349                 Domain d = GenericDomain>

350        : X86Inst<o, f, Imm8, outs, ins, asm,itin, d> {

351        let Pattern =pattern;

352        let CodeSize= 3;

353      }

定义“I”是不带立即数的,“Ii8”则带有一个i8类型的立即数。参数itin与指令调度有关,用于描述指令在CPU中的执行路线图,作为基类,默认设为NoItinerary——没有路线图。从这两个类派生出来的def的例子有(X86InstrArithmetic.td):

295      let hasSideEffects =1in { // so thatwe don't speculatively execute

296      let SchedRW = [WriteIDiv]in{

297      let Defs = [AL, AH, EFLAGS], Uses = [AX]in

298      def DIV8r  :I<0xF6, MRM6r, (outs),  (insGR8:$src),   //AX/r8 = AL,AH

299                     "div{b}\t$src", [],IIC_DIV8_REG>;

300      let Defs = [AX, DX, EFLAGS], Uses = [AX,DX]in

301      def DIV16r : I<0xF7, MRM6r, (outs),  (ins GR16:$src),  // DX:AX/r16 = AX,DX

302                     "div{w}\t$src", [],IIC_DIV16>, OpSize16;

303      let Defs = [EAX, EDX, EFLAGS], Uses = [EAX,EDX]in

304      def DIV32r : I<0xF7, MRM6r, (outs),  (ins GR32:$src),  // EDX:EAX/r32 = EAX,EDX

305                     "div{l}\t$src", [],IIC_DIV32>, OpSize32;

306      // RDX:RAX/r64 = RAX,RDX

307      let Defs = [RAX, RDX, EFLAGS], Uses = [RAX,RDX]in

308      def DIV64r : RI<0xF7, MRM6r, (outs), (insGR64:$src),

309                      "div{q}\t$src", [],IIC_DIV64>;

310      } //SchedRW

注释解释了这些指令的操作。其中IIC_XXX都是特定的指令执行路线图,留到指令调度再来研究它们。MRM6r则是指令的格式,这是反汇编器需要的。Defs与Uses描述了操作数以外的寄存器的使用情况,Defs定义的是内容会被改变的寄存器集,Uses定义的是内容会被援引的寄存器集。

204      let Defs = [EFLAGS]in {

205      let SchedRW = [WriteIMul]in{

206      // Register-Integer Signed Integer Multiply

207      def IMUL16rri : Ii16<0x69, MRMSrcReg,                     //GR16 = GR16*I16

208                            (outsGR16:$dst), (ins GR16:$src1, i16imm:$src2),

209                            "imul{w}\t{$src2,$src1, $dst|$dst, $src1, $src2}",

210                            [(setGR16:$dst, EFLAGS,

211                                  (X86smul_flagGR16:$src1, imm:$src2))],

212                                  IIC_IMUL16_RRI>,OpSize16;

213      def IMUL16rri8 : Ii8<0x6B, MRMSrcReg,                      //GR16 = GR16*I8

214                           (outsGR16:$dst), (ins GR16:$src1, i16i8imm:$src2),

215                           "imul{w}\t{$src2,$src1, $dst|$dst, $src1, $src2}",

216                           [(setGR16:$dst, EFLAGS,

217                                 (X86smul_flagGR16:$src1, i16immSExt8:$src2))],

218                                 IIC_IMUL16_RRI>,OpSize16;

这两个def有两个基类,除了Ii8与li16,还有OpSize16。在X86InstrFormats.td里有两个OpSize16,一个是class,另一个是def。这里用的是class的版本。Def是不允许作为基类的(类似于加了final的class)。OpSize16表示使用32位操作数时(默认是16位)指令需要0x66的前缀(操作数大小更改前缀),显然这也是给反汇编器用的。

注意,DIVXr的定义中都没有指定Pattern,这意味着DIVXr无需匹配。这是怎么做到的呢?秘密就X86DAGToDAGISel::Select。这个方法处理X86目标机器不适用通用指令选择的特定类型节点。比如,它将ISD::SDIVREM匹配为DIVXr(而ISD::SDIVREM,则是在合法阶段,从ISD::SDIV类型SDNode对象得到的。而ISD::SDIV类型SDNode对象又是从LLVMIR,通过visitSDiv方法生成的。这真是一个漫长的过程,我们将花上很长的篇幅来谈论它。)

而IMUL16rri定义中匹配模式则这样解读:因为匹配模式的操作符是set,因此最里层dag值被解释为源模式(要匹配的),而(IMUL16rriGR16:$src1, imm:$src2)称为目标模式(匹配后,要产生这样结果的目标机器DAG,在处理这个定义时,TableGen会生成这个定义)。

另外,X86smul_flag是X86目标机器特定的,这个匹配模式不适合从LLVMIR直接生成的SDNode对象,因此,在X86InstrCompiler.td,还有一个这样的匿名Pat定义:

def : Pat<(mulGR16:$src1, imm:$src2), (IMUL16rri GR16:$src1, imm:$src2)>;

这个定义将从LLVM IR生成的通用DAG形式直接匹配为IMULX定义的目标模式。

2.2.8.  指令展开的例子

对于IMUL16rri,展开后是这个样子:

defIMUL16rri {         // Instruction X86InstIi16 OpSize16

  Domain X86Inst:d = GenericDomain;

  string Namespace = "X86";

  dag OutOperandList = (outs GR16:$dst);

  dag InOperandList = (ins GR16:$src1,i16imm:$src2);

  string AsmString = "imul{w}               {$src2, $src1, $dst|$dst, $src1,$src2}";

  list<dag> Pattern = [(set GR16:$dst,EFLAGS, (X86smul_flag GR16:$src1, imm:$src2))];

  list<Register> Uses = [];

  list<Register> Defs = [EFLAGS];

  list<Predicate> Predicates = [];

  int Size = 0;

  string DecoderNamespace = "";

  int CodeSize = 3;

  int AddedComplexity = 0;

  bit isReturn = 0;

  bit isBranch = 0;

  bit isIndirectBranch = 0;

  bit isCompare = 0;

  bit isMoveImm = 0;

  bit isBitcast = 0;

  bit isSelect = 0;

  bit isBarrier = 0;

  bit isCall = 0;

  bit canFoldAsLoad = 0;

  bit mayLoad = ?;

  bit mayStore = ?;

  bit isConvertibleToThreeAddress = 0;

  bit isCommutable = 0;

  bit isTerminator = 0;

  bit isReMaterializable = 0;

  bit isPredicable = 0;

  bit hasDelaySlot = 0;

  bit usesCustomInserter = 0;

  bit hasPostISelHook = 0;

  bit hasCtrlDep = 0;

  bit isNotDuplicable = 0;

  bit isConvergent = 0;

  bit isAsCheapAsAMove = 0;

  bit hasExtraSrcRegAllocReq = 0;

  bit hasExtraDefRegAllocReq = 0;

  bitisRegSequence = 0;

  bit isPseudo = 0;

  bit isExtractSubreg = 0;

  bit isInsertSubreg = 0;

  bit hasSideEffects = ?;

  bit isCodeGenOnly = 0;

  bit isAsmParserOnly = 0;

  InstrItinClass Itinerary = IIC_IMUL16_RRI;

  list<SchedReadWrite> SchedRW = [WriteIMul];

  string Constraints = "";

  string DisableEncoding = "";

  string PostEncoderMethod = "";

  string DecoderMethod = "";

  bits<64> TSFlags = { 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0,1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,0, 0, 0, 1, 0, 1 };

  string AsmMatchConverter = "";

  string TwoOperandAliasConstraint ="";

  bit UseNamedOperandTable = 0;

  bits<8> Opcode = { 0, 1, 1, 0, 1, 0, 0,1 };

  Format Form = MRMSrcReg;

  bits<7> FormBits = { 0, 0, 0, 0, 1, 0,1 };

  ImmType ImmT = Imm16;

  bit ForceDisassemble = 0;

  OperandSize OpSize = OpSize16;

  bits<2> OpSizeBits = { 0, 1 };

  AddressSize AdSize = AdSizeX;

  bits<2> AdSizeBits = { 0, 0 };

  Prefix OpPrefix = NoPrfx;

  bits<3> OpPrefixBits = { 0, 0, 0 };

  Map OpMap = OB;

  bits<3> OpMapBits = { 0, 0, 0 };

  bit hasREX_WPrefix = 0;

  FPFormat FPForm = NotFP;

  bit hasLockPrefix = 0;

  Domain ExeDomain = GenericDomain;

  bit hasREPPrefix = 0;

  Encoding OpEnc = EncNormal;

  bits<2> OpEncBits = { 0, 0 };

  bit hasVEX_WPrefix = 0;

  bit hasVEX_4V = 0;

  bit hasVEX_4VOp3 = 0;

  bit hasVEX_i8ImmReg = 0;

  bit hasVEX_L = 0;

  bit ignoresVEX_L = 0;

  bit hasEVEX_K = 0;

  bit hasEVEX_Z = 0;

  bit hasEVEX_L2 = 0;

  bit hasEVEX_B = 0;

  bits<3> CD8_Form = { 0, 0, 0 };

  int CD8_EltSize = 0;

  bit has3DNow0F0FOpcode = 0;

  bit hasMemOp4Prefix = 0;

  bit hasEVEX_RC = 0;

  bits<2> EVEX_LL = { 0, 0 };

  bits<7> VectSize = { 0, 0, 1, 0, 0, 0,0 };

  bits<7> CD8_Scale = { 0, 0, 0, 0, 0, 0,0 };

  string NAME = ?;

}

在这个定义里许多相关的定义被整合进来,包括对PatFrag的内联,这将在下面描述。

0 0
原创粉丝点击