LLVM学习笔记（7）

来源：互联网发布：网络时代知乎编辑：程序博客网时间：2024/06/05 07:08

2.2.6. 调度信息

在Instruction定义430行的Itinerary以及433行的SchedRW用于描述指令调度的信息。

其中Itinerary是从指令执行步骤方面来描述指令。目标机器从InstrItinClass派生对应指令的定义。对像X86这样指令复杂且版本繁多的处理器来说，需要定义的InstrItinClass派生定义数量众多。它们都在X86Schedule.td里，几乎每条（类）指令对应一个InstrItinClass定义。注意这些定义实际上是给Atom这样的顺序流水线机器使用的，因此它们不支持的指令就无需定义对应的InstrItinClass（比如AVX指令集）。举例而言，除法的InstrItinClass定义是这样的：

167 // div

168 def IIC_DIV8_MEM : InstrItinClass;

169 def IIC_DIV8_REG : InstrItinClass;

170 def IIC_DIV16 : InstrItinClass;

171 def IIC_DIV32 : InstrItinClass;

172 defIIC_DIV64 : InstrItinClass;

指令执行的一个流水线步骤则由InstrStage来描述：

57 classInstrStage<int cycles, list<FuncUnit> units,

58 int timeinc = -1,

59 ReservationKind kind =Required> {

60 int Cycles = cycles; // length ofstage in machine cycles

61 list<FuncUnit> Units = units; // choice offunctional units

62 int TimeInc = timeinc; // cycles tillstart of next stage

63 int Kind = kind.Value; // kind of FUreservation

64 }

Cycles代表完成这个这个步骤（阶段）所需的周期数。Units代表用于完成该阶段的功能单元的选择。比如，IntUnit1，IntUnit2。TimeInc表示在执行步骤中，从这个阶段的开始到下个阶段的开始，需要经历多少个周期。例如：可以两种方式之一来指明一个阶段：

InstrStage<1, [FU_x, FU_y]> - TimeInc缺省为Cycles

InstrStage<1, [FU_x, FU_y], 0> - 显式指定TimeInc

一条（类）指令如何在（顺序）流水线中执行，则需要InstrItinData派生定义将InstrItinClass与stages绑定起来。

110 classInstrItinData<InstrItinClass Class,list<InstrStage> stages,

111 list<int>operandcycles = [],

112 list<Bypass> bypasses= [], int uops = 1> {

113 InstrItinClass TheClass = Class;

114 int NumMicroOps = uops;

115 list<InstrStage> Stages = stages;

116 list<int> OperandCycles =operandcycles;

117 list<Bypass> Bypasses = bypasses;

118 }

NumMicroOps代表该类指令解码后的微操作（micro-operation）的数量。如果数量是0，意味着该指令可以解码为需要动态确定的、数量不定的微操作。这直接关系到执行步骤限制每周期可发布的微操作数的全局IssueWidth属性。

OperandCycles是可选的“周期数”。它们指出在指令发出这些周期后，指定的操作数完成写或读。

Bypasses是可选的“流水线转发路径”（即处理器将执行写入操作指令的结果直接交给后续的读操作指令，绕过寄存器的接力），如果在一条指令的值在一个特定旁路上可用，且另一条指令可以从这个旁路读出这个值，那么操作数的使用时延降低1个周期。

那么在这个例子里：

InstrItinData<IIC_iLoad_i , [InstrStage<1,[A9_Pipe1]>,

InstrStage<1,[A9_AGU]>],

[3, 1],[A9_LdBypass]>,

InstrItinData<IIC_iMVNr, [InstrStage<1,[A9_Pipe0, A9_Pipe1]>],

[1, 1],[NoBypass, A9_LdBypass]>,

IIC_iLoadi类别指令在发出后的周期1上读入输入，在周期3这次载入的结果可用。这个结果可以通过转发路径A9_LdBypass得到。如果IIC_iMVNr类别指令的第一个源操作数使用它，那么操作数时延减少1。

但对于具有乱序执行能力的处理器（即一周期里可以执行多条指令），比如SandyBridge即后续架构的处理器，这样的描述难以利用处理器提供的指令级并行性。这时就要使用Instruction定义433行的SchedRW。这是一个列表，与指令的输入、输出参数对应，描述该指令处理这些操作数时如何占用处理器资源。这样在使用资源不冲突，且没有依赖关系时，多条指令可以并发执行。这部分细节我们在后面的章节再来讨论。

注意，对很多指令而言，同时使用了Itinerary与SchedRW来描述调度。但实际上，哪个起作用，则取决于实际的目标机器。下面我们将会看到，对Atom处理器定义了一系列的InstrItinData定义，在Atom处理器上Itinerary将用于调度。而在Sandy Bridge则给出了一系列SchedReadWrite定义，指令通过SchedRW来调度。

2.2.7. X86指令的定义

Instruction的定义是目标机器无关的。因此，目标机器几乎总是从Instruction派生出自己所需要的指令定义。以X86为例，X86有一个庞大复杂的芯片族，因此定义了自己一个在Instruction基础上极大扩展了的基类X86Inst（X86InstrFormats.td）：

220 classX86Inst<bits<8> opcod, Format f, ImmType i,dag outs, dag ins,

221 string AsmStr,

222 InstrItinClass itin,

223 Domain d = GenericDomain>

224 : Instruction {

225 let Namespace= "X86";

226

227 bits<8> Opcode = opcod;

228 Format Form = f;

229 bits<7> FormBits = Form.Value;

230 ImmType ImmT = i;

231

232 dagOutOperandList = outs;

233 dagInOperandList = ins;

234 string AsmString = AsmStr;

235

236 // If this is apseudo instruction, mark it isCodeGenOnly.

237 letisCodeGenOnly = !eq(!cast<string>(f),"Pseudo");

238

239 let Itinerary= itin;

240

241 //

242 // Attributesspecific to X86 instructions...

243 //

244 bit ForceDisassemble = 0;// Force instruction to disassemble even though it's

245 // isCodeGenonly.Needed to hide an ambiguous

246 // AsmString from theparser, but still disassemble.

247

248 OperandSize OpSize = OpSizeFixed;// Does this instruction's encoding change

249 // based onoperand size of the mode?

250 bits<2> OpSizeBits = OpSize.Value;

251 AddressSize AdSize = AdSizeX;// Does this instruction's encoding change

252 // based onaddress size of the mode?

253 bits<2> AdSizeBits = AdSize.Value;

254

255 Prefix OpPrefix = NoPrfx;// Which prefix byte does this inst have?

256 bits<3> OpPrefixBits = OpPrefix.Value;

257 Map OpMap = OB; // Whichopcode map does this inst have?

258 bits<3> OpMapBits = OpMap.Value;

259 bit hasREX_WPrefix = 0; // Does this inst require the REX.W prefix?

260 FPFormat FPForm = NotFP; // What flavor ofFP instruction is this?

261 bit hasLockPrefix = 0; // Does thisinst have a 0xF0 prefix?

262 Domain ExeDomain = d;

263 bit hasREPPrefix = 0; // Does thisinst have a REP prefix?

264 Encoding OpEnc = EncNormal;// Encoding used by this instruction

265 bits<2> OpEncBits = OpEnc.Value;

266 bit hasVEX_WPrefix = 0; // Does thisinst set the VEX_W field?

267 bit hasVEX_4V = 0; // Doesthis inst require the VEX.VVVV field?

268 bit hasVEX_4VOp3 = 0; // Does thisinst require the VEX.VVVV field to

269 // encode the thirdoperand?

270 bit hasVEX_i8ImmReg = 0; // Does this instrequire the last source register

271 // to be encoded in aimmediate field?

272 bit hasVEX_L = 0; // Doesthis inst use large (256-bit) registers?

273 bit ignoresVEX_L = 0; // Does thisinstruction ignore the L-bit

274 bit hasEVEX_K = 0; // Doesthis inst require masking?

275 bit hasEVEX_Z = 0; // Doesthis inst set the EVEX_Z field?

276 bit hasEVEX_L2 = 0; // Does thisinst set the EVEX_L2 field?

277 bit hasEVEX_B = 0; // Doesthis inst set the EVEX_B field?

278 bits<3> CD8_Form = 0; // Compresseddisp8 form - vector-width.

279 // Declare it intrather than bits<4> so that all bits are defined when

280 // assigning tobits<7>.

281 int CD8_EltSize = 0; // Compresseddisp8 form - element-size in bytes.

282 bit has3DNow0F0FOpcode =0;// Wacky 3dNow! encoding?

283 bit hasMemOp4Prefix = 0; // Same bit asVEX_W, but used for swapping operands

284 bit hasEVEX_RC = 0; //Explicitly specified rounding control in FP instruction.

285

286 bits<2> EVEX_LL;

287 let EVEX_LL{0}= hasVEX_L;

288 letEVEX_LL{1} = hasEVEX_L2;

289 // Vector size inbytes.

290 bits<7> VectSize = !shl(16, EVEX_LL);

291

292 // The scalingfactor for AVX512's compressed displacement is either

293 // - the size of a power-of-two number of elements or

294 // - the size of a single element forbroadcasts or

295 // - the total vector size divided by apower-of-two number.

296 // Possiblevalues are: 0 (non-AVX512 inst), 1, 2, 4, 8, 16, 32 and 64.

297 bits<7> CD8_Scale = !if (!eq (OpEnc.Value,EncEVEX.Value),

298 !if (CD8_Form{2},

299 !shl(CD8_EltSize, CD8_Form{1-0}),

300 !if (hasEVEX_B,

301 CD8_EltSize,

302 !srl(VectSize, CD8_Form{1-0}))), 0);

303

304 // TSFlags layoutshould be kept in sync with X86BaseInfo.h.

305 letTSFlags{6-0} = FormBits;

306 letTSFlags{8-7} = OpSizeBits;

307 letTSFlags{10-9} = AdSizeBits;

308 letTSFlags{13-11} = OpPrefixBits;

309 letTSFlags{16-14} = OpMapBits;

310 letTSFlags{17} = hasREX_WPrefix;

311 letTSFlags{21-18} = ImmT.Value;

312 letTSFlags{24-22} = FPForm.Value;

313 letTSFlags{25} = hasLockPrefix;

314 letTSFlags{26} = hasREPPrefix;

315 letTSFlags{28-27} = ExeDomain.Value;

316 letTSFlags{30-29} = OpEncBits;

317 letTSFlags{38-31} = Opcode;

318 letTSFlags{39} = hasVEX_WPrefix;

319 letTSFlags{40} = hasVEX_4V;

320 let TSFlags{41} = hasVEX_4VOp3;

321 letTSFlags{42} = hasVEX_i8ImmReg;

322 letTSFlags{43} = hasVEX_L;

323 letTSFlags{44} = ignoresVEX_L;

324 letTSFlags{45} = hasEVEX_K;

325 letTSFlags{46} = hasEVEX_Z;

326 letTSFlags{47} = hasEVEX_L2;

327 letTSFlags{48} = hasEVEX_B;

328 // If we run outof TSFlags bits, it's possible to encode this in 3 bits.

329 letTSFlags{55-49} = CD8_Scale;

330 letTSFlags{56} = has3DNow0F0FOpcode;

331 letTSFlags{57} = hasMemOp4Prefix;

332 letTSFlags{58} = hasEVEX_RC;

333 }

在228行的Form指定了指令的格式，相关的定义在X86InstrFormats.td中。它们描述了X86指令的“mod-reg-r/m”字节等的内容。另外248行以下的内容也都是跟指令编码有关的。指令编码与格式不影响指令选择与指令分配，它只对反汇编器的生成有关。因此，这里我们跳过它们。

LLVM从X86Inst又派生了若干类，它们区别主要在X86Inst定义230行的ImmT，即带不带立即数，以及立即数的大小。这些类构成进一步定义指令的基础。我们看两个例子（X86InstrFormats.td）：

340 classI<bits<8>o, Format f, dag outs,dagins, string asm,

341 list<dag>pattern, InstrItinClass itin = NoItinerary,

342 Domain d = GenericDomain>

343 : X86Inst<o, f,NoImm, outs, ins, asm, itin, d> {

344 let Pattern =pattern;

345 let CodeSize= 3;

346 }

347 classIi8 <bits<8> o, Formatf, dag outs, dagins, string asm,

348 list<dag>pattern, InstrItinClass itin = NoItinerary,

349 Domain d = GenericDomain>

350 : X86Inst<o, f, Imm8, outs, ins, asm,itin, d> {

351 let Pattern =pattern;

352 let CodeSize= 3;

353 }

定义“I”是不带立即数的，“Ii8”则带有一个i8类型的立即数。参数itin与指令调度有关，用于描述指令在CPU中的执行路线图，作为基类，默认设为NoItinerary——没有路线图。从这两个类派生出来的def的例子有（X86InstrArithmetic.td）：

295 let hasSideEffects =1in { // so thatwe don't speculatively execute

296 let SchedRW = [WriteIDiv]in{

297 let Defs = [AL, AH, EFLAGS], Uses = [AX]in

298 def DIV8r :I<0xF6, MRM6r, (outs), (insGR8:$src), //AX/r8 = AL,AH

299 "div{b}\t$src", [],IIC_DIV8_REG>;

300 let Defs = [AX, DX, EFLAGS], Uses = [AX,DX]in

301 def DIV16r : I<0xF7, MRM6r, (outs), (ins GR16:$src), // DX:AX/r16 = AX,DX

302 "div{w}\t$src", [],IIC_DIV16>, OpSize16;

303 let Defs = [EAX, EDX, EFLAGS], Uses = [EAX,EDX]in

304 def DIV32r : I<0xF7, MRM6r, (outs), (ins GR32:$src), // EDX:EAX/r32 = EAX,EDX

305 "div{l}\t$src", [],IIC_DIV32>, OpSize32;

306 // RDX:RAX/r64 = RAX,RDX

307 let Defs = [RAX, RDX, EFLAGS], Uses = [RAX,RDX]in

308 def DIV64r : RI<0xF7, MRM6r, (outs), (insGR64:$src),

309 "div{q}\t$src", [],IIC_DIV64>;

310 } //SchedRW

注释解释了这些指令的操作。其中IIC_XXX都是特定的指令执行路线图，留到指令调度再来研究它们。MRM6r则是指令的格式，这是反汇编器需要的。Defs与Uses描述了操作数以外的寄存器的使用情况，Defs定义的是内容会被改变的寄存器集，Uses定义的是内容会被援引的寄存器集。

204 let Defs = [EFLAGS]in {

205 let SchedRW = [WriteIMul]in{

206 // Register-Integer Signed Integer Multiply

207 def IMUL16rri : Ii16<0x69, MRMSrcReg, //GR16 = GR16*I16

208 (outsGR16:$dst), (ins GR16:$src1, i16imm:$src2),

209 "imul{w}\t{$src2,$src1, $dst|$dst, $src1, $src2}",

210 [(setGR16:$dst, EFLAGS,

211 (X86smul_flagGR16:$src1, imm:$src2))],

212 IIC_IMUL16_RRI>,OpSize16;

213 def IMUL16rri8 : Ii8<0x6B, MRMSrcReg, //GR16 = GR16*I8

214 (outsGR16:$dst), (ins GR16:$src1, i16i8imm:$src2),

215 "imul{w}\t{$src2,$src1, $dst|$dst, $src1, $src2}",

216 [(setGR16:$dst, EFLAGS,

217 (X86smul_flagGR16:$src1, i16immSExt8:$src2))],

218 IIC_IMUL16_RRI>,OpSize16;

这两个def有两个基类，除了Ii8与li16，还有OpSize16。在X86InstrFormats.td里有两个OpSize16，一个是class，另一个是def。这里用的是class的版本。Def是不允许作为基类的（类似于加了final的class）。OpSize16表示使用32位操作数时（默认是16位）指令需要0x66的前缀（操作数大小更改前缀），显然这也是给反汇编器用的。

注意，DIVXr的定义中都没有指定Pattern，这意味着DIVXr无需匹配。这是怎么做到的呢？秘密就X86DAGToDAGISel::Select。这个方法处理X86目标机器不适用通用指令选择的特定类型节点。比如，它将ISD::SDIVREM匹配为DIVXr（而ISD::SDIVREM，则是在合法阶段，从ISD::SDIV类型SDNode对象得到的。而ISD::SDIV类型SDNode对象又是从LLVMIR，通过visitSDiv方法生成的。这真是一个漫长的过程，我们将花上很长的篇幅来谈论它。）

而IMUL16rri定义中匹配模式则这样解读：因为匹配模式的操作符是set，因此最里层dag值被解释为源模式（要匹配的），而(IMUL16rriGR16:$src1, imm:$src2)称为目标模式（匹配后，要产生这样结果的目标机器DAG，在处理这个定义时，TableGen会生成这个定义）。

另外，X86smul_flag是X86目标机器特定的，这个匹配模式不适合从LLVMIR直接生成的SDNode对象，因此，在X86InstrCompiler.td，还有一个这样的匿名Pat定义：

def : Pat<(mulGR16:$src1, imm:$src2), (IMUL16rri GR16:$src1, imm:$src2)>;

这个定义将从LLVM IR生成的通用DAG形式直接匹配为IMULX定义的目标模式。

2.2.8. 指令展开的例子

对于IMUL16rri，展开后是这个样子：

defIMUL16rri { // Instruction X86InstIi16 OpSize16

Domain X86Inst:d = GenericDomain;

string Namespace = "X86";

dag OutOperandList = (outs GR16:$dst);

dag InOperandList = (ins GR16:$src1,i16imm:$src2);

string AsmString = "imul{w} {$src2, $src1, $dst|$dst, $src1,$src2}";

list<dag> Pattern = [(set GR16:$dst,EFLAGS, (X86smul_flag GR16:$src1, imm:$src2))];

list<Register> Uses = [];

list<Register> Defs = [EFLAGS];

list<Predicate> Predicates = [];

int Size = 0;

string DecoderNamespace = "";

int CodeSize = 3;

int AddedComplexity = 0;

bit isReturn = 0;

bit isBranch = 0;

bit isIndirectBranch = 0;

bit isCompare = 0;

bit isMoveImm = 0;

bit isBitcast = 0;

bit isSelect = 0;

bit isBarrier = 0;

bit isCall = 0;

bit canFoldAsLoad = 0;

bit mayLoad = ?;

bit mayStore = ?;

bit isConvertibleToThreeAddress = 0;

bit isCommutable = 0;

bit isTerminator = 0;

bit isReMaterializable = 0;

bit isPredicable = 0;

bit hasDelaySlot = 0;

bit usesCustomInserter = 0;

bit hasPostISelHook = 0;

bit hasCtrlDep = 0;

bit isNotDuplicable = 0;

bit isConvergent = 0;

bit isAsCheapAsAMove = 0;

bit hasExtraSrcRegAllocReq = 0;

bit hasExtraDefRegAllocReq = 0;

bitisRegSequence = 0;

bit isPseudo = 0;

bit isExtractSubreg = 0;

bit isInsertSubreg = 0;

bit hasSideEffects = ?;

bit isCodeGenOnly = 0;

bit isAsmParserOnly = 0;

InstrItinClass Itinerary = IIC_IMUL16_RRI;

list<SchedReadWrite> SchedRW = [WriteIMul];

string Constraints = "";

string DisableEncoding = "";

string PostEncoderMethod = "";

string DecoderMethod = "";

bits<64> TSFlags = { 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0,1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,0, 0, 0, 1, 0, 1 };

string AsmMatchConverter = "";

string TwoOperandAliasConstraint ="";

bit UseNamedOperandTable = 0;

bits<8> Opcode = { 0, 1, 1, 0, 1, 0, 0,1 };

Format Form = MRMSrcReg;

bits<7> FormBits = { 0, 0, 0, 0, 1, 0,1 };

ImmType ImmT = Imm16;

bit ForceDisassemble = 0;

OperandSize OpSize = OpSize16;

bits<2> OpSizeBits = { 0, 1 };

AddressSize AdSize = AdSizeX;

bits<2> AdSizeBits = { 0, 0 };

Prefix OpPrefix = NoPrfx;

bits<3> OpPrefixBits = { 0, 0, 0 };

Map OpMap = OB;

bits<3> OpMapBits = { 0, 0, 0 };

bit hasREX_WPrefix = 0;

FPFormat FPForm = NotFP;

bit hasLockPrefix = 0;

Domain ExeDomain = GenericDomain;

bit hasREPPrefix = 0;

Encoding OpEnc = EncNormal;

bits<2> OpEncBits = { 0, 0 };

bit hasVEX_WPrefix = 0;

bit hasVEX_4V = 0;

bit hasVEX_4VOp3 = 0;

bit hasVEX_i8ImmReg = 0;

bit hasVEX_L = 0;

bit ignoresVEX_L = 0;

bit hasEVEX_K = 0;

bit hasEVEX_Z = 0;

bit hasEVEX_L2 = 0;

bit hasEVEX_B = 0;

bits<3> CD8_Form = { 0, 0, 0 };

int CD8_EltSize = 0;

bit has3DNow0F0FOpcode = 0;

bit hasMemOp4Prefix = 0;

bit hasEVEX_RC = 0;

bits<2> EVEX_LL = { 0, 0 };

bits<7> VectSize = { 0, 0, 1, 0, 0, 0,0 };

bits<7> CD8_Scale = { 0, 0, 0, 0, 0, 0,0 };

string NAME = ?;

}

在这个定义里许多相关的定义被整合进来，包括对PatFrag的内联，这将在下面描述。

0 0