The Architecture Of LLVM

来源:互联网 发布:如何优化手机系统 编辑:程序博客网 时间:2024/05/21 12:51

http://aosabook.org/en/llvm.html


This chapter discusses some of the design decisions that shapedLLVM1, an umbrella project that hostsand develops a set of close-knit low-level toolchain components (e.g.,assemblers, compilers, debuggers, etc.), which are designed to becompatible with existing tools typically used on Unix systems. Thename "LLVM" was once an acronym, but is now just a brand for theumbrella project. While LLVM provides some unique capabilities, andis known for some of its great tools (e.g., the Clangcompiler2, a C/C++/Objective-Ccompiler which provides a number of benefits over the GCC compiler),the main thing that sets LLVM apart from other compilers is itsinternal architecture.

From its beginning in December 2000, LLVM was designed as a set ofreusable libraries with well-definedinterfaces [LA04].At the time, open sourceprogramming language implementations were designed as special-purposetools which usually had monolithic executables. For example, it wasvery difficult to reuse the parser from a static compiler (e.g., GCC)for doing static analysis or refactoring. While scripting languagesoften provided a way to embed their runtime and interpreter intolarger applications, this runtime was a single monolithic lump of codethat was included or excluded. There was no way to reuse pieces, andvery little sharing across language implementation projects.

Beyond the composition of the compiler itself, the communitiessurrounding popular language implementations were usually stronglypolarized: an implementation usually providedeither atraditional static compiler like GCC, Free Pascal, and FreeBASIC,or it provided a runtime compiler in the form of aninterpreter or Just-In-Time (JIT) compiler. It was very uncommon tosee language implementation that supported both, and if they did,there was usually very little sharing of code.

Over the last ten years, LLVM has substantially altered thislandscape. LLVM is now used as a common infrastructure to implement abroad variety of statically and runtime compiled languages (e.g., thefamily of languages supported by GCC, Java, .NET, Python, Ruby,Scheme, Haskell, D, as well as countless lesser known languages). Ithas also replaced a broad variety of special purpose compilers, suchas the runtime specialization engine in Apple's OpenGL stack and theimage processing library in Adobe's After Effects product. FinallyLLVM has also been used to create a broad variety of new products,perhaps the best known of which is the OpenCL GPU programming languageand runtime.

11.1. A Quick Introduction to Classical Compiler Design

The most popular design for a traditional static compiler (like most Ccompilers) is the three phase design whose major components are thefront end, the optimizer and the back end(Figure 11.1). The front end parses source code, checkingit for errors, and builds a language-specific Abstract Syntax Tree(AST) to represent the input code. The AST is optionally converted toa new representation for optimization, and the optimizer and back endare run on the code.

[Three Major Components of a Three-Phase Compiler]

Figure 11.1: Three Major Components of a Three-Phase Compiler

The optimizer is responsible for doing a broad variety oftransformations to try to improve the code's running time, such aseliminating redundant computations, and is usually more or lessindependent of language and target. The back end (also known as thecode generator) then maps the code onto the target instruction set.In addition to makingcorrect code, it is responsible forgenerating good code that takes advantage of unusual featuresof the supported architecture. Common parts of a compiler back endinclude instruction selection, register allocation, and instructionscheduling.

This model applies equally well to interpreters and JIT compilers.The Java Virtual Machine (JVM) is also an implementation of thismodel, which uses Java bytecode as the interface between the front endand optimizer.

11.1.1. Implications of this Design

The most important win of this classical design comes when a compilerdecides to support multiple source languages or targetarchitectures. If the compiler uses a common code representation inits optimizer, then a front end can be written for any language thatcan compile to it, and a back end can be written for any target thatcan compile from it, as shown inFigure 11.2.

[Retargetablity]

Figure 11.2: Retargetablity

With this design, porting the compiler to support a new sourcelanguage (e.g., Algol or BASIC) requires implementing a new front end,but the existing optimizer and back end can be reused. If these partsweren't separated, implementing a new source language would requirestarting over from scratch, so supporting N targets andM source languages would need N*M compilers.

Another advantage of the three-phase design (which follows directlyfrom retargetability) is that the compiler serves a broader set ofprogrammers than it would if it only supported one source language andone target. For an open source project, this means that there is alarger community of potential contributors to draw from, whichnaturally leads to more enhancements and improvements to the compiler.This is the reason why open source compilers that serve manycommunities (like GCC) tend to generate better optimized machine codethan narrower compilers like FreePASCAL. This isn't the case forproprietary compilers, whose quality is directly related to theproject's budget. For example, the Intel ICC Compiler is widely knownfor the quality of code it generates, even though it serves a narrowaudience.

A final major win of the three-phase design is that the skillsrequired to implement a front end are different than those required forthe optimizer and back end. Separating these makes it easier for a"front-end person" to enhance and maintain their part of thecompiler. While this is a social issue, not a technical one, itmatters a lot in practice, particularly for open source projectsthat want to reduce the barrier to contributing as much as possible.

11.2. Existing Language Implementations

While the benefits of a three-phase design are compelling andwell-documented in compiler textbooks, in practice it is almost neverfully realized. Looking across open source language implementations(back when LLVM was started), you'd find that the implementations ofPerl, Python, Ruby and Java share no code. Further, projects like theGlasgow Haskell Compiler (GHC) and FreeBASIC are retargetable tomultiple different CPUs, but their implementations are very specificto the one source language they support. There is also a broadvariety of special purpose compiler technology deployed to implementJIT compilers for image processing, regular expressions, graphics carddrivers, and other subdomains that require CPU intensive work.

That said, there are three major success stories for this model, thefirst of which are the Java and .NET virtual machines. These systemsprovide a JIT compiler, runtime support, and a very well definedbytecode format. This means that any language that can compile to thebytecode format (and there are dozens ofthem3)can take advantage of the effort put into the optimizer and JIT aswell as the runtime. The tradeoff is that these implementationsprovide little flexibility in the choice of runtime: they botheffectively force JIT compilation, garbage collection, and the use ofa very particular object model. This leads to suboptimal performancewhen compiling languages that don't match this model closely, such asC (e.g., with the LLJVM project).

A second success story is perhaps the most unfortunate, butalso most popular way to reuse compiler technology: translate the inputsource to C code (or some other language) and send it through existingC compilers. This allows reuse of the optimizer and code generator,gives good flexibility, control over the runtime, and is really easyfor front-end implementers to understand, implement, and maintain.Unfortunately, doing this prevents efficient implementation ofexception handling, provides a poor debugging experience, slows downcompilation, and can be problematic for languages that require guaranteedtail calls (or other features not supported by C).

A final successful implementation of this model is GCC4. GCCsupports many front ends and back ends, and has an active and broadcommunity of contributors. GCC has a long history of being a Ccompiler that supports multiple targets with hacky support for a fewother languages bolted onto it. As the years go by, the GCC communityis slowly evolving a cleaner design. As of GCC 4.4, it has a newrepresentation for the optimizer (known as "GIMPLE Tuples") which iscloser to being separate from the front-end representation thanbefore. Also, its Fortran and Ada front ends use a clean AST.

While very successful, these three approaches have strong limitationsto what they can be used for, because they are designed as monolithicapplications. As one example, it is not realistically possible toembed GCC into other applications, to use GCC as a runtime/JITcompiler, or extract and reuse pieces of GCC without pulling in mostof the compiler. People who have wanted to use GCC's C++ front end fordocumentation generation, code indexing, refactoring, and staticanalysis tools have had to use GCC as a monolithic application thatemits interesting information as XML, or write plugins to injectforeign code into the GCC process.

There are multiple reasons why pieces of GCC cannot be reused aslibraries, including rampant use of global variables, weakly enforcedinvariants, poorly-designed data structures, sprawling code base, andthe use of macros that prevent the codebase from being compiled tosupport more than one front-end/target pair at a time. The hardestproblems to fix, though, are the inherent architectural problems thatstem from its early design and age. Specifically, GCC suffers fromlayering problems and leaky abstractions: the back end walks front-endASTs to generate debug info, the front ends generate back-end datastructures, and the entire compiler depends on global data structuresset up by the command line interface.

11.3. LLVM's Code Representation: LLVM IR

With the historical background and context out of the way, let's diveinto LLVM: The most important aspect of its design is the LLVMIntermediate Representation (IR), which is the form it uses torepresent code in the compiler. LLVM IR is designed to host mid-levelanalyses and transformations that you find in the optimizer section ofa compiler. It was designed with many specific goals in mind,including supporting lightweight runtime optimizations,cross-function/interprocedural optimizations, whole program analysis,and aggressive restructuring transformations, etc. The most importantaspect of it, though, is that it is itself defined as a first classlanguage with well-defined semantics. To make this concrete, here is asimple example of a.ll file:

define i32 @add1(i32 %a, i32 %b) {entry:  %tmp1 = add i32 %a, %b  ret i32 %tmp1}define i32 @add2(i32 %a, i32 %b) {entry:  %tmp1 = icmp eq i32 %a, 0  br i1 %tmp1, label %done, label %recurserecurse:  %tmp2 = sub i32 %a, 1  %tmp3 = add i32 %b, 1  %tmp4 = call i32 @add2(i32 %tmp2, i32 %tmp3)  ret i32 %tmp4done:  ret i32 %b}

This LLVM IR corresponds to this C code, which provides two differentways to add integers:

unsigned add1(unsigned a, unsigned b) {  return a+b;}// Perhaps not the most efficient way to add two numbers.unsigned add2(unsigned a, unsigned b) {  if (a == 0) return b;  return add2(a-1, b+1);}

As you can see from this example, LLVM IR is a low-level RISC-likevirtual instruction set. Like a real RISC instruction set, itsupports linear sequences of simple instructions like add, subtract,compare, and branch. These instructions are in three address form,which means that they take some number of inputs and produce a resultin a different register.5 LLVM IR supports labels and generally looks like a weirdform of assembly language.

Unlike most RISC instruction sets, LLVM is strongly typed with asimple type system (e.g.,i32 is a 32-bit integer, i32**is a pointer to pointer to 32-bit integer) and some details of themachine are abstracted away. For example, the calling convention isabstracted throughcall and ret instructions andexplicit arguments. Another significant difference from machine codeis that the LLVM IR doesn't use a fixed set of named registers, ituses an infinite set of temporaries named with a % character.

Beyond being implemented as a language, LLVM IR is actually defined inthree isomorphic forms: the textual format above, an in-memory datastructure inspected and modified by optimizations themselves, and anefficient and dense on-disk binary "bitcode" format. The LLVMProject also provides tools to convert the on-disk format from text tobinary:llvm-as assembles the textual .ll file into a.bc file containing the bitcode goop andllvm-dis turns a.bc file into a .ll file.

The intermediate representation of a compiler is interesting becauseit can be a "perfect world" for the compiler optimizer: unlike thefront end and back end of the compiler, the optimizer isn't constrainedby either a specific source language or a specific target machine. Onthe other hand, it has to serve both well: it has to be designed to beeasy for a front end to generate and be expressive enough to allowimportant optimizations to be performed for real targets.

11.3.1. Writing an LLVM IR Optimization

To give some intuition for how optimizations work, it is useful towalk through some examples. There are lots of different kinds ofcompiler optimizations, so it is hard to provide a recipe for how tosolve an arbitrary problem. That said, most optimizations follow asimple three-part structure:

  • Look for a pattern to be transformed.
  • Verify that the transformation is safe/correct for the matched instance.
  • Do the transformation, updating the code.

The most trivial optimization is pattern matching on arithmeticidentities, such as: for any integerX, X-X is 0,X-0 is X, (X*2)-X isX. The firstquestion is what these look like in LLVM IR. Some examples are:

⋮    ⋮    ⋮%example1 = sub i32 %a, %a⋮    ⋮    ⋮%example2 = sub i32 %b, 0⋮    ⋮    ⋮%tmp = mul i32 %c, 2%example3 = sub i32 %tmp, %c⋮    ⋮    ⋮

For these sorts of "peephole" transformations, LLVM provides aninstruction simplification interface that is used as utilities byvarious other higher level transformations. These particulartransformations are in theSimplifySubInst function and looklike this:

// X - 0 -> Xif (match(Op1, m_Zero()))  return Op0;// X - X -> 0if (Op0 == Op1)  return Constant::getNullValue(Op0->getType());// (X*2) - X -> Xif (match(Op0, m_Mul(m_Specific(Op1), m_ConstantInt<2>())))  return Op1;…return 0;  // Nothing matched, return null to indicate no transformation.

In this code, Op0 and Op1 are bound to the left and right operands ofan integer subtract instruction (importantly, these identities don'tnecessarily hold for IEEE floating point!). LLVM is implemented inC++, which isn't well known for its pattern matching capabilities(compared to functional languages like Objective Caml), but it doesoffer a very general template system that allows us to implementsomething similar. Thematch function and the m_functions allow us to perform declarative pattern matching operationson LLVM IR code. For example, them_Specific predicate onlymatches if the left hand side of the multiplication is the same as Op1.

Together, these three cases are all pattern matched and the functionreturns the replacement if it can, or a null pointer if no replacementis possible. The caller of this function (SimplifyInstruction)is a dispatcher that does a switch on the instruction opcode,dispatching to the per-opcode helper functions. It is called fromvarious optimizations. A simple driver looks like this:

for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E; ++I)  if (Value *V = SimplifyInstruction(I))    I->replaceAllUsesWith(V);

This code simply loops over each instruction in a block, checking tosee if any of them simplify. If so (becauseSimplifyInstruction returns non-null), it uses thereplaceAllUsesWith method to update anything in the code usingthe simplifiable operation with the simpler form.

11.4. LLVM's Implementation of Three-Phase Design

In an LLVM-based compiler, a front end is responsible for parsing,validating and diagnosing errors in the input code, then translatingthe parsed code into LLVM IR (usually, but not always, by building anAST and then converting the AST to LLVM IR). This IR is optionallyfed through a series of analysis and optimization passes which improvethe code, then is sent into a code generator to produce native machinecode, as shown inFigure 11.3. This is a verystraightforward implementation of the three-phase design, but thissimple description glosses over some of the power and flexibilitythat the LLVM architecture derives from LLVM IR.

[LLVM's Implementation of the Three-Phase Design]

Figure 11.3: LLVM's Implementation of the Three-Phase Design

11.4.1. LLVM IR is a Complete Code Representation

In particular, LLVM IR is both well specified and the onlyinterface to the optimizer. This property means that all you need toknow to write a front end for LLVM is what LLVM IR is, how it works,and the invariants it expects. Since LLVM IR has a first-classtextual form, it is both possible and reasonable to build a front endthat outputs LLVM IR as text, then uses Unix pipes to send it throughthe optimizer sequence and code generator of your choice.

It might be surprising, but this is actually a pretty novel propertyto LLVM and one of the major reasons for its success in a broad rangeof different applications. Even the widely successful and relativelywell-architected GCC compiler does not have this property: its GIMPLEmid-level representation is not a self-contained representation. As asimple example, when the GCC code generator goes to emit DWARF debuginformation, it reaches back and walks the source level "tree" form.GIMPLE itself uses a "tuple" representation for the operations inthe code, but (at least as of GCC 4.5) still represents operands asreferences back to the source level tree form.

The implications of this are that front-end authors need to know andproduce GCC's tree data structures as well as GIMPLE to write a GCCfront end. The GCC back end has similar problems, so they also need toknow bits and pieces of how the RTL back end works as well. Finally,GCC doesn't have a way to dump out "everything representing mycode", or a way to read and write GIMPLE (and the related datastructures that form the representation of the code) in text form.The result is that it is relatively hard to experiment with GCC, andtherefore it has relatively few front ends.

11.4.2. LLVM is a Collection of Libraries

After the design of LLVM IR, the next most importantaspect of LLVM is that it is designed as a set of libraries, rather than as amonolithic command line compiler like GCC or an opaque virtual machinelike the JVM or .NET virtual machines. LLVM is aninfrastructure, a collection of useful compiler technology that can bebrought to bear on specific problems (like building a C compiler, oran optimizer in a special effects pipeline). While one of its mostpowerful features, it is also one of its least understood designpoints.

Let's look at the design of the optimizer as an example: it reads LLVMIR in, chews on it a bit, then emits LLVM IR which hopefully willexecute faster. In LLVM (as in many other compilers) the optimizer isorganized as a pipeline of distinct optimization passes each of whichis run on the input and has a chance to do something. Common examplesof passes are the inliner (which substitutes the body of a functioninto call sites), expression reassociation, loop invariant codemotion, etc. Depending on the optimization level, different passesare run: for example at -O0 (no optimization) the Clang compiler runsno passes, at -O3 it runs a series of 67 passes in its optimizer (asof LLVM 2.8).

Each LLVM pass is written as a C++ class that derives (indirectly)from the Pass class. Most passes are written in a single.cpp file, and their subclass of thePass class isdefined in an anonymous namespace (which makes it completely privateto the defining file). In order for the pass to be useful, codeoutside the file has to be able to get it, so a single function (tocreate the pass) is exported from the file. Here is a slightlysimplified example of a pass to make things concrete.6

namespace {  class Hello : public FunctionPass {  public:    // Print out the names of functions in the LLVM IR being optimized.    virtual bool runOnFunction(Function &F) {      cerr << "Hello: " << F.getName() << "\n";      return false;    }  };}FunctionPass *createHelloPass() { return new Hello(); }

As mentioned, the LLVM optimizer provides dozens of different passes,each of which are written in a similar style. These passes arecompiled into one or more.o files, which are then built into aseries of archive libraries (.a files on Unix systems). Theselibraries provide all sorts of analysis and transformationcapabilities, and the passes are as loosely coupled as possible: theyare expected to stand on their own, or explicitly declare theirdependencies among other passes if they depend on some other analysisto do their job. When given a series of passes to run, the LLVMPassManager uses the explicit dependency information to satisfy thesedependencies and optimize the execution of passes.

Libraries and abstract capabilities are great, but they don't actuallysolve problems. The interesting bit comes when someone wants to builda new tool that can benefit from compiler technology, perhaps a JITcompiler for an image processing language. The implementer of thisJIT compiler has a set of constraints in mind: for example, perhapsthe image processing language is highly sensitive to compile-timelatency and has some idiomatic language properties that are importantto optimize away for performance reasons.

The library-based design of the LLVM optimizer allows our implementerto pick and choose both the order in which passes execute, and whichones make sense for the image processing domain: if everything isdefined as a single big function, it doesn't make sense to waste timeon inlining. If there are few pointers, alias analysis and memoryoptimization aren't worth bothering about. However, despite our bestefforts, LLVM doesn't magically solve all optimization problems!Since the pass subsystem is modularized and the PassManager itselfdoesn't know anything about the internals of the passes, theimplementer is free to implement their own language-specific passes tocover for deficiencies in the LLVM optimizer or to explicitlanguage-specific optimization opportunities.Figure 11.4 shows a simple example for our hypotheticalXYZ image processing system:

[Hypothetical XYZ System using LLVM]

Figure 11.4: Hypothetical XYZ System using LLVM

Once the set of optimizations is chosen (and similar decisions aremade for the code generator) the image processing compiler is builtinto an executable or dynamic library. Since the only reference tothe LLVM optimization passes is the simplecreate functiondefined in each .o file, and since the optimizers live in.a archive libraries, only the optimization passesthatare actually used are linked into the end application, not theentire LLVM optimizer. In our example above, since there is areference to PassA and PassB, they will get linked in. Since PassBuses PassD to do some analysis, PassD gets linked in. However, sincePassC (and dozens of other optimizations) aren't used, its code isn'tlinked into the image processing application.

This is where the power of the library-based design of LLVM comes intoplay. This straightforward design approach allows LLVM to provide avast amount of capability, some of which may only be useful tospecific audiences, without punishing clients of the libraries thatjust want to do simple things. In contrast, traditional compileroptimizers are built as a tightly interconnected mass of code, whichis much more difficult to subset, reason about, and come up to speedon. With LLVM you can understand individual optimizers without knowinghow the whole system fits together.

This library-based design is also the reason why so many peoplemisunderstand what LLVM is all about: the LLVM libraries have manycapabilities, but they don't actuallydo anything by themselves.It is up to the designer of the client of the libraries (e.g., theClang C compiler) to decide how to put the pieces to best use. Thiscareful layering, factoring, and focus on subset-ability is also whythe LLVM optimizer can be used for such a broad range of differentapplications in different contexts. Also, just because LLVM providesJIT compilation capabilities, it doesn't mean that every client usesit.

11.5. Design of the Retargetable LLVM Code Generator

The LLVM code generator is responsible for transforming LLVM IR intotarget specific machine code. On the one hand, it is the codegenerator's job to produce the best possible machine code for anygiven target. Ideally, each code generator should be completelycustom code for the target, but on the other hand, the code generatorsfor each target need to solve very similar problems. For example,each target needs to assign values to registers, and though eachtarget has different register files, the algorithms used should beshared wherever possible.

Similar to the approach in the optimizer, LLVM's code generator splitsthe code generation problem into individual passes—instructionselection, register allocation, scheduling, code layout optimization,and assembly emission—and provides many builtin passes that are runby default. The target author is then given the opportunity to chooseamong the default passes, override the defaults and implementcompletely custom target-specific passes as required. For example,the x86 back end uses a register-pressure-reducing scheduler since ithas very few registers, but the PowerPC back end uses a latencyoptimizing scheduler since it has many of them. The x86 back end usesa custom pass to handle the x87 floating point stack, and the ARMback end uses a custom pass to place constant pool islands insidefunctions where needed. This flexibility allows target authors toproduce great code without having to write an entire code generatorfrom scratch for their target.

11.5.1. LLVM Target Description Files

The "mix and match" approach allows target authors to choose whatmakes sense for their architecture and permits a large amount of codereuse across different targets. This brings up another challenge:each shared component needs to be able to reason about target specificproperties in a generic way. For example, a shared register allocatorneeds to know the register file of each target and the constraintsthat exist between instructions and their register operands. LLVM'ssolution to this is for each target to provide a target description ina declarative domain-specific language (a set of .td files)processed by the tblgen tool. The (simplified) build process for thex86 target is shown inFigure 11.5.

[Simplified x86 Target Definition]

Figure 11.5: Simplified x86 Target Definition

The different subsystems supported by the .td files allowtarget authors to build up the different pieces of their target. Forexample, the x86 back end defines a register class that holds all ofits 32-bit registers named "GR32" (in the.td files, targetspecific definitions are all caps) like this:

def GR32 : RegisterClass<[i32], 32,  [EAX, ECX, EDX, ESI, EDI, EBX, EBP, ESP,   R8D, R9D, R10D, R11D, R14D, R15D, R12D, R13D]> { … }

This definition says that registers in this class can hold 32-bitinteger values ("i32"), prefer to be 32-bit aligned, have thespecified 16 registers (which are defined elsewhere in the.tdfiles) and have some more information to specify preferred allocationorder and other things. Given this definition, specific instructionscan refer to this, using it as an operand. For example, the"complement a 32-bit register" instruction is defined as:

let Constraints = "$src = $dst" indef NOT32r : I<0xF7, MRM2r,               (outs GR32:$dst), (ins GR32:$src),               "not{l}\t$dst",               [(set GR32:$dst, (not GR32:$src))]>;

This definition says that NOT32r is an instruction (it uses theI tblgen class), specifies encoding information (0xF7,MRM2r), specifies that it defines an "output" 32-bit register$dst and has a 32-bit register "input" named $src(the GR32 register class defined above defines which registersare valid for the operand), specifies the assembly syntax for theinstruction (using the{} syntax to handle both AT&T andIntel syntax), specifies the effect of the instruction and providesthe pattern that it should match on the last line. The "let"constraint on the first line tells the register allocator that theinput and output register must be allocated to the same physicalregister.

This definition is a very dense description of the instruction, andthe common LLVM code can do a lot with information derived from it (bythetblgen tool). This one definition is enough forinstruction selection to form this instruction by pattern matching onthe input IR code for the compiler. It also tells the register allocator howto process it, is enough to encode and decode the instruction tomachine code bytes, and is enough to parse and print the instructionin a textual form. These capabilities allow the x86 target to supportgenerating a stand-alone x86 assembler (which is a drop-in replacementfor the "gas" GNU assembler) and disassemblers from the targetdescription as well as handle encoding the instruction for the JIT.

In addition to providing useful functionality, having multiple piecesof information generated from the same "truth" is good for otherreasons. This approach makes it almost infeasible for the assemblerand disassembler to disagree with each other in either assembly syntaxor in the binary encoding. It also makes the target descriptioneasily testable: instruction encodings can be unit tested withouthaving to involve the entire code generator.

While we aim to get as much target information as possible into the.td files in a nice declarative form, we still don't haveeverything. Instead, we require target authors to write some C++ codefor various support routines and to implement any target specificpasses they might need (like X86FloatingPoint.cpp, whichhandles the x87 floating point stack). As LLVM continues to grow newtargets, it becomes more and more important to increase the amount ofthe target that can be expressed in the .td file, and wecontinue to increase the expressiveness of the .td files tohandle this. A great benefit is that it gets easier and easier writetargets in LLVM as time goes on.

11.6. Interesting Capabilities Provided by a Modular Design

Besides being a generally elegant design, modularity provides clientsof the LLVM libraries with several interesting capabilities. Thesecapabilities stem from the fact that LLVM provides functionality, butlets the client decide most of thepolicies on how to use it.

11.6.1. Choosing When and Where Each Phase Runs

As mentioned earlier, LLVM IR can be efficiently (de)serializedto/from a binary format known as LLVM bitcode. Since LLVM IR isself-contained, and serialization is a lossless process, we can dopart of compilation, save our progress to disk, then continue work atsome point in the future. This feature provides a number ofinteresting capabilities including support for link-time andinstall-time optimization, both of which delay code generation from"compile time".

Link-Time Optimization (LTO) addresses the problem where the compilertraditionally only sees one translation unit (e.g., a.c filewith all its headers) at a time and therefore cannot do optimizations(like inlining) across file boundaries. LLVM compilers like Clangsupport this with the-flto or -O4 command line option.This option instructs the compiler to emit LLVM bitcode to the.ofile instead of writing out a native object file, and delayscode generation to link time, shown inFigure 11.6.

[Link-Time Optimization]

Figure 11.6: Link-Time Optimization

Details differ depending on which operating system you're on, but theimportant bit is that the linker detects that it has LLVM bitcode inthe.o files instead of native object files. When it seesthis, it reads all the bitcode files into memory, links them together,then runs the LLVM optimizer over the aggregate. Since the optimizercan now see across a much larger portion of the code, it can inline,propagate constants, do more aggressive dead code elimination, andmore across file boundaries. While many modern compilers support LTO,most of them (e.g., GCC, Open64, the Intel compiler, etc.) do so byhaving an expensive and slow serialization process. In LLVM, LTOfalls out naturally from the design of the system, and works acrossdifferent source languages (unlike many other compilers) because theIR is truly source language neutral.

Install-time optimization is the idea of delaying code generation evenlater than link time, all the way to install time, as shown inFigure 11.7. Install time is a very interesting time(in cases when software is shipped in a box, downloaded, uploaded to amobile device, etc.), because this is when you find out the specificsof the device you're targeting. In the x86 family for example, thereare broad variety of chips and characteristics. By delayinginstruction choice, scheduling, and other aspects of code generation,you can pick the best answers for the specific hardware an applicationends up running on.

[Install-Time Optimization]

Figure 11.7: Install-Time Optimization

11.6.2. Unit Testing the Optimizer

Compilers are very complicated, and quality is important, thereforetesting is critical. For example, after fixing a bug that caused acrash in an optimizer, a regression test should be added to make sureit doesn't happen again. The traditional approach to testing this isto write a .c file (for example) that is run through thecompiler, and to have a test harness that verifies that the compilerdoesn't crash. This is the approach used by the GCC test suite, forexample.

The problem with this approach is that the compiler consists of manydifferent subsystems and even many different passes in the optimizer,all of which have the opportunity to change what the input code lookslike by the time it gets to the previously buggy code in question. Ifsomething changes in the front end or an earlier optimizer, a testcase can easily fail to test what it is supposed to be testing.

By using the textual form of LLVM IR with the modular optimizer, theLLVM test suite has highly focused regression tests that can load LLVMIR from disk, run it through exactly one optimization pass, and verifythe expected behavior. Beyond crashing, a more complicated behavioraltest wants to verify that an optimization is actually performed. Hereis a simple test case that checks to see that the constant propagationpass is working with add instructions:

; RUN: opt < %s -constprop -S | FileCheck %sdefine i32 @test() {  %A = add i32 4, 5  ret i32 %A  ; CHECK: @test()  ; CHECK: ret i32 9}

The RUN line specifies the command to execute: in this case,theopt and FileCheck command line tools. Theopt program is a simple wrapper around the LLVM pass manager,which links in all the standard passes (and can dynamically loadplugins containing other passes) and exposes them through to thecommand line. The FileCheck tool verifies that its standardinput matches a series ofCHECK directives. In this case, thissimple test is verifying that theconstprop pass is folding theadd of 4 and 5 into 9.

While this might seem like a really trivial example, this is verydifficult to test by writing .c files: front ends often do constantfolding as they parse, so it is very difficult and fragile to writecode that makes its way downstream to a constant folding optimizationpass. Because we can load LLVM IR as text and send it through thespecific optimization pass we're interested in, then dump out theresult as another text file, it is really straightforward to testexactly what we want, both for regression and feature tests.

11.6.3. Automatic Test Case Reduction with BugPoint

When a bug is found in a compiler or other client of the LLVMlibraries, the first step to fixing it is to get a test case thatreproduces the problem. Once you have a test case, it is best tominimize it to the smallest example that reproduces the problem, andalso narrow it down to the part of LLVM where the problem happens,such as the optimization pass at fault. While you eventually learnhow to do this, the process is tedious, manual, and particularlypainful for cases where the compiler generates incorrect code but doesnot crash.

The LLVM BugPointtool7 uses the IRserialization and modular design of LLVM to automate this process.For example, given an input.ll or .bc file along with alist of optimization passes that causes an optimizer crash, BugPointreduces the input to a small test case and determines which optimizeris at fault. It then outputs the reduced test case and theoptcommand used to reproduce the failure. It finds this by usingtechniques similar to "delta debugging" to reduce the input and theoptimizer pass list. Because it knows the structure of LLVM IR,BugPoint does not waste time generating invalid IR to input to theoptimizer, unlike the standard "delta" command line tool.

In the more complex case of a miscompilation, you can specify theinput, code generator information, the command line to pass to theexecutable, and a reference output. BugPoint will first determine ifthe problem is due to an optimizer or a code generator, and will thenrepeatedly partition the test case into two pieces: one that is sentinto the "known good" component and one that is sent into the"known buggy" component. By iteratively moving more and more codeout of the partition that is sent into the known buggy code generator,it reduces the test case.

BugPoint is a very simple tool and has saved countless hours of testcase reduction throughout the life of LLVM. No other open sourcecompiler has a similarly powerful tool, because it relies on awell-defined intermediate representation. That said, BugPoint isn'tperfect, and would benefit from a rewrite. It dates back to 2002, andis typically only improved when someone has a really tricky bug totrack down that the existing tool doesn't handle well. It has grownover time, accreting new features (such as JIT debugging) without aconsistent design or owner.

11.7. Retrospective and Future Directions

LLVM's modularity wasn't originally designed to directly achieve anyof the goals described here. It was a self-defense mechanism: it wasobvious that we wouldn't get everything right on the first try. Themodular pass pipeline, for example, exists to make it easier toisolate passes so that they can be discarded after being replaced bybetter implementations8.

Another major aspect of LLVM remaining nimble (and a controversialtopic with clients of the libraries) is our willingness to reconsiderprevious decisions and make widespread changes to APIs withoutworrying about backwards compatibility. Invasive changes to LLVM IRitself, for example, require updating all of the optimization passesand cause substantial churn to the C++ APIs. We've done this onseveral occasions, and though it causes pain for clients, it is theright thing to do to maintain rapid forward progress. To make lifeeasier for external clients (and to support bindings for otherlanguages), we provide C wrappers for many popular APIs (which areintended to be extremely stable) and new versions of LLVM aim tocontinue reading old.ll and .bc files.

Looking forward, we would like to continue making LLVM more modularand easier to subset. For example, the code generator is still toomonolithic: it isn't currently possible to subset LLVM based onfeatures. For example, if you'd like to use the JIT, but have no needfor inline assembly, exception handling, or debug informationgeneration, it should be possible to build the code generator withoutlinking in support for these features. We are also continuouslyimproving the quality of code generated by the optimizer and codegenerator, adding IR features to better support new language andtarget constructs, and adding better support for performing high-levellanguage-specific optimizations in LLVM.

The LLVM project continues to grow and improve in numerous ways. Itis really exciting to see the number of different ways that LLVM isbeing used in other projects and how it keeps turning up in surprisingnew contexts that its designers never even thought about. The newLLDB debugger is a great example of this: it uses theC/C++/Objective-C parsers from Clang to parse expressions, uses theLLVM JIT to translate these into target code, uses the LLVMdisassemblers, and uses LLVM targets to handle calling conventionsamong other things. Being able to reuse this existing code allowspeople developing debuggers to focus on writing the debugger logic, instead ofreimplementing yet another (marginally correct) C++ parser.

Despite its success so far, there is still a lot left to be done, aswell as the ever-present risk that LLVM will become less nimble andmore calcified as it ages. While there is no magic answer to thisproblem, I hope that the continued exposure to new problem domains, awillingness to reevaluate previous decisions, and toredesign and throw away code will help. After all, the goal isn't tobe perfect, it is to keep getting better over time.

Footnotes

  1. http://llvm.org
  2. http://clang.llvm.org
  3. http://en.wikipedia.org/wiki/List_of_JVM_languages
  4. Abackronym that now stands for "GNU Compiler Collection".
  5. This is in contrast to a two-addressinstruction set, like X86, which destructively updates an inputregister, or one-address machines which take one explicit operandand operate on an accumulator or the top of the stack on a stackmachine.
  6. Forall the details, please see Writing an LLVM Pass manual athttp://llvm.org/docs/WritingAnLLVMPass.html.
  7. http://llvm.org/docs/Bugpoint.html
  8. I often say that none of thesubsystems in LLVM are really good until they have been rewritten atleast once.

0 0