JIT File description

来源:互联网 发布:大数据产生的背影 题 编辑:程序博客网 时间:2024/04/25 22:50

ARM SFX PORTING:https://bugs.webkit.org/show_bug.cgi?id=24986


Q:Can't understand difference between jit/JITArithmetic32_64 and jit/JITArithmetic.
Both of them are implementing JIT class. So when program calls functions from JITArithmetic and when JITArithmetic32_64.


A:The JIT class is quite large, and so its implementation is broken up into multiple files.  The manner in which it is broken up is somewhat arbitrary, with the main goal being to not have any outrageously large .cpp files.  Here's the intuition I use for finding what I want:


JIT.cpp: the harness that runs the JIT and a few utility functions that are not related to any particular bytecode opcode.
JITArithmetic.cpp: implementation of arithmetic instructions.
JITCall.cpp: implementation of calling convention, including call instructions.
JITPropertyAccess.cpp: implementation of heap access instructions.
JITOpcodes.cpp: implementation of miscellaneous instructions that don't fit into the above.

But there is also a second split: the JIT can either be generating code that uses the 32_64 value representation, or the 64-bit value representation.  These two representations require two completely different code generators and most of the code in the JIT is specialized on one or the other.  Hence any file that is marked 32_64 (like JITArithmetic32_64) contains implementations for the 32_64 value representation, whereas files that d


SH4Assembler.h: This contains only C++ functions that deal with the SH4 instruction set.

MacroAsseblerSH4.h:This contains a simple class wrapping pointers.

MacroAssembler.h: This assembler classes is that they very simply provide an interface to encode the machine instructions that are available on the target processor.

The MacroAssembler wraps the architecture specific assembler in a common higher level interface, providing the facilities as necessary to support its full API, and providing a richer programming interface.
It is already the case the the MacroAssembler must handle situations in which an immediate operand to an instruction may not fit within the range of values supported on a given platform, and in these situations it will load the value into a register.  Currently it's only option for doing so is to use immediate moves, but having the option of utilizing a constant pool also available here would be entirely complementary to existing mechanisms.
The AbstractMacroAssembler already has a size() method to check how much has currently been written to the buffer, it should just be able to use this to check how much code has been planted since a previous point.
AssemblerBufferWithConstantPool.h:
1.- CP should live within a separate AssemberBuffer.There will be an AssemblerBuffer without CP feature, and a other one (AssemblerBufferWithCP) with CP. This solution is very simple. Each architecture can decide which solution is suitable for them. The problem is here that the functionality of AssemblerBuffer is repeated in AssemblerBufferWithCP as well (code duplication, copy&paste code, etc.)

4.CP(ConstantPool) should live in a separate file (derived from AssemblerBuffer)This solution is very similar to the case 1). CP lives in a separate file (AssemblerBufferWithCP) where the constant pool interface have been implemented, and this class is derived from AssemblerBuffer. In this case the pubic interface of AssemblerBufferWithCP calls AssemblerBuffer interface considering the flushing.

MacroAssembler does not know anything about how many bytes are written to the buffer when a function of the Assembler is called.
There should be a one-to-one mapping between calls to the Assembler and instructions planted, and as such the assembler can assume that the number of bytes written by an assembler call will be whatever the maximum instruction width is for that platform (4, in the case of ARM).
There are three reasons that currently envisage we may require a constant pool on other platforms:(1) For immediate values that cannot be represented within instruction operands.  For example, floating-point and vector values on x86 (& x86-64, ARMv7).(2) For values that can be (and currently are being) encoded in the instruction stream, but that might be more efficient if accessed via a constant pool.  E.g. on x86-64 it *might* be more efficient to place some pointer immediates in a constant pool (these are currently loaded via second move immediate instruction), and on arm-v7 32-bit immediate values are loaded with a 2 instruction sequence, again in some cases it may make sense to switch this to a data load.(3) Moving values that are patched out of the instruction stream will remove the need to i-cache flush on modification, and by organizing constant-pool into page sized units we may be able to avoid changing protections when running with NX support.  This may be useful at some point in the future.




AssemblerBuffer{.h,.cpp}:Basically, this is where the generated machine code instructions are stored.

JIT{.h,.cpp}: This converts byte code to machine instructions. 
instruction formatting mechanism
instruction selection policy


We have some barriers which have to be solved to generate machine code on ARM:1. Constant pool:On ARM, only those constants can be fit into an instruction which can be produced from a 8 bits long number rotated twice by a 4 bit unsigned integer. For example 257 (0x101) does not fit. This implies that the constant pool is needed. There are two possible solutions. (1) Leave the constant on CallFrame, or (2) insert them into the machine code. The first solution is not satisfactory, because on ARM the space is limited (12 bit) to represent an offset from the base. If the offset does not fit, more instructions are needed to access the address. In the second solution, we have to put extra code which jumps through the constant pool. This case is still better than the first one, because it generates much less machine code. Additionally we have to generate other constants as well, like addresses of stub and CTI functions, far jumps, etc. So, those extra constants have to be stored in the constant pool too (not on CallFrame).2. Branch optimization:There is an option to use faster and more predictable branches. Initially, wegenerate all branches with PC relative load which uses one constant address and one machine instruction. After the JIT code generation is finished, we can replace the PC relative load with a simple branch instruction if the address fits in the instruction. This replace saves space and times as well. All the above implies that the compilation should have a second phase to do the replacing.3. Exception handling:On ARM, we cannot follow the strategy of the exception handling used on x86. Wecannot regenerate identical instructions sequences when an exception happens, because relative calls and constants in the pool are not necessarily the same (as described above, see items 1 and 2). The return address manipulation is solved through stub functions.4. Conditional field:All ARM instructions are conditionally executed. So, we can avoid short jumps(cmp and jmp combo). See a really small example at 'op_construct_verify'. On ARM, we can combine the two jumps using the conditional field.5. Dynamically generated stub functions:We eliminated all inline assembly codes from the source code and replaced them with dynamically generated JIT codes. We generate the following code snipets dynamically: ABI switch (trampoline in your terminology), return address manipulation for exception handling, and soft math instructions. The most interesting part is the stub functions for return address manipulation. Because of exception handling, we cannot address some CTI functions directly. For those functions, stub functions are generated where we can solve the return address manipulation.6. Bigger patterns worsen performance:At last, if we generate machine code through MacroAssembler we will miss more target specific optimizations. Using a general function of MacroAssembler to generate an operation can often generate useless code. This is similar issue like at super-instructions.I would be the happiest person if we could use MacroAssembler as you said, but I think it is hard task (and it would lead to performance degradation on ARM). Additionally, if we would request an interface change necessary for ARM that would cause performance regression on x86, it would be rejected. :-) However, I am open to make our points of view converge. The truth must lie somewhere inbetween our approaches.


 the layers in the architecture:
The abstract code generation layer (MacroAssembler interface down) is layered like a traditional compiler.  In a traditional compiler, it is common to have an assembler layer completely independent of the compiler (often a separate application).  The compiler takes a source code file, compiles it, and produces an output file of assembly code.  The interface to the assembler is largely made up of a set of assembly code mnemonics from which there is typically a one-to-one mapping to machine instructions (additionally there are assembler directives).Let's look at an example:
arith.c: (compiled will gcc 4.0 -O3 -march=armv6) 
(12345678 = 0xBC614E, 12320768 = 0XBC0000, 24832 = 0X6100, 78 = 0X4E)

int sub(int x) { return x - 12345678; } int mul(int x) { return x * 12345678; }arith.s: (With some .globl & .align directives stripped out to make it more readable.) _sub: sub r0, r0, #12320768 sub r0, r0, #24832 sub r0, r0, #78 bx lr _mul: ldr r3, L5 mul r0, r0, r3 bx lr L5: .long 12345678In the case of the subtract, since the immediate does not fit within an immediate field of a subtract instruction, the C compiler has emitted multiple subtract instructions. In the case of the multiply, the compiler has chosen to store the constant in a constant pool (labelled L5), and has planted a load instruction to load the constant from the pool. These are two different instruction selection strategies, and a compiler may choose (as it has here) to use a mixture of both. The constant-pool is not a mechanism implemented within the assembler, but instead is a one of a number of techniques available to the compiler during instruction selection when generating code for which a single assembly operation is not available. The assembler is not aware of the structure or semantics of the constant pool - it only sees a PC relative load instruction.

Layering the compiler on top of an assembler in this fashion provides a number of benefits.  For the compiler developer, layering the compiler on the assembler separates the instruction selection from the minutiae of machine instruction encoding.  For clients of the compiler providing a well defined language for machine instruction generation is useful if the compiler provides facilities to bypass the higher level language, and directly emit a specific sequence of machine instructions (commonly though use of inline 'asm' statements in C code, and in our JIT through direct use of 'm_assembler').The assembler interface within the JIT is designed to closely mimic that of the assembler layer in a traditional compiler, providing an interface largely made up of the set of assembler mnemonics of the host architecture, with a one-to-one mapping from calls to these functions, to machine instructions being planted.  All instruction selection is performed within the MacroAssembler layer.  The MacroAssembler is a very simple compiler, mapping from one (generic) assembly like language to another (concrete, machine-specific) assembly language.  We refer to it as a MacroAssembler since the capabilities are presently very limited, simply remapping mnemonic names (e.g. addPtr -> addq_rr), and expanding single MacroAssembler calls to multiple assembler operations (facilities typically within the capabilities of traditional macro-assembler languages).  However we certainly expect the sophistication of the MacroAssembler layer to increase as necessary – and we tend to be conservative in our naming, bear in mind that we initially labelled the entire JIT as just a context-threaded-interpreter, a much simpler form of dynamic code generation engine (and we still have the acronym 'cti' scattered through the code as a reminder!).The key difference from the ARM port as it currently stands, and the description above of the design of the layering in the JIT (and in a traditional compiler stack) is that the constant pool mechanism is below the assembler interface.  It seems that correcting this so that it falls within the MacroAssembler layer (from where we can look at how we can generalize to code to share implementation where possible and minimize unnecessary duplication when we introduce constant pools on other platforms) should only require a relatively minor restructuring of your code – it should not require you to change the code your JIT generates, and it should not significantly change the implementation of the constant pool, just shuffle a few methods between files, and change the classes upon which a few variables are declared.





https://bugs.webkit.org/show_bug.cgi?id=24986
原创粉丝点击