Skip to content
This repository has been archived by the owner on Aug 2, 2019. It is now read-only.

B3 JIT (WebKit)

Kunshan Wang edited this page Feb 19, 2016 · 5 revisions

The B3 JIT compiler

Original blog: https://webkit.org/blog/5852/introducing-the-b3-jit-compiler/

WebKit's FTL JIT now uses a new backend called B3 (Bare Bone Backend), which replaces LLVM as the low-level optimizer. This layer shares many principles with the Mu micro VM, most notably the role as a low-level substrate of a virtual machine.

This Wiki page will discuss the B3 JIT and its influence to the Mu micro VM.

Comparison

motivation

  • B3: LLVM is powerful but slow. It takes up 3/4 of the optimisation time. B3 should be 10 times faster than LLVM while still operate the same level of granularity as LLVM and do the same kind of optimisations LLVM does.
  • Mu: Mu abstracts over concurrency, JIT and GC, but offloads most optimisations to the client.

size of the IR

  • B3: Focus on the size (memory footprint) of the IRs, and the number of IRs (levels). Much effort is made to design a compact representation that is also fast to traverse and transform.
  • Mu: Not designed with memory footprint in mind. The in-memory representation of the IR is an implementation detail.

type system

  • B3 IR: void, int32, int64, float, double. No heap references. No aggregate types. Integers are used as addresses.
  • Mu IR: Has object references, pointers, aggregate types, etc.

basic operations

  • Both B3 and Mu IR have BinOps, Comparisons, Conversions, Branch and Switch. Both B3 and Mu use the goto-with-values form (B3 uses "Upsilon functions" to "assign" values to Phi-nodes.).

Function call

  • B3: Can only call C functions (CCall). Custom calling conventions are possible via Patchpoints.
  • Mu: Can call both Mu functions (CALL) and C functions (CCALL). Mu implementations may support more calling conventions in addition to the C callconv, but the client cannot customise it.

Memory operations

  • B3: Load and Store with either Int32 or Int64 as addresses.
  • Mu: Load, Store, CmpXchg and AtomicRMW with iref<T> or ptr<T>. ptr<T> can be converted from int<n>.

GC

  • B3: Does not abstract over GC.
  • Mu: It does.

B3's patchpoint

In short, patchpoint is like inline assembly: it lets the client insert arbitrary machine code into the B3 IR.

Unlike LLVM, B3 patchpoints integrate deeply into the compiler, giving the client a MacroAssembler to populate the code and not limiting the size of patchpoints. Like GCC's inline assembly, patchpoints can also specify which SSA values (in B3 IR) are accessible in the machine code, and specify which machine registers those values should be in.

Patchpoint is a powerful mechanism. With the power to insert arbitrary machine code, the client can implement many advanced features, such as custom calling conventions, self-modifying code, and traps/watchpoints (which the blog refers to as "OSR").

Typically OSR points are placed after a guard. In LLVM and Mu, such pattern has to be expressed as a CMP, a BRANCH and a TRAP. B3 introduced the CheckXxx family of instructions:

  • Patchpoint(keepalives): (unconditionally) Execute custom machine code.
  • Check(cond, keepalives): Execute custom machine code when cond is false.
  • CheckAdd(op1, op2, keepalives): Execute custom machine code when op1+op2 overflows.
  • CheckSub, CheckMul: ditto

To program the patchpoint to be a TRAP, the client just needs to generate machine code that behaves like a TRAP, and use its arguments as keep-alive variables. The Check and CheckXxx instructions do not break the basic block, and are not allowed to fall through once the abnormal path is taken.

Patchpoints also allows reserving some NOPs so that it can be patched to a JMP or INT later. The original blog considers "code-patching" and "checking" simply as two ways to implement traps (JavaScript is single-threaded), while Mu consider WATCHPOINT as a mechanism to trigger traps between different threads and is not supposed to be used to trap the thread itself.

Influence to Mu

Conditional traps without breaking basic blocks

Mu was not designed with minimising the number of basic blocks in mind. The Check in B3 needed to be CMP, BRANCH2 and then TRAP. But a conditional trap can be introduced to the Mu IR: %rv = CONDTRAP %cond <T> KEEPALIVES(...). A flag can be used to indicate this instruction cannot continue: CONDTRAP [#NO_CONT] %cond <> KEEPALIVES(...). That flag will give undefined behaviours if the stack is re-bound without performing OSR to remove the current frame.

The #NO_CONT flag can be helpful for subsequent Mu IR instructions that can benefit from the fact that %cond is true. Currently we don't know any such Mu IR instructions, yet. Most relevant optimisations, such as eliminating NULL checks, can be done by the client.

Unreachable code

Currently the Mu IR does not have an "unreachable" instruction. The idiom is to throw an exception. But adding an "UNREACHABLE" instruction may be helpful to the backend.

BinOp with overflow handling

Binary operations may take a flag so that it will return additional boolean values to indicate certain events. For example:

(%z %sign_ovf) = ADD [#SIGN_OVF] <@i64> %x %y
CONDTRAP [#NO_CONT] %sign_ovf <> KEEPALIVES(%z)

If signed overflow occured, %sign_ovf will be 1 and the trap will be triggered.

TODO: Create an Issue for BinOp with flags.