Writing a high-performance JavaScript engine takes more than just having a highly optimising compiler like TurboFan. Particularly for short-lived sessions, like loading websites or command line tools, there’s a lot of work that happens before the optimising compiler even has a chance to start optimising, let alone having time to generate the optimised code.
This is the reason why, since 2016, we’ve moved away from tracking synthetic benchmarks (like Octane) to measuring real-world performance, and why since then we’ve worked hard on the performance of JavaScript outside of the optimising compiler. This has meant work on the parser, on streaming, on our object model, on concurrency in the garbage collector, on caching compiled code… let’s just say we were never bored.
As we turn to improving the performance of the actual initial JavaScript execution, however, we start to hit limitations when optimising our interpreter. V8’s interpreter is highly optimised and very fast, but interpreters have inherent overheads that we can’t get rid of; things like bytecode decoding overheads or dispatch overheads that are an intrinsic part of an interpreter’s functionality.
With our current two-compiler model, we can’t tier up to optimised code much faster; we can (and are) working on making the optimisation faster, but at some point you can only get faster by removing optimisation passes, which reduces peak performance. Even worse, we can’t really start optimising earlier, because we won’t have stable object shape feedback yet.
Enter Sparkplug: our new non-optimising JavaScript compiler we’re releasing with V8 v9.1, which nestles between the Ignition interpreter and the TurboFan optimising compiler.
The new compiler pipeline
Sparkplug is designed to compile fast. Very fast. So fast, that we can pretty much compile whenever we want, allowing us to tier up to Sparkplug code much more aggressively than we can to TurboFan code.
There are a couple of tricks that make the Sparkplug compiler fast. First of all, it cheats; the functions it compiles have already been compiled to bytecode, and the bytecode compiler has already done most of the hard work like variable resolution, figuring out if parentheses are actually arrow functions, desugaring destructuring statements, and so on. Sparkplug compiles from bytecode rather than from JavaScript source, and so doesn’t have to worry about any of that.
The second trick is that Sparkplug doesn’t generate any intermediate representation (IR) like most compilers do. Instead, Sparkplug compiles directly to machine code in a single linear pass over the bytecode, emitting code that matches the execution of that bytecode. In fact, the entire compiler is a switch statement inside a for loop, dispatching to fixed per-bytecode machine code generation functions.
// The Sparkplug compiler (abridged).
for (; !iterator.done(); iterator.Advance()) {
VisitSingleBytecode();
}
The lack of IR means that the compiler has limited optimisation opportunity, beyond very local peephole optimisations. It also means that we have to port the entire implementation separately to each architecture we support, since there’s no intermediate architecture-independent stage. But, it turns out that neither of these is a problem: a fast compiler is a simple compiler, so the code is pretty easy to port; and Sparkplug doesn’t need to do heavy optimisation, since we have a great optimising compiler later on in the pipeline anyway.
Technically, we currently do two passes over the bytecode — one to discover loops, and a second one to generate the actual code. We’re planning on getting rid of the first one eventually though.
Adding a new compiler to an existing mature JavaScript VM is a daunting task. There’s all sorts of things you have to support beyond just standard execution; V8 has a debugger, a stack-walking CPU profiler, there’s stack traces for exceptions, integration into the tier-up, on-stack replacement to optimized code for hot loops… it’s a lot.
Sparkplug does a neat sleight-of-hand that simplifies most of these problems away, which is that it maintains "interpreter-compatible stack frames".
Let’s rewind a bit. Stack frames are how code execution stores function state; whenever you call a new function, it creates a new stack frame for that function’s local variables. A stack frame is defined by a frame pointer (marking its start) and a stack pointer (marking its end):
A stack frame, with stack and frame pointers
At this point, roughly half of you will be screaming, saying "this diagram doesn’t make sense, stacks obviously grow in the opposite direction!". Fear not, I made a button for you:
When a function is called, the return address is pushed to the stack; this is popped off by the function when it returns, to know where to return to. Then, when that function creates a new frame, it saves the old frame pointer on the stack, and sets the new frame pointer to the start of its own stack frame. Thus, the stack has a chain of frame pointers, each marking the start of a frame which points to the previous one: