A Beginners Guide to SpiderMonkey's MacroAssembler

I recently had cause to start writing a good introduction to SpiderMonkey’s MacroAssembler. I’ve expanded it a bit, and am reposting here. Hopefully this will be the first in a couple of posts, but no promises!

The SpiderMonkey MacroAssembler is the dominant interface to emitting code. For the most part, it tries to provide a hardware agnostic interface to the emission of native machine code.

Almost all of SpiderMonkey JIT compilers, Baseline, IonMonkey, CacheIR for example, end up using the MacroAssembler to generate code.

In IonMonkey's case, JavaScript bytecode is converted to a MIR graph (a high level graph representation of the program), which in turn is lowered into a LIR graph (a lower level representation of the program, with register assignments), and then the LIR graph is visited, with each node type having code generation implemented using MacroAssembler.

If you've never seen anything like the MacroAssembler, it can be a bit baffling! I know I was definitely confused a bit when I started.

Let's use CodeGenerator::visitBooleanToString as a worked example to show what MacroAssembler looks like and how it functions.

visitBooleanToString emits code that converts a boolean value to a string value. It consumes a LIR node of type LBooleanToString, which is the LIR node type for that operation.

void CodeGenerator::visitBooleanToString(LBooleanToString* lir) {
    Register input = ToRegister(lir->input());
    Register output = ToRegister(lir->output());
    const JSAtomState& names = gen->runtime->names();
    Label true_, done;

    masm.branchTest32(Assembler::NonZero, input, input, &true_);
    masm.movePtr(ImmGCPtr(names.false_), output);
    masm.jump(&done);

    masm.bind(&true_);
    masm.movePtr(ImmGCPtr(names.true_), output);

    masm.bind(&done);
}

Let's go through this bit by bit:

  • Register input = ToRegister(lir->input());: So at the top, we have two Register declarations. These correspond to machine registers (so, r11 or eax etc., depending on the architecture). In this case, we are looking at the IonMonkey code generator, and so the choice of which registers to use was made by the IonMonkey register allocator, so we simply take its decision: this is the ToRegister(...) bits.
  • const JSAtomState& names = gen->runtime->names();: This isn't really related to the MacroAssembler, but suffice it to say JSAtomState holds a variety of pre-determined names, and we're interested in the pointers to the true and false names right now.
  • Label true_, done;: Next we have the declaration of two labels. These correspond to the labels you would put in if you were writing assembly by hand. A label when created isn't actually associated with a particular point in the code. That happens when you masm.bind(&label). You can however branch to or jump to a label, even when it has yet to be bound.
  • masm.branchTest32(Assembler::NonZero, input, input, &true_);: This corresponds to a test-and-branch sequence. In assembly, test usually implies you take two arguments, and bitwise and them together, in order to set processor register flags. Effectively this is saying branch to true if input & input != 0.
  • masm.movePtr(ImmGCPtr(names.false_), output); This moves a pointer value into a register. ImmGCPtr is a decoration that indicates a couple of things: First, we're moving the pointer as an Immediate: that is to say, a constant that will be put directly into the code. The GCPtr portion tells the system that this pointer is a GCPtr or a pointer managed by the garbage collector. We need to tell the MacroAssembler about this so it can remember the pointer, and put it in a table for the Garbage Collector so when doing a Moving GC that changes the address of this value, so that the garbage collector can update it.
  • masm.jump(&done);: Un-conditionally jump to the done lable.
  • masm.bind(&true_);: Bind the true label. When something jumps to the true label, we want them to land here in the code stream.
  • masm.movePtr(ImmGCPtr(names.true_), output);: This moves a different pointer into the output register.
  • masm.bind(&done);: Bind the done label.

The way to think of the MacroAssembler is that it's actually outputting code for most of these operations. (Labels turn out to be a bit magical, but it's Ok not to think about it normally).

So, what does this look like in actually emitted code? I added a masm.breakpoint() to just before the branch, and ran the ion tests (../jit-test/jit_test.py --jitflags=all ./dist/bin/js ion). This found me one test case that actually exercised this code path: ../jit-test/jit_test.py --debugger=lldb --jitflags=all ./dist/bin/js ion/bug964229-2.js. I then disassembled the code with the LLDB function dis -s $rip -e $rip+40

I've annotated this with the rough MacroAssembler that generated the code. The addresses that the jumps hit are those that the Labels got bound to. The choice of registers made by Ion, we can infer, to be Register input = edx and Register output = rax.

While our compilers use MacroAssembler today, we're also looking to a future where maybe we use CraneLift for CodeGeneration. This is already being tested for WASM. If using and improving Cranelift is of interest to you, there's a job opening today!

The next blog post about MacroAssembler I'd like t write will cover Addresses and Memory, if I ever get around to it :D