Finding hot allocation sites with bpftrace

So, last time I said I was hooked, and that I wrote an allocation profiler.

That’s true! And it was comparatively easy.

Starting from the baseline of SpiderMonkey which sleeps before exit from the last blog post, the profiler looks like this:

BEGIN
{
   printf("Attaching probes for %s", str($1))
}

uprobe:$1:cpp:"*gcprobes::CreateObject*"
{
   // 7 frames as the hook is ususally quite deep, so we want
   // to see user code too. 
   @alloc[ustack(7)] = count()
}

uprobe:$1:cpp:JS_ShutDown
{
   print(@alloc);
}

END {
    // Don't print map on exit.
    clear(@alloc);
}

Simple right? There’s two wrinkles

  1. To get gcprobes working in JIT code, which is where a lot of allocations happen you have to compile with --enable-gcprobes which short circuits some JIT allocation paths. For my purposes this is fine, but is a caution for production workloads.
  2. To make sure the symbols is findable by bpf, I again marked gcprobes::CreateObject as MOZ_NEVER_INLINE.

Smooth sailing. Except this wouldn’t be a fun blog post if there wasn’t a hiccup.

Here’s the output from a random workload:

@alloc[
    js::gc::gcprobes::CreateObject(JSObject*)+0
    js::NativeObject::create(JSContext*, js::gc::AllocKind, js::gc::Heap, JS::Handle<js::SharedShape*>, js::gc::AllocSite*)+694
    0x963f590dd53
    0x963f5954b40
    0x963f5952235
    0x963f595426d
    0x963f5952235
]: 1979955

Well that’s not great. Our call stack is from JIT code, so it’s hard to read.

As far as I can tell there’s not builtin support in bpftrace for this. But SpiderMonkey does have some perf integration in the form of jitdump files, enabled by running with PERF_SPEW_DIR=<dir> and IONPERF=func.

So what if we used that file to map these addresses back? I could not for the life me find a tool that would do this. But it turns out my colleague Markus has already written linux-perf-data, the rust crate, which has support for JitDump parsing!

So with the help of AI... I hacked together [mgaudet/jitdump_filter](https://github.com/mgaudet/jitdump_filter), a totally untested rust tool for querying jitdumps and acting like c++filt for jit addresses.

Now... the readme really says it like it is:

Note: This is not a place of honour. No highly esteemed deed is commemorated here. Nothing valued is here. This code was hacked together with AI and is shared only to allow others to run it, rather than learn from it.

But it works.

I tweaked SpiderMonkey to dump the pid into the log file, then used a hacky bash script to post-process the BPF logs into something with symbols:

#!/bin/bash

BENCHMARKS=(
Air
# ...
)

DUMPER=$HOME/tmp/jitdump_filter

for BENCH in "${BENCHMARKS[@]}"; do
    echo "Running benchmark: $BENCH"
   # use env to allow setting environment variables under bpftrace
   sudo bpftrace ~/objectProfiler.bt /home/matthew/unified-git/obj-opt-shell-nodebug-x86_64-pc-linux-gnu/dist/bin/js -c "env PERF_SPEW_DIR=/tmp/perf/ IONPERF=func /home/matthew/unified-git/obj-opt-shell-nodebug-x86_64-pc-linux-gnu/dist/bin/js cli.js $BENCH"  | tee "allocs/${BENCH}-allocs"
   # Each file has a xxxpid=<pid> line inside; grab the pid:
   PID=$(cat allocs/${BENCH}-allocs | grep xxxpid | head -n 1 | sed 's/.*xxxpid=\([0-9]*\).*/\1/')
   echo "PID: $PID"

    echo 'Running cat allocs/${BENCH}-allocs | $DUMPER /tmp/perf/jit-${PID}.dump > allocs/${BENCH}-allocs-annotated.txt'
   cat allocs/${BENCH}-allocs | $DUMPER /tmp/perf/jit-${PID}.dump | tee allocs/${BENCH}-allocs-annotated.txt
done

Gross. But it worked great!

@alloc[
    js::gc::gcprobes::CreateObject(JSObject*)+0
    js::NativeObject::create(JSContext*, js::gc::AllocKind, js::gc::Heap, JS::Handle<js::SharedShape*>, js::gc::AllocSite*)+694
    0x963f590dd53 VMWrapper: NewPlainObjectOptimizedFallback
    0x963f5954b40 Ion: rewrite_args_nboyer (@evaluate:3624:35)
    0x963f5952235 Ion: rewrite_nboyer (@evaluate:3598:30)
    0x963f595426d Ion: rewrite_args_nboyer (@evaluate:3624:35)
    0x963f5952235 Ion: rewrite_nboyer (@evaluate:3598:30)
]: 1979955

I’m far less confident these allocation profiles will turn out to be actionable compared to the rooting ones, but I’m very happy with the process by which I got here!

Exploring a language runtime with bpftrace

So I have been having quite a bit of fun learning about eBPF. It’s been on my todo list for like two or three years, but I’ve finally made the time to actually start to figure it out, and have already found some neat stuff, and had to do some fun hacking.

Let’s set the stage.

eBPF is an instrumentation and tracing system built into Linux (and Windows these days!). The general thrust of it is that you provide the kernel with some bytecode, which it verifies, then executes in kernel context.

You’re able to collect all sorts of information via this bytecode, which then can be dumped out. There are multiple ways to get BPF bytecode into the kernel, but I’m going to talk about bpftrace, which provides an awk like language for writing BPF programs.

Now, there’s a fair amount of reasonably comprehensive documentation about bpftrace so I’ll give only the tiniest intro, and then get into some of the problems I’ve been curious about and the solutions I’ve built around BPF to get answers.

Your basic bpftrace program looks like this:

some:probe  /* Probe */
{
  /* Action List */ 
  @start[tid] = nsec
}

some:other:probe
/@start[tid]/ /* filtered by having a start timestamp with this tid that is nonzero */
{
 $duration = nsec - @start[tid]
 print($duration)
 delete(@start[tid]) // remove the key entry so we can get the next duration.
}

A couple of things worth noting in this sketch:

  • Variables that start with @ are maps, associative keyed dictionaries. These are global. Variables that start with $ are locals to an action set.
  • /something/ is a filter applied on a probe
  • There are many builtin functions (print, delete) and ‘getters’ like nsec which returns a timestamp and tid which returns the thread id.

Rooted profiling

SpiderMonkey is a precisely rooted JS engine, so we have a Rooted type, that informs the GC about C++ object pointers. One thing we've noticed is that occasionally we see the Rooted constructor in profiles; yet it's also fast enough that it doesn't show up often. We also have a tool RootedTuplewhich allows us to reduce rooting cost in a single stack frame at the cost of slightly uglier code.

So I’ve been curious for a while about “Where are all these roots coming from”, eg, what source locations are the most frequently run Rooted creations. That link points to a bug which hypothesizes we could do this with std::source_location in C++20, but given we don’t yet have C++20 in SpiderMonkey I sort of stopped there. Until I started learning bpftrace, at which point I realized I probably could make this work.

My initial thinking looked roughly like this:

uprobe://<path to firefox>/Rooted::Rooted
{
    $filename = ???
    $lineno = ???
    @call_counts[$filename,$lineno]++
}

uprobe here is a “user probe” which accesses a function symbol in the program (as opposed to a kprobe, which is a kernel probe).

Of course, I couldn’t figure out how to get a filename or linenumber. Eventually I figured out that actually, there’s a builtin function which behaves pretty much how I want: ustack(n) which returns the top n user stack frames, and can be used as keys to a map.

For reasons I totally forget, I decided to instead target the registerWithRootList function that rooted constructors all actually call into. My first almost working probe looked like this:

uprobe:/home/matthew/unified-git/obj-debug-shell-x86_64-pc-linux-gnu/dist/bin/js:*registerWithRootLists*
{
   @call[ustack(perf,3)] = count()
}


uprobe:/home/matthew/unified-git/obj-debug-shell-x86_64-pc-linux-gnu/dist/bin/js:cpp:JS_ShutDown
{
   printf("calling exit from ebpf");
   exit()
}

after I marked registerWithRootLists as MOZ_NEVER_INLINE.

This worked, but gave me an unsymbolicated result:

<a whole bunch of low counts> 
...
@call[
        55912bcaef60 0x55912bcaef60 ([unknown])
        55912bc97acd 0x55912bc97acd ([unknown])
        55912c1e7598 0x55912c1e7598 ([unknown])
]: 837
@call[
        55912bd662e0 0x55912bd662e0 ([unknown])
        55912bd476ac 0x55912bd476ac ([unknown])
        55912bf9b415 0x55912bf9b415 ([unknown])
]: 1401
@call[
        55912bd662e0 0x55912bd662e0 ([unknown])
        55912bde2918 0x55912bde2918 ([unknown])
        55912c053dbd 0x55912c053dbd ([unknown])
]: 1538

Eventually, with the incredible pointer from Zixian Cai on mastodon I discovered that 1) you need to explicitly print the map, 2) need to do it while the process is alive. Along with discovering I can generalize my probe file by using $1, my final probe file was

uprobe:$1:*registerWithRootLists*
{
   @call[ustack(perf,3)] = count()
}


uprobe:$1:cpp:JS_ShutDown
{
   print(@call)
}

END {
    // Don't print map on exit.
    clear(@call)
}

which was run like this: sudo bpftrace rootTrace.bt programUnderTrace -c ‘programUnderTrace …’. To make sure the program was kept alive, after the call to JS_ShutDown I made the process sleep for 2 seconds.

I then ran the profiler against every JetStream3 subtest, and produced a bunch of reports. I was then able to open a family of bug reports, highlighting places where for example where we could remove 75,000,000 calls to registerWithRootLists by switching to RootedTuple.

This was a cool success, and I was high on it. So next I started writing an allocation profiler.

However that’s a blog post for later, as it involves some fun explorations with writing the tool you need, and this has gotten big enough.

A Performance Investigation Challenge

I really liked matklad’s Performance Visualization challenge (partially because it didn’t take me long to find the line with samply, which made me feel good).

Here’s a skills gap or research question perhaps: How do you identify an impactful but diffuse problem. I have a concrete example in mind.

So, nine months ago, trying to optimize a function in Speedometer 3, my colleague Iain Ireland dug through the generated assembly, and largely walked away uncertain as to what was going on.

Fast forward to one month ago: He landed a weirdly impactful patch. He removed an ‘optimization’ where we used to write the tag separately from the value where the Value representation let us do that. This change alone improved some benchmarks by 6-8%. The best hypothesis here is that on some x86 hardware this split write totally broke store forwarding, and heavily neutered performance.

Now: Once we had the hypothesis that this was a store forwarding problem we were able to show that the patch reduced the amount of failed forward using perf and performance counters.

The research / methodological question I pose here however is: How on earth do you find these sorts of problems without luckily ending up staring at them! One has to imagine there’s other issues hanging out there, but I really have no idea how to find them.

Now, I have an unread copy of System’s Performance by Brendan Gregg sitting on my desk, and maybe the answer is in there, but I’m curious about if anyone has any techniques or methodologies that have worked well for them, or if this is a research area that still needs work (Automated Mechanical Sympathy anyone?)

Post-Publication Postscript:

Iain writes:

To be clear, I think we're relatively confident we understand the problem in retrospect. It's the load-store conflict problem described here: https://zeux.io/2025/05/03/load-store-conflicts/

In particular, if you search for "Indeed, if we check the Zen 4 optimization guide, we will see (emphasis mine)", there's a pull quote that says "The LS unit supports store-to-load forwarding (STLF) when there is an older store that contains all of the load’s bytes"

Which is precisely the thing we did not do. So I don't even think this is our "best hypothesis", I think it's just the answer.

Learning about C++ Direction Setting

Another blog post written wearing only my hat, rather than any Mozilla or TC39 Delegate related hats

Last year while considering some JavaScript standards evolution, I wanted to look into how C++ handles some of the challenges. I dug through some documents, and wanted to share some of my findings here as a pointer for future discussion.

Design Aims

One of the fascinating artifacts is a set of C++ design aims, located in “Notes on Operating Principles for Evolving C++” , Appendix A¹.

I love the opinionated nature of this list, some pieces of which you could lift-wholesale into JS:

  • C++’s evolution must be driven by real problems.
  • It is more important to allow a useful feature than to prevent every misuse.
    • Enable good programming rather than to eliminate bad programming
  • If in doubt, pick the variant of a feature that is easiest to teach.

I personally really resonated with:

Prefer generality over specificity: prefer standardizing general building blocks on top of which domain-specific semantics can be layered, as opposed to domain-specific facilities on top of which other domain-specific semantics can't be layered

Which would support my insane quest for user defined primitives.

One recurring theme in the documents I read was the idea that C++ would require teaching and learning support, and that teaching and learning is a key principle when doing design for the language. This is something I’ve not heard made explicit very frequently in JS discussions.

The Direction Group

A fascinating aspect of the C++ committee is the existence of the Direction Group:

The direction group is a small by-invitation group of experienced participants who are asked to recommend priorities for WG21. Currently, that group consists of: Howard Hinnant, Roger Orr, Bjarne Stroustrup, Daveed Vandevoorde, and Michael Wong. Their charter includes setting forth a group opinion on:

  • Evolution direction (language and library): This includes both language and library topics, and includes both proposals in hand and proposals we do not have but should solicit. The direction group maintains a list of the proposals it considers the most important for the next version of C++ or to otherwise make progress such as in a TS, and the design group chairs use that list to prioritize work at meetings. Typically, work on other topics will occur after there’s nothing further left to do at this meeting to advance the listed items.
  • Providing an opinion on any specific proposal: This includes whether the proposal should be pursued or not, and if pursued any changes that should be considered. Design group participants are strongly encouraged to give weight to an opinion that the direction group feels strongly enough about to suggest.

They have produced a document, “Directions for ISO C++” which functions as both a highlighter of many of the legitimate challenges of C++ design-by-committee, as well as a finger-on-the-scale to attempt to drive priorities.

They explicitly call out some time scales when discussing their priorities:

  • Long Term (Decades)
  • Medium Term (3-10 Years)
  • Short term (The next few releases)

I find this notion fascinating, and perhaps something that is worth trying to work into T39.

I really appreciate the pragmatic discussions in this document as well, much of which writes down things we see in JS, but perhaps don’t document nearly as explicitly. Apologies for the length of the quote, but I think it’s good:

All proposals consume the (limited) committee time, and WG21 members should consider the best overall outcome for the future of the language. Hence, while small proposals to clean up non-trivial defects are welcome, discussion about these may have lower priority. If such a small proposal proves to be controversial, it is probably better to withdraw or defer it to avoid preventing progress on more substantive items.

We are a set of interrelated committees currently with about 200 members present at a meeting and more active via the Web. Thus some “design by committee,” or rather “design by committees,” is unavoidable. We need to consciously and systematically try to minimize those effects by building a shared sense of direction.

  • We have no shared aims, no shared taste.
    • This is a major problem, possibly the most dangerous problem we face as a committee. For C++ to succeed, we must overcome that. For starters, we – as individuals, as SGs, as WGs, and as the committee as a whole – must devote more time to develop and articulate a common understanding. In particular, for every proposal being considered, we should do more to emphasize the motivation and explain how it fits in the language, standard library, and common uses that we anticipate.
  • The alternative is a dysfunctional committee producing an incoherent language
    • We need to be more explicit about:
      • What general problems we are trying to address
      • How a particular proposal serves those articulated aims

I don’t know how the chairs manage these sorts of issues at TC39, but I really appreciate the candour here – similarly with their frank discussion of process issues:

We are “a bunch of volunteers.”

  • Most are enthusiasts for some aspect of the language or other.
  • Few have a global view (geographically, C++ community, C++ usage, C++ Language and standard library).
  • Most have a strong interest in only some subset of use, language, or library.
  • Most are deeply engaged with a specific form of C++ as part of their daily work.
  • Our levels and kinds of relevant computer-science, design, and programming-language education vary dramatically.
  • Many are clever people attracted to clever solutions.
  • Some are devoted to ideas of perfection.
  • Many are tool builders.
  • Some have full control over their source code whereas others critically rely on code outside their direct control (open-source and/or commercial).
  • Some operate in an environment with strong management control, others in environments without a formal management structure (and everything in-between).

This implies that we can’t rely on a common vocabulary, a core set of shared values, a common set of basic ideals, or a common understanding of what’s a problem. Consequently, we must spend more effort on:

  • Articulating rationales for proposals.
  • Facilities for the “average programmer,” who is seriously under-represented on the committee.
  • Facilities aimed more at application builders than at builders of foundational libraries.

Please pay special attention to that last point. We feel that C++’s utility and reputation suffer badly from the committee lacking attention to improvements for relative novices and developers with relatively mundane requirements. Remember:

  • Most C++ programmers are not like the members of the committee.

    We, as a committee, have no mechanism of reward (except accepting someone’s proposal) or punishment (except delaying or rejecting someone’s proposal). To get something accepted, we need consensus (defined as large majority, but not necessarily unanimity). This has implications on what we can do and how we do it.

  • Nothing gets done unless someone cares enough to do it.

  • A small vocal minority can stop any proposal at any stage of the process.

The majority of which will feel familiar to those working with TC39 in my opinion.

The authors of the Directions paper also explicitly highlight the importance of trust in the process, and the failures that have caused loss of trust. These will sound familiar to TC39 participants – the broader committee (plenary) stomping all over a working groups (champions group’s) careful design work, late breaking feedback and failure of consensus based on “lack of comfort”.

I do feel the pain behind this sentence, “People who were not sufficiently motivated to take part in the discussion should feel obliged to at least stay neutral“, though I think with the breadth of C++ it understates the amount of effort required to stay on top of everything.

Some of the guidance from the direction group… I wish we had anything equivalent in TC39:

When triaging features for consideration, we propose that the WG chairs put a higher priority on features that conform to these goals, but also keep a watch against features that:

  • Turn C++ into a radically different language
  • Turn parts of C++ into a significantly different language by providing a segregated sub-language
  • Have C++ compete with every other language by adding as many as possible of their features
  • Incrementally modify C++ to support a whole new “paradigm” without articulating the end goal
  • Hamper C++'s use for the most demanding systems programming tasks
  • Increase the complexity of C++ use for the 99% for the benefit of the 1% (us and our best friends)

The misfires of TC39, in my opinion, are often covered by exactly these sorts of comments.

The C++ Programmer’s ‘Bill Of Rights’

This is a peculiar section, but I think interesting as a thought experiment: What would a JS programmer’s ‘bill of rights’ look like; we know there are some aspects to this:

  • Don’t break the web.
  • ???

But what does the rest of this look like?

Acknowledgements:

Thanks to Botond Ballo for an email conversation which guided me to many of these sources. Much appreciated.


¹: The C++ Committee’s love of PDFs is truly baffling and an absolute pain in the butt. Most of these quotes were massaged back into semi-coherent formatting by copying garbage out of a PDF and then having ChatGPT recreate some semblance of the original. As a result, take these quotes as having statistical similarity to the original document, but not 100% coherence. However, I link to the source in case there’s wording concerns.

Upgrade issue with jujutsu

Trying to update jujutsu I had a heck of a time. A previously working command stopped working. Running in a different directory, updating rust neither worked. The issue is this commit in the jujutsu repo.

The symptom is:

matthew@ZenTower:~$ cargo binstall --strategies crate-meta-data jj-cli
 WARN Failed to retrieve token from `gh auth token` err=Os { code: 2, kind: NotFound, message: "No such file or directory" }
 WARN Failed to read git credential file
 INFO resolve: Resolving package: 'jj-cli'
ERROR Fatal error:
  × For crate jj-cli: Failed to parse cargo manifest: TOML parse error at line 49, column 12
  │    |
  │ 49 | resolver = "3"
  │    |            ^^^
  │ unknown variant `3`, expected `1` or `2`
  │ 
  ├─▶ Failed to parse cargo manifest: TOML parse error at line 49, column 12
  │      |
  │   49 | resolver = "3"
  │      |            ^^^
  │   unknown variant `3`, expected `1` or `2`
  │   
  ├─▶ TOML parse error at line 49, column 12
  │      |
  │   49 | resolver = "3"
  │      |            ^^^
  │   unknown variant `3`, expected `1` or `2`
  │   
  ╰─▶ TOML parse error at line 49, column 12
         |
      49 | resolver = "3"
         |            ^^^
      unknown variant `3`, expected `1` or `2`

It turns out the one thing I didn’t update was cargo-binstall, so cargo install cargo-binstall to update fixed this.

32 Bit Build of SpiderMonkey -- First in a while

Some quick notes on doing a recent 32 bit build of SpiderMonkey

Mozconfig:

ac_add_options --target=i686-pc-linux

ac_add_options --enable-application=js

ac_add_options --disable-optimize
ac_add_options --enable-debug
ac_add_options --enable-ccache=sccache
ac_add_options --disable-tests


# Dump opt builds into another dir.
mk_add_options MOZ_OBJDIR=@TOPSRCDIR@/obj-debug-shell-@CONFIG_GUESS@

Machine needs to have 32-bit packages available:

  • sudo dpkg --add-architecture i386 && sudo apt-get update

Need 32bit zlib:

  • sudo apt-get install zlib1g:i386

User Defined Primitives: A Sketch

The following is written in my Personal capacity, and does not represent the views of the SpiderMonkey team, Mozilla, TC39 or any other broader group than just what’s between my ears. I wear no hats here.

I want to propose a little thought experiment for the JS language. It’s pretty janky, there’s lots to quibble about, but I find myself coming back to this again and again and again.

tl;dr: We should add user-defined primitives to JS.

Motivation

Unless you’ve been in my brain (or suffered me talking about this in person), you’re probably wondering: What does that even mean?

Let’s take a step back. Working on a JS engine I get to watch the language evolve. JavaScript evolves by proposal, each going through a set of stages.

What I have seen over my time working on JS is new proposals coming from the community that want to add new primitives to the language (BigInt, Decimal, Records & Tuples).

New proposals want to add primitives for a variety of reasons:

  • They want operator overloading
  • They want identity-less objects
  • They want a divorce from the object protocol

Of course, as soon as a proposal has this shape, it has to come to TC39, because this isn’t the kind of thing that can just ship as a library.

There’s a problem here though: Engine developers are reluctant to add new primitives: primitives have a high implementation cost and there's concern that this ends up adding branches to fast paths causing performance regressions for code that doesn’t ever interact with this feature. For those who think Decimal would be awesome, it seems an easy price to pay, but for those who will never use it, it would sort of suck for the internet to get just a -little- bit slower so that people could have less libraries and easier interoperability. Furthermore, each new primitive takes a type tag in our value representations... these are a limited resource, barring huge redesign of JS engines.

There’s another problem here: TC39 is required to gatekeep all these things. This makes the evolution and exploration of the solution space slow and fraught. Even more scary: What if TC39 gets the design wrong, and as a result we don’t see a feature actually getting used, or it is forever replaced by some better polyfill?

Some people both outside and inside the JS community make fun of the “Framework of the Week” -- the fast churn that happens in the JS community. The thing is, I think this is actually a fantastic thing! We want language communities to be able to explore other solutions, find trade-offs, discover novel solutions, etc. Forcing everything through the narrow waist of TC39 is sometimes a technical necessity, but it also deprives the community of one of its strengths.

So what do we do about it? User Defined Primitives: JavaScript authors should be able to write their own primitive types and ship them as libraries to JS engines. The manifesto should be something along the lines of “Why just Decimal? Why not Rational? Why not vec3? vec4? ImmutableSet?”

What does success look like?

The JS community gets to write hundreds of libraries exploring the whole space of primitives. We see these libraries evolve, fork and split, finding good points on the trade-off space.

Ideally, we’d even see some user-defined primitives become so popular, so intrinsic to the way that people write JS, that we decide to simply re-host those user-defined intrinsics right into the official JS spec.

Sure... but how?

So it is simple enough to have a rallying cry. The reason I’m writing this however is that, as an engine implementer, I have a sketch of how I think we could get this to work. I’ll come to some gaps shortly.

One thing that I keep in the back of my head about this work is: I think a successful design for this would provide a pathway for BigInt to be re-specified as a “Specification Defined Primitive”. Essentially, if you could do user-defined primitives in the past, before BigInt, then you could have simply had BigInt be a library, that perhaps later got some special syntactic sugar (n suffix).

A Strawman Design

So let’s build out a strawman design. My biggest point I want to make is that I think all the pieces are technically feasible, the challenges being agreeing on how to make this all look.

As a preamble: Proxy is a type that exposes to users the internal protocol of Objects from the specification (sometimes called the meta-object protocol). Proxies have traps so that users can customize what happens during the execution of this internal protocol.

What if we specified the operations on primitives in terms of a “primitive operation protocol”, and then specified user-defined-primitives as special kinds of data that have a user-implemented primitive operation protocol.

One important thing I tried to avoid was saying that these user-defined primitives are Objects. I actually think it’s important that they’re not Objects. That they exist in a totally different space than Objects. In my opinion this makes things like operator overloading much more manageable, as you would only ever allow operator overloading in terms of these Primitive objects. It also makes only Primitives pay the cost of operator overloading, as you already have to type check at each operator to choose the right semantics.

So what does this look like? Let’s start with a fairly gross version of this, where we set all the moving pieces out explicitly.

Suppose we had a Primitive constructor, that worked a lot like Proxy, in that it took a trap object. Unlike Proxy however, what this constructor returns is a function, which when invoked, returns primitives with the traps installed. For performance, traps should be resolved eagerly, and be immutable.

let traps = { 
  // We'll discuss the contents of this in a second.
}

let Vec3 = new Primitive(traps); 

let aVec = Vec3(0, 0, 1);
let bVec = Vec3(0, 0, 1); 
(aVec === bVec) // true -- no identity!

Off to a good start. Next, let’s start designing the trap set

Well, let’s start with the most basic: What do you do with the arguments to the construction function, and what’s the actual data model here?

To avoid having identity, we need to define the == and === operators to compare these elements somehow. This means we have to define a data-model for primitives. I’d propose the following:

A Primitive is a collection of data with named ‘slots’ where primitives are compared by slot-wise comparison always

In a JS engine implementation a Primitive is a special type, with a single type tag.

So let’s extend our traps object to provide ‘construction’ mechanics.

let traps  = {
  constructor(primitive, x,y,z) {
    // Design Question: Do we always provide getters for slots?
    // Suppose for the purposes of this post, we do. 
    Primitive.setSlot(primitive, 'x', x);
    Primitive.setSlot(primitive, 'y', y);
    Primitive.setSlot(primitive, 'x', z);
  }
}

Now a Vec3 has 3 slots to store data in, named x, y, z. In a JS engine, a Primitive right now consists of

  • An internal slot which holds all the processed trap functions as well as the count and names of the slots. This would look and work a lot like a Shape (SM) or Map (V8), concepts which already exist
  • A number of slots which hold JS values.

So far in our story, the following should work:

const pos = Vec3(1, 2, 3)
const target = Vec3(100, 100, 0);

function distance(v1, v2) {
    return Math.sqrt(
        (v2.x - v1.x) * (v2.x - v1.x)
        + (v2.y - v1.y) * (v2.y - v1.y)
        + (v2.z - v1.z) * (v2.z - v1.z)
    );
}

let distanceToTarget = distance(v1,target);

What else would we need? Well, here’s where we can add operator overloading. Of course, we do this by adding another trap:

add: (a, b) => {
      // Design Question: What do you do with mixed types? 
      // It's tempting to just say throw, but it would 
      // foreclose on the possibilty of scaling a 
      // vector by N * Vec3(a, b, c);
      return Vec3(a.x + b.x, a.y + b.y, a.z + b.z);
    },

Write the rest of the definitions for common math and this should work:

let product = Vec3(1, 2, 4) * Vec3(1, 2, 3);
// Using distance-from-0,0,0 as a poor-man's .magnitude() function.
let mag = distance(product, Vec3(0, 0, 0));
let scale = Vec3(1 / mag, 1 / mag, 1 / mag)
let norm = product * scale;

Now this makes you program with free functions outside regular operators.

Of course, we could then define Object Wrappers for these user defined primitives if we wanted to, potentially allowing someone to write Vec3(1, 3, 4).magnitude()

Ok... it’s a sketch. Kinda thin.

Well, yes. I have hemmed and hawed about this idea for more than two years now. I kept thinking “Y’know, we have all the infrastructure to experiment with this, you could build a prototype”, and kept not finding the time.

I’m publishing this blog post for two reasons:

  1. I want to stop thinking about this and ‘close the open loop’ in my brain.
  2. I hope is sparks some conversation. I see this as a totally viable way for the JS community to get loads of things it wants, while getting engine developers and TC39 out of the way.

Ok... sounds good, what’s left to do?

This really is a sketch. There are so many design decisions left unsaid here:

  1. Syntactically, what does a user-defined primitive look like? Should you only provide these to a new class like syntax sugar? (e.g. value Vec3 { add(){} })

  2. How do you handle variable-length primitives? It would be amazing if you could make Records and Tuples work here, but both are arbitrarily sized. Maybe that’s fine?

    recordTraps = {
         construct(primtive, ...args) {
             // The nice thing here is that it makes comparisons between records and typles
             // of different lengths quick, because the "PrimtiveShape" you'd likely want
             // would be a fast path
             for ((k,v) of args) {
                 Primitive.setSlot(primitive, k, v);
             }
         }
     }
  3. Speaking of Tuples... do you allow slot indexes? In general? Opt in?

  4. How do you handle realms? Primitives can be shipped freely between realms, but presumably the code which powers a primitive is stuck in one realm? This is actually being discussed in committee already to handle the same problem in Shared Structs.

  5. Speaking of... Structs really overlap a lot with this. Too much? Maybe? Can we do anything to make this work with them nicely?

  6. Similarly, Structs is trying to answer the question: How do you give these things methods? Do Primitives get object wrappers?

  7. Do we allow extended primitives to store object references or do we mandate they don’t; or is this part of the primitive protocol?

  8. JSON.stringify,toStringTag, etc. etc.

  9. Structured Cloning.

Pushing this to completion would be a huge undertaking. Almost certainly years and years of standards work. Yet, I think it would be a fascinating and worthwhile undertaking.

Brian Goetz recently gave a talk called “Postcards from the Peak of Complexity”. He highlights how shipping too soon could have bad effects, but by waiting and working through issues you really could find a simple core. I think that simple core is in here.

If I were to tackle this, I’d approach the problem from two ends:

  1. I’d really start building the prototype for what these things look like. Get a playground up, let people get a feel for what works and what’s rough, and explore.
  2. At the same time however, I’d make a serious effort and using BigInt as a guide to how you would include these things in the specification -- coming back to my success criteria, I think if BigInt can be specified as a built-in Primitive type, you’ve really won here.

Why aren’t you presenting this at plenary?

I don’t think I have enough buy in for this idea to come to plenary, both within Mozilla, and without. So for now, let’s talk about this like it’s a blog post and not a real proposal. If it ever became a real proposal maybe I’d present.

Matt, you’re describing Value Types from Java.

Yes, thank you for noticing a heavy influence. Value types in Java have a fantastically pithy phrase which captures some of the power here: Codes like a class, works like an int.

I’ve been too far away from the Java ecosystem to really have strong opinions about them. I also think ultimately JS primitives would look different than Java Value types, if only because JS is a dynamically typed language, but I do think it’s a huge inspiration to this post.

Conclusion

I want to challenge the JS Community to think big about how the language could evolve to solve real challenges. I think this sketch provides a possible path forward to solve a whole bunch of problems in the language ecosystem. However, I’d be lying if I wasn’t scared of the long tail of complexity this would potentially entail.

Any background reading? Watching?

Java

JavaScript

Acknowledgements

  • Thanks to Ben Visness for helping workshop out a Vec3 type, and reading drafts of this.
  • Thanks to Iain Ireland, Eemeli Aro for providing draft feedback.
  • Thanks to Steve Fink for catching post-publication errors :)
  • Thanks to the SpiderMonkey team for letting me ramble about this stuff.
  • Thanks to Nicolo Ribaudo and Tim Chevalier from Igalia for their prototyping work on Records and Tuples which inspired a lot of this thinking.
  • Thanks to anyone who has listened to me about this!

Making Teleporting Smarter

(This is a republish of an article posted at SpiderMonkey.dev for my own archiving purposes)

Recently I got to land a patch which touches a cool optimization, that I had to really grok. As a result, I wrote a huge commit message. I’d like to expand that message a touch here and turn it into a nice blog post.

This post assumes roughly that you understand how Shapes work in the JavaScript object model, and how prototypical property lookup works in JavaScript. If you don’t understand that just yet, this blog post by Matthias Bynens is a good start.

This patch aims to mitigate a performance cliff that occurs when we have applications which shadow properties on the prototype chain or which mutate the prototype chain.

The problem is that these actions currently break a property lookup optimization called "Shape Teleportation".

What is Shape Teleporting?

Suppose you’re looking up some property y on an object obj, which has a prototype chain with 4 elements. Suppose y isn’t stored on obj, but instead is stored on some prototype object B, in slot 1.

In order to get the value of this property, officially you have to walk from obj up to B to find the value of y. Of course, this would be inefficient, so what we do instead is attach an inline cache to make this lookup more efficient.

Now we have to guard against future mutation when creating an inline cache. A basic version of a cache for this lookup might look like:

  • Check obj still has the same shape.
  • Check obj‘s prototype (D) still has the same shape.
  • Check D‘s prototype (C) still has the same shape
  • Check C’s prototype (B) still has the same shape.
  • Load slot 1 out of B.

This is less efficient than we would like though. Imagine if instead of having 3 intermediate prototypes, there were 13 or 30? You’d have this long chain of prototype shape checking, which takes a long time!

Ideally, what you’d like is to be able to simply say

  • Check obj still has the same shape.
  • Check B still has the same shape
  • Load slot 1 out of B.

The problem with doing this naively is “What if someone adds y as a property to C? With the faster guards, you’d totally miss that value, and as a result compute the wrong result. We don’t like wrong results.

Shape Teleporting is the existing optimization which says that so long as you actively force a change of shape on objects in the prototype chain when certain modifications occur, then you can guard in inline-caches only on the shape of the receiver object and the shape of the holder object.

By forcing each shape to be changed, inline caches which have baked in assumptions about these objects will no longer succeed, and we'll take a slow path, potentially attaching a new IC if possible.

We must reshape in the following situations:

  • Adding a property to a prototype which shadows a property further up the prototype chain. In this circumstance, the object getting the new property will naturally reshape to account for the new property, but the old holder needs to be explicitly reshaped at this point, to avoid an inline cache jumping over the newly defined prototype.

  • Modifying the prototype of an object which exists on the prototype chain. For this case we need to invalidate the shape of the object being mutated (natural reshape due to changed prototype), as well as the shapes of all objects on the mutated object’s prototype chain. This is to invalidate all stubs which have teleported over the mutated object.

Furthermore, we must avoid an "A-B-A" problem, where an object returns to a shape prior to prototype modification: for example, even if we re-shape B, what if code deleted and then re-added y, causing B to take on its old shape? Then the IC would start working again, even though the prototype chain may have been mutated!

Prior to this patch, Watchtower watches for prototype mutation and shadowing, and marks the shapes of the prototype objects involved with these operations as InvalidatedTeleporting. This means that property access with the objects involved can never more rely on the shape teleporting optimization. This also avoids the A-B-A
problem as new shapes will always carry along the InvalidatedTeleporting flag.

This patch instead chooses to migrate an object shape to dictionary mode, or generate a new dictionary shape if it's already in dictionary mode. Using dictionary mode shapes works because all dictionary mode shapes are unique and never recycled. This ensures the ICs are no longer valid as expected, as well as handily avoiding the A-B-A problem.

The patch does keep the InvalidatedTeleporting flag to catch potentially ill-behaved sites that do lots of mutation and shadowing, avoiding having to reshape proto objects forever.

The patch also provides a preference to allow cross-comparison between old and new, however this patch defaults to dictionary mode teleportation.

Performance testing on micro-benchmarks shows large impact by allowing ICs to attach where they couldn't before, however Speedometer3 shows no real movement.

5 Years of Compiler Jobs

Turns out I missed the 5 year anniversary of my CompilerJobs page by a little more than a week.

From the initial commit, myself and many helpful contributors have

  • Made 308 Commits
  • Listed 196(!!) companies; not including companies who have come and gone
  • 543 GitHub Stars

It’s my sincere hope that it’s lived up to my original hope, inspiring people to pursue the area by helping them realize that there are way more teams and roles in this space than you might imagine at the beginning of your career.

I really don’t know how many people this has helped find a job. I’d love to hear from you if it was helpful. Tell me about how it helped you!

Here’s to another 5 years of connecting people to the world of working in programming languages, compilers and runtime systems.

Dump the commit contents into the editor when editing a Jujutsu commit

I’m so happy with this config that I need to share it even though I’ve only been using this for a few hours. Put this in ~/.config/jj/config.toml.

[templates]
draft_commit_description = '''
concat(
  description,
  surround(
    "\nJJ: This commit contains the following changes:\n", "",
    indent("JJ:     ", diff.stat(72)),
  ),
  surround("\nJJ: Diff:\n", "", indent("JJ:  ", diff.git(4)))
)
'''

It’s awesome. Now editing a commit message looks like this.

Bug 1930698 - Add invalidation logging for fuses r?iain

Differential Revision: https://phabricator.services.mozilla.com/D228689

JJ: This commit contains the following changes:
JJ:     js/src/vm/InvalidatingFuse.cpp | 6 ++++++
JJ:     js/src/vm/Logging.h            | 1 +
JJ:     2 files changed, 7 insertions(+), 0 deletions(-)

JJ: Diff:
JJ:  diff --git a/js/src/vm/InvalidatingFuse.cpp b/js/src/vm/InvalidatingFuse.cpp
JJ:  index 6fbc9b2aa0..e7b541ff20 100644
JJ:  --- a/js/src/vm/InvalidatingFuse.cpp
JJ:  +++ b/js/src/vm/InvalidatingFuse.cpp
JJ:  @@ -10,8 +10,9 @@
JJ:   #include "jit/Invalidation.h"
JJ:   #include "jit/JitSpewer.h"
JJ:   #include "vm/JSContext.h"
JJ:   #include "vm/JSScript.h"
JJ:  +#include "vm/Logging.h"
JJ:   
JJ:   #include "gc/StableCellHasher-inl.h"
JJ:   #include "vm/JSScript-inl.h"
JJ:   
JJ:  @@ -33,8 +34,10 @@
JJ:   
JJ:   void js::InvalidatingRuntimeFuse::popFuse(JSContext* cx) {
JJ:     // Pop the fuse in the base class
JJ:     GuardFuse::popFuse(cx);
JJ:  +  JS_LOG(fuseInvalidation, mozilla::LogLevel::Verbose,
JJ:  +         "Invalidating fuse popping: %s", name());
JJ:     // do invalidation.
JJ:     for (AllZonesIter z(cx->runtime()); !z.done(); z.next()) {
JJ:       // There's one dependent script set per fuse; just iterate over them all to
JJ:       // find the one we need (see comment on JS::Zone::fuseDependencies for
JJ:  @@ -70,8 +73,11 @@
JJ:       // before calling invalidate.
JJ:       if (script->hasIonScript()) {
JJ:         JitSpew(jit::JitSpew_IonInvalidate, "Invalidating ion script %p for %s",
JJ:                 script->ionScript(), reason);
JJ:  +      JS_LOG(fuseInvalidation, mozilla::LogLevel::Debug,
JJ:  +             "Invalidating ion script %s:%d for reason %s", script->filename(),
JJ:  +             script->lineno(), reason);
JJ:         js::jit::Invalidate(cx, script);
JJ:       }
JJ:     }
JJ:   }
JJ:  diff --git a/js/src/vm/Logging.h b/js/src/vm/Logging.h
JJ:  index f4b63e3773..a593c249bd 100644
JJ:  --- a/js/src/vm/Logging.h
JJ:  +++ b/js/src/vm/Logging.h
JJ:  @@ -83,8 +83,9 @@
JJ:   
JJ:   #define FOR_EACH_JS_LOG_MODULE(_)                                            \
JJ:     _(debug)                /* A predefined log module for casual debugging */ \
JJ:     _(wasmPerf)             /* Wasm performance statistics */                  \
JJ:  +  _(fuseInvalidation)     /* Invalidation triggered by a fuse  */            \
JJ:     JITSPEW_CHANNEL_LIST(_) /* A module for each JitSpew channel. */
JJ:   
JJ:   // Declare Log modules
JJ:   #define DECLARE_MODULE(X) inline constexpr LogModule X##Module(#X);

JJ: Lines starting with "JJ: " (like this one) will be removed.

Many thanks to Erich at work!

A Case for Feminism in Programming Language Design

I wish I had read this paper by Felienne Hermans and Ari Schlesinger before going to SPLASH.

Felienne’s blog post is worth reading as an introduction, and here’s the stream of her presentation, which I highly recomend -- she's an excellent compelling communicator.

I don't have much to add, beyond a few quotes I felt worthwhile to share:

Coming back to my insider-outsider perspective, I sometimes wonder what we are even researching. What exactly is a programming language for? What does it mean to design a programming language? And I keep coming back the the question: why are women of all colors so under represented in the programming languages community?

The spread-out nature of research on programming languages is problematic, since it prevents the PL community from having a more holistic view of programming language use. We are robbing ourselves of a place for conversations on the different perspectives on the ways people use with programming languages.

SPLASH 2024: Impressions and Feelings

I thought it would be useful to sit down and write up some of my thoughts on SPLASH 2024 while they are still fresh.

Due to happy nuptials (& a pressing desire to get home), I was only able to attend Splash for 2.5 days; Wednesday, Thursday and Friday morning.

The beauty of any conference is of course the Hallway Track, so I have many papers and presentations I need to read or watch that I missed. In this write-up I’ll just highlight papers / presentations I managed to catch. Missing something here says nothing other than I likely missed it :)

REBASE

Wednesday was REBASE. My first time attending REBASE, and I quite liked it. Industry / Academic cross-overs are very valuable in my opinion.

After Rebase ended, a group of us ended up chatting in the room for so long that we missed the student research competition and the food!

Thursday

This day opened with a keynote by Richard P. Gabriel, talking about his career, how he sees AI having experienced a few AI winters.

  • Wasm-R3: Record-Reduce-Replay for Realistic and Standalone WebAssembly Benchmarks was quite cool. As an engine developer it’s right up my alley, but also it addresses a real use-case I see which the generation of benchmarks from real applications.

  • WhiteFox: White-box Compiler Fuzzing Empowered by Large Language Models. This was quite neat, and honestly a decent use for an LLM in my mind. The basic idea is to provide the code to a optimization (in a Deep Learning compiler like PyTorch in the paper) to an LLM, and get it to describe some essential features of a test case including example code. Then using these essential features and example codes, create fuzz-test cases. There’s a feedback loop here to make sure the test cases actually exercise the optimizations as predicted. Their results really seem to speak for themselves -- they’ve been called out by the PyTorch team for good work. Overall I was pretty impressed by the presentation.

  • Abstract Debuggers: Exploring Program Behaviors Using Static Analysis Results This was a really neat piece of work. The basic thrust is that most static analyzers either say “Yep! This is OK” or “Nope, there’s a problem here”. The challenge is that interpreting how a problem exists is often a bit of a pain, and furthermore, all the intermediate work a static analyzer does is hidden within it not providing value to users.

    The authors of this paper ask the question (and provide a compelling demo of) “What if you expose a static analyzer like a debugger?” What if you can set break points, and step through the sets of program states that get to an analysis failure? They make a compelling case that this is actually a pretty great interface, and I’m very excited to see more of this.

    As a fanatic about omniscient debugging, I found myself wondering what the Pernosco of static analysis looks like; alas, I never managed to formulate the question in time in the session, then didn’t get a chance to talk to the presenting author later.

Friday

  • Redressing the balance: a yin-yang perspective on information technology Konrad Hinsen used the idea of Yin and Yang to interrogate the way in which we work in information technology. In his presentation, Yang is the action precipitated by the thought of Yin; his argument is that we have been badly imbalanced in information technology, focused on the Yang of “build fast and break things” and not nearly enough on the balancing Yin of “Think and explore”. As a result, tools and environments for though have been left un-built, where the focus has landed on tools for shipping products.

    His hope is that we can have a vision of software that’s more Yin focused; his domain is scientific software and he’s interested in software with layers -- Documentation, formal models, execution smentics.

  • Mark--Scavenge: Waiting for Trash to Take Itself Out This neat paper proposes a new concurrent GC algorithm that tries to eliminate wasted work caused by the evacuation of objects which end up being dead by the time they are evacuated. This is done by doing evacuation using the set of sparse parges selected from a previous GC cycle, only evacuating objects rediscovered on a second cycle.

    As a last-ditch GC, they can always choose to evacuate a sparse page, making use of headroom.

    It was a quite compelling presentation, with good results for the JVM.

The Things I Missed:

There’s a whole bunch of presentations and papers I missed that I would definitely like to catch up on:

Conclusion

Every year I come to an academic conference as an industry practitioner I am reminded of the value of keeping yourself even a little bit connected to the academic world. There’s interesting work happening there, and it’s always nice to hear dispatches from worlds which may be one possible future!

Gut Checking Jujutsu vs Sapling

To be honest, I continue to have no idea where I will land version control wise. Here’s some pro-cons that are in my head at the moment.

Pro Jujutsu

  • I appreciate the versioned working directory
  • The .git support is really nice.
  • I am getting used to the ability to type a 3-letter change id to do work with changes.

Con Jujutsu

  • No absorb
  • No histedit
    • I'll be honest, I find reworking history to be really exhausting in Jujutsu. It uses a weird conflict marker by default which I find confusing, and generally the requirement that you do all the rebasing yourself vs having a histedit script... not a fan.
  • The transparent conversion of working directory to commit can bite you -- means you can accidentally add a file and not notice!
  • jj's versioned working directory seems to occasionally break the Mozilla build system, as it tries to figure out what tools should be bootstrapped and when, which seems to be based off the revision. This is not implicitly as pro-Sapling position, as I suspect I'd have equal pain with Sapling.

Pro Sapling

  • I kinda miss ISL when working in Jujutsu..
  • absorb!
  • histedit
  • I think the changeset evolution story in Sapling is probably a little easier to understand than in Jujutsu

Con Sapling

  • Stepping into the future... but less far
  • dotgit support still feels sufficiently experimental I don't know I'd be comfortable using it. This means that until we do the switch for real, probably stuck with the weird workflow

Connecting my PiKVM Power Button to Home Assistant

This is mostly a note to myself if I ever want to figure out how I did this.

  1. Edit configuration.yaml. I added a shell service:

    shell_command:
       pikvm_power: "curl -X POST -k -u admin:super_secret_password https://pikvm-ip/api/atx/click?button=power"
  2. Reboot HA; needed for the shell_command.pikvm_power service to appear.

  3. Add a button helper

  4. Add an automation that calls the service when the helper button is pressed.

Success!

Jujutsu Two: A better experience

I've been working with Jujutsu the last month or so. It's actually been really nice. It's yet another reminder that, despite the version control system monoculture encouraged by GitHub, there's still innovation and interesting things happening in this space. It reaffirms my believe that plain git is really no longer where we should be aiming for as a community.

Last time I investigated Jujutsu I had some real show-stopping issues that prevented me from giving it a fair shake. This time I managed to get it set on my Linux machine such that it became my daily driver for the last month.

Experience

First and foremost, the ability to use your existing git-cinnabar enabled unified checkout as your repo, and seamlessly switch between git and jj is a pretty winning feature. Now, it turns out that Sapling experimentally is adding what they're calling dotgit support, but it's still experimental, whereas this is pretty core to Jujutsu.

It took me a little while to really internalize the power of a 'versioned working directory' workflow, but I've come to believe it's actually kind of wonderful.

Here's roughly what it looks like:

  1. jj new central "I would like to start working off of central". This produces a working directory with an associated "change id". Change IDs stay the same over the evolution of a working directory / commit.
  2. jj desc -m "Figure out how to frob the blob" Describe your working directory. This is totally optional.
  3. Do your work. The work is automatically checkpointed along the way any time you run a jj command.
  4. If you're interrupted and have to go work on something else, just go to that revision without worrying about losing work.
  5. When it's time to return to working on what you were, simply reopen the working directory with jj edit <change id>
  6. git pull use Cinnabar to pull in new changes.
  7. jj rebase -s working-dir-change-id -d central
  8. jj desc -m "Bug ABCD - Frobnicate the Blob with the Hasher r?freud" Update your description once you have a bug and a reviewer.
  9. jj commit No message edit here -- the description is used as the commit message

Unfortunately, it's here where we have awkwardness; moz-phab doesn't understand detached heads and freaks out when you try to submit. So at this point you have to create a branch, switch git to it, then submit the change. Almost certainly fixable, but we'll not ask the engineering effectiveness team for this.

Now, this is of course the happy path. There are certainly some brain-bending bits when you fall off of it. For example, the handling of conflicts is sort of strange: you edit a conflicted revision, then squash your resolution into the conflicted change, and it loses its conflict status. Dependent revisions and working directories are then rebased, which may have conflicts or not.

Some Setup Notes:

So the slowness I reported last time, everyone asked if I had set up watchman. I had not. So this time around, first thing:

jj config set --user core.fsmonitor "watchman"

Next: In order to make Jujutsu's log make any sense for Mozilla central, you have to teach it about what commits are 'immutable'. We had to do the same dance for Sapling too -- a side effect of not using the standard main branch name.

I put this into ~/.config/jj/config.toml, though it definitely belongs in the repo's .jj/repo/config.toml

[revset-aliases]
"immutable_heads()" = "central@origin | (central@origin.. & ~mine())"

Jujutsu vs Sapling?

Honestly, I have no idea where I'll land. If Sapling's dotgit support matures, it would be a really nice Mercurial replacement for people. But Jujutsu's versioned working directory is a legitimately interesting paradigmn.

I feel like there's a bit of a philosophical school thing going on here

  • Sapling feels like the crystallization of all the good ideas of Mercurial into a modern tool, suited for large scale development, supported by a large corporation.
  • Jujutsu feels like an evolution of git. Taking the position that today computers are fast, storage is fast, why not track more and be more helpful. Yet it still feels connected to the git heritage in a way that occasionally feels clunky. The developer works at Google, but I didn't get the feeling like jj was going to become the default for Googlers any time soon.

To paint a word picture... Sapling is an electric car with swooping lines, and Jujutsu is the Delorean from Back to the Future -- cobbled together with parts, but capable of amazing things.

Some Other Notes

  • The name of the tool is Jujutsu, not Jujitsu... which I have been thinking it was for 6+ months now 😨. My apologies to Martin von Zweigbergk, author of Jujutsu.
  • Use jj abandon to clean up your tree.
  • I find the default jj log to be quite noisy and a bit unpleasant.
    • Jujutsu's log formatting language is... very powerful (I don't get it all); but it has some cool built-in templates; try out jj log -T builtin_log_detailed or jj log -T builtin_log_oneline -- these defaults are defined here
  • Because your working directory is versioned, you can use jj obslog on it to explore the evolution of your working directory. This is super cool -- you can retrieve some code you deleted that you thought you didn't need, but turns out you did.
  • Jujitsu has a revset language that's similar but not identical to the one used in Mercurial and Sapling. For example, in mercurial to list my published commits, I might do hg log -r 'author(mgaudet) and public(). In jj I used jj log -r 'author(mgaudet) & ::central'.

Notes on a Hardware Upgrade

Just for my own edification, writing down the results of an upgrade:

Debug Browser:

  • Old: 15:42.73

  • New: 4:20.33

Debug Shell:

  • Old:

    • Cold Cache:  1:40.17

    • Hot Cache:  0:48.6

  • New:

    • Cold Cache:  0:46.58

    • Hot Cache: 0:26.60

Old Machine: Idle 48W, Build ~225W.

New Machine; Idle 98W, build ~425W.

Sapling & A Workflow For Mozilla Work

In my continuing quest to not use git, I have spent the last few weeks coming up with a viable workflow for working with Sapling at Mozilla. While it's still has rough edges, I've been enjoying it enough that I figure it's time to share.

Edit to add: It's worth highlighting that this workflow is 1000% unsupported, YMMV and please don't file bugs about this; let's not make the engineering workflow team's life harder.

What

Sapling is an SCM system built by Meta. Apparently it's been around for a while in different forms, but I only heard about it on it's public debut.

Sapling is designed to handle large repos, and has a workflow that's extremely familiar to anyone who has used Mercurial + Evolve.

Experience

My experience with Sapling has actually been... pretty darn good. I'll detail the workflow below, and highlight a few gotchas, but overall Sapling seems like where I might end up sticking in my continuing quest.

What's Good?

So first and foremost, I'm super happy with the user experience of Sapling. So much of my workflow in mercurial simply moved over with no trouble (naked heads, frequent use of histedit, absorb, hiding commits, etc). Some things are even nicer than they are in mercurial: for example, Mozillians often use hg wip, which is an alias that gets installed by bootstrap, which shows a graphical overview of the local state of the tree. In sapling, that's just the default output of a bare sl if you're in a sapling repo -- which sounds silly, but is a legitimately nice workflow improvement.

Even better than all the familiarity is that in my experience almost everything in Sapling is fast, even with a Mozilla central sized Repo. It feels as fast or faster than git, and definitely faster than Mercurial. Because it is so fast it can make some interesting decisions that surprised me. For example, if you amend a commit in the middle of a stack, it will automatically restack all dependent commits immediately.

Sapling also has some really fantastic tooling:

  • The Interactive Smart Log (ISL) gives you a browser based UI for dealing with your tree. I was totally skeptical, but it is very good. You can rebase with drag-and-drop, clean up you tree all from the UI. It's impressive
  • The Sapling VSCode Plugin is really nice. It builds the ISL directly into VSCode, and also adds some really delightful touches, like showing the history annotation for each line of code as you edit it. Using this for handling merge conflicts is legitimately quite nice.

What's Not as Good

Well, firstly: Mozilla's code base has no idea what to do about sapling, to varying levels of problematic. I've made one fix so far, but organizationally I don't want to make more work for the engineering workflow teams, so some things I sort of expect will at best be clunky in a sapling repo.

Some examples:

  • mach bootstrap doesn't error out or anything, but definitely seems to work incorrectly when run inside a sapling repo.
  • mach clang-format relies on figuring out the outgoing set, so it doesn't work at the moment. It's possible to work around this one however.

Sapling itself for sure isn't perfect:

  • I've run into a crash while rebasing once; nothing seemed to be lost though and sl rebase --continue finished the job.
  • The ISL seems finicky at times; it will throw an exception and be broken until reload occasionally.
  • Some aspects of my workflow haven't been implemented in Sapling. For example, I used to make heavy use of hg rebase --stop which would stop a partially completed rebase and leave some dependent changes as un-evolved; this doesn't seem to have an equivalent in Sapling, which provides only --abort and --continue
  • Getting Sapling setup to work properly took some more effort and a few more gotcha's than I expected.
  • Sapling's histedit doesn't use the lovely TUI that mercurial provides, and thus is just... clunky. Interestingly, the sl amend commit in the middle of the stack workflow is kind of nicer for quick edits.
  • I think Sapling's history editing capabilities seem to be only about 50% as powerful as evolve -- I cannot figure out an equivalent to the hg obslog.

One major pain point for me at the moment that I don't have a good answer for is format-on-commit, which I relied pretty heavily on. Apparently Sapling does have hooks, but I haven't yet figured out if they're usable as pre-commit hooks yet.

The Workflow

Basically, the workflow is the following diagram:

I'll explain some more in detail below

Getting Started

  1. Get yourself Sapling
  2. Get yourself a git-cinnabar clone of central: See the So what's next heading of this blog post
  3. sl clone from the local git repo into a new sapling repo.
  4. Do your work inside your sapling repo! Check out the guide here
  5. To make the smartlogs work properly, and to teach Sapling what is 'public', you need to tell it what remote refs are public: sl config --local 'remotenames.publicheads=remote/central'. If you don't do this expect ISL to fall over, and sl to complain about the number of public commits.

Push to try:

  1. sl push --to tmp-PushToTry
  2. cd ../git-clone
  3. git checkout tmp-PushToTry
  4. ./mach try ...
  5. cd ../sapling-clone

Of course, replace tmp-PushToTry as appropriate. And if you've previously used that branch name, or need to update it --force works wonders.

You'll also likely be interested in this git repo setting: git config --local receive.denyCurrentBranch updateInstead which is a nice quality of life improvement rather than getting yelled at.

moz-phab submit

  1. sl push --to FeatureSubmit
  2. cd ../git-clone
  3. `moz-phab submit --upstream central
  4. cd ../sapling-clone
  5. sl pull
  6. sl (use smart log to find the updated commit with the differential ID added)
  7. sl goto updated commit;
  8. sl hide old stack (technically optional, but recommended)

Future Explorations

  • You can probably intuit that it seems totally feasible to script most of the above interactions with the git clone. Definitely a possible future path for me.
    • Hanging out in the Sapling discord has made me aware that there's experimental work happening on a dotgit mode that will have a .git repo; in that world, I suspect a lot of this workflow would be obviated, but it sounds like this is still experimental and I'm not sure how actively it's being developed.
  • Apparently there used to be a Phabricator extension, since deleted, which might be resuscitable. Ideally this would allow bypassing moz-phab submit

Concerns

I do have some reservations about going whole-hog onto sapling.

  1. Sapling is first and foremost Meta's tool. I worry, despite a fairly clear CONTRIBUTING.md that if I need to fix sapling, it'll be a PITA to actually get fixes landed -- but the repo is already filled with a bunch of merged PRs, so this could be just paranoia!
  2. Add-ons (e.g. plugins) are an important workflow aid... however I'm bad at Python, and from chatting in the Sapling discord, it definitely seems like it's a bit rough -- essentially you write against the internal Sapling python API, which is perhaps more than I would like.

Other Notes for Explorers:

  • Launching ISL on a remote machine manually: sl isl --no-open -p 8081 -- provides token for remote access.
  • You can use sl goto and specify a remote git revision and it will just figure it out, though you have to use the full git hash.

Exciting times ahead.

Mozilla: Six Years!

I've now made it to six years at Mozilla. It's been an interesting year. I was off on parental leave for some of it as I had a second child.

Among the interesting things I tackled this year:

This year I handed ownership of the DOM Streams component over to Kagami Rosylight, who is a much more able steward of it in the long term than I could be. They have done a wonderful job.

Traditionally I update my Bugzilla statistics here as well:

  • Bugs filed 808 (+79)
  • Comments made 3848 (+489)
  • Assigned to 432 (+67)
  • Commented on 1458 (+249)
  • Patches submitted 1173 (+121)
  • Bugs poked 2498 (+685)

This year I've dropped the patches reviewed line, because it seems like with Phabricator I am no longer getting a good count on that. There's no way I've reviewed only 94 patches... I have reviewed more patches for Temporal alone in the last year!

You may notice that I've poked a large number of bugs this year. I've started taking time after every triage meeting to try and close old bugs that have lingered in our backlog for ages, and no longer have any applicability in 2023, for example bugs due to old makefile issues when we no longer use makefiles.

This is something more of us in the Triage team have started working on as well, based on the list of 'unrooted' SpiderMonkey bugs (see queries here). It's my sincere hope that sometime late next year our bug backlog will be quite a bit more useful to us.

Exploring Jujitsu (jj)

Edit: The tool is actually called Jujutsu, not Jujitsu… my apologies for getting this wrong throughout here. I’ve left the below intact for posterity, but it’s definitely wrong.

With the news that Firefox development is moving to git, and my own dislike of the git command line interface, I have a few months to figure out if there's a story that doesn't involve the git cli that can get me a comfortable development flow.

There's a few candidates, and each is going to take some time to correctly evaluate. Today, I have started evaluating Jujitsu, a git compatible version control system with some interesting properties.

  • The CLI is very mercurial inspired, and shares a lot of commonalities in supported processes (i.e anonymous head based development)
  • The default log is very similar to mozilla's hg wip
  • It considers the working directory to be a revision, which is an interesting policy.

Here's how I have started evaluating Jujitsu.

  1. First I created a new clone of unified which I called unified-git. Then, using the commands described by glandium in his recent blog post about the history of version control at mozilla I converted that repo to have a git object store in the background.
  2. I then installed Jujitsu. First I did cargo install binstall, then I did cargo cargo-binstall jj to get the binary of jj.
  3. I then made a co-located repository, by initializing jujitsu with the existing git repo: jj init --git-repo=.

After this, I played around, and managed to create a single commit which I have already landed (a comment fix, but still, it was a good exploration of workflow).

There is however, I believe a showstopper bug on my mac, which will prevent me from using jujitsu seriously on my local machine -- I will likely still investigate the potential on my linux build box however.

The bug is this one, and is caused by a poor interaction between jujitsu and insensitive file systems. It means that my working copy will always show changed files existing (at least on a gecko-dev derived repo), which makes a lot of the jujitsu magic and workflow hard.

Some notes from my exploration:

Speed:

This was gently disappointing. While the initial creation of the repo was fast (jj init took 1m11s on my M2 Macbook Pro), every operation by default does a snapshot of the repo state. Likely because of the aforementioned bug, this leads to surprising outcomes: for example, jj log is way slower than hg wip on the same machine (3.8s vs 2s). Of course, if you put jj log --ignore-working-copy, then it's way faster (0.6s), but I don't yet know if that's a usable working system.

Workflow

I was pretty frustrated by this, but in hindsight a lot of the issues came from having the working copy always seeming dirty. This needs more exploration.

  • jj split was quite nice. I was surprised to find out jj histedit doesn't yet exist
  • I couldn't figure out the jj equivalent of hg up . --clean -- this could be every history navigating tool, but because of the bug, it didn't feel like it.

Interaction with Mozilla tools

moz-phab didn't like the fact that my head was detached, and refused to deal with my commit. I had to use git to make a branch (for some reason a Jujitsu branch didn't seem to suffice). Even then, I'm used to moz-phab largely figuring out what commits to submit, but for some reason it really really struggled here. I'm not sure if that's a git problem or a Jujitsu one, but to submit my commit I had to give both ends of a commit range to have it actually do something.

Conclusion

I doubt this will be the last jujitsu post I write -- I'm very interested in trying it in a non-broken state; the fact that it's broken on my Mac however is going to really harm it's ability to become my default.

I've got some other tools I'd like to look into:

  • I've played with Sapling before, but because there's no backing git repo, it may not serve my purposes, as moz-phab wont work (push to try as well, I'll bet) but... maybe if I go the Steve Fink route and write my own phabricator submission tool... maybe it would work.
  • git-branchless looks right up my alley, and is the next took to be evaluated methinks.

Edited: Fixed the cargo-binstall install instruction (previously I said cargo install binstall, but that's an abandoned crate, not the one you want).