Finding hot allocation sites with bpftrace

So, last time I said I was hooked, and that I wrote an allocation profiler.

That’s true! And it was comparatively easy.

Starting from the baseline of SpiderMonkey which sleeps before exit from the last blog post, the profiler looks like this:

BEGIN
{
   printf("Attaching probes for %s", str($1))
}

uprobe:$1:cpp:"*gcprobes::CreateObject*"
{
   // 7 frames as the hook is ususally quite deep, so we want
   // to see user code too. 
   @alloc[ustack(7)] = count()
}

uprobe:$1:cpp:JS_ShutDown
{
   print(@alloc);
}

END {
    // Don't print map on exit.
    clear(@alloc);
}

Simple right? There’s two wrinkles

  1. To get gcprobes working in JIT code, which is where a lot of allocations happen you have to compile with --enable-gcprobes which short circuits some JIT allocation paths. For my purposes this is fine, but is a caution for production workloads.
  2. To make sure the symbols is findable by bpf, I again marked gcprobes::CreateObject as MOZ_NEVER_INLINE.

Smooth sailing. Except this wouldn’t be a fun blog post if there wasn’t a hiccup.

Here’s the output from a random workload:

@alloc[
    js::gc::gcprobes::CreateObject(JSObject*)+0
    js::NativeObject::create(JSContext*, js::gc::AllocKind, js::gc::Heap, JS::Handle<js::SharedShape*>, js::gc::AllocSite*)+694
    0x963f590dd53
    0x963f5954b40
    0x963f5952235
    0x963f595426d
    0x963f5952235
]: 1979955

Well that’s not great. Our call stack is from JIT code, so it’s hard to read.

As far as I can tell there’s not builtin support in bpftrace for this. But SpiderMonkey does have some perf integration in the form of jitdump files, enabled by running with PERF_SPEW_DIR=<dir> and IONPERF=func.

So what if we used that file to map these addresses back? I could not for the life me find a tool that would do this. But it turns out my colleague Markus has already written linux-perf-data, the rust crate, which has support for JitDump parsing!

So with the help of AI... I hacked together [mgaudet/jitdump_filter](https://github.com/mgaudet/jitdump_filter), a totally untested rust tool for querying jitdumps and acting like c++filt for jit addresses.

Now... the readme really says it like it is:

Note: This is not a place of honour. No highly esteemed deed is commemorated here. Nothing valued is here. This code was hacked together with AI and is shared only to allow others to run it, rather than learn from it.

But it works.

I tweaked SpiderMonkey to dump the pid into the log file, then used a hacky bash script to post-process the BPF logs into something with symbols:

#!/bin/bash

BENCHMARKS=(
Air
# ...
)

DUMPER=$HOME/tmp/jitdump_filter

for BENCH in "${BENCHMARKS[@]}"; do
    echo "Running benchmark: $BENCH"
   # use env to allow setting environment variables under bpftrace
   sudo bpftrace ~/objectProfiler.bt /home/matthew/unified-git/obj-opt-shell-nodebug-x86_64-pc-linux-gnu/dist/bin/js -c "env PERF_SPEW_DIR=/tmp/perf/ IONPERF=func /home/matthew/unified-git/obj-opt-shell-nodebug-x86_64-pc-linux-gnu/dist/bin/js cli.js $BENCH"  | tee "allocs/${BENCH}-allocs"
   # Each file has a xxxpid=<pid> line inside; grab the pid:
   PID=$(cat allocs/${BENCH}-allocs | grep xxxpid | head -n 1 | sed 's/.*xxxpid=\([0-9]*\).*/\1/')
   echo "PID: $PID"

    echo 'Running cat allocs/${BENCH}-allocs | $DUMPER /tmp/perf/jit-${PID}.dump > allocs/${BENCH}-allocs-annotated.txt'
   cat allocs/${BENCH}-allocs | $DUMPER /tmp/perf/jit-${PID}.dump | tee allocs/${BENCH}-allocs-annotated.txt
done

Gross. But it worked great!

@alloc[
    js::gc::gcprobes::CreateObject(JSObject*)+0
    js::NativeObject::create(JSContext*, js::gc::AllocKind, js::gc::Heap, JS::Handle<js::SharedShape*>, js::gc::AllocSite*)+694
    0x963f590dd53 VMWrapper: NewPlainObjectOptimizedFallback
    0x963f5954b40 Ion: rewrite_args_nboyer (@evaluate:3624:35)
    0x963f5952235 Ion: rewrite_nboyer (@evaluate:3598:30)
    0x963f595426d Ion: rewrite_args_nboyer (@evaluate:3624:35)
    0x963f5952235 Ion: rewrite_nboyer (@evaluate:3598:30)
]: 1979955

I’m far less confident these allocation profiles will turn out to be actionable compared to the rooting ones, but I’m very happy with the process by which I got here!