Mozilla: Six Years!

I've now made it to six years at Mozilla. It's been an interesting year. I was off on parental leave for some of it as I had a second child.

Among the interesting things I tackled this year:

This year I handed ownership of the DOM Streams component over to Kagami Rosylight, who is a much more able steward of it in the long term than I could be. They have done a wonderful job.

Traditionally I update my Bugzilla statistics here as well:

  • Bugs filed 808 (+79)
  • Comments made 3848 (+489)
  • Assigned to 432 (+67)
  • Commented on 1458 (+249)
  • Patches submitted 1173 (+121)
  • Bugs poked 2498 (+685)

This year I've dropped the patches reviewed line, because it seems like with Phabricator I am no longer getting a good count on that. There's no way I've reviewed only 94 patches... I have reviewed more patches for Temporal alone in the last year!

You may notice that I've poked a large number of bugs this year. I've started taking time after every triage meeting to try and close old bugs that have lingered in our backlog for ages, and no longer have any applicability in 2023, for example bugs due to old makefile issues when we no longer use makefiles.

This is something more of us in the Triage team have started working on as well, based on the list of 'unrooted' SpiderMonkey bugs (see queries here). It's my sincere hope that sometime late next year our bug backlog will be quite a bit more useful to us.

Exploring Jujitsu (jj)

With the news that Firefox development is moving to git, and my own dislike of the git command line interface, I have a few months to figure out if there's a story that doesn't involve the git cli that can get me a comfortable development flow.

There's a few candidates, and each is going to take some time to correctly evaluate. Today, I have started evaluating Jujitsu, a git compatible version control system with some interesting properties.

  • The CLI is very mercurial inspired, and shares a lot of commonalities in supported processes (i.e anonymous head based development)
  • The default log is very similar to mozilla's hg wip
  • It considers the working directory to be a revision, which is an interesting policy.

Here's how I have started evaluating Jujitsu.

  1. First I created a new clone of unified which I called unified-git. Then, using the commands described by glandium in his recent blog post about the history of version control at mozilla I converted that repo to have a git object store in the background.
  2. I then installed Jujitsu. First I did cargo install binstall, then I did cargo cargo-binstall jj to get the binary of jj.
  3. I then made a co-located repository, by initializing jujitsu with the existing git repo: jj init --git-repo=.

After this, I played around, and managed to create a single commit which I have already landed (a comment fix, but still, it was a good exploration of workflow).

There is however, I believe a showstopper bug on my mac, which will prevent me from using jujitsu seriously on my local machine -- I will likely still investigate the potential on my linux build box however.

The bug is this one, and is caused by a poor interaction between jujitsu and insensitive file systems. It means that my working copy will always show changed files existing (at least on a gecko-dev derived repo), which makes a lot of the jujitsu magic and workflow hard.

Some notes from my exploration:

Speed:

This was gently disappointing. While the initial creation of the repo was fast (jj init took 1m11s on my M2 Macbook Pro), every operation by default does a snapshot of the repo state. Likely because of the aforementioned bug, this leads to surprising outcomes: for example, jj log is way slower than hg wip on the same machine (3.8s vs 2s). Of course, if you put jj log --ignore-working-copy, then it's way faster (0.6s), but I don't yet know if that's a usable working system.

Workflow

I was pretty frustrated by this, but in hindsight a lot of the issues came from having the working copy always seeming dirty. This needs more exploration.

  • jj split was quite nice. I was surprised to find out jj histedit doesn't yet exist
  • I couldn't figure out the jj equivalent of hg up . --clean -- this could be every history navigating tool, but because of the bug, it didn't feel like it.

Interaction with Mozilla tools

moz-phab didn't like the fact that my head was detached, and refused to deal with my commit. I had to use git to make a branch (for some reason a Jujitsu branch didn't seem to suffice). Even then, I'm used to moz-phab largely figuring out what commits to submit, but for some reason it really really struggled here. I'm not sure if that's a git problem or a Jujitsu one, but to submit my commit I had to give both ends of a commit range to have it actually do something.

Conclusion

I doubt this will be the last jujitsu post I write -- I'm very interested in trying it in a non-broken state; the fact that it's broken on my Mac however is going to really harm it's ability to become my default.

I've got some other tools I'd like to look into:

  • I've played with Sapling before, but because there's no backing git repo, it may not serve my purposes, as moz-phab wont work (push to try as well, I'll bet) but... maybe if I go the Steve Fink route and write my own phabricator submission tool... maybe it would work.
  • git-branchless looks right up my alley, and is the next took to be evaluated methinks.

Edited: Fixed the cargo-binstall install instruction (previously I said cargo install binstall, but that's an abandoned crate, not the one you want).

CacheIR: The Benefits of a Structured Representation for Inline Caches

In less than a week (😱) myself and my colleague Iain Ireland will be in Portugal, presenting our paper on CacheIR at MPLR, co-located with SPLASH 2023. Here’s our preprint (edit: and official ACM DL link), and here’s the abstract:

Inline Caching is an important technique used to accelerate operations in dynamically typed language implementations by creating fast paths based on observed program behaviour. Most software stacks that support inline caching use low-level, often ad-hoc, Inline-Cache (ICs) data structures for code generation. This work presents CacheIR, a design for inline caching built entirely around an intermediate representation (IR) which: (i) simplifies the development of ICs by raising the abstraction level; and (ii) enables reusing compiled native code through IR matching techniques. Moreover, this work describes WarpBuilder, a novel design for a Just-In-Time (JIT) compiler front-end that directly generates type-specialized code by lowering the CacheIR contained in ICs; and Trial Inlining, an extension to the inline-caching system that allows for context-sensitive inlining of context-sensitive ICs. The combination of CacheIR and WarpBuilder have been powerful performance tools for the SpiderMonkey team, and have been key in providing improved performance with less security risk.

This paper is the paper on CacheIR that I have wanted to exist for years, at least since I wrote this blog post in 2018. Since then, we’ve taken inline caching and pushed it even further with the addition of WarpBuilder, and so we cover even more of the power that CacheIR unlocks. I think this is a really fascinating design point which provides large amounts of software engineering leverage when building your system, and so I’m very happy to see that we’ve managed to publish a paper on this. We didn’t even cover everything about CacheIR in this paper — for example, we didn’t talk about tooling such as the CacheIR Analyzer or CacheIR health tool.

It’s my hope that we’ll seed conversations with this paper and find more academic collaborations and inspire more designs with high leverage. I’d be glad to answer questions or hear comments!

Thanks to our co-authors! Jan (who deserves the credit of having come up with CacheIR), Nathan (who did a bunch of work on the paper) and Nelson, always a happy guide to academia.

Viewing Missed Clang Optimizations in SpiderMonkey

Triggered by this blog post about -fsave-optimization-record, and Max Bernstein asking about it, and then pointing out this neat front end to the data, I figured I'd see what it said for SpiderMonkey.

Here's my my procedure:

First I created a new mozconfig. The most important part being

ac_add_options --enable-application=js

ac_add_options --enable-optimize
ac_add_options --disable-debug

export CFLAGS="$CFLAGS -fsave-optimization-record"
export CXXFLAGS="$CXXFLAGS -fsave-optimization-record"

Then I built SpiderMonkey. Then I asked OptView2 to generate my results for the JS directory:

./optview2/opt-viewer.py --output-dir js --source-dir ~/unified/ ~/unified/obj-opt-shell-nodebug-opt-recordx86_64-pc-linux-gnu/  -j10

After waiting a bit, it filled a directory with HTML files. I've uploaded them to GitHub, and published on GitHub Pages.

It certainly seems like this has interesting information! But there's a ton to go through, so for now just posting this blog post so people can reproduce my method. The OptView2 index isn't amazing either, so it's worth looking at specific files too.

Working in the Open & Psychological Safety

It was really interesting to read the article "The Curious Side Effects of Medical Transparency" as an Open Source developer. The feelings the doctor describes are deeply familiar to me, as we struggle with transparency in open source projects.

These aren't original thoughts, but I don't know how we adequately manage psychological safety while working in the open. You want your team to be able to share ideas, and have discussions without worrying about harrasment or someone misconstruing (intentionally perhaps) the words being used.

At the same time, the whole point of being open is that there's value in open community; if planning happens exclusively in private there's no opportunity to for the community to provide input or to even come to your aid.

I wish I had good answers, or original thoughts here, but I don't, and I'd be happy to read thoughts from anyone who does have answers or good practices.

Mozilla: 5 years

I missed my 5 year anniversary at Mozilla by a few days here.

As is my tradition, here’s my Bugzilla user stats (from today — I am 3 days late from my real anniversary which was the 27th)

  • Bugs filed 729
  • Comments made 3359
  • Assigned to 365
  • Commented on 1209
  • Patches submitted 1052
  • Patches reviewed 94
  • Bugs poked 1813

The last year was a big one. Tackled my biggest project to date, which ironically, wasn’t even a SpiderMonkey project really: I worked on reimplementing the WHATWG Streams standard inside the DOM. With the help of other Mozillians, we now have the most conformant implementation of the Streams specification by WPT testing. I became the module owner of the resulting DOM Streams module.

I also managed to get a change into the HTML spec, which is a pretty neat outcome.

I’m sure there’s other stuff I did… but I’m procrastinating on something by writing this blog post, and I should get back to that.

Faster Ruby: Thoughts from the Outside

(This is Part II of the Faster Ruby posts, which started with a retrospective on Ruby+OMR, a Ruby JIT compiler I worked on five years ago)

As someone who comes from a compiler background, when asked to make a language fast, I’m sympathetic to the reaction: “Just throw a compiler at it!”. However, working on SpiderMonkey, I’ve come to the conclusion that a fast language implementation has many moving parts, and a compiler is just one part of it.

I’m going to get to Ruby, but before I get there, I want to take a tour briefly of some bits and pieces of SpiderMonkey that help make it a fast JavaScript engine; from that story, you may be able to intuit some of my suggestions for how Ruby ought to move forward!

Good Bones in a Runtime

It’s taken me many years of working on SpiderMonkey to internalize some of the implications of various design choices, and how they drive good performance. For example, let’s discuss the object model:

In SpiderMonkey, a JavaScript Object consists of two pieces: A set of slots, which store values, and a shape, which describes the layout of the object (which property ends up in which slot)

Shapes are shared across many objects with the same layout:

var a = [];
for (var i = 0; i < 1000; i++) { 
    var o = {a: 1, b: 2};
  a.push(o)
}

In the above example, there are a thousand objects in the array, but all those objects share the same shape.

Recall as well, that JavaScript is a prototype language; each object has a prototype; so there’s a design decision: for a given object, where do you store the prototype?

It could well be in a slot on the object, but that would bloat objects. Similar to how layouts are shared across many different objects, there are many objects that share a prototype. In the above example, every object in the array has a prototype of Object.protoype. We therefore associate the prototype of an objet not with the object itself, but rather with the shape of the object. This means that when you mutate the prototype of an object (Object.setPrototypeOf), we have to change the shape of the object.

Given that all property lookup is based on either the properties of an object, or the prototype chain of an object, we now have an excellent key upon which to build a cache for property access. In SpiderMonkey, these inline caches are associated with property access bytecodes; each stub in the inline cache chain for a bytecode trying to do a property load like o.b ends up looking like this:

if (!o.hasShape(X)) { try next stub; } 
return o.slots(X.slotOf('b'))

Inline Caches are Stonkingly Effective

I’ve been yammering on about inline caches to pretty much anyone who will listen for years. Ever since I finally understood the power of SpiderMonkey’s CacheIR system, I’ve realized that inline caches are not just a powerful technique for making method dispatch fast, but they’re actually fundamental primitives for handling a dynamic language’s dynamism.

So let’s look briefly at the performance possibilities brought by Inline Caches:

Octane Scores (higher is better):
Interpreter, CacheIR, Baseline, Ion: 34252  (3.5x) (46x total)
Interpreter, CacheIR, Baseline:      9887   (2.0x) (13x total)
Interpreter, CacheIR:                4890   (6.6x)
Interpreter:                         739

Now: Let me first say outright, Octane is a bad benchmark suite, and not really representative of the real web… but it runs fast and produces good enough results to share in a blog post (details here).

With that caveat however, you can see the point of this section: well designed inline caches can be STONKINGLY EFFECTIVE: just adding our inline caches improves performance by more than 6x on this benchmark!

The really fascinating thing about inline caches, as they exist in SpiderMonkey, is that they serve to accelerate not just property accesses, but also most places where the dynamism of JavaScript rears its head. For example:

function add(a,b) { return a + b; } 
var a = add(1,1);
var b = add(1,"1");
var c = add("1","1");
var d = add(1.5,1);

All these different cases have to be handled by the same bytecode op, Add.

loc     op
——   ——
main:
00000:  GetArg 0                        # a
00003:  GetArg 1                        # a b
00006:  Add                             # (a + b)
00007:  Return                          #

So, in order to make this fast, we add an Inline Cache to Add, where we attach a list of type-specialized stubs. So the first stub would be be specialized to the Int32+Int32 case, the second to the Int32+String and so on and so forth.

Since typically types are relatively stable at a particular bytecode op, this strategy is very effective for speeding up execution time.

Making Ruby Fast: Key Pieces

Given the above story, you would be unsurprised to hear that I would suggest starting with improving the Ruby Object model, providing shapes.

The good news for Ruby is that there are people from Shopify pushing this exact thing. This blog post, The Future Shape of Ruby Objects, from Chris Seaton is a far more comprehensive and Ruby focused introduction to shapes than I wrote above, and the bug tracking this is here.

The second thing I’d do is invest in just enough JIT compilation technology to allow the creation of easy to reason about inline caches. Because I come from SpiderMonkey, I would almost certainly shamelessly steal the general shape of CacheIR, as I think Jan de Mooij has really hit on something special with its design. This would likely be a very simple template-JIT, done method-at-a-time.

When I worked on Ruby+OMR I didn’t have a good answer for how to handle the dynamism of Ruby, due to a lack of practical experience. There was a fair amount of hope that we could recycle the JIT profiling architecture from J9, accumulating data from injected counters in a basic compilation of a method, and feeding into a higher-optimizing recompilation that would specialize further. It’s quite possible that this could work! However, having seen the inline caching architecture of SpiderMonkey, I realize now that JIT profiling would have been maybe the costliest way we could generate the data we would need for type specialization. I may well have read this paper, but I don’t think I understood it.

Today in SpiderMonkey, all our type profiling is done through our inline caches. Our top tier compiler frontend, WarpBuilder, analyzes the inline cache chains to determine what is the actual important cases we should speculate on. We even do a neat trick with ICs to power smart inlining. Today, the thing I wish a project like OMR would provide most is the building blocks for a powerful inline cache system.

In the real world, YJIT is a really interesting JIT for Ruby being built around the fascinating Basic Block Versioning (BBV) architecture that Maxime Chevalier-Boisvert built during her PhD, an architecture I and other people who have worked on SpiderMonkey really admired as innovative. As I understand it, YJIT doesn’t need to lean on inline caching nearly as much as SpiderMonkey does, as much of the optimizations provided naturally fall out of the versioning of basic blocks. Still, in her blog post introducing YJIT, Maxime does say that even YJIT would benefit from shapes, something I definitely can believe.

Open Questions, to me at least

  • Does Ruby in 2022 still have problems with C-extensions? Once upon a time we were really concerned about how opaque C-extensions were. TruffleRuby used the really neat Sulong technology to solve this.

    Does the C-extension API need to be improved to allow a fast implementation to exist? Unsure.

    SpiderMonkey has the advantage of working in a ‘closed world’ mostly, where native code integrations are fairly tightly coupled. This doesn’t describe Ruby Gems that lean on the C-APIs.

  • What kind of speedup is available for big Rails applications? If 90% of the time in an application is spent in database calls, then there’s little opportunity for improvement via JIT technologies.

Conclusion

I’ve been out of the Ruby game for a long time. Despite that, I find myself thinking back to it frequently. Ruby+OMR was, in hindsight, perhaps a poor architectural fit. As dynamic as Java is, languages like JavaScript and Ruby mean that the pressure on compilation technology is appreciably different.

With the lessons I’ve learned, it seems to me that a pretty fast Ruby is probably possible. JavaScript is a pretty terrible language to make fast, and it’s achieved it (having a huge performance war between implementations, causing huge corporations to pour resources into JS performance helped… maybe Ruby needs a performance war). I’m glad to see the efforts coming out of Shopify — I really think they’ll pay huge dividends over the next few years. I wish that team much luck.

(There’s some really excellent discussion about this post over at Hacker News)

Faster Ruby: A Ruby+OMR Retrospective

Years ago I worked on a project called Ruby+OMR. The goal of that project was to integrate Eclipse OMR, a toolkit for building fast language runtimes, into Ruby, to make it faster. I worked on the JIT compiler, but there was also work to integrate the OMR Garbage Collector to replace the Ruby one.

After the project had trickled down to a stop, I wrote a retrospective blog post about the project, but never published it. Then, I moved on from IBM and started working at Mozilla, on SpiderMonkey, their JavaScript engine.

Working at Mozilla I’ve learned enormous amounts about how dynamic languages can be made fast, and what kind of changes are the most important to seeing performance.

Now feels like a reasonable time to update and expand that retrospective, and then I have a second follow up blog post I'll post tomorrow about how I’d make Ruby fast these days if I were to try, from the perspective of someone who’s not been involved in the community for five years.

Retrospective

It has been five years since I stopped working on Ruby+OMR, which is far enough in the past that I should refresh people’s memories.

Eclipse OMR is a project that came out of IBM. The project contains a series of building blocks for building fast managed language runtimes: Garbage collection technology, JIT compiler technology, and much more.

The origin of the project was the J9 Java Virtual machine (later open sourced as OpenJ9). The compiler technology, called Testarossa, was already a multi-language compiler, being used in production IBM compilers for Java, COBOL, C/C++, PL/X and more.

The hypothesis behind OMR was this: If we already had a compiler that could be used for multiple languages, could we also extend that to other technologies in J9? Could we convert the JIT compiler, GC and other parts, turning them into a library that could be consumed by other projects, allowing them to take advantage of all the advanced technology that already existed there?

Of course, this wasn’t a project IBM embarked on for 100% altruistic reasons: Runtimes built on top of OMR would, by their very nature, come with good support for IBM’s hardware platforms, IBM Z and POWER , a good thing considering that there had been challenges getting another popular language runtime onto those platforms.

In order to demonstrate the possibilities of this project, we attempted to connect OMR to two existing language runtimes: CPython, and MRI Ruby. I honestly don’t remember the story of what happened with CPython+OMR; I know it had more challenges than Ruby+OMR.

My Ruby+OMR Story

By the time I joined the Ruby+OMR Project, the initial implementation was well underway, and we were already compiling basic methods.

I definitely remember working on trying to help the project get out the door… but honestly, I have relatively little recollection of concrete things I did in those days. Certainly I recall doing lots of work to try to improve performance, running benchmarks, making it crash less.

I do know that we decided to make sure we landed with a Big Bang. So we submitted a talk to RubyKagi 2015, which is the premiere conference for Ruby developers in Japan, and a conference frequented by many of the Ruby Core team.

I would give a talk on JIT technology, and Robert Young and Craig Lehman gave a talk on the GC integration. Just days before the talks, we open sourced our Ruby work (squashing the commit history, which as I try to write this retrospective, I understand and yet wish we hadn’t needed to).

I spent ages building my RubyKaigi talk. It felt so important that we land with our best feet forward. I iterated on my slides many times, practiced, edited and practiced some more.

The thing I remember most from that talk was the moment when I looked down into the audience, and saw Matz, the creator of Ruby, sitting in the front row, his head down and eyes closed. I thought I had managed to put him to sleep. Somewhere in the video of that talk you can spot it happening: Suddenly I start to stumble over my slides, and my voice jumps a half-register, before I manage to recover.

That Ruby Kaigi was also interesting: that was the one where Matz announced his goal Ruby3x3: Ruby 3 would be 3x faster than Ruby 2.0. It seemed like our JIT compiler would be a potentially key part of this!

We continued working on Ruby, and I returned to RubyKaigi ten months later, in September of 2016. I gave a talk, this time, about trying to nail down how specifically we would measure Ruby 3x3. To date, this is probably the favourite talk I have ever given; a relatively focused rant on the challenges of measuring computer performance and the various ways you can mislead yourself.

It was at this RubyKaigi that we had some conversations with the Ruby Core team about trying to integrate OMR into the Ruby Core. Overall, they weren’t particularly receptive. There were a number of concerns. In June of 2017, those concerns became a part of a a talk Matz gave in Singapore, where he called them the ‘hidden rules’ of Ruby 3x3:

  • Memory Requirements: He put it this way: Ruby's memory requirements are driven by Heroku's smallest dyno, which had 512mb of RAM at the time.

  • Dependency: Ruby is long lived, 25 years old almost, and so there was a definite fear of dependency on another project. He put it this way: If Ruby were to add another dependency, that dependency ought to be as stable as Ruby itself.

  • Maintainability: Overall maintainability matters: Unmaintainable code stops evolution, so the Ruby Core team must be able to maintain whatever JIT is proposed.

By this point, the OMR team had already scaled effort on Ruby+OMR to effectively zero, but if we hadn’t done that, this talk would have been the death-knell to Ruby+OMR, purely on the second two points. While we had a road to improved memory usage, we were by definition a new project, and a complex one at that. We’d never become the default JIT compiler for Ruby.

The rest of the talk focused on a project being done by a Toronto based Red Hat developer named Vladimir Makarov, called MJIT. MJIT added a JIT compiler to Ruby by translating the bytecode of a Ruby method to a small C file, invoking GCC or Clang to compile that C File into a shared object, and then loading the newly compiled shared object to back the Ruby method.

Editorializing, MJIT was a fascinating approach. It's not quite a bytecode level JIT, because it feeds the underlying compiler (gcc) not raw bytecode, but C code that executes the same code that the bytecode would, as well as a pre-compiled header with all the required definitions. Since MJIT is looking at C code, it is free do do all sorts of interesting optimization at the C level, that a bytecode level JIT similar to Testarossa would never see. This turns out to be a really interesting work around for a problem that Evan Phoenix pointed out in his 2015 RubyKaigi Keynote, which he called the Horizon Problem. In short, the issue is that a JIT compiler can only optimize what it sees: but in a bytecode JIT for Ruby, like Ruby+OMR huge swathes (possibly even the majority) of the important semantics are hidden away as calls to C Routines, and therefore provide barriers to optimization. MJIT would be limited in what optimizations were possible by the contents of the pre-compiled header, which ultimately would define most of the 'optimization horizon'.

Furthermore, MJIT solved in a relatively nice way many of the maintenance problems that concerned the Ruby core community: By producing C code, the JIT process would be relatively easily debuggable, by being able to reason via C code, which the Ruby Core developers are obviously proficient at.

I haven’t paid a lot of attention to the Ruby community since 2017, but MJIT did get integrated into Ruby, and at least according to the git history, appears to still be maintained.

I was very excited to see Maxime Chevalier-Boisvert announce YJIT, as I loved her idea of basic block versioning. I’m excited to see that project grow inside of Ruby. One thing that project has done excellently is include core developers early, and get into the official Ruby tree early.

What did Ruby+OMR accomplish?

Despite Ruby+OMR’s failure to form the basis of Ruby’s JIT technology, or replace Ruby’s GC technology, the project did serve a number of purposes:

  • Ruby was an important test bed for a lot of OMR. It served as a proving ground for ideas over and over again, and helped the team firm up ideas about how consumption of the OMR libraries should work. Ruby made OMR better by forcing us to think about and work on our consumption story.
  • We managed to influence the Ruby community in a number of ways:
    • We showed that GC technology improvements were possible, and that they could bring performance improvement.
    • We helped influence some of the Ruby community's thoughts on benchmarking, with my talk at RubyKaigi having been called out explicitly in the development of a Rails benchmark that was used to track Ruby performance for a few years hence.

What should we have done differently in Ruby+OMR?

There's a huge number of lessons I learned from working on Ruby+OMR.

  • At the time we did the work on Ruby+OMR, the integration story between OMR and a host language was pretty weak. It required coordination between two repos, a fairly gross ‘glue’ code that was required to make the two systems talk to each other.

    A new interface, called JitBuilder was developed that may have helped, but by the time it arrived on the scene we were already knee deep in our integration into Ruby.

  • We should have made it dramatically easier, much earlier, to have people be able to try out Ruby+OMR. The Ruby community uses packaging systems to match Ruby versions to their app, like RVM and rbenv, and so we would have been very well served by pushing hard to get acceptance into these package managers early.

  • Another barrier to having people try out Ruby+OMR with the JIT enabled was our lack of asynchronous compilation. Not having asynchronous compilation left us in a state where we couldn’t be run, or basically even tested, for latency sensitive tasks like a Rails server application.

    I left tackling this one far too late, and never actually succeeded in getting it up and running. For future systems, I suspect it would be prudent to tackle async compilation very early, to ensure the design is able to cope with it robustly.

One question people have asked about Ruby+OMR is how difficult it was to keep up with Ruby’s evolution. Overall, my recollection is that it wasn’t too challenging, because we chose an initial compiler design that limited the challenge: Ruby+OMR produced IL from Ruby bytecode (which didn’t change a lot release to release), and a lot of the Ruby bytecodes were implemented in the JIT purely by calling directly into appropriate RubyVM routines. This meant that the OMR JIT compiler naturally kept up with relative ease, as we weren’t doing almost anything fancy that would have posed a challenge. Longer term, integration challenges would have gotten larger, but we had hoped at some point we’d end up in-tree, and have an easier maintenance story.

Conclusion

I greatly enjoyed working on Ruby+OMR, and I believed for the majority of my time working on it that we were serious contenders to become the default JIT for Ruby. The Ruby community is a fascinating group of individuals, and I really enjoyed getting to know some people there.

Ultimately, the failure of the Ruby+OMR project was, in my opinion, our lack of maturity. We simply hadn’t nailed down a cohesive story that we could tell to projects that was compelling, rather than scary. It’s too bad, as there are still pieces of the Testarossa compiler technology that I miss, almost five years since I’ve stopped working with it.

Edit History

  • Section on MJIT updated August 8, 2022, 10:45am to clarify a bit what I found to be special about MJIT after an illuminating conversation with Chris Seaton on twitter

Throttling Home Assistant Automations

Suppose you have a Home Assistant Automation, for example one that sends a notification to your phone, that you’d only like to run at most once every four hours or so.

You might google Debouncing an automation, because that’s the word that jumps into your head first, and end up here, which suggests a template condition like this:

 condition:
    condition: template
    value_template: "{{ (as_timestamp(now()) - as_timestamp(state_attr('automation.doorbell_alert', 'last_triggered') | default(0)) | int > 5)}}"

But then you have to do math, and it’s awkward.

There’s a much nicer way!

 condition:
    condition: template
    value_template: "{{now() - state_attr('automation.doorbell', 'last_triggered') > timedelta(hours=4, minutes = 1)}}"

Four Years at Mozilla

Tomorrow will be my fourth anniversary of working at Mozilla. Time flies.

This year has seen me work on everything from frontend features like class static initialization blocks to tackling the large task of re-hosting the Streams implementation in the DOM (one day, I will blog about that project).

Bugzilla Statistics

As is my tradition, here's this year's Bugzilla User Statistics. I've also done last year's because I'd gathered the data to write this post for last year, and then never posted

Year 3

  • Bugs filed: 459 (+183)
  • Comments made: 2113 (+624)
  • Assigned to: 208 (+90)
  • Commented on: 631 (+222)
  • Patches submitted: 762 (+230)
  • Bugs poked: 784 (+277)

Year 4

  • Bugs filed: 601 (+142)
  • Comments made: 2718 (+605)
  • Assigned to: 275 (+67)
  • Commented on: 930 (+299)
  • Patches submitted: 894 (+132)
  • Bugs poked: 1241 (+457)

Implementing Private Fields for JavaScript

When implementing a language feature for JavaScript, an implementer must make decisions about how the language in the specification maps to the implementation. Sometimes this is fairly simple, where the specification and implementation can share much of the same terminology and algorithms. Other times, pressures in the implementation make it more challenging, requiring or pressuring the implementation strategy diverge to diverge from the language specification.

Private fields is an example of where the specification language and implementation reality diverge, at least in SpiderMonkey-- the JavaScript engine which powers Firefox. To understand more, I'll explain what private fields are, a couple of models for thinking about them, and explain why our implementation diverges from the specification language.

Private Fields

Private fields are a language feature being added to the JavaScript language through the TC39 proposal process, as part of the class fields proposal, which is at Stage 4 in the TC39 process. We will ship private fields and private methods in Firefox 90.

The private fields proposal adds a strict notion of 'private state' to the language. In the following example, #x may only be accessed by instances of class A:

class A { 
   #x = 10;
}

This means that outside of the class, it is impossible to access that field. Unlike public fields for example, as the following example shows:

class A {
  #x = 10; // Private field
  y = 12; // Public Field
};

var a = new A();
a.y; // Accessing public field y: OK
a.#x; // Syntax error: reference to undeclared private field

Even various other tools that JavaScript gives you for interrogating objects are prevented from accessing private fields (e.g. Object.getOwnProperty{Symbols,Names} don't list private fields; there's no way to use Reflect.get to access them).

A Feature Three Ways

When talking about a feature in JavaScript, there are often three different aspects in play: the mental model, the specification, and the implementation.

The mental model provides the high level thinking that we expect programmers to use mostly. The specification in turn provides the detail of the semantics required by the feature. The implementation can look wildly different from the specification text, so long as the specification semantics are maintained.

These three aspects shouldn't produce different results for people reasoning through things (though, sometimes a 'mental model' is shorthand, and doesn't accurately capture semantics in edge case scenarios).

We can look at private fields using these three aspects:

Mental Model

The most basic mental model one can have for private fields is what it says on the tin: fields, but private. Now, JS fields become properties on objects, so the mental model is perhaps 'properties that can't be accessed from outside the class'.

However, when we encounter proxies, this mental model breaks down a bit; trying to specify the semantics for 'hidden properties' and proxies is challenging (what happens when a Proxy is trying to provide access control to properties, if you aren't supposed to be able see private fields with Proxies? Can subclasses access private fields? Do private fields participate in prototype inheritance?). In order to preserve the desired privacy properties an alternative mental model became the way the committee thinks about private fields.

This alternative model is called the 'WeakMap' model. In this mental model you imagine that each class has a hidden weak map associated with each private field, such that you could hypothetically 'desugar'

class A {
    #x = 15;
    g() {
        return this.#x;
    }
}

into something like

class A_desugared {
    static InaccessibleWeakMap_x = new WeakMap();
    constructor() {
        A_desugared.InaccessibleWeakMap_x.set(this, 15);
    }

    g() {
        return A_desugared.InaccessibleWeakMap_x.get(this);
    }
}

The WeakMap model is, surprisingly, not how the feature is written in the specification, but is an important part of the design intention is behind them. I will cover a bit later how this mental model shows up in places later.

Specification

The actual specification changes are provided by the class fields proposal, specifically the changes to the specification text. I won't cover every piece of this specification text, but I'll call out specific aspects to help elucidate the differences between specification text and implementation.

First, the specification adds the notion of [[PrivateName]], which is a globally unique field identifier. This global uniqueness is to ensure that two classes cannot access each other's fields merely by having the same name.

function createClass() {
  return class {
    #x = 1;
    static getX(o) {
      return o.#x;
    }
  };
}

let [A, B] = [0, 1].map(createClass);
let a = new A;
let b = new B;

A.getX(a);  // Allowed: Same class
A.getX(b);  // Type Error, because different class.

The specification also adds a new 'internal slot', which is a specification level piece of internal state associated with an object in the spec, called [[PrivateFieldValues]] to all objects. [[PrivateFieldValues]] is a list of records of the form:

{ 
  [[PrivateName]]: Private Name, 
  [[PrivateFieldValue]]: ECMAScript value 
}

To manipulate this list, the specification adds four new algorithms:

  1. PrivateFieldFind
  2. PrivateFieldAdd
  3. PrivateFieldGet
  4. PrivateFieldSet

These algorithms largely work as you would expect: PrivateFieldAdd appends an entry to the list (though, in the interest of trying to provide errors eagerly, if a matching Private Name already exists in the list, it will throw a TypeError. I'll show how that can happen later). PrivateFieldGet retrieves a value stored in the list, keyed by a given Private name, etc.

The Constructor Override Trick

When I first started to read the specification, I was surprised to see that PrivateFieldAdd could throw. Given that it was only called from a constructor on the object being constructed, I had fully expected that the object would be freshly created, and therefore you'd not need to worry about a field already being there.

This turns out to be possible, a side effect of some of the specification's handling of constructor return values. To be more concrete, the following is an example provided to me by André Bargull, which shows this in action.

class Base {
  constructor(o) {
    return o; // Note: We are returning the argument!
  }
}

class Stamper extends Base {
  #x = 'stamped';
  static getX(o) {
      return o.#x;
  }
}

Stamper is a class which can 'stamp' its private field onto any object:

let obj = {};
new Stamper(obj);  // obj now has private field #x
Stamper.getX(obj); // => "stamped"

This means that when we add private fields to an object we cannot assume it doesn't have them already. This is where the pre-existence check in PrivateFieldAdd comes into play:

let obj2 = {};
new Stamper(obj2);
new Stamper(obj2); // Throws 'TypeError' due to pre-existence of private field

This ability to stamp private fields into arbitrary objects interacts with the WeakMap model a bit here as well. For example, given that you can stamp private fields onto any object, that means you could also stamp a private field onto a sealed object:

var obj3 = {};
Object.seal(obj3);
new Stamper(obj3);
Stamper.getX(obj3); // => "stamped"

If you imagine private fields as properties, this is uncomfortable, because it means you're modifying an object that was sealed by a programmer to future modification. However, using the weak map model, it is totally acceptable, as you're only using the sealed object as a key in the weak map.

PS: Just because you can stamp private fields into arbitrary objects, doesn't mean you should: Please don't do this.

Implementing the Specification

When faced with implementing the specification, there is a tension between following the letter of the specification, and doing something different to improve the implementation on some dimension.

Where it is possible to implement the steps of the specification directly, we prefer to do that, as it makes maintenance of features easier as specification changes are made. SpiderMonkey does this in many places. You will see sections of code that are transcriptions of specification algorithms, with step numbers for comments. Following the exact letter of the specification can also be helpful where the specification is highly complex and small divergences can lead to compatibility risks.

Sometimes however, there are good reasons to diverge from the specification language. JavaScript implementations have been honed for high performance for years, and there are many implementation tricks that have been applied to make that happen. Sometimes recasting a part of the specification in terms of code already written is the right thing to do, because that means the new code is also able to have the performance characteristics of the already written code.

Implementing Private Names

The specification language for Private Names already almost matches the semantics around Symbols, which already exist in SpiderMonkey. So adding PrivateNames as a special kind of Symbol is a fairly easy choice.

Implementing Private Fields

Looking at the specification for private fields, the specification implementation would be to add an extra hidden slot to every object in SpiderMonkey, which contains a reference to a list of {PrivateName, Value} pairs. However, implementing this directly has a number of clear downsides:

  • It adds memory usage to objects without private fields
  • It requires invasive addition of either new bytecodes or complexity to performance sensitive property access paths.

An alternative option is to diverge from the specification language, and implement only the semantics, not the actual specification algorithms. In the majority of cases, you really can think of private fields as special properties on objects that are hidden from reflection or introspection outside a class.

If we model private fields as properties, rather than a special side-list that is maintained with an object, we are able to take advantage of the fact that property manipulation is already extremely optimized in a JavaScript engine.

However, properties are subject to reflection. So if we model private fields as object properties, we need to ensure that reflection APIs don't reveal them, and that you can't get access to them via Proxies.

In SpiderMonkey, we elected to implement private fields as hidden properties in order to take advantage of all the optimized machinery that already exists for properties in the engine. When I started implementing this feature André Bargull -- a SpiderMonkey contributor for many years -- actually handed me a series of patches that had a good chunk of the private fields implementation already done, for which I was hugely grateful.

Using our special PrivateName symbols, we effectively desuagar

class A { 
  #x = 10;
  x() { 
    return this.#x;
  }
}

to something that looks closer to

class A_desugared {
  constructor() { 
    this[PrivateSymbol(#x)] = 10; 
  }
  x() { 
    return this[PrivateSymbol(#x)];
  }
}

Private fields have slightly different semantics than properties however. They are designed to issue errors on patterns expected to be programming mistakes, rather than silently accepting it. For example:

  1. Accessing an a property on an object that doesn't have it returns undefined. Private fields are specified to throw a TypeError, as a result of the PrivateFieldGet algorithm.
  2. Setting a property on an object that doesn't have it simply adds the property. Private fields will throw a TypeError in PrivateFieldSet.
  3. Adding a private field to an object that already has that field also throws a TypeError in PrivateFieldAdd. See "The Constructor Override Trick" above for how this can happen.

To handle the different semantics, we modified the bytecode emission for private field accesses. We added a new bytecode op, CheckPrivateField which verifies an object has the correct state for a given private field. This means throwing an exception if the property is missing or present, as appropriate for Get/Set or Add. CheckPrivateField is emitted just before using the regular 'computed property name' path (the one used for A[someKey]).

CheckPrivateField is designed such that we can easily implement an inline cache using CacheIR. Since we are storing private fields as properties, we can use the Shape of an object as a guard, and simply return the appropriate boolean value. The Shape of an object in SpiderMonkey determines what properties it has, and where they are located in the storage for that object. Objects that have the same shape are guaranteed to have the same properties, and it's a perfect check for an IC for CheckPrivateField.

Other modifications we made to the engine include omitting private fields from the property enumeration protocol, and allowing the extension of sealed objects if we are adding a private field.

Proxies

Proxies presented us a bit of a new challenge. Concretely, using the Stamper class above, you can add a private field directly to a Proxy:

let obj3 = {}; 
let proxy = new Proxy(obj3, handler); 
new Stamper(proxy)

Stamper.getX(proxy) // => "stamped"
Stamper.getX(obj3)   // TypeError, private field is stamped 
                    // onto the Proxy Not the target!

I definitely found this surprising initially. The reason I found this surprising was I had expected that, like other operations, the addition of a private field would tunnel through the proxy to the target. However, once I was able to internalize the WeakMap mental model, I was able to understand this example much better. The trick is that in the WeakMap model, it is the Proxy, not the target object, used as the key in the #x WeakMap.

These semantics presented a challenge to our implementation choice to model private fields as hidden properties however, as SpiderMonkey's Proxies are highly specialized objects that do not have room for arbitrary properties. In order to support this case, we added a new reserved slot for an 'expando' object. The expando is an object allocated lazily that acts as the holder for dynamically added properties on the proxy. This pattern is used already for DOM objects, which are typically implemented as C++ objects with no room for extra properties. So if you write document.foo = "hi", this allocates an expando object for document, and puts the foo property and value in there instead. Returning to private fields, when #x is accessed on a Proxy, the proxy code knows to go and look in the expando object for that property.

In Conclusion

Private Fields is an instance of implementing a JavaScript language feature where directly implementing the specification as written would be less performant than re-casting the specification in terms of already optimized engine primitives. Yet, that recasting itself can require some problem solving not present in the specification.

At the end, I am fairly happy with the choices made for our implementation of Private Fields, and am excited to see it finally enter the world!

Acknowledgements

I have to thank, again, André Bargull, who provided the first set of patches and laid down an excellent trail for me to follow. His work made finishing private fields much easier, as he'd already put a lot of thought into decision making.

Jason Orendorff has been an excellent and patient mentor as I have worked through this implementation, including two separate implementations of the private field bytecode, as well as two separate implementations of proxy support.

Thanks to Caroline Cullen, and Iain Ireland for helping to read drafts of this post.

My Pernosco Workflow

Mozilla pays for Pernosco, and I get huge value from it. In fact, at this point, if Mozilla wasn’t paying for it, I would pay for it myself because it’s so fantastic.

Honestly, I upload traces of shell runs before trying local rr about 2/3rds of the time now, because Pernosco traces are just faster to solve; the power of instant jumps through time, data flow analysis, and the notebook are all incredible. Shell processing time is typically less than 5 minutes in my experience, so I just grab a snack / make a coffee after submitting.

Here’s my pernosco-submit workflow:

Step 1: Gather a Local Trace

UPDATE: The machines the process pernosco logs have been updated, so the below instructions have been too

To gather a local rr trace that’s compatible with Pernosco you need to disable incompatible CPU features. I have written a little script pernosco-record that I use to do that:

#!/bin/bash

rr record --disable-avx-512 "$@"

This works for jit-test like this:

./mach jit-test --debugger=pernosco-record testName

Or just ./mach run --debugger=pernosco-record foo.js

Step 2: Upload the trace

Find the trace you’re interested in ~/.local/share/rr/<trace>, and call pernosco-submit

pernosco-submit upload ~/.local/share/rr/<trace> <PATH TO CENTRAL>

You also will need to have set PERNOSCO_GROUP and PERNOSCO_USER in your environment. PERNOSCO_USER_SECRET_KEY cannot be in the environment of the recording, so I just always provided it on the command line.

Update: It turns out that these days, there's configuration files that can should be used instead of environment variables:

  • ~/.config/pernosco/user holds the email for Pernosco
  • ~/.config/pernosco/group holds the associate group for Pernosco. Mozilla has its own group, but if you're a regular Pernosco customer, check your account page for all this info.
  • ~/.config/pernosco/user_secret_key holds the secret key

Huge Thanks to Daniel Holbert for the tip!

Step 3: Wait for email

You'll get an email with the trace when it's done processing.

Scheduling Sleep on my build machine

I have a build machine for my work, but it takes a fair amount of power when running. So I have scheduled it to sleep by itself when it’s not my working hours. My hacked together root-crontab (sudo crontab -e to edit) that I’ve been using successfully for almost a year looks like this:

# At 17:00 on every day-of-week from Monday through Thursday sleep for 
# 12 hours. 
0 17 * * 1-4 (echo -n "Sleeping" && date && /usr/sbin/rtcwake -s 48600 -m mem && echo -n "Woke" && date) >> /home/matthew/cronlog 2>&1
# At 17:00 on Friday, sleep for 60 hours (Wake 7am Monday morning)
0 17 * * 5 (echo -n "Sleeping" && date && /usr/sbin/rtcwake -s 216000 -m mem && echo -n "Woke" && date) >> /home/matthew/cronlog 2>&1 

It uses the rtcwake utility that can put the machine into a sleep mode for a period of time.

There's more than just write-watchpoints?

On the weekend I was reading this blog post about using rr to debug a problem in the Julia runtime when something jumped out at me and smacked me. The author used the ‘awatch’ command… which I had never heard of.

It turns out, gdb/rr can do both write and read watchpoints; which I didn’t know! This is fantastic knowledge that will almost certainly serve me well if I can manage to remember it in the future. So I’m writing this blog post to force me to remember. I’ve definitely wanted this before, so I’m surprised I have never found this before.

Playing around with semgrep

Semgrep seems to be a pretty cool looking tool, which allows you to do semantic grep. It appears to be designed to help find vulnerable usage patterns in source code, but the mere ability to have some level of semantic understanding paired with a 'grep' like interface is quite appealing.

I'm looking at ES2015+ features right now, and was curious if any of the benchmarks we've got checked into mozilla-central use rest arguments. With the help of the people at r2c on their community slack (see link at the top right of semgrep.dev), we were able to come up with the following semgrep pattern.

function $FUNC(..., ...$MORE) { ... }

This matches any function declaration which takes a rest parameter.

Running it across our performance tests, it provides exactly what I was hoping for:

$ semgrep -e 'function $FUNC(..., ...$MORE) { ... }' -l javascript third_party/webkit/PerformanceTests/
third_party/webkit/PerformanceTests/ARES-6/Air/reg.js
91:    function newReg(...args)
92:    {
93:        let result = new Reg(...args);
94:        Reg.regs.push(result);
95:        return result;
96:    }

third_party/webkit/PerformanceTests/ARES-6/Air/util.js
32:function addIndexed(list, cons, ...args)
33:{
34:    let result = new cons(list.length, ...args);
35:    list.push(result);
36:    return result;
37:}

third_party/webkit/PerformanceTests/ARES-6/Babylon/air-blob.js
91:    function newReg(...args)
92:    {
93:        let result = new Reg(...args);
94:        Reg.regs.push(result);
95:        return result;
96:    }

third_party/webkit/PerformanceTests/ARES-6/Basic/benchmark.js
35:        function expect(program, expected, ...inputs)
36:        {
37:            let result = simulate(prepare(program, inputs));
38:            if (result != expected)
39:                throw new Error("Program " + JSON.stringify(program) + " with inputs " + JSON.stringify(inputs) + " produced " + JSON.stringify(result) + " but we expected " + JSON.stringify(expected));
40:        }

third_party/webkit/PerformanceTests/ARES-6/glue.js
37:function reportResult(...args) {
38:    driver.reportResult(...args);
39:}

Now: As is, this isn't sufficient to cover all the cases I'm interested in: For example, what if someone defines an arrow function that takes a rest parameter?

(...rest) => { return rest[0]; }

Or, worse, the braces are optional when you have a single expression:

(...rest) => rest[0]

To help support more complicated patterns, semgrep supports boolean combinations of patterns (i.e. pattern1 || pattern2). I wasn't able to get this working because of some arrow parsing bugs, but nevertheless, this is a promising and neat tool!

They've got a live editor for it setup at semgrep.dev/write to dork around with.

A Brief note on Environments and Scopes in SpiderMonkey

To finish telling the story of generators I needed to have a clearer understanding of Environments in SpiderMonkey. Thanks to a discussion with Jason I was able to cobble together some understanding.

JavaScript keeps track of bindings: This is a name corresponding to a value. For example, local and global variables, function arguments, class names -- all of these are bindings. These bindings have some interesting (challenging) properties:

  1. Bindings are nested: This implies two things: First, name lookup proceeds by walking the enclosing bindings looking for a definition of a name. Second, you can shadow a binding by creating a new binding of the same name in an inner scope.
  2. Bindings can be captured: When you create a closure by creating a function, any bindings not defined in the function are captured for when the function is invoked.
  3. Bindings are dynamic: Outside of strict mode, direct eval is capable of adding a new binding: eval('var b = 10'); x = b; works, even though the binding b didn't exist before the eval.

In SpiderMonkey, bindings are implemented with two complementary mechanisms: Scopes and Environments.

The most important distinction to keep in mind is that in SpiderMonkey, Scopes are used to track static binding information. This is information which is always true depending solely on where you are textually in the program, and determined by the parser directly.

Environments are used to track dynamic binding information. As you execute the JS program, the live values of the bindings are kept in environments. As a result, there can be many environments associated with a given scope. Unlike Scopes, Environments are also real JavaScript objects that are just never exposed to the programmer, created as the program executes. Each environment is also linked to its parent environment, the one corresponding to the enclosing scope. As a result of this linking, we often talk about environment chains, referring to not just the single environment, but all enclosing ones too.

Let's work through a specific, albeit very artificial, example to help clarify. In order to avoid the complexity of the real implementation, which has many optimizations and hairy portions, I will instead tell a simplified story which nevertheless attempts to convey the flavour of the issue.

function f(x) {
  let a = 'A: ' + x;
  if (x > 10) {
    let z = a + ' > 10 (z)';
    return function() {
      return z;
    }
  }
  return function() {
    return a + ' <= 10';
  }
}

var f1 = f(0);
var f2 = f(12);

print(f1())  // => A: 0 <= 10
print(f2());  // => A: 12 > 10 (z)

In function f we have two major scopes: There is the scope body for the whole function f, and nested inside is the scope for the body of the if statement.

Recalling that the scopes keep track of the static information, these scopes let us know statically where we must have a binding for a, z, and x. The scopes also know statically where the storage for these bindings is located -- either in stack slots, in some environment object on the environment chain, or in a special object property (in the case of with bindings, or the global object).

When we start executing a call to f(), the first thing that happens is the creation of an environment object to hold the value of a. Since each new environment points to the enclosing one, it will be the end of the environment chain. The JavaScript execution frame is updated so that its 'current environment' points to the newly created environment. Then, if we enter the conditional, the same process is repeated with a new environment to hold z.

In SpiderMonkey, whenever a function is created, it captures the current environment pointer. This is how variable values are captured: in this example, how f1 remembers the values of a and x, and how f2 remembers the values of a, x and z: When we invoke f1 or f2, the new environment created for the function invocation uses the captured environment as the ‘enclosing’ environment, and so the lookup chain has access to the values in the environment of the function creation.

So when we invoke f2, we create an environment for the call. It’s parent environment is the environment where it was created, which contains the binding for z. That environment’s parent is in turn the enclosing parent, which has the environment for a, and its parent has the binding for x.

‍               f   x: 12
                ^
                |
                |
      First F env  a: 'A: 12'
                ^
                |
                |
  Conditional env  z: 'A: 12 > 10 (z)'
                ^
                |
                |
f2 invocation env (empty)

When a binding is on the environment chain, often we statically know where it is relative to the current environment, so SpiderMonkey often refers to environment stored variables via "Environment Coordinates". These are a pair (hops, slot), which indicates how many links on the chain need to be followed (hops) and which slot on that resulting object will have the value.

Optimization

If all locals were stored this way, we'd have to allocate an environment object every time the program enters a scope. The allocation alone would be very slow, and it's also bad for locality. We can do better by analysing binding use to optimize storage.

Where a binding isn't captured by any function, instead we try to store the variable as a slot on the stack frame. However, when a variable is captured, we must store it in the environment so that it can be read on subsequent invocation of the capturing function. In SpiderMonkey the term used for this is "Aliased". So when you see mention of 'an aliased binding', or an opcode like GetAliasedVar, you know the binding is stored on the environment chain.

A direct eval can create a function that captures any or all of the bindings in its environment. So, the mere existence of a direct eval causes all bindings in all enclosing non-global scopes to be marked as aliased.

In some situations, we can also elide the creation of an environment, if we know that it will never contain any bindings.

Conclusion

Hopefully this is a helpful sketch of how variable bindings are handled in SpiderMonkey. There’s loads more complexity in this area of SpiderMonkey that I didn’t cover, but I nevertheless tried to cover the basics.

Acknowledgements

Thanks very much to Jason for previewing a draft of this post and making very helpful suggestions! Any errors are mine however.

How do Generators... Generate, in SpiderMonkey?

I'm going to be looking at async functions and generators over the next little while, which requires I understand them. My previous experience has been that writing about things in a structured fashion helps me learn-by-teaching. So this blog post (series?) is me learning about generators by teaching.

We'll start from the basics of the language feature, and then continue into more specifics.

Basics of Generators

Generators are special objects that return a sequence of values when you call their next() method. The sequence of values provided is specified by code, which backs a generator.

To create a JS Generator, you use the function* syntax:

function* gen() {
  yield 1;
  yield 2;
  yield 3;
}

This function, when called, instead of running, returns a generator.

var g = gen();

At this point, none of the code backing the generator has run yet. Once you invoke the next method on the generator g, then the body of the function runs until a yield. At the yield point, execution of the function body stops, and the caller of next is returned an object with value and done properties: value is the argument to the yield, and done is false if there is more results to be yielded, and true if the generator is finished. Subsequent calls to next will simply return {value: undefined, done: true}.

g.next();  // {value: 1,          done: false}
g.next();  // {value: 2,          done: false}
g.next();  // {value: 3,          done: false}
g.next();  // {value: undefined,  done: true}
g.next();  // {value: undefined,  done: true}

This protocol is understood by JavaScript features, like for ... of loops:

let res = []
for (v of gen()) { 
  res.push(v);
}
v; // [1,2,3]

When you call .next() it's possible to provide an argument. That becomes the value of the yield expression when the generator is resumed.

Investigating a basic generator:

Let's look at the bytecode for a generator function. With a debug build of SpiderMonkey, you can dump bytecode with the dis function: dis(gen) produces the following fairly substantial chunk of Bytecode

flags: NEEDS_CALLOBJECT
loc     op
-----   --
main:
00000:  Generator                       # GENERATOR
00001:  SetAliasedVar ".generator" (hops = 0, slot = 2) # GENERATOR
00006:  InitialYield 0                  # RVAL GENERATOR RESUMEKIND
00010:  AfterYield (ic: 1)              # RVAL GENERATOR RESUMEKIND
00015:  CheckResumeKind                 # RVAL
00016:  Pop                             # 
00017:  NewObject ({value:(void 0), done:(void 0)}) # OBJ
00022:  One                             # OBJ 1
00023:  InitProp "value"                # OBJ
00028:  False                           # OBJ false
00029:  InitProp "done"                 # OBJ
00034:  GetAliasedVar ".generator" (hops = 0, slot = 2) # OBJ .generator
00039:  Yield 1                         # RVAL GENERATOR RESUMEKIND
00043:  AfterYield (ic: 5)              # RVAL GENERATOR RESUMEKIND
00048:  CheckResumeKind                 # RVAL
00049:  Pop                             # 
00050:  NewObject ({value:(void 0), done:(void 0)}) # OBJ
00055:  Int8 2                          # OBJ 2
00057:  InitProp "value"                # OBJ
00062:  False                           # OBJ false
00063:  InitProp "done"                 # OBJ
00068:  GetAliasedVar ".generator" (hops = 0, slot = 2) # OBJ .generator
00073:  Yield 2                         # RVAL GENERATOR RESUMEKIND
00077:  AfterYield (ic: 9)              # RVAL GENERATOR RESUMEKIND
00082:  CheckResumeKind                 # RVAL
00083:  Pop                             # 
00084:  NewObject ({value:(void 0), done:(void 0)}) # OBJ
00089:  Int8 3                          # OBJ 3
00091:  InitProp "value"                # OBJ
00096:  False                           # OBJ false
00097:  InitProp "done"                 # OBJ
00102:  GetAliasedVar ".generator" (hops = 0, slot = 2) # OBJ .generator
00107:  Yield 3                         # RVAL GENERATOR RESUMEKIND
00111:  AfterYield (ic: 13)             # RVAL GENERATOR RESUMEKIND
00116:  CheckResumeKind                 # RVAL
00117:  Pop                             # 
00118:  NewObject ({value:(void 0), done:(void 0)}) # OBJ
00123:  Undefined                       # OBJ undefined
00124:  InitProp "value"                # OBJ
00129:  True                            # OBJ true
00130:  InitProp "done"                 # OBJ
00135:  SetRval                         # 
00136:  GetAliasedVar ".generator" (hops = 0, slot = 2) # .generator
00141:  FinalYieldRval                  # 
00142:  RetRval                         # !!! UNREACHABLE !!!

We'll go through this piece by piece to try to understand what's going on.

00000:  Generator                       # GENERATOR
00001:  SetAliasedVar ".generator" (hops = 0, slot = 2) # GENERATOR

Reading this, on the left we have the Bytecode Index (0000), the opcode in the middle Generator, and on the right, after the #, we have the statically determined stack contents.

To understand what opcodes do, the best reference is Opcodes.h in the SpiderMonkey sources, as well as the interpreter implementation of the opcode.

These two opcodes together create a Generator object, and create a binding for the generator under the name .generator, for future access. We use .generator as the name because we know it will never conflict with a user defined JS name because there's no valid syntax to create that name.

00006:  InitialYield 0                  # RVAL GENERATOR RESUMEKIND

InitialYield does three main things: First, it makes the Generator object, allocated above by the Generator opcode, the return value. Then, it calls AbstractGeneratorObject::initialSuspend, after which it pops the current frame off the stack (returning to the caller). We'll discuss the suspend operation shortly.

The contract for generators bytecode is that at the time of generator resumption the stack will be updated to have on it:

  1. The 'result' value of the generator (ie, y, in y = yield x;). This is injected as the argument to .next(...).
  2. The generator object
  3. The resume kind: this indicates if the generator was resumed with .next(), .throw or .return.
00010:  AfterYield (ic: 1)              # RVAL GENERATOR RESUMEKIND
00015:  CheckResumeKind                 # RVAL
00016:  Pop                             #

AfterYield is a book-keeping operation, which we will skip for now.

CheckResumeKind reads the generator resume kind and either: 1) Continues to the next instruction, if the resume kind is next, 2) Throws an exception if the resume kind is either throw or return. The return value is what is on the stack afterwards. Because this was an 'initial yield', there's nothing to consume this return value, so we simply pop it off the stack.

00017:  NewObject ({value:(void 0), done:(void 0)}) # OBJ
00022:  One                             # OBJ 1
00023:  InitProp "value"                # OBJ
00028:  False                           # OBJ false
00029:  InitProp "done"                 # OBJ
00034:  GetAliasedVar ".generator" (hops = 0, slot = 2) # OBJ .generator
00039:  Yield 1                         # RVAL GENERATOR RESUMEKIND

Now that we've gotten through the code executed before the first invocation of next(), we can see the code executed on our first call to next():

  • NewObject allocates the return value of the generator.
  • One pushes 1 onto the stack, and InitProp sets the value property on that object to 1.
  • False and InitProp set the done property on the object to false
  • GetAliasedVar` retrieves the generator from the environment, and pushes it onto the stack.
  • Yield suspends the generator, returning its argument to the caller. Following the same contract for InitialYield, when execution resumes after this bytecode, the stack will have the argument to next/throw/return, the generator, and the resume kind on the stack.

Seeing the above pattern, you can probably easily pick out the next three yields, so I will skip those.

00118:  NewObject ({value:(void 0), done:(void 0)}) # OBJ
00123:  Undefined                       # OBJ undefined
00124:  InitProp "value"                # OBJ
00129:  True                            # OBJ true
00130:  InitProp "done"                 # OBJ
00135:  SetRval                         # 
00136:  GetAliasedVar ".generator" (hops = 0, slot = 2) # .generator
00141:  FinalYieldRval                  # 
00142:  RetRval                         # !!! UNREACHABLE !!!

Once we have exhausted the yields, we get to the end of the generator. At this point, we prepare the {value: returnValue, done: true} object. This is stored into the return value slot, then returned via FinalYieldRval, which closes the generator object and then returns.

Suspend

Suspending a generator means saving the state of the stack frame so that it can be restored. The relevant state here is the state of the expression / interpreter stack.

In a slightly more complicated generator we can see this happen:

function *gen2() { 
    return 10 + (yield 1)
}

When we disassemble this (dis(gen2)) we can see the relevant bits here:

00017:  NewObject ({value:(void 0), done:(void 0)}) # OBJ
00022:  Int8 10                         # OBJ 10
00024:  NewObject ({value:(void 0), done:(void 0)}) # OBJ 10 OBJ
00029:  One                             # OBJ 10 OBJ 1
00030:  InitProp "value"                # OBJ 10 OBJ
00035:  False                           # OBJ 10 OBJ false
00036:  InitProp "done"                 # OBJ 10 OBJ
00041:  GetAliasedVar ".generator" (hops = 0, slot = 2) # OBJ 10 OBJ .generator
00046:  Yield 1                         # OBJ 10 RVAL GENERATOR RESUMEKIND
00050:  AfterYield (ic: 6)              # OBJ 10 RVAL GENERATOR RESUMEKIND
00055:  CheckResumeKind                 # OBJ 10 RVAL
00056:  Add                             # OBJ (10 + RVAL)

The first NewObject pushed is the one for the return value; the second is the one returned by the yield. If you look at the yield, you can see that the return object and the literal 10 are still on the stack when we yield. These values need to be saved on suspend.

For that, the method AbstractGeneratorObject::suspend exists. This method has three main responsibilities.

  1. Keeping track of the resume index: this is where the generator should start executing on resumption.
  2. Keeps track of the frame's environment chain. I'll discuss the environment chain in a moment.
  3. Copying the values out of the interpreter stack into an array stored on the generator object (possibly recycling a previously allocated array, extending it if necessary).

As near as I can tell, the major difference between InitialYield and Yield is that InitialYield is made aware that a priori we know we will have nothing on the expression stack to be saved.

The environment chain needs to be maintained by the generator as it will change over the execution of a generator. For example:

function *gen_with_environment(x) { 
  if (x) {
      var y = 10; 
      let z = 12; 
      yield 1; 
      yield z + 1; 
  }
  yield 2;  
}

The binding let z binding is only available inside of the lexical block defined by the conditional braces. This is managed within the engine by creating a new lexical environment object for the block with the braces, which is made the 'current' environment when the braces are entered. As a result, when we yield, we need to know which lexical environment to restore. The same is not true of the var y binding, which would not create a new environment, as the language hoists var bindings to the top of function definitions.

It's worth noting that environments are mutable in JavaScript, as a direct eval is allowed to add new bindings:

function* g() {
  eval('var b = 10;')
  yield b;
}

so we must keep track of the precise environment to be restored, as it may have been mutated.

Resume and .next()

The above essentially covered how generators are created, and how we leave a generator frame. To see how we get back in, we want to look at the implementation of GeneratorNext, which is self-hosted JavaScript that implements Generator.prototype.next. Self-hosted JS is a special dialect of JS implemented in SpiderMonkey with elevated permissions that is written specifically for engine functionality.

function GeneratorNext(val) {
    // The IsSuspendedGenerator call below is not necessary for correctness.
    // It's a performance optimization to check for the common case with a
    // single call. It's also inlined in Baseline.

    if (!IsSuspendedGenerator(this)) {
        if (!IsObject(this) || !IsGeneratorObject(this))
            return callFunction(CallGeneratorMethodIfWrapped, this, val, "GeneratorNext");

        if (GeneratorObjectIsClosed(this))
            return { value: undefined, done: true };

        if (GeneratorIsRunning(this))
            ThrowTypeError(JSMSG_NESTING_GENERATOR);
    }

    try {
        return resumeGenerator(this, val, "next");
    } catch (e) {
        if (!GeneratorObjectIsClosed(this))
            GeneratorSetClosed(this);
        throw e;
    }
}

Each function largely does what you would imagine it to do. However, there are some interesting pieces. For example return resumeGenerator(this, val, "next") gets compiled directly to the Resume bytecode, not a function call.

The Resume bytecode calls AbstractGeneratorResume, which takes the previously saved expression stack, restores it to the machine stack, and sets the program counter to the correct value (as determined by the resume index stored in the generator).

Given the previously discussed Yield protocol, it also pushes the argument to .next(), the generator object, and the resume kind.

More Functionality: return, throw

Generator.prototype has two more methods than just .next(): There's also .return and .throw.

  • gen.return(x) closes the generator, and returns {val: x, done: true}
  • gen.throw(x) enters the generator, and throws x (which may be caught by the body of the generator and handled.

The implementation of these are both in builtin/Generator.js. Both of which use resumeGenerator (JSOp::Resume), just with different types and return values. These are then handled by the CheckResumeKind op as appropriate.

Conclusion

Having started with no knowledge of how Generators work in SpiderMonkey, I was pleasantly surprised to find the high level design mostly made sense to me. Things I didn't cover here that might be worth exploring:

  1. The JIT implementation of generators
  2. Async functions are implemented with generators: How does that work?

Acknowledgements:

Many thanks to Iain Ireland, Steve Fink, and Jeff Walden for proof reading this post. Mistakes are mine, feedback very welcome!

Investigating the Return Behaviour of JS Constructors

Did you know that you can return a value from a constructor in JS? I didn't for the longest time. There's some non-obvious behaviour there too!

Given

class A { 
  constructed = true;
  constructor(o) {
   return o; 
  }
}

Why does the type of the return value seem to affect what comes out of the new expression:

js> new A()
({constructed:true})
js> new A(10) 
({constructed:true})
js> new A({override: 10}) 
({override:10})

Let's figure this out from the specification.

Looking at the Runtime Semantics: Evaluate New, we call Construct(constructor, argList).

So, inside of that Construct, we return constructor.[[Construct]](argumentsList, newTarget). So I guess we need to figure out what a class's internal slot [[Construct]] is set to.

Static semantics for class constructors are here. This doesn't seem to help me too much. Let's look at the runtime semantics of class definition evaluation instead.

So:

  • Step 8: "let constructor be ConstructorMethod of ClassBody".
  • Step 11: Let constructorInfo be !DefineMethod of constructor with arguments proto and constructorParent
  • Step 12: Let F be constructorInfo.[[Closure]]
  • Setp 14: Perform MakeConstructor(F, false, proto)

Inside of MakeConstructor we have Step 4: "Set F.[[Construct]] to the definition specified in 9.2.2.

9.2.2 is the [[Construct]] behaviour for ECMAScriptFunction objects. In that, Step 12 is what we were looking for:

If result.[[Type]] is return, then

a. If Type(result.[[Value]]) is Object, return NormalCompletion(result.[[Value]]).

b. If kind is base, return NormalCompletion(thisArgument).

c. If result.[[Value]] is not undefined, throw a TypeError exception.

So, if your constructor function returns an Object, that object is the result of the constructor. Otherwise, the return value is ignorerd, and this is returned.

Why is it like this?

After spending some time looking into the history, the conclusion is essentially that it’s something that’s needed as part of a transition plan from pre-classes JS.

I found this ESDiscuss thread: Should the default constructor return the return value of super? particularly enlightening, while not addressing the topic directly.

Some quotes:

Sebastian Markbåge:

Basic constructors still have the quirky behavior of ES functions that they can return any object and don't have to return the instantiated object. This can be useful if they're used as functions or should return a placeholder object, or other instance, for compatibility/legacy reasons. E.g. when you have a custom instantiation process.

class Foo { constructor() { return {}; } }

Allen Wirfs-Brock

It is difficult to design of a constructor body that behaves correctly in all five of these situations: invoked via the new operator; invoked with a super call from a subclass constructor; called directly; called via call/apply function with arbitrary things passed as the this value; and, called as a method. The ES6 spec. has to handle all for of those use cases for the legacy built-in constructors. But I don't think we want to encourage people to do so for new abstractions defined using ES6 class definitions because in most cases what they produce will be buggy,

Sebastian Markbåge:

The use case I had in mind was React components. Components in React are described as classes which makes them seem approachable to a broad user base. They cannot and should not be accessed as class instances though. The instances are immutable data structures used exclusively by the library. The base constructor could look something like this:

constructor(x) {
  return { _hiddenInstance: this, _instantiationContext: CurrentContext, _id: uid(), _someArgument: x };
}

This would generate a descriptor that can be used by the library but only used as a reference by the user. This allows users to declare classes just like they're used to and even instantiate them normally. However, they'd only be given access to the real instance at the discretion of the library.

Brendan Eich

Adding class syntax as (mostly) sugar for the prototypal pattern does not obviously mean rejecting all unusual or exceptional variants possible in the prototypal pattern. I say let super be used even in class C{}'s constructor, and let return-from-constructor work without requiring [Symbol.create]. JS is dynamic.

Review: Eero Wifi

When we first moved into our house in Ottawa, we had just the wifi that came with our Hitron cable modem. While it had been sufficient for our needs in a two bedroom condo, it become quickly apparent that in a house it wasn’t going to cut it. Big chunks of the house were total dead zones.

I constantly want to be up to date with networking, despite having pretty weak networking skills. For this round, I did some research, and figured I’d make a safe choice and go with the TP-Link Archer C7, which at the time was the Wirecutter’s recommended choice.

The Archer mostly served us fine, until we bought an Arlo baby camera. I should really write a review of our Arlo some day, but that’s another post. It’s a Wifi baby camera. We would use it pretty much every time my daughter was asleep, which meant we were streaming video across the network for effectively 16 hours a day. This didn't really get along well with the Archer.

The symptom we started to encounter was that the 2.4GHz wifi band the Arlo was connected to would start to slow down until finally it effectively stopped answering any requests. The problem would persist until we rebooted the router. I'm honestly not sure if the problem started right away, as we were still pretty sleep deprived in the grand scheme of things when we got the Arlo, but it definitely got worse over time. It would also vary: Some weeks we'd have almost no troubles, and other weeks we'd have to reboot the router every couple of days. Then over time, the bad weeks got worse, and we'd have to reboot the router every day some weeks. I mastered the art of rebooting the router from my phone, using the 5Ghz Wifi band that hadn't died.

Eventually my wife had enough, and she demanded that we fix this. I went hunting for a solution. For a while I'd dreamed of building a Ubiquiti network. I'd read enough blog posts and reviews to think I wanted that crazy level of control that came with a Ubiquiti network.

I tried to sell my wife on this idea, but as soon as I mentioned the wiring I wanted to run, she said no: too much complexity. (At the time, I also hadn't yet heard about the Dream Machine, which would have heavily reduced the complexity. However, reviews for it are scarce even today). Instead, we ended up with perhaps the exact opposite of a Ubiquiti install: an eero 3 pack.

A poorly lit glamour shot of a single eero

A poorly lit glamour shot of a single eero

eero is what you would imagine a router would be like if Apple circa 2017 built a router. It's slick, effective and kind of opaque as to what's going on.

Opening the box and the packaging puts the product front and centre, reminding me of unboxing an iPhone. Underneath the three eero are three USB-C power adapters (which I infer from the manual are 'special' to the eero: its LED lights up a different colour to complain if you use a non-matching adapter). Also included is a short ethernet cable. The build quality is excellent: The cables and power adapters feel great, with a very premium feeling coating, and the routers themselves have a nice glossy finish with soft rubber bottom. Really nice hardware.

Setup is almost trivial: took about ten minutes and I had all three units running in the house, all guided by their app.

The app

IMG_8446.png

The app is actually the only way to control the routers, and that's my first fear with these routers. There is no "sign in at 192.168.1.1 with admin/admin" dashboard. Which means in the off chance that eero stops supporting the app, I suddenly can't control my network. While I don't feel like this is a likely outcome, it nevertheless is a fear I have.

The app is very simple, intentionally: This is not designed for people who want to have packet level introspection of their networks. There's no live updating traffic graphs, no cool layout diagrams. Just a simple interface with a big green bar to ensure you know that the network is all working just fine.

You can add IP reservations (which I need so that my work desktop has a consistent IP so that I can ssh into it without dorking around with zeroconf.

Surprisingly you can't add another admin account to the network. The only way my wife can admin the network is if she logs in under my account on her phone, which isn't particularly nice (albeit, and this is eero's watchword, simple).

The app also pushes eero secure, though not so hard I am annoyed yet. Eero secure adds some ad-blocking, 'threat scanning' and content filters. Nothing I need just yet.

Performance

Initial performance testing with the eeros was really impressive. I've run enough speed tests with the Archer that I had a pretty good idea of what the wifi was like in a couple of parts in the house. i.e. in my office, speed tests would typically get me between 25 and 45 Mbps. After installing the new eero routers I got 125 mbs.

A problem

Ok; So all good? Not quite. We did end up going on an adventure with one of my eero. Here's the email I sent to eero support:

I have one (of three) eero which appears to disconnect from the mesh network very frequently. Currently, in order to maintain stability on the rest of the network I have it disconnected. This is a brand new install direct from the box: The problems started within a day of the first install (it’s possible they started even earlier and I only noticed on the second day).

The troublesome eero was originally installed in my office. The gateway eero is installed in the living room, and I had a third eero installed in the basement.

Here are the trouble shooting steps I have taken so far:

  1. Removed and Re-added the office eero from the network. This did not resolve the problem, the office eero dropped off the mesh within a couple of hours.
  2. I moved the office eero to another location (bedroom downstairs). Similarly, within a couple of hours it had dropped again.
  3. To verify the original location was not the issue, I moved my family room eero, which so far had a stable connection, into my office, and unplugged the original office eero. I have been running in this configuration for more than 48 hours with no disconnects.

My suspicion is that there’s something amiss with the single office eero.

I’d appreciate any guidance. My current plan is to return the set of eero and replace all three, but I’d prefer not to if it can be arranged.

It was particularly peculiar working when the office eero dropped. The way I work these days I mostly remote connect from my laptop to my desktop over wifi, because the desktop is so much more powerful a machine it makes me more productive in my work. When the eero would drop out of the mesh, I would still be able to work and connect to my desktop, but just not the internet, as my office formed a little network partition.

Once I disconnected the problematic unit all the issues went away, it was just a little irksome that we'd paid for three units, and only had two that worked. Despite that, network performance remained excellent with only the two eero connected.

eero replied to my email that they would like to replace the bad unit. In under a week I had a new eero, along with a return label for the defective one. We've been running the mesh with all three units for a few days now, and it seems like we're out of the woods. No issues since.

Amazon

Another concern I have with eero is their acquisition by Amazon. As near as I can tell, the integration so far is minimal (an Alexa skill to allow you to turn off your kids internet), but I'm definitely going to be keeping a close eye on the eero privacy policy. Yet another opportunity for an internet giant to hoover up data.

Conclusion

I'm happy with the eero system so far. It's not got knobs galore, but I think I really like that. With great power comes great responsibility, and with something like a wifi network I don't feel like I want responsibility, I just want something that works reliably day in and day out. I want my router to be invisible infrastructure, and so far, so good.