A Brief note on Environments and Scopes in SpiderMonkey

To finish telling the story of generators I needed to have a clearer understanding of Environments in SpiderMonkey. Thanks to a discussion with Jason I was able to cobble together some understanding.

JavaScript keeps track of bindings: This is a name corresponding to a value. For example, local and global variables, function arguments, class names -- all of these are bindings. These bindings have some interesting (challenging) properties:

  1. Bindings are nested: This implies two things: First, name lookup proceeds by walking the enclosing bindings looking for a definition of a name. Second, you can shadow a binding by creating a new binding of the same name in an inner scope.
  2. Bindings can be captured: When you create a closure by creating a function, any bindings not defined in the function are captured for when the function is invoked.
  3. Bindings are dynamic: Outside of strict mode, direct eval is capable of adding a new binding: eval('var b = 10'); x = b; works, even though the binding b didn't exist before the eval.

In SpiderMonkey, bindings are implemented with two complementary mechanisms: Scopes and Environments.

The most important distinction to keep in mind is that in SpiderMonkey, Scopes are used to track static binding information. This is information which is always true depending solely on where you are textually in the program, and determined by the parser directly.

Environments are used to track dynamic binding information. As you execute the JS program, the live values of the bindings are kept in environments. As a result, there can be many environments associated with a given scope. Unlike Scopes, Environments are also real JavaScript objects that are just never exposed to the programmer, created as the program executes. Each environment is also linked to its parent environment, the one corresponding to the enclosing scope. As a result of this linking, we often talk about environment chains, referring to not just the single environment, but all enclosing ones too.

Let's work through a specific, albeit very artificial, example to help clarify. In order to avoid the complexity of the real implementation, which has many optimizations and hairy portions, I will instead tell a simplified story which nevertheless attempts to convey the flavour of the issue.

function f(x) {
  let a = 'A: ' + x;
  if (x > 10) {
    let z = a + ' > 10 (z)';
    return function() {
      return z;
    }
  }
  return function() {
    return a + ' <= 10';
  }
}

var f1 = f(0);
var f2 = f(12);

print(f1())  // => A: 0 <= 10
print(f2());  // => A: 12 > 10 (z)

In function f we have two major scopes: There is the scope body for the whole function f, and nested inside is the scope for the body of the if statement.

Recalling that the scopes keep track of the static information, these scopes let us know statically where we must have a binding for a, z, and x. The scopes also know statically where the storage for these bindings is located -- either in stack slots, in some environment object on the environment chain, or in a special object property (in the case of with bindings, or the global object).

When we start executing a call to f(), the first thing that happens is the creation of an environment object to hold the value of a. Since each new environment points to the enclosing one, it will be the end of the environment chain. The JavaScript execution frame is updated so that its 'current environment' points to the newly created environment. Then, if we enter the conditional, the same process is repeated with a new environment to hold z.

In SpiderMonkey, whenever a function is created, it captures the current environment pointer. This is how variable values are captured: in this example, how f1 remembers the values of a and x, and how f2 remembers the values of a, x and z: When we invoke f1 or f2, the new environment created for the function invocation uses the captured environment as the ‘enclosing’ environment, and so the lookup chain has access to the values in the environment of the function creation.

So when we invoke f2, we create an environment for the call. It’s parent environment is the environment where it was created, which contains the binding for z. That environment’s parent is in turn the enclosing parent, which has the environment for a, and its parent has the binding for x.

‍               f   x: 12
                ^
                |
                |
      First F env  a: 'A: 12'
                ^
                |
                |
  Conditional env  z: 'A: 12 > 10 (z)'
                ^
                |
                |
f2 invocation env (empty)

When a binding is on the environment chain, often we statically know where it is relative to the current environment, so SpiderMonkey often refers to environment stored variables via "Environment Coordinates". These are a pair (hops, slot), which indicates how many links on the chain need to be followed (hops) and which slot on that resulting object will have the value.

Optimization

If all locals were stored this way, we'd have to allocate an environment object every time the program enters a scope. The allocation alone would be very slow, and it's also bad for locality. We can do better by analysing binding use to optimize storage.

Where a binding isn't captured by any function, instead we try to store the variable as a slot on the stack frame. However, when a variable is captured, we must store it in the environment so that it can be read on subsequent invocation of the capturing function. In SpiderMonkey the term used for this is "Aliased". So when you see mention of 'an aliased binding', or an opcode like GetAliasedVar, you know the binding is stored on the environment chain.

A direct eval can create a function that captures any or all of the bindings in its environment. So, the mere existence of a direct eval causes all bindings in all enclosing non-global scopes to be marked as aliased.

In some situations, we can also elide the creation of an environment, if we know that it will never contain any bindings.

Conclusion

Hopefully this is a helpful sketch of how variable bindings are handled in SpiderMonkey. There’s loads more complexity in this area of SpiderMonkey that I didn’t cover, but I nevertheless tried to cover the basics.

Acknowledgements

Thanks very much to Jason for previewing a draft of this post and making very helpful suggestions! Any errors are mine however.