A Listing of Compiler Teams

When I was just getting started in compilers, I found that I was really inspired by Nullstone’s compilerjobs.com page. It was really cool to have a Job board dedicated to an area I found interesting, and it was even more interesting to see what companies I’d never have imagined to do compiler work having teams of people working on the area.

Alas, a few years ago, compilerjobs.com’s listings stopped loading, and so this resource disappeared.

In the last few years, as I’ve talked to colleagues and people I’ve mentored about their careers aspirations, it’s become clear that people aren’t aware of the breadth of companies that are doing this kind of work. I have a private listing I’ve shared with people on the job hunt. I’ve decided that it’s time for me to make this listing public, and to try to open it up to others to help me maintain.

Here it is, in its preliminary form

Please, give it a look over, and help me fill it in. My aspiration for this is for it to become a valuable resource for anyone interested in working in this field.



Two Years of Mozilla

Today is my second anniversary of being a Mozillian. Here’s some random milestones about that time:

  • Bugzilla User Statistics
    • Bugs filed: 276
    • Comments made: 1489
    • Assigned to: 118
    • Commented on: 409
    • Patches submitted: 532
    • Patches reviewed: 93
    • Bugs poked: 507
  • Switched from reviews in Bugzilla (stats above I think) to Phabricator (I don't think counted in the above).
  • Saw the deprecation of mozilla-inbound and the rise of Lando
  • 327 commits in central (found via hg log -r 'public() and author(mgaudet)' -T '{node|short} {author|user} {desc|firstline}\n', though looking at the log it becomes pretty clear that this includes backouts and re-landings, so in a sense there is some overcounting here)
  • Three All Hands (Austin, San Fransisco, Whistler; I missed Orlando for excellent reasons)

I’m still really happy working for Mozilla. If you’re interested in joining us, please reach out!

Visual Studio Code's Remote Development is Really Impressive

For the past few months I’ve been making heavy use of Visual Studio Code’s Remote Development extension. It’s incredibly impressive to be honest. It’s pretty amazing to transparently work in different environments. I’ve used it connected to a Linux VM initially, then recently connected to a real Linux machine.

Because editing is happening locally, I find it hides a good chunk of latency, which is also really nice. Editing feels about as fast as it does locally, even when you’re having network challenges (like I do locally, alas). One downside for me is that I got pretty used to using opendiff for dealing with merges, but I’m finding that using plain text merge resolution + VSCode’s semantic diff understanding is pretty good.

I really should say, overall, I’ve gotten quite happy with my VSCode setup. I have clang-format running on file save, which I don’t understand how I lived with before. Intellisense results range from amazing to garbage, seemingly depending on the day; I suspect I do a lot of bad things to the poor engine jumping between heads all the time.

As a Firefox developer, I feel I should be more annoyed that it’s built on Electron (and therefore Chrome), but honestly, for a free development environment it blows my socks off, and so I can’t ride that high horse too long.

Setting a Static IP for an Ubuntu 19.04 Server VM in VMWare Fusion

The initial setup is the same as the article I wrote about Ubuntu 18.10 (create private adapter).

This time, because Ubuntu 19.04 has added something called cloud-init, there’s a new place (yay) to do configuration, however, this time I got it done entirely CLI. Yay me:

mgaudet@ubuntu1904vm:~$ cat /etc/cloud/cloud.cfg.d/50-curtin-networking.cfg 
network:
  ethernets:
    ens33:
      dhcp4: true
    ens38:
      dhcp4: false
      addresses: [192.168.77.15/24]
      gateway4: 192.168.77.2
  version: 2

I added the ens38 stanza above; ran sudo cloud-init clean -r, the machine rebooted and we were done.

Thanks to veprr on askubuntu.com

A Beginners Guide to SpiderMonkey's MacroAssembler

I recently had cause to start writing a good introduction to SpiderMonkey’s MacroAssembler. I’ve expanded it a bit, and am reposting here. Hopefully this will be the first in a couple of posts, but no promises!

The SpiderMonkey MacroAssembler is the dominant interface to emitting code. For the most part, it tries to provide a hardware agnostic interface to the emission of native machine code.

Almost all of SpiderMonkey JIT compilers, Baseline, IonMonkey, CacheIR for example, end up using the MacroAssembler to generate code.

In IonMonkey's case, JavaScript bytecode is converted to a MIR graph (a high level graph representation of the program), which in turn is lowered into a LIR graph (a lower level representation of the program, with register assignments), and then the LIR graph is visited, with each node type having code generation implemented using MacroAssembler.

If you've never seen anything like the MacroAssembler, it can be a bit baffling! I know I was definitely confused a bit when I started.

Let's use CodeGenerator::visitBooleanToString as a worked example to show what MacroAssembler looks like and how it functions.

visitBooleanToString emits code that converts a boolean value to a string value. It consumes a LIR node of type LBooleanToString, which is the LIR node type for that operation.

void CodeGenerator::visitBooleanToString(LBooleanToString* lir) {
    Register input = ToRegister(lir->input());
    Register output = ToRegister(lir->output());
    const JSAtomState& names = gen->runtime->names();
    Label true_, done;

    masm.branchTest32(Assembler::NonZero, input, input, &true_);
    masm.movePtr(ImmGCPtr(names.false_), output);
    masm.jump(&done);

    masm.bind(&true_);
    masm.movePtr(ImmGCPtr(names.true_), output);

    masm.bind(&done);
}

Let's go through this bit by bit:

  • Register input = ToRegister(lir->input());: So at the top, we have two Register declarations. These correspond to machine registers (so, r11 or eax etc., depending on the architecture). In this case, we are looking at the IonMonkey code generator, and so the choice of which registers to use was made by the IonMonkey register allocator, so we simply take its decision: this is the ToRegister(...) bits.
  • const JSAtomState& names = gen->runtime->names();: This isn't really related to the MacroAssembler, but suffice it to say JSAtomState holds a variety of pre-determined names, and we're interested in the pointers to the true and false names right now.
  • Label true_, done;: Next we have the declaration of two labels. These correspond to the labels you would put in if you were writing assembly by hand. A label when created isn't actually associated with a particular point in the code. That happens when you masm.bind(&label). You can however branch to or jump to a label, even when it has yet to be bound.
  • masm.branchTest32(Assembler::NonZero, input, input, &true_);: This corresponds to a test-and-branch sequence. In assembly, test usually implies you take two arguments, and bitwise and them together, in order to set processor register flags. Effectively this is saying branch to true if input & input != 0.
  • masm.movePtr(ImmGCPtr(names.false_), output); This moves a pointer value into a register. ImmGCPtr is a decoration that indicates a couple of things: First, we're moving the pointer as an Immediate: that is to say, a constant that will be put directly into the code. The GCPtr portion tells the system that this pointer is a GCPtr or a pointer managed by the garbage collector. We need to tell the MacroAssembler about this so it can remember the pointer, and put it in a table for the Garbage Collector so when doing a Moving GC that changes the address of this value, so that the garbage collector can update it.
  • masm.jump(&done);: Un-conditionally jump to the done lable.
  • masm.bind(&true_);: Bind the true label. When something jumps to the true label, we want them to land here in the code stream.
  • masm.movePtr(ImmGCPtr(names.true_), output);: This moves a different pointer into the output register.
  • masm.bind(&done);: Bind the done label.

The way to think of the MacroAssembler is that it's actually outputting code for most of these operations. (Labels turn out to be a bit magical, but it's Ok not to think about it normally).

So, what does this look like in actually emitted code? I added a masm.breakpoint() to just before the branch, and ran the ion tests (../jit-test/jit_test.py --jitflags=all ./dist/bin/js ion). This found me one test case that actually exercised this code path: ../jit-test/jit_test.py --debugger=lldb --jitflags=all ./dist/bin/js ion/bug964229-2.js. I then disassembled the code with the LLDB function dis -s $rip -e $rip+40

I've annotated this with the rough MacroAssembler that generated the code. The addresses that the jumps hit are those that the Labels got bound to. The choice of registers made by Ion, we can infer, to be Register input = edx and Register output = rax.

While our compilers use MacroAssembler today, we're also looking to a future where maybe we use CraneLift for CodeGeneration. This is already being tested for WASM. If using and improving Cranelift is of interest to you, there's a job opening today!

The next blog post about MacroAssembler I'd like t write will cover Addresses and Memory, if I ever get around to it :D

Setting a Static IP for an Ubuntu 18.10 VM in VMWare Fusion 10

I recently setup a new laptop and a new Ubuntu VM. I have a static IP setup for my VMs so I can ssh into them with an SSH host alias.

For some reason, how this works has changed since last I set this up, and this was a huge pain. Here’s what worked:

Network Settings.png

I explicitly added a new custom network adapter. VMWare fusion assigned it the subnet IP of 192.168.77.0

You can verify this by checking /Library/Preferences/VMware\ Fusion/networking:

answer VNET_2_HOSTONLY_NETMASK 255.255.255.0
answer VNET_2_HOSTONLY_SUBNET 192.168.77.0
answer VNET_2_NAT yes
answer VNET_2_NAT_PARAM_UDP_TIMEOUT 30
answer VNET_2_VIRTUAL_ADAPTER yes

In the same directory, vmnet2/nat.conf has more helpful info:

[host]

# NAT gateway address
ip = 192.168.77.2
netmask = 255.255.255.0

Alright. So now we can setup the VM with an IP. 192.168.77.10 was chosen arbitrarily.

In theory you can do this with nmctl, but I couldn’t grok it.

In theory you can do this with nmctl, but I couldn’t grok it.

And boom, this works.

Things I tried before I couldn’t get working:

  • Setting a static IP on vmnet1 or vmnet8

  • Setting the static IP via /etc/network/interfaces

Moving Mercurial Changesets to a New Clone

I’m setting up a new laptop, and I didn’t want to copy a whole mercurial clone from one laptop to another. All I wanted to do was move my not-yet-public changesets from my local repo to a new clone.

Turns out, with revsets and bundle, this isn’t too hard!

(OldRepo) $ hg bundle --base 423bdf7a802b -r 'author(mgaudet) and not public()' MattsChanges.bundle

Now, because the base revision is pretty old (because my eldest non-public change I wanted to bring along has a pretty old base), the bundle is big, relative to the number of changesets it's bringing across.

However, it applies nicely!

(NewRepo) $ hg unbundle ../MattsChanges.bundle 
adding changesets
adding manifests
adding file changes
added 20 changesets with 49 changes to 83824 files (+16 heads)

I got confused briefly when I tried to use hg import; it complained abort: ../MattsChanges.bundle: no diffs found which isn't a great error message, but I figured it out.

The Unofficial Incomplete Spidermonkey Bibliography

I’ve started a little side-project: The Unofficial Incomplete Spidermonkey Bibliography. I’ve been interested in doing this since at least March of this year, but I finally have done it.

This project was definitely inspired by Chris Seaton’s The Ruby Bibliography, however I don’t want to focus exclusively on academic publications. There’s lots of excellent writing about SpiderMonkey out there that is blogs, bugs reports, and more. My hope is that this is a home to help gather all this knowledge.

On a personal note, I’m particularly interested in older blog posts, especially those that exist only in archive.org links in people’s personal notebooks here and there.

Please, give me a hand: Open Issues for things you’d like references to, or make pull requests to help fill in all the enormous gaps I am certain exist in the bibliography as I have it now.

Using rr-dataflow: Why and How?

If you haven't heard of rr, go check it out right away. If you have, let me tell you about rr-dataflow and why you may care about it!

Perhaps you, like me, occasionally need to track something back to its origin. I will be looking at the instructions being executed in an inline cache, and I will think "Well that's wrong... where did this instruction get generated?"

Now, because you can set a watchpoint, and reverse continue, you can see where a value was last written; it's easy enough to do

(rr) watch *(int*)($rip)
(rr) reverse-continue

The problem is that, at least in SpiderMonkey, that's rarely sufficient; the first time you stop, you'll likely be seeing the copy from a staging buffer into the final executable page. So you set a watch point, and reverse continue gain. Oops, now you're in the copying of a the buffer during a resize; this process can happen a few times before you arrive at the actual point you are interested in.

Enter rr-dataflow. As it says on the homepage: "rr-dataflow adds an origin command to gdb that you can use to track where data came from."

rr-dataflow is built on the Capstone library for disassembly. This allows rr-dataflow to determine for a given instruction where the data is flowing to and from.

So, in the case of the example described before, the process starts almost the same:

(rr) source ~/rr-dataflow/flow.py
(rr) watch *(int*)($rip)
(rr) reverse-continue

However, this time, when we realize the watchpoint stops at an intermediate step, we can simply go:

(rr) origin

rr-dataflow then analyzes the instruction that tripped the watchpoint, sets an appropriate watchpoint, and reverse continues for you. The process of getting back to where you need to becomes

(rr) origin
(rr) origin

Tada! That is why you might be interested in rr-dataflow. The homepage also has a more detailed worked example.

A caveat: I've found it to be a little unreliable with 32-bit binaries, as it wasn't developed with them in mind. One day I would love to dig in a little more into how it works, and potentially help it be better there. But in the mean time, thanks so much Jeff Muizelaar for creating such an awesome tool.

Fixing long delays when a program crashes while running in the VSCode terminal

Symptom: You’re writing broken code (aren’t we all?) and your program is crashing. When it crashes running under the regular OS/X terminal, you don’t see any problems; it the program crashes and it’s fine.

However, when you do the same thing under VSCode’s integrated terminal, you see a huge delay.

Solution:

launchctl unload -w /System/Library/LaunchAgents/com.apple.ReportCrash.plist

For some reason, crashes in the regular terminal don’t seem to upset ReportCrash, but when they happen in VSCode, ReportCrash takes a tonne of CPU and hangs out for 10-30s. My, totally uninformed guess, is that ReportCrash thinks the crash is related to VSCode and is sampling everything about the whole VSCode instance. The only evidence I have for this is that I find the crash delays don’t seem to happen right after restarting VSCode.

Cleaning a Mercurial Tree

(Should have mentioned this in the last post. Oops!)

You've littered your tree with debugging commits. Or you've landed a patch without pushing yourself, so it exists upstream already and you don't need your local copy. Slowly but surely hg wip becomes less useful.

You need hg prune.

Works just like your garden's pruning shears. Except, it's powered by Evolve, and so really it just marks changesets as obsolete, hiding them from view.

My Mercurial Workflow

When I joined Mozilla I decided to leap head first into Mercurial, as most of Mozilla's code is stored in Mercurial. Coming from a git background, the first little while was a bit rough, but I'm increasingly finding I prefer Mercurial's approach to things.

I really do find the staging area too complex, and branches heavierweight than necessary (see my earlier post Grinding my Gears with Git), and increasingly I really appreciate the plugin architecture that allows the creation of a Mercurial that works for you.

I have to say though, where I do envy git is that they've got some great looking docs theses days, and Mercurial is pretty far behind there, as often the best docs are those on the wiki, and it doesn't always feel very maintained.

With that in mind, let's talk about how I work with Mercurial. This post is heavily inspired by (and my workflow definitely inspired by) Steve Fink's post from last year.

Setup

I mostly used the initial setup of .hgrc provided by Mozilla's bootstrap.py. I have made a couple tweaks though:

Diff settings

[diff]
git = true
showfunc = true
unified = 8

The above stanza helps get diffs in a friendly format for your reviewers. Last time I checked, bootstrap.py didn't set unified = 8.

[color]
diff.trailingwhitespace = bold red_background

The SpiderMonkey team is a stickler for some things, including trailing whitespace. This colourization rule helps it stand out when you inspect a change with hg diff.

Aliases

  • I'm a huge fan of the wip extension that bootstrap.py sets up. It lists recent heads in graph format with a terse log, along with colourized output.
Green Hashes are draft revisions; blue are public. The red text highlights the current working directory parent, and yellow text are bookmark names.

Green Hashes are draft revisions; blue are public. The red text highlights the current working directory parent, and yellow text are bookmark names.

  • Mercurial's default hg log has a different philosophy than git log. Where git log shows you a relative view of history from your current working directory or specified revision, Mercurial's log command by default shows a global view of history in your repository. In a small project, I can imagine that making sense, but to be honest, 95% of the time I find hg log does the wrong thing for what I want. So:

    [alias]
    flog = log -f --template=wip

    Adds hg flog as an alias for following-log, which is closer in behaviour to the git log. The --template-wip bit uses the colourization and line formatting already provided for the wip extension.

    Honestly though, I use hg wip about 10x more often than I use hg {f}log.

Phases

One of the cool things about Mercurial is its well developed reasoning about history rewriting. One key element to that is the notion of 'phases' which help define when a rewrite of a changeset can happen. There's a darn good chance this will be a sensible default for you in your .hgrc:

[phases]
publish = false

Getting to work

I use a clone of mozilla-unified as the starting point. When I start working on something new, unless I have a good reason not to, I'll typically start work off of the most recent central tag in the repo.

$ hg pull && hg up central

Labelling (Bookmarks)

When working in Mercurial, one of the things you get to decide is whether or not you label you commits or not. This article goes into more detail, but suffice it to say, there's no requirement, as there is in in git, to label your lightweight branches (using Bookmarks).

I have experimented both with labelling and not these days, and I have to say, so long as I have hg wip, I think it's pretty reasonable to get by without bookmarks, especially as my commit messages typically end up having the bug numbers in them, so it feels almost like redundant info to label the branch. Maybe if you work on a project where commit messages aren't associated with a bug or other identifier labelling might be more useful.

When developing, I tend to use commits as checkpoints. Then later, what I will do is use history rewriting tools to create the commits that tell the story I want to tell. In Mercurial, this means you'll want to enable the Evolve and Histedit extensions (Facebook's chistedit.py is also nice, but not necessary). You'll also want rebase (unlike in git, rebase and histedit are two distinct operations).

A tip with histedit: When I first started with mercurial, I found myself more uncomfortable with histedit than I was with git. This was because I was used to doing 'experimental' history editing, always knowing I could get back to the repo state I stared from just by moving my branch pointer back to the commit I left behind.

Mercurial, with the evolve extension enabled, has a more complicated story for how history editing works. Over time, you'll learn about it, but in the mean time, if you want to be able to keep your old history: hg histedit --keep will preserve the old history and create the new history under a new head.

Evolve knows some pretty cool tricks, but I think I'll save that for later once I can explain the the magic a little more clearly.

More Extensions

absorb

Absorb is the coolest extension for mercurial I know of. What it does is automate applying edits to a stack of patches, exactly the kind of edits that show up in a code-review based workflow. If you create stacks of commits and get them reviewed as a stack, it's worth looking into this.

The best explanation I know of is this little whitepaer written by Jun Wu, the original author.

share

One extension I adore, is the share extension, which ships with Mercurial. It's very similar in spirit to git-worktree This allows me to have multiple working copies, but a common repo storage. Even better, it works great to have a working copy inside my VM that's backed by my current repo.


So that was a long, rambling blog post: Mostly I just wanted to share the pieces of Mercurial that make me really happy I stuck to it, and to encourage others to give it a shot again. While I doubt mercurial will ever supplant Git, as Git has mindshare for days, at the very least I think Mercurial is worth exploring as a different point in the DVCS design space.

Please, if you've got tips and tricks you'd like to share, or cool extensions, feel free to reach out or leave a comment.

An Inline Cache isn’t Just a Cache

If you read the Wikipedia page for Inline Caching, you might think that inline caches are caches, in the same way that you might talk about a processor cache, or a page cache for a webserver like memcached. The funny thing is, that really undersells how they're used in SpiderMonkey (and I believe other JS engines), and in hindsight I really wish I had known more about them years ago.

The Wikipedia page cites a paper L. Peter Deutsch, Allan M. Schiffman, "Efficient implementation of the Smalltalk-80 system", POPL '84, which I found on the internet. In the paper the authors discuss the key aspect of their implementation being the ability to dynamically change representation of program code and data,

"as needed for efficient use at any moment. An important special case of this idea is > caching> : One can think of information in a cache as a different represenation of the same information (considering contents and accessing information together)"

Part of their system solution is a JIT compiler. Another part is what they call inline caching. As they describe them in the paper, an inline cache is self modifying code for method dispatch. Call sites start as pointing to a method-lookup: The first time the method-lookup is invoked, the returned method (along with a guard on type, to ensure the call remains valid) overwrites the method-lookup call. The hypothesis here is that a particular call site will very often resolve to the same method, despite in principle being able to resolve to any method.

In my mind, the pattern of local self-modifying code, the locality hypothesis, as well as the notion of a guard are the fundamental aspects of inline caching.

The paper hints at something bigger however. On Page 300 (pdf page 4)

For a few special selectors like +, the translator generates inline code for the common case along with the standard send code. For example, + generates a class check to verify that both arguments are small integers, native code for the integer addition, and an overflow check on the result. If any of the checks fail, the send code is executed.

It's not clear if the authors considered this part of the inline caching strategy. However, it turns out that this paragraph describes the fundamental elements of the inline caching inside SpiderMonkey.

When SpiderMonkey encounters an operation that can be efficiently performed under certain assumptions, it emits an inline cache for that operation. This is an out-of-line jump to a linked list of 'stubs'. Each stub is a small piece of executable code, usually consisting of a series of sanity checking guards, followed by the desired operation. If a guard fails, the stub will jump either to another one generated for a different case (the stubs are arranged in a linked list) or to the fallback path, which will do the computation in the VM, then possibly attach a new stub for the heretofore not-observed case. When the inline cache is initialized, it will start pointed to the fallback case (this is the 'unlinked' state from the Smalltalk paper).

SpiderMonkey generates these inline caches for all kinds of operations: Property accesses, arithmetic operations, conversion-to-bool, method calls and more.

Let's make this a little more concrete with an example. Consider addition in Javascript: function add(a,b) { return a + b; }

The language specifies an algorithm for figuring out what the correct result is based on the types of the arguments. Of course, we don't want to have to run the whole algorithm every time, so the first time the addition is encountered, we will attempt to attach an inline cache matching the input types (following the locality hypothesis) at this particular operation site.

So let's say you have an add of two integers: add(3,5) The first time through the code will be an inline cache miss, because there is none generated. At this point, SM will attach an Int32+Int32 cache, which consists of generated code like the following pseudo code:

int32_lhs = unbox_int32(lhs, fail); // Jump to fail if not an int32
int32_rhs = unbox_int32(rhs, fail);
res = int32_lhs + int32_rhs;
if (res.overflowed()) goto fail; 
return res;

fail: 
  goto next stub on chain

Any subsequent pair of integers being added (add(3247, 12), etc) will hit in this cache, and return the right thing (outside of overflow). Of course, this cache won't work in the case of add("he", "llo"), so on a subsequent miss, we'll attach a String+String Cache. As different types flow through the IC, we build up a chain (*) handling all the types observed, up to a limit. We typically terminate the chains when they get too long to provide any value to save memory. The chaining here is the 'self-modifying' code of inline caching, though, in SpiderMonkey it's not actually the executable code that is modified, just the control flow through the stubs.

There have been a number of different designs in Spidermonkey for inline caching, and I've been working on converting some of the older designs to our current state of the art, CacheIR, which abstracts the details of the ICs to support sharing them between the different compilers within the SpiderMonkey Engine. Jan's blog post introducing CacheIR has a more detailed look at what CacheIR looks like.

So, in conclusion, inline caches are more than just caches (outside the slightly odd Deutsch-Schiffman definition). The part of me that likes to understand where ideas come from would be interested in knowing how inline-caching evolved from the humble beginnings in 1983 to the more general system I describe above in SpiderMonkey. I'm not very well read about inline caching, but I can lay down an upper limit. In 2004, the paper "LIL: An Architecture Neutral Language for Virtual-Machine Stubs", describes inline cache stubs of similar generality and complexity, suggesting the range is somewhere between 1983 and 2004.

(*) For the purposes of this blog post, we consider the IC chains as a simple linked list, however reality is more complicated to handle the type inference system.


Thank you so much to Ted Campbell and Jan de Mooij for taking a look at a draft of this post.


Addendum:  Tom Schuster points out, and he's totally right, that the above isn't entirely clear: This isn't the -most- optimized case. IonMonkey will typically only fallback to Inline Caches where it can't do something better.

Erasure from GitHub

Boy GitHub doesn't handle you abandoning an email address particularly well when dealing with their Contributors page.

If you compare the Contributors page for Eclipse OMR with the output of git shortlog, you notice a divergence

Screen Shot 2018-05-07 at 12.36.30 PM.png
$ git shortlog -sn --no-merges | head -n 4
   176    Matthew Gaudet
   168    Leonardo Banderali
   137    Robert Young
   128    Nicholas Coughlin

Turns out, the issue here is that I gave up my IBM email on GitHub when I left. So now, GitHub can't link my commits in OMR to me, so I no longer show up as a contributor. 

I'm not actually upset about this, but I do wonder about the people who (wrongly) say "GitHub is your resume", and the ways this disadvantages people.

Grinding my Gears with Git

Since I have started working with Mozilla, I have been doing a lot of work with mercurial, and development in the Bugzilla way. So much so, that I've not really used git much in the last four months.

Coming back to git, to collaborate on a paper I am writing with some researchers, I find some things really bothersome, that I had sort of taken for granted after years of becoming one with the Git way

  • Branches / Pull Requests are too heavyweight! This might be a side effect of the writing of a paper, but what I find myself desperately wanting to do is produce dozens of independent diffs, that I can throw up for other authors to look at. Especially speculative diffs, that change a section of the paper in a way that I'm not sure we absolutely want to go. This isn't so much a criticism of git, so much as it is of the github style of collaboration.
  • The staging area is way too complicated for a simple workflow of trying to make quick changes to a small project. When doing full fledged production software engineering, I have found it useful, but working on a paper like this? It's just extra friction that doesn't produce any value.

I have another rant one day about how LaTeX and version control is a pretty bad workflow, due to the mismatch in diff semantics (or alternatively, the hoops one needs to go through to get a good version-controllable LaTeX document).

Open tabs are cognitive spaces

I love this blog post by Michail Rybakov, Open tabs are cognitive spaces. It provides insight into a way of working (the hundreds of tabs model) I've always found baffling and confusing. I wonder if some of the tooling he talks about was more common, if I wouldn't aim more towards a thousand-tab style.

Also, I learned something super cool:

The Sitterwerk library in St.Gallen follows a serendipity principle. Every book has an RFID chip, so you can place books everywhere on the shelves - the electronic catalogue will update the location of the book accordingly. The visitors are encouraged to build their own collection, placing books that they used for their research together, thus enabling the next reader to find something relevant by chance, something they didn’t know they were looking for.
— Open tabs are cognitive spaces

I absolutely love that idea, as much as it scares the organized person in me.