Note Taking in my Work

My entire professional career I've been a note-taker at work. In the beginning, I used the Journal template of Lotus notes.

Image borrowed from the University of Windsor IT department

Image borrowed from the University of Windsor IT department

I used the journal extensively throughout my internship with IBM. Each day got a new entry, and each day's entry was filled with goals, notes on what I was learning, and debugging notes about what I was working on. Everything went in the journal, even PDFs of papers I was reading, along with the notes about them. Using its full text search, the Notes journal was how I survived my internship, and the huge volume of information that I had to deal with there.

Strangely, the single-fulltext-DB practice didn't come back to school with me, and I returned to paper notes in classes; perhaps for the better. In work though, it's hard to write down a stack trace, and find it again later, so when i went back to IBM as a full time employee, I wanted to have another database of notes. For all the power of the Lotus Notes DB, it had some downsides that I didn't like, and so I didn't want to return to that, so I went hunting for a solution.

I landed on Zim Wiki. It served me quite well through my term, though in the last few months I worked there I got a MacBook, and I discovered that while Zim is functional, it isn't excellent (I used the instructions here along with an Automator app to launch it).

I've tried to make sure every intern I worked with found their own solution to taking notes. Some took up Zim, but the last intern I worked with at IBM also had a Mac, and he introduced me to an interesting app called Quiver (Thanks Ray!), that I've been using for the last few months.

Quiver in action. In this view, the Markdown sources are being rendered previewed in the split pane view.

Quiver in action. In this view, the Markdown sources are being rendered previewed in the split pane view.

Quiver's most unique feature is the notion of cells. Each document in a Quiver notebook consists of one or more 'cells' in order. A cell has a type, corresponding to its content: Text, Code, Markdown, Latex or Diagram. The code cells are powered by the Ace editor component, and have syntax highlighting built in.

You can add cells, split cells and re-order them, using keystrokes to change between types.

Documents can be tagged for organization, and the full-text search seems pretty good so far.


So far, my experience with Quiver has been quite positive. Every day one of the first things I do is create a new note in my "Work Logs" notebook, titled by the date, and in it I write down my days goals as a checklist, as I understand them in the morning. In that note I keep casual debugging notes, hypotheses I need to explore etc. I also have a note for many bugs I work on, where I collate more of the information in a larger chunk, if warranted.

One of the magical things (to me at least, coming from Zim) is that rich text paste works very well; this is great for capturing IRC-logs formatted by my IRC client, or syntax highlighted code from here or there. I can also capture images by pasting them in (though, this also worked OK in Zim).


There are some concerns I have with Quiver though, that make me give it at best a qualified recommendation.

  • As near a I can tell, it's developed by a single developer, and I don't think he makes enough on it to work on it full time. While the application has been solid as a rock to me, it's clear there's still lots of places where work could be done. For example, there's no touch-bar support (to be honest, I just want to use the emoji picker, though it would be neat to see the cell type accessible from the touch-bar).
  • There are also a few little bugs I've encountered, almost all related to rich text editing. For example, the checkboxes are a bit finicky to edit around (and don't behave like bullets as I expected).

Overall, I am really enjoying Quiver, and will definitely keep using it

Trying out Visual Studio Code

In the interests of personal growth, I've been exploring some alternatives to my old-school vim+make setup. Especially for exploring a new codebase of appreciable size, I find myself wanting a little more.

While I'm absolutely certain I could get clang_complete working in the way I want, I figure... maybe time to think about other tools? It's 2018 after all, and my development flow has been pretty much stagnant since about 2015, when I upgraded from grep to the silver searcher, and before that, had been pretty much stuck since ~2012 when I started doing C++ the majority of the time.

I've been really interested in seeing what Microsoft has been doing with Visual Studio Code. I've been using it as my exclusive development environment now for a little over a month. There was a bit of a rocky setup trying to get IntelliSense working with Spidermonkey, and I'm not entirely proud of the solution I came up with (hacking together the c_cpp_properties.json in such a way that changes to defines, if and when they happen, are going to cause me trouble. Alas, there's no support for -include right now) but it works!

It's been a long time since I've used an IDE, and I have to say... I like it. Having tight local feedback on syntax errors is worth so much of the other pain VSCode has put me through, but also having access to IntelliSense is pretty amazing. The built in terminal has become incredibly powerful to me, by allowing me to use the command line tools I want to (hg wip) without leaving the IDE, and the ability to cmd-click a an error message's filename:line-number to jump to that in the editor is pretty amazing.

As a very long time vim user, I find myself a little surprised at how little I miss modal editing. I think the only motion I regularly miss is the jump-to-enclosing-(brace,parens). 

VSCode has a lot really going for it:

  • Almost all the settings are done via JSON files. While I normally hate hand-editing JSON, it's a refreshing change from most software's control panels, and allows great granularity, doubly so since VSCode is syntax checking its own settings files. 
  • Lots of passive information sources. The editor uses the gutters to great effect in providing information, such as highlighting lines that have changed in the patch you are working on. Has a minimap, similar to Sublime Text (though, I've never used Sublime), and inside the minimap, similar gutter information is used to highlight search matches, syntax errors, etc. 
The green bar in the gutter is saying this is new, uncommitted code.

The green bar in the gutter is saying this is new, uncommitted code.

  • Fuzzy file open (cmd-P) is a built in feature. 
  • Find by symbol is pretty magical (cmd-T).
  • The code command line tool allows me to open files, even from the built in terminal. 

Now, I shouldn't say that it's been entirely without pain.

  • The mercurial plugin is quite basic, and doesn't serve my needs particularly well, leading to me using the command line in the built in terminal. This is mostly fine, though I've yet to hook up 'code' as my editor.
  • Occasionally IntelliSense just loses its mind. Seems to generally get better with a restart.

I've tried out the debugger integration, which was... OK; though, that could mostly reflect my comfort with a command line debugger.

I have yet to put the extensions through their paces though. So far, all I've installed are the C++ tools, the mercurial extension, and one for trimming whitespace at the end of edited lines.

Overall, a month in, I'm very impressed. We'll see if it sticks!

FOSDEM Community Track (February 2017)

*cough* This was a draft post that it turns out I totally forgot about. Looking it over though, it seems fairly complete, despite my never having posted it.


I was at FOSDEM speaking as part of the Ruby DevRoom this year. I had a great time, and you can watch my talk here.

However, the Ruby track was only on the first day. The second day, I spent some time at the "Community" track... despite the fact that I couldn't get into most talks because of room size issues!

During the Mentoring 101 talk, those who couldn't get into the room instead held a round table in the hall way outside the room, which I found fascinating. The topic of the round table started with "How do you mentor new people to your community", but also stretched into how do encourage new people to become part of your community.

There was a good spread of projects participating in the talk, including community members from WordPress, LibreOffice and Apache Spark.

I took some notes from that discussion, that I'll share and expand below:

Sign posting:

Many people made the point that it's really important as a community that you demonstrate the variety of ways in which your community is willing to take contributions. https://make.wordpress.org/ was called out as a good example of this, which calls out 16 different subteams on wordpress, each of which points out what kind of work they do and how you can get involved.

Other signposting pointed out:

  • Issue labels Beginners is a good choice, though some communities go further and have a "first time contributor" tag. A comment made by a number of people was the importance of curating these beginner tags and ensuring that they are properly laid out. Similarly, it's really important more experienced developers don't tackle these, to avoid them drying up. Stories were told of some projects that would actively reject pull requests for "first time contributor" bugs if someone had done work on the project before.
    • Some people pointed out a good tag that's not common enough was "second time contribution" -- these are the slightly larger tasks that really help hook people into a community.
  • Recognition: Some projects make a big deal of recognizing everyone who contributes. LibreOffice apparently sends out a paper certificate.
  • Non-code contribution: Super important to call out the value of them! Documentation, bug triage, reproduction got a huge number of nods.

Onboarding

  • Face to face is super important: Hangouts, skype, etc. Important to build those personal relationships. If you're geographically close, coffee shops.
  • Open sprint day: A day where a large fraction of the community tries to show up simultaneously to work on a sprint together (virtual or real world!)
  • Have people document their own onboarding struggles. Easy contribution, but also super valuable.

Advertising to new contributors:

Sites exist to pull in new contributors.

There are university programs asking students to try to contribute to OSS: Having smooth paths to help them is great.

Culture

  • Be aware! The Loudest culture wins

Reading Testarossa Compiler Logs

The Testarossa compiler technology is the one included in both Eclipse OMR and OpenJ9. One of the interesting features of the OMR compiler technology is its pervasive use of semi-structured logging to explain the actions of the compiler as it compiles.

This is particularly important in the OpenJ9 compiler which makes huge amounts of decisions at runtime, based on data drawn from the running Java application or the state of the system (most compilation in OpenJ9 happens asynchronously on one of four compilation threads).

You can generate logs using the tracing options documented in the OMR Compiler Problem Determination Guide

For java, this typically means passing some -Xjit:trace* option, in addition to a log file specification.

If you were to download a build of OpenJ9, from AdoptOpenJDK let's say, you can test this out by generating logs for every method executed while running java -version like this:

$ java -Xjit:traceIlGen,log=logFile -version

You can modify this to see what was compiled by adding verbose to the Xjit options:

I've truncated this for space.

Of course, if you log everything, you'll likely produce huge logs that are a slog to deal with, so the Testarossa technology provides filtering mechanisms. For example, let's say we want to get a traceFull log of just the method java/lang/String.hashCode()I:

$ java  -Xjit:'{java/lang/String.hashCode*}(traceIlGen,log=logFile)'  -version

The additional quoting is there to deal with the shell wanting to handle many of the characters in that option string.

So, traceILGen isn't a particularly interesting option, unless you're looking at how bytecode becomes intermediate representation -- at which point it becomes great. traceFull is a useful alias for a number of tracing flags (though, despite the name, not all of them).

java  -Xjit:'{java/lang/String.hashCode*}(traceFull,log=logFile)'  -version

Using the above command, I got a traceFull log for java/lang/String.hashCode()I, and put it up on GitHub as a Gist. The rest of this post will talk about that gist.

So, if you look at it, the logs are XML... ish. There are pieces that try to form XML, but other pieces that are unaware, and write to the log as a plain text file.

I personally have waffled from time to time as to whether or not the artifice is worthwhile, or problematic. I lean towards worthwhile now, but have not always.

The basic pattern for most of a tracefull log is as follows:

  • A section on IlGen, the mapping of bytecode to trees (the Testarossa IL.
  • A dump of the trees, any new elements of the symbol reference table, and the CFG,
  • The optimization results for an opt
  • Another dump of the trees.

The last two points repeat until the optimization strategy is done executing.

Optimizations will number the transformations they make to allow selective disablement.

<optimization id=9 name=coldBlockOutlining method=java/lang/String.hashCode()I>
Performing 9: coldBlockOutlining
[     2] O^O COLD BLOCK OUTLINING: outlined cold block sequence (9-10)
[     3] O^O COLD BLOCK OUTLINING: outlined cold block sequence (5-5)

This comment does an excellent job of explaining it, though, the idea has also been called "optimization fuel" before.

As far as reading the trees, I'll defer to the documentation about the intermediate representation, contained in this directory, and in particular this document, Intro to Trees.

There's a lot more in these logs, but I'm a bit tired, so I'll leave this here. The logs are not dense, but can be invaluable in understanding the decision the compiler has made over a compilation and identifying bugs.

Some notes on CMake variables and scopes

I've been doing a lot of work on CMake for Eclipse OMR for the last little while.

CMake is a really ambitious project that accomplishes so much with such simplicity it's like magic... so long as you stay on the well trodden road. Once you start wandering into the woods, because your project has peculiar needs or requirements, things can get hairy pretty quickly.

There's a pretty steep ramp from "This is amazing, a trivial CMakeLists.txt builds my project" to "How do I do this slightly odd thing?"

We'll see how much I end up talking about CMake, but I'll start with a quick discussion of variables and scopes in CMake.

Variables and scopes in CMake

First, a quick note of caution: Variables exist in an entirely separate universe from properties, and so what I say about variables may well not apply to properties, which I am much less well versed in.

Variables are set in CMake using set:

set(SOME_VARIABLE <value>)

The key to understanding variables in CMake in my mind is to understand where these variables get set.

Variables are set in a particular scope. I am aware of two places where new scopes are created:

  1. When add_subdirectory is used to add a new directory to the CMake source tree and
  2. When invoking a function

Each scope when created maintains a link to its parent scope, and so you can think of all the scopes in a project as a tree.

Here's the trick to understanding scopes in CMake: Unlike other languages, where name lookup would walk up the tree of scopes, each new scope is a copy by value of the parent scope at that point. This means add_subdirectory and function inherit the scope from the point where they're called, but modification will not be reflected in the parent scope.

This actually can be put to use to simplify your CMakeLists.txt. A surprising amount of CMake configuration is still done only through what seem to be 'global' variables -- despite the existence of more modern forms. i.e despite the existence of target_compile_options, if you need to add compiler options only to a C++ compile, you'll still have to use CMAKE_CXX_FLAGS.

If you don't realize, as i didn't, that scopes are copied-by-value, you may freak out at contaminating the build flags of other parts of a project. The trick is realizing that the scope copying limits the impact of the changes you make to these variables

Parent Scope

Given that a scope has a reference to the scope it was copied from, it maybe isn't surprising that there's a way in CMake to affect the parent scope:

set(FOO <foo value> PARENT_SCOPE)

Sets FOO in the parent scope... but not the current scope! So if you're going to want to read FOO back again, and see the updated value, you'll want to write to FOO without PARENT_SCOPE as well.

Cache Variables

Cache variables are special ones that persist across runs of CMake. They get written to a special file called CMakeCache.txt.

There's a little bit different about cache variables. They're typed, as they interact with CMake's configuration GUI system), as well they tend to override normal variables (which makes a bit of sense). Mostly though, on the subject, I'll defer to the documentation!

Scope Tidbits:

There's a couple other random notes related to scoping I'd like to share.

  1. It appears that not all scopes are created equal. In particular, it appears that targets will always use target-affecting variables from the contained directory scope, not function scopes.

    function(add_library_with_option)
        set(CMAKE_CXX_FLAGS "-added-option)
        add_library(foo T.cpp) 
     endfunction(add_library_with_option)

    It's been my experience that the above doesn't work as expected, because the add_library call doesn't seem to see the modification of the CXX flags.

  2. Pro Tip: If anything has gone wrong in your CMakeLists.txt, try looking in the cache! It's just a text file, but can be crazy helpful to figure out what's going on.

Paper I Love: "The Challenges of Staying Together While Moving Fast"

(Prefix inspired by Papers We Love)

I recently had the opportunity to meet Julia Rubin when she was meeting at IBM. While we met for only a few minutes, we had a great (albeit short) conversation, and I started looking into her publications. With only one read, (out of a small pile!) I've already found a paper I want to share with everyone: 

"The Challenges of Staying Together While Moving Fast: An Exploratory Study" - Julia Rubin and Martin Rinard (PDF)

This paper speaks to me: It really validates many of my workplace anxieties, and assures me that these feelings are quite universal across organizations. The industry really hasn't nailed building large software products, and there's a lot of work that could be done to make things better. 

The paper includes a section titled "Future Opportunities". I hope academia listens, as there are great projects in there with potential for impact on developers lives. 

Lambda Surprise

Another day of being surprised by C++

typedef int (*functiontype)();
int x = 10;

functiontype a,b;
a = []() -> int { return 10; };   // OK
b = [&x]() -> int {return x; }; // Type error

My intuition had said the latter should work; after all, the only thing that changed was the addition of a capture expression.

However, this changes the type of the lambda so that it's no longer coercable to a regular function (and in fact, others I've talked to suggest that the surprising thing is that the assignment to a even works.)

sigh.

There is a work around:

#include <functional> 
typedef std::function<int()> functiontype;

It's funny: Before working with C++ lambdas, I had been thinking they would provide huge amounts of power when working with (and possibly providing!) C/C++ API interfaces. Alas, they are special beasts.

Debugging a Clang plugin

Writing this down here so that maybe next time I google it, I will find my own page :)

The magical incantation desired is

$ gdb --args clang++ .... 
Reading symbols from clang++...(no debugging symbols found)...done.
(gdb) set follow-fork-mode child

Doing this means that gdb will not get stuck just debugging the clang driver!

Going to be speaking at the Ruby devroom 2016!

I will be speaking this year at the FOSDEM Ruby devroom about the challenges the Ruby+OMR JIT compiler faces, and how they can be surmounted with your help! The abstract is below, or on the FOSDEM website. 

Highly Surmountable Challenges in Ruby+OMR JIT Compilation

The Ruby+OMR JIT compiler adds a JIT to CRuby. However, it has challenges to surmount before it will provide broad improvement to Ruby applications that aren’t micro-benchmarks. This talk will cover some of those challenges, along with some brainstorming about potential ways to tackle them.

The Ruby+OMR JIT compiler is one way to add JIT compilation to the CRuby interpreter. However, it has a number of challenges to surmount before it will provide broad improvement to Ruby applications that aren’t micro-benchmarks. This talk will cover some of those challenges, along with some brainstorming about potential ways to tackle them.

The challenges range from small to large. You can get a sneak peek by looking through the issue tracker for the Ruby+OMR preview.  

Boobytrapped Classes

Today I learned: You can construct Boobytrapped class hierarchies in C++.

Here's an example (Godbolt link)

#include <iostream> 
struct Exploder { 
 // Causes explosions! 
}; 

struct Unexploder { 
  void roulette() {} 
};

template<class T>
struct BoobyTrap : public T { 
  /* May or may not explode. 
  */
  void unsafe_call () { exploder(); }
  void safe_call() {} 

  private: 

  void exploder() { T::roulette(); } 
}; 

int main(int argc, char** argv) { 
    BoobyTrap<Unexploder> s; 
    s.safe_call();
    s.unsafe_call(); // Click! We survived! 

    BoobyTrap<Exploder> unsafe;
    unsafe.safe_call(); 

    // Uncomment to have an explosion occur. 
    // Imagine this with conditional compilation?
    // unsafe.unsafe_call(); 
    return 0;
}

The wacky thing here is that you can totally use the safe_call member function of the BoobyTrapped class independent of parent class -- because unsafe_call is only expanded and substituted if you call it!

This feels awkward, because it divides the interface of BoobyTrap into callable and uncallable pieces. I cant decide if I think this is a good idea or bad idea.

Pro:

  • You can knit together classes, and so long as the interfaces match enough so that the interfaces work, you're OK.

Con:

  • Feels like Fragile Base class ++

Thanks to Leonardo for pointing this out!

Commits vs. Pull Requests

When working on an open source project, you face questions of how to make your work consumable. I've been watching the process in my work on Eclipse OMR, and I've come up with my personal mental model: 

  • Commits delimit atomic changes to the source code. Each commit should build, and contains one state transition for the code. Each commit message provides an opportunity to explain your reasoning for making the change. 
  • Pull Requests group together a set of related commits; For example, a feature, or a group of cleanup commits. These commits can be tested and discussed as a whole, and therefore merged as a whole.

Of course, this model is heavily driven by our Pull Request workflow at work.  

Getting Ready

Now that I'm working much more in the open on Eclipse OMR, I am getting ready to start blogging more about tech, and things from my day job.

I suspect lots of small posts, tips and tricks, etc. We'll see how it goes. A separate section to keep things a bit isolated from my other blog.