Reading Testarossa Compiler Logs

The Testarossa compiler technology is the one included in both Eclipse OMR and OpenJ9. One of the interesting features of the OMR compiler technology is its pervasive use of semi-structured logging to explain the actions of the compiler as it compiles.

This is particularly important in the OpenJ9 compiler which makes huge amounts of decisions at runtime, based on data drawn from the running Java application or the state of the system (most compilation in OpenJ9 happens asynchronously on one of four compilation threads).

You can generate logs using the tracing options documented in the OMR Compiler Problem Determination Guide

For java, this typically means passing some -Xjit:trace* option, in addition to a log file specification.

If you were to download a build of OpenJ9, from AdoptOpenJDK let's say, you can test this out by generating logs for every method executed while running java -version like this:

$ java -Xjit:traceIlGen,log=logFile -version

You can modify this to see what was compiled by adding verbose to the Xjit options:

I've truncated this for space.

Of course, if you log everything, you'll likely produce huge logs that are a slog to deal with, so the Testarossa technology provides filtering mechanisms. For example, let's say we want to get a traceFull log of just the method java/lang/String.hashCode()I:

$ java  -Xjit:'{java/lang/String.hashCode*}(traceIlGen,log=logFile)'  -version

The additional quoting is there to deal with the shell wanting to handle many of the characters in that option string.

So, traceILGen isn't a particularly interesting option, unless you're looking at how bytecode becomes intermediate representation -- at which point it becomes great. traceFull is a useful alias for a number of tracing flags (though, despite the name, not all of them).

java  -Xjit:'{java/lang/String.hashCode*}(traceFull,log=logFile)'  -version

Using the above command, I got a traceFull log for java/lang/String.hashCode()I, and put it up on GitHub as a Gist. The rest of this post will talk about that gist.

So, if you look at it, the logs are XML... ish. There are pieces that try to form XML, but other pieces that are unaware, and write to the log as a plain text file.

I personally have waffled from time to time as to whether or not the artifice is worthwhile, or problematic. I lean towards worthwhile now, but have not always.

The basic pattern for most of a tracefull log is as follows:

  • A section on IlGen, the mapping of bytecode to trees (the Testarossa IL.
  • A dump of the trees, any new elements of the symbol reference table, and the CFG,
  • The optimization results for an opt
  • Another dump of the trees.

The last two points repeat until the optimization strategy is done executing.

Optimizations will number the transformations they make to allow selective disablement.

<optimization id=9 name=coldBlockOutlining method=java/lang/String.hashCode()I>
Performing 9: coldBlockOutlining
[     2] O^O COLD BLOCK OUTLINING: outlined cold block sequence (9-10)
[     3] O^O COLD BLOCK OUTLINING: outlined cold block sequence (5-5)

This comment does an excellent job of explaining it, though, the idea has also been called "optimization fuel" before.

As far as reading the trees, I'll defer to the documentation about the intermediate representation, contained in this directory, and in particular this document, Intro to Trees.

There's a lot more in these logs, but I'm a bit tired, so I'll leave this here. The logs are not dense, but can be invaluable in understanding the decision the compiler has made over a compilation and identifying bugs.