What Caught My Eye - Week 5

Arts

  • A fantastically interesting hypothesis on the development of consciousness: The origin of consciousness in the breakdown of the bicameral mind. A taste:

    Julian Jaynes proposes a radical answer to these questions: until a few thousand years ago human beings did not ‘view themselves’. They did not have the ability: they had no introspection and no concept of ‘self’ that they could reflect upon. In other words: they had no subjective consciousness. Jaynes calls their mental world the bicameral mind. It is a mind with two chambers, the mind that is divided in a god part and a human part. The human part heard voices and experienced these as coming from gods. These gods were no judging, moral or transcendent gods, but were more like each person's personal problem solvers. They were hallucinated voices that provided the answers when a person entered a stressful situation which couldn't be solved by routine.

  • On a different note: Stormtroopers twerking. Could it get any better?

  • I've also been enjoying Big Boi's Vicious Lies and Dangerous Rumors. Goes a little sideways in places, but by and large, a pretty good album.

Tech

  • Eugene Wallingford on one reason we need computer programs, to bridge the gap between theory and data. Programming a solution forces a codification of how we handle edge cases, missing values, etc.

A quote

From a letter my grandfather wrote to me, after I questioned my reach a little bit:

Your reach was, and is greater than your grasp. That must always be your motto. If you reach for the stars, you will see and touch the moon.

It's a lovely sentiment, you see everwhere, but one I love-- even more so since it comes from my grandfather.

I also made this:

Reach.jpeg

What Caught My Eye - Week 4

Bit of a single topic this week. Still, some interesting things.

Technology

  • Caught Like Insects

    [...] for many, it was liberating to find that, on the web, you could explore your true nature and find fellow travelers without shame.

    But as paranoia grows about the NSA reading our emails and Google tapping into our home thermostats, it’s increasingly clear that — rather than providing an identity-free playground — the web can just as easily capture and preserve aspects of our identities we would have preferred to keep hidden. What started as a metaphor to describe the complexly interconnected network has come to suggest a spider’s sticky trap.

  • Code is not Literature: An interesting disucssion of the notion of 'reading code', and how 'reading' is really the wrong way to look at it.

    It was sometime after that presentation that I finally realized the obvious: code is not literature. We don’t read code, we decode it. We examine it. A piece of code is not literature; it is a specimen.

    I've always been very curious about the idea of sitting down and reading code, since it's a common exhortation to those trying to improve their skills. However, my brief forays have always been fruitless, largely because I did try to understand the challenge as reading, not examination.

    My most successful and rewarding instances of 'reading code' have been delving deep into systems I don't understand with a debugger and a notebook. Debugging problems in code I've never seen before is often a part of my day-to-day, and lately, part of my leisure time too!

  • On the Matter of Why Bitcoin Matters: An interesting and measured take on bitcoin from Glenn Fleishman.

    Bitcoin shows a path for massively more secure, reliable, and sensible ways to store value and move it around. As a currency, I have little faith that it will become a replacement for dollars, euros, or renminbi. As a model for a future payment and transaction system, I believe it’s already shown its value.

  • The CS Mindset: A discussion on why we teach CS, and what we hope students will learn from their CS courses.

    With this skill comes something else, something even more important: a discipline of thinking and a clarity of thought that are hard to attain when you learn "how to think more methodically and how to solve problems more effectively" in the abstract or while doing almost any other activity.

Fixing Tradebeans and Tradesoap on 64 bit JVMs

Edit: March 10, 2014: It seems this may not be the panacea I thought it is. Will update if I can figure out more on what's going on, but be aware for now: YMMV.


Imagine you're trying to run the tradebeans or tradesoap benchmarks from the Dacapo Benchmark suite with a 64 bit JVM.

If you, like me, just downloaded the dacapo-9.12-bach.jar you'll likely see a huge stack trace failure like the below:

java -jar dacapo-9.12-bach.jar tradesoap -t 1 
Using scaled threading model. 8 processors detected, 1 threads used to drive the workload, in a possible range of [1,128]
11:08:54,419 ERROR [GBeanInstanceState] Error while starting; GBean is now in the FAILED state: abstractName="org.apache.geronimo.framework/j2ee-system/2.1.4/car?ServiceModule=org.apache.geronimo.framework/j2ee-system/2.1.4/car,j2eeType=AttributeStore,name=AttributeManager"
java.lang.ArrayIndexOutOfBoundsException: Array index out of range: -2
    at com.sun.xml.bind.v2.util.CollisionCheckStack.findDuplicate(CollisionCheckStack.java:112)
    at com.sun.xml.bind.v2.util.CollisionCheckStack.push(CollisionCheckStack.java:53)
    at com.sun.xml.bind.v2.runtime.XMLSerializer.pushObject(XMLSerializer.java:471)
    at com.sun.xml.bind.v2.runtime.XMLSerializer.childAsXsiType(XMLSerializer.java:574)
org.apache.geronimo.gbean.InvalidConfigurationException: Configuration org.apache.geronimo.framework/j2ee-system/2.1.4/car failed to start due to the following reasons:
  The service ServiceModule=org.apache.geronimo.framework/j2ee-system/2.1.4/car,j2eeType=AttributeStore,name=AttributeManager did not start because Array index out of range: -1
  The service ServiceModule=org.apache.geronimo.framework/j2ee-system/2.1.4/car,j2eeType=ConfigurationManager,name=ConfigurationManager did not start because org.apache.geronimo.framework/j2ee-system/2.1.4/car?ServiceModule=org.apache.geronimo.framework/j2ee-system/2.1.4/car,j2eeType=AttributeStore,name=AttributeManager did not start.

    at org.apache.geronimo.kernel.config.ConfigurationUtil.startConfigurationGBeans(ConfigurationUtil.java:485)

This bug report is the hint you need.

Essentially the report points out that jaxb fails on 64 bit systems because it indexes an array using a hash code which can be negative.

Following that report, you can get Tradebeans and Tradesoap working by replacing the jaxb jars inside the dacapo jar.

Because the aforementioned bug report specificaly points out that they resolved the problem with 2.1.13, I also used JAXB 2.1.13.

After you unzip the dacapo jar, you'll need to unzip dat/daytrader.zip as well. Inside of that, the files you'll replace are geronimo-jetty6-minimal-2.1.4/repository/com/sun/xml/bind/jaxb-impl/2.0.5/jaxb-impl-2.0.5.jar, with the obtained jaxb-ri-20100511/lib/jaxb-impl.jar and ./repository/javax/xml/bind/jaxb-api/2.0/jaxb-api-2.0.jar with jaxb-ri-20100511/lib/jaxb-api.jar (keeping the same names)

Seems to work now!

Surviving Discouragement as a graduate student.

This thread at the Academia Stack Exchange is marvellous, and full of sage advice, some of which I already have internalized, some of it not so much.

I'm just going to harvest the page for some quotes:

The fact is research is hard. It appears to consist primarily of staring at a problem for days and days and days without getting anywhere. Sometimes, rarely, I do figure something out and that feels wonderful, but the overwhelming majority of my time appears to be spent banging my head against a mostly figurative wall.

Yes. This. And it wouldn't be so damn tempting if those bricks didn't wiggle just a little bit every time I slammed my forehead into them. Sometimes I think my eyes must be playing tricks on me, what with the repeated cranial trauma and all. But then I remember how good it felt the last time my head actually went through the wall, and so I keep plugging away.

-- JeffE

The metaphor is apt, and the sentiment very true.

But it's the small sublime moments of joy when you realize that you've discovered something that no one else knows that make it fun. And the feeling, as time goes on, that you're immersed in a wonderful lake of , with beautiful new ideas around you as far as you can see.

p.s the advice you were given is very sound. Take breaks, find fulfilling things to do outside of work, and realize that everyone (even seasoned researchers) feel the same frustrations and highs that you do.

-- Suresh

A reminder that even successful people have struggled, and the reason why.

  • Failure is normal—and even to be expected. Just about nothing works exactly as you predicted it would. More importantly, if something doesn't go wrong, then your project has been badly designed, and in fact, I would argue that you're only doing development, not research!
  • Don't be afraid to fail! Failure teaches you lessons that you will never learn from success. I needed a few really abysmal grades in college to get me on the right track—the proverbial kick in the pants that allowed me to realize I couldn't coast through college the way I did through high school.

-- aeismail

This is one of the lessons I need to take more to heart.

Benchmarking is Hard - and Modern Hardware is making it harder

Let's say you have a program, and you'd like to find out if you can make it faster by changing the compiler (or to a library, or setting. I'm a compiler guy, so I'll talk about compilers here on in).

This doesn't seem like too hard a problem does it? It sure is easy to imagine a simple method:

  1. Run the program with the old, baseline version of the compiler.
  2. Run the program with the new, modified version of the compiler.
  3. Compare the times.

This makes sense doesn't it? Time to publish!


Unfortunately for your nascent publication, you start thinking: What happens if I run the programs again?

If the year is 1977, and you're running the program with one thread, on a machine with no preemption, with an in-order processor that has no cache (say, an Apple II), you'll probably get the same time. Perhaps even down to the cycle.


So you run your programs again: Lo and behold the times have changed, both before and after.

Now what can you say about the affect of your change to the compiler? What happened!?

The problem

It's not 1977.

Let's talk about the where variance can come from when you're benchmarking today. This list gets longer every day. Remember, this is all your computer trying to be helpful, and improve overall system performance and user-experience

  • You're likely running your benchmark on a machine with a preemptively multitasking operating system, with virtual memory, file systems, daemons etc.

    The process your benchmark runs in can be preempted in favour of another, or not. Context Switches will incur overhead, and so how often your benchmark process is switched out will have an effect on final runtime.

  • Your machine is chock-a-block filled with caches from top to bottom (L1, L2, L3, (L4), Disk Cache,Page Cache). The contents of these caches, whether they are warm or cold for the workload will change the result.
  • Your benchmark likely with multiple threads, which means synchronization and thread scheduling can come into play, with the order of interleavings affecting the benchmarks outcome.

At this point you're probably already despairing: "How is it possible to say anything about computer performance".

Oh, but wait, the problem gets worse.

  • What if your benchmark is written for one of these new-fangled JIT compiled languages? The longer your program is running, the faster it gets as more pieces are compiled, first into native code, then into better native code.

Now let's make the problem even worse!

  • What if your test machine is a relatively recent Intel machine? Then your performance will depend on how warm your machine is, because of a feature called Turbo Boost, which lets these processors overclock themselves so long as they remain within their electrical and thermal limits.

    This means that by simply putting your machine by a cold window (hello Canadian Winter!) you might get better results!

Modern Benchmarking

So this is obviously concerning. What do we do about this?

A big part of the answer comes from acting like scientists, and using the scientific method.

In this case, we need to use controlled experimentation, which in this case means controlling for as many possible variables as possible. You want your experimentation to be as much about the effect you are trying to measure as possible, and replicable.

  • Try to make the machine you use to test as quiet as possible. Your goal is to eliminate as many sources of variance in the runtime results you'll collect. Disable any GUI if you can, any daemons that aren't required, kick everyone else off the machine, etc.
  • Disable any frequency scaling. As I've discovered, sometimes you have to look deep. For example, in Linux, there's the CPU Frequency scaling system, which is separate from the Intel Turbo Boost Control. Disable both! (Here's a pointer for earlier-than-3.9 series kernels)

Unfortunately, even a quiet machine, you have relatively little control over things like caches and synchronization effects. This means that we need another tool: Statistics.

To show anything, you're going to have to quantify-- through multiple runs-- and account for run-to-run variance. To show improvement, a statistical hypothesis test should be used. If that's too hardcore, at least quantifying the variance will go a long way to showing your result is not a fluke.

I haven't even started to discuss how to combine (ppt) results when you have multiple programs to test.

Conclusion

The status-quo in evaluation is changing-- and this is a great thing; bad science can lead us down dark-alleys and dead ends, which wastes time (and ).

Unfortunately, the difficulty of doing this right means that you're going to see bad benchmarking all the time. We need to try not to be disheartened, and instead, try simply to be better!

Things to look forward to, in the future:

  • Virtual Machine Images as a basis for evaluation. In some ways this forms the backbone of the ultimate form of replicability: Dump the source code, the compiler and build-scripts used to compile, all the libraries, dynamic and static, as well as all testings scripts into a virtual machine, and distribute THAT as artifact of your evaluation.

A reminder that experimentation is hard, and it is not just benchmarking is provided by a terribly interesting paper on behavioural science.

(Update: January 19, 2014: My supervisor reminds me of a couple things I forgot to make clear:

  • Reproducibility requires documentation: Experiments need to have all their conditions documented. Remind readers of the conditions whenever you discuss the results of these experiments.

  • One of the best ways to ensure reproducibility is to have a testing harness which tracks the experimental conditions, and manages the results.

)

More reading

  • Why You Should Care about Quantile Regression: This paper actually inspired this post, along with trying to create a satisfactory experimental method for my thesis.
  • Producing wrong data without doing anything obviously wrong!: An in depth discussion of measurement bias. I'm not entirely sold on their particular solution, but this is never the less a useful discussion point.
  • The Evaluate Collaboratory is working on this problem from the academic end.

    Let's build an "Experimental Evaluation in Software and Systems Canon", a list of readings on experimental evaluation and "good science" that have influenced us and that have the potential to influence the researchers coming after us.

    The canon they have proposed is rich with further reading.