Fixing Tradebeans and Tradesoap on 64 bit JVMs

Edit: March 10, 2014: It seems this may not be the panacea I thought it is. Will update if I can figure out more on what's going on, but be aware for now: YMMV.


Imagine you're trying to run the tradebeans or tradesoap benchmarks from the Dacapo Benchmark suite with a 64 bit JVM.

If you, like me, just downloaded the dacapo-9.12-bach.jar you'll likely see a huge stack trace failure like the below:

java -jar dacapo-9.12-bach.jar tradesoap -t 1 
Using scaled threading model. 8 processors detected, 1 threads used to drive the workload, in a possible range of [1,128]
11:08:54,419 ERROR [GBeanInstanceState] Error while starting; GBean is now in the FAILED state: abstractName="org.apache.geronimo.framework/j2ee-system/2.1.4/car?ServiceModule=org.apache.geronimo.framework/j2ee-system/2.1.4/car,j2eeType=AttributeStore,name=AttributeManager"
java.lang.ArrayIndexOutOfBoundsException: Array index out of range: -2
    at com.sun.xml.bind.v2.util.CollisionCheckStack.findDuplicate(CollisionCheckStack.java:112)
    at com.sun.xml.bind.v2.util.CollisionCheckStack.push(CollisionCheckStack.java:53)
    at com.sun.xml.bind.v2.runtime.XMLSerializer.pushObject(XMLSerializer.java:471)
    at com.sun.xml.bind.v2.runtime.XMLSerializer.childAsXsiType(XMLSerializer.java:574)
org.apache.geronimo.gbean.InvalidConfigurationException: Configuration org.apache.geronimo.framework/j2ee-system/2.1.4/car failed to start due to the following reasons:
  The service ServiceModule=org.apache.geronimo.framework/j2ee-system/2.1.4/car,j2eeType=AttributeStore,name=AttributeManager did not start because Array index out of range: -1
  The service ServiceModule=org.apache.geronimo.framework/j2ee-system/2.1.4/car,j2eeType=ConfigurationManager,name=ConfigurationManager did not start because org.apache.geronimo.framework/j2ee-system/2.1.4/car?ServiceModule=org.apache.geronimo.framework/j2ee-system/2.1.4/car,j2eeType=AttributeStore,name=AttributeManager did not start.

    at org.apache.geronimo.kernel.config.ConfigurationUtil.startConfigurationGBeans(ConfigurationUtil.java:485)

This bug report is the hint you need.

Essentially the report points out that jaxb fails on 64 bit systems because it indexes an array using a hash code which can be negative.

Following that report, you can get Tradebeans and Tradesoap working by replacing the jaxb jars inside the dacapo jar.

Because the aforementioned bug report specificaly points out that they resolved the problem with 2.1.13, I also used JAXB 2.1.13.

After you unzip the dacapo jar, you'll need to unzip dat/daytrader.zip as well. Inside of that, the files you'll replace are geronimo-jetty6-minimal-2.1.4/repository/com/sun/xml/bind/jaxb-impl/2.0.5/jaxb-impl-2.0.5.jar, with the obtained jaxb-ri-20100511/lib/jaxb-impl.jar and ./repository/javax/xml/bind/jaxb-api/2.0/jaxb-api-2.0.jar with jaxb-ri-20100511/lib/jaxb-api.jar (keeping the same names)

Seems to work now!

Surviving Discouragement as a graduate student.

This thread at the Academia Stack Exchange is marvellous, and full of sage advice, some of which I already have internalized, some of it not so much.

I'm just going to harvest the page for some quotes:

The fact is research is hard. It appears to consist primarily of staring at a problem for days and days and days without getting anywhere. Sometimes, rarely, I do figure something out and that feels wonderful, but the overwhelming majority of my time appears to be spent banging my head against a mostly figurative wall.

Yes. This. And it wouldn't be so damn tempting if those bricks didn't wiggle just a little bit every time I slammed my forehead into them. Sometimes I think my eyes must be playing tricks on me, what with the repeated cranial trauma and all. But then I remember how good it felt the last time my head actually went through the wall, and so I keep plugging away.

-- JeffE

The metaphor is apt, and the sentiment very true.

But it's the small sublime moments of joy when you realize that you've discovered something that no one else knows that make it fun. And the feeling, as time goes on, that you're immersed in a wonderful lake of , with beautiful new ideas around you as far as you can see.

p.s the advice you were given is very sound. Take breaks, find fulfilling things to do outside of work, and realize that everyone (even seasoned researchers) feel the same frustrations and highs that you do.

-- Suresh

A reminder that even successful people have struggled, and the reason why.

  • Failure is normal—and even to be expected. Just about nothing works exactly as you predicted it would. More importantly, if something doesn't go wrong, then your project has been badly designed, and in fact, I would argue that you're only doing development, not research!
  • Don't be afraid to fail! Failure teaches you lessons that you will never learn from success. I needed a few really abysmal grades in college to get me on the right track—the proverbial kick in the pants that allowed me to realize I couldn't coast through college the way I did through high school.

-- aeismail

This is one of the lessons I need to take more to heart.

Benchmarking is Hard - and Modern Hardware is making it harder

Let's say you have a program, and you'd like to find out if you can make it faster by changing the compiler (or to a library, or setting. I'm a compiler guy, so I'll talk about compilers here on in).

This doesn't seem like too hard a problem does it? It sure is easy to imagine a simple method:

  1. Run the program with the old, baseline version of the compiler.
  2. Run the program with the new, modified version of the compiler.
  3. Compare the times.

This makes sense doesn't it? Time to publish!


Unfortunately for your nascent publication, you start thinking: What happens if I run the programs again?

If the year is 1977, and you're running the program with one thread, on a machine with no preemption, with an in-order processor that has no cache (say, an Apple II), you'll probably get the same time. Perhaps even down to the cycle.


So you run your programs again: Lo and behold the times have changed, both before and after.

Now what can you say about the affect of your change to the compiler? What happened!?

The problem

It's not 1977.

Let's talk about the where variance can come from when you're benchmarking today. This list gets longer every day. Remember, this is all your computer trying to be helpful, and improve overall system performance and user-experience

  • You're likely running your benchmark on a machine with a preemptively multitasking operating system, with virtual memory, file systems, daemons etc.

    The process your benchmark runs in can be preempted in favour of another, or not. Context Switches will incur overhead, and so how often your benchmark process is switched out will have an effect on final runtime.

  • Your machine is chock-a-block filled with caches from top to bottom (L1, L2, L3, (L4), Disk Cache,Page Cache). The contents of these caches, whether they are warm or cold for the workload will change the result.
  • Your benchmark likely with multiple threads, which means synchronization and thread scheduling can come into play, with the order of interleavings affecting the benchmarks outcome.

At this point you're probably already despairing: "How is it possible to say anything about computer performance".

Oh, but wait, the problem gets worse.

  • What if your benchmark is written for one of these new-fangled JIT compiled languages? The longer your program is running, the faster it gets as more pieces are compiled, first into native code, then into better native code.

Now let's make the problem even worse!

  • What if your test machine is a relatively recent Intel machine? Then your performance will depend on how warm your machine is, because of a feature called Turbo Boost, which lets these processors overclock themselves so long as they remain within their electrical and thermal limits.

    This means that by simply putting your machine by a cold window (hello Canadian Winter!) you might get better results!

Modern Benchmarking

So this is obviously concerning. What do we do about this?

A big part of the answer comes from acting like scientists, and using the scientific method.

In this case, we need to use controlled experimentation, which in this case means controlling for as many possible variables as possible. You want your experimentation to be as much about the effect you are trying to measure as possible, and replicable.

  • Try to make the machine you use to test as quiet as possible. Your goal is to eliminate as many sources of variance in the runtime results you'll collect. Disable any GUI if you can, any daemons that aren't required, kick everyone else off the machine, etc.
  • Disable any frequency scaling. As I've discovered, sometimes you have to look deep. For example, in Linux, there's the CPU Frequency scaling system, which is separate from the Intel Turbo Boost Control. Disable both! (Here's a pointer for earlier-than-3.9 series kernels)

Unfortunately, even a quiet machine, you have relatively little control over things like caches and synchronization effects. This means that we need another tool: Statistics.

To show anything, you're going to have to quantify-- through multiple runs-- and account for run-to-run variance. To show improvement, a statistical hypothesis test should be used. If that's too hardcore, at least quantifying the variance will go a long way to showing your result is not a fluke.

I haven't even started to discuss how to combine (ppt) results when you have multiple programs to test.

Conclusion

The status-quo in evaluation is changing-- and this is a great thing; bad science can lead us down dark-alleys and dead ends, which wastes time (and ).

Unfortunately, the difficulty of doing this right means that you're going to see bad benchmarking all the time. We need to try not to be disheartened, and instead, try simply to be better!

Things to look forward to, in the future:

  • Virtual Machine Images as a basis for evaluation. In some ways this forms the backbone of the ultimate form of replicability: Dump the source code, the compiler and build-scripts used to compile, all the libraries, dynamic and static, as well as all testings scripts into a virtual machine, and distribute THAT as artifact of your evaluation.

A reminder that experimentation is hard, and it is not just benchmarking is provided by a terribly interesting paper on behavioural science.

(Update: January 19, 2014: My supervisor reminds me of a couple things I forgot to make clear:

  • Reproducibility requires documentation: Experiments need to have all their conditions documented. Remind readers of the conditions whenever you discuss the results of these experiments.

  • One of the best ways to ensure reproducibility is to have a testing harness which tracks the experimental conditions, and manages the results.

)

More reading

  • Why You Should Care about Quantile Regression: This paper actually inspired this post, along with trying to create a satisfactory experimental method for my thesis.
  • Producing wrong data without doing anything obviously wrong!: An in depth discussion of measurement bias. I'm not entirely sold on their particular solution, but this is never the less a useful discussion point.
  • The Evaluate Collaboratory is working on this problem from the academic end.

    Let's build an "Experimental Evaluation in Software and Systems Canon", a list of readings on experimental evaluation and "good science" that have influenced us and that have the potential to influence the researchers coming after us.

    The canon they have proposed is rich with further reading.

What Caught My Eye - Week 3

This week has been extra busy, so this is a bit thinner than the last two!

Diversity in Tech

  • Female Founders: Paul Graham musing on technology entreprenurialism and women.

    So how would you cause there to be more female programmers? The meta-answer is: not just one thing. People's abilities and interests by the time they're old enough to start a startup are the product of their whole lives—indeed, of their ancestors' lives as well. Even if we limit ourselves to one lifetime we find a long list of factors that could influence the ratio of female programmers to male, from the first day of a girl's life when her parents treat her differently, right up to the point where a woman who has become a programmer leaves the field because it seems unwelcoming. And while the nature of this sort of funnel is that you can increase throughput by attacking bottlenecks at any point, if you want to eliminate the discrepancy between male and female programmers completely, you probably have to go back to the point where it starts to become significant.

    It seems to be well underway by the time kids reach their teens. Which to me suggests the place to focus the most effort initially is in getting more girls interested in programming.

  • 3 States had no girls take the AP CS Exam. Does support a little bit about Paul Graham's point above. More generally, I find this just teribly sad.

  • Just Because You’re Privileged Doesn’t Mean You Suck

    Having access to a computer is just one way I was privileged. There are countless others: I wasn’t raised in poverty, or in a country riddled with disease or corruption. I am a white man and have never faced racism or sexual discrimination.

    I don’t feel regret for who I am, I just recognize that not everyone has it so easy. Privilege is about being mindful of the fact that not all people have equal footing.

    It also an important first step towards correcting injustices, for if you truly believe that everyone has the same opportunities as you, there is no reason to advocate for change.

Technology

  • I sit on the skeptical side of the bitcoin bubble. One of my key problems with it has always been that it seems incredibly wasteful, in a way that never seemed socially justifiable. This article: What is Proof of Stake, and why it matters, points out that there exist alternatives to the current 'proof-of-work' regieme that exists in the crypto-currency world, with the possibility for societally beneficial currencies.

    This is a fascinating notion to me.

  • Embedded Security CTF: Experiment with working around security software in a safe environment:

    The Lockitall devices work by accepting Bluetooth connections from the Lockitall LockIT Pro app. We've done the hard work for you: we spent $15,000 on a development kit that includes remote controlled locks for you to practice on, and reverse engineered enough of it to build a primitive debugger.

    Using the debugger, you'll be able to single step the lock code, set breakpoints, and examine memory on your own test instance of the lock. You'll use the debugger to find an input that unlocks the test lock, and then replay it to a real lock.

    I got through the first non-tutorial lock. The second one, I'm still working on... Alas, my first idea of a buffer overflow got beaten by a locked page. Real vulnerability researchers are on level 18 or more by now, despite only having been released last night. They're good! The whole thing was put together by Matasano Security and Square, the commerce company.

On Geekdom

This piece by John Siracusa on 'The Road to Geekdom' is so good. Read the whole thing, but this quote tickled me sufficiently I am posting this from a bus:

You don’t have to be a geek about everything in your life—or anything, for that matter. But if geekdom is your goal, don’t let anyone tell you it’s unattainable. You don’t have to be there “from the beginning” (whatever that means). You don’t have to start when you’re a kid. You don’t need to be a member of a particular social class, race, sex, or gender.

Geekdom is not a club; it’s a destination, open to anyone who wants to put in the time and effort to travel there. And if someone lacks the opportunity to get there, we geeks should help in any way we can. Take a new friend to a meetup or convention. Donate your old games, movies, comics, and toys. Be welcoming. Sharing your enthusiasm is part of being a geek.

What Caught My Eye - Week 2

Diversity and Culture

  • A More Peaceful 2014: How people working on real issues can end up silencing each other while ostensibly working for the same causes.

    I find this post particularly interesting as someone starting a new blog in 2014. I will completely admit to a terror about posting on divisive issues rooted in a fear that I will be misjudged: labelled ignorant, harmful, or abusive when aiming higher.

  • On Technical entitlement: How early exposure to technology and skills changes our attitudes to them, in a way that discourages late starters. I especially enjoyed the following quote:

    For one thing, precocity is rewarded in tech. We all swoon over the guy who started programming robots when he was 6. Growing up in tech, I took this as a constant in life—if you’re doing cool things, the younger the better. But it’s become obvious that this is more unique. One of my friends working in finance put it this way: “If I told people I started shorting stocks when I was nine—not that I was, by the way—people wouldn’t be impressed. They’d only say, ‘Who was stupid enough to give you their money?’”

    Follow up with this post from Philip Guo on how privilege greases the wheels. He closes the piece with the following, which captures a lot of my thoughts on the topic too:

    I hope to live in a future where people who already have the interest to pursue CS or programming don't self-select themselves out of the field. I want those people to experience what I was privileged enough to have gotten in college and beyond – unimpeded opportunities to develop expertise in something that they find beautiful, practical, and fulfilling.

    The bigger goal on this front is to spur interest in young people from underrepresented demographics who might never otherwise think to pursue CS or STEM studies in general.

  • The Next Civil Rights Issue: Why Women Aren’t Welcome on the Internet (disturbing): Last week I mentioned how software can have political agendas, without meaning to. The kind of problems described in this article are another kind of politics, which I'd argue partially stem from the narrow perspective rampant in the creation of internet technologies.

    But no matter how hard we attempt to ignore it, this type of gendered harassment—and the sheer volume of it—has severe implications for women’s status on the Internet. Threats of rape, death, and stalking can overpower our emotional bandwidth, take up our time, and cost us money through legal fees, online protection services, and missed wages. [...] And as the Internet becomes increasingly central to the human experience, the ability of women to live and work freely online will be shaped, and too often limited, by the technology companies that host these threats, the constellation of local and federal law enforcement officers who investigate them, and the popular commentators who dismiss them—all arenas that remain dominated by men, many of whom have little personal understanding of what women face online every day.

Art

Technology

  • Links 2013: Brett Victor's collection of papers and projects he fell in love with this year.

    My friends know that Brett Victor simultaneously fascinates and irks me. He's clearly thinking on a higher level than I am, and I can't shake the feeling that what grates at me is simply his genius; He's so much smarter than me that it burns.

    His historical awareness is also terribly painful. Watch his video on The Future of Programming, and you feel like we went down the wrong trouser leg.

New Years

Warning: Productivity Wankery Below


Resolutions are a pretty bad way to change things to be sure.

For me however, the start of the year has provided me with an ideal opportunity to sit down and re-group: Start re-thinking how I'm tackling the challenges ahead; Make some fresh starts and modest changes.

Here's a couple that are already paying off in my mind.

Email Lists

I'm starting to actively police my email subscriptions. Life on the internet means that without trying, you'll end up on 20 marketing lists and 10 newsletters every year.

Years back, I decided that rather than police them, judicious use of filters in Gmail would get them out of sight, out of mind.

It turns out, that this doesn't work terribly well for me: I feel the need to process the email eventually anyhow. As a result, I'm going through and hitting unsubscribe on every list that I would previously filter-away.

Organization

I'm tackling a huge project right now: My Master's thesis. If I'm brutally honest it got away from me between October and December of 2013. I have excuses, but essentially, I broke every organizational habit that had worked for me, and came into January looking at a mess of notes, sundry todo-lists, and a draft thesis peppered with notes and outlines.

To get back on track, I spent a large chunk of time on Monday regrouping. I've had a lot of success with Trello before, so I'm getting back on that wagon.

Tooling

'Right Tool for the Right Job' — We all know it, but we all fail sometimes. The worst is when you become painfully aware you've been using the wrong tool at a point where switching is infeasible— in the middle of a project let's say.

I experienced this on my last project,where I discovered that my data analysis tools— custom built Python and Numpy scripts— were very painful to use and too slow to adapt to new questions.

To rectify this for my thesis, I'm putting aside my distaste for R's craziness (There's a guide to R which has as its abstract "If you are using R and you think you’re in hell, this is a map for you.") in order to leverage its power for data analysis, and dumping data into SQLite for management.

One benefit to this is that I get to use ggplot2, which is phenomenal.