Using rr-dataflow: Why and How?

If you haven't heard of rr, go check it out right away. If you have, let me tell you about rr-dataflow and why you may care about it!

Perhaps you, like me, occasionally need to track something back to its origin. I will be looking at the instructions being executed in an inline cache, and I will think "Well that's wrong... where did this instruction get generated?"

Now, because you can set a watchpoint, and reverse continue, you can see where a value was last written; it's easy enough to do

(rr) watch *(int*)($rip)
(rr) reverse-continue

The problem is that, at least in SpiderMonkey, that's rarely sufficient; the first time you stop, you'll likely be seeing the copy from a staging buffer into the final executable page. So you set a watch point, and reverse continue gain. Oops, now you're in the copying of a the buffer during a resize; this process can happen a few times before you arrive at the actual point you are interested in.

Enter rr-dataflow. As it says on the homepage: "rr-dataflow adds an origin command to gdb that you can use to track where data came from."

rr-dataflow is built on the Capstone library for disassembly. This allows rr-dataflow to determine for a given instruction where the data is flowing to and from.

So, in the case of the example described before, the process starts almost the same:

(rr) source ~/rr-dataflow/flow.py
(rr) watch *(int*)($rip)
(rr) reverse-continue

However, this time, when we realize the watchpoint stops at an intermediate step, we can simply go:

(rr) origin

rr-dataflow then analyzes the instruction that tripped the watchpoint, sets an appropriate watchpoint, and reverse continues for you. The process of getting back to where you need to becomes

(rr) origin
(rr) origin

Tada! That is why you might be interested in rr-dataflow. The homepage also has a more detailed worked example.

A caveat: I've found it to be a little unreliable with 32-bit binaries, as it wasn't developed with them in mind. One day I would love to dig in a little more into how it works, and potentially help it be better there. But in the mean time, thanks so much Jeff Muizelaar for creating such an awesome tool.