How do we read code?

I recently got to participate in a psychological experiment for programmers. A friend of mine, Mike Hansen, is doing research on how people comprehend programs. The goal is to figure out some way of measuring what features in programming systems help programmers understand what they are doing, and how this can be used to make systems that lead to higher quality software. Mike is currently running an experiment where he shows people several short Python programs and asks them to tell the output of the program. The test subject is sitting in front of an eye tracker, so afterwards Mike can see where you were looking at various times during the experiment.

I was one of the test subjects, and Mike was kind enough to let me have a video of the eye tracker data superimposed on my screen. I've shared a small section for you to watch.

One of the things that stood out to me in watching the video was how much my mind seems to work like a computer. First I read over the whole program, and then I start interpreting it. The program in question consists of two calls to a function called between, followed by a call to common. For the first call to between, I spend a lot of time moving my eyes between the call site and the function definition. For the second call, however, I only glance up at the function definition once.

In programming language terms, I seem to be doing some kind of just-in-time compilation. The first time through, I read and interpret every instruction. Afterwards, it seems like I remember what this function does and am able to determine its output much quicker. Interpreting the first call takes about 24 seconds, while I blow through the second one in about 10 seconds.

Another observation is that naming things accurately seems to help. I was able to work through the call to common very quickly. While reading this program, I remember thinking "this should return the elements that are in both arrays." I read over the program to verify that it does what its name suggests, and then I can do the equivalent operation in my head rather than by interpreting the code.

I'm excited to see what else Mike's research uncovers. One aspect he's interested in is how the approach of inexperienced programmers differs from that of experienced programmers. For example, there seems to be some evidence that following variable naming conventions helps experienced programmers understand the code much quicker, while breaking these conventions leads to a severe penalty. On the other hand, inexperienced programmers seem to take about as long regardless of how the variables are named.

If you happen to live in Bloomington, consider volunteering for Mike's experiment. It's a lot of fun, and you get $10 for participating. He'll be collecting data all through next semester, and the more people he gets, the better. If you want to participate, send him and e-mail at mihansen@indiana.edu and he can schedule a time for you.