I've been a professional software developer for 21 years, and an amateur programmer for a decade or so before that. For much of that time I've been fascinated by the problem of software quality — mostly why there isn't any to speak of. Software, in general, is terrible. And one of the reasons why it's terrible is that software, by which I mean the source code that software is created from, is very hard to read.
Software is written in “code”, in the argot of any of a vast array of programming languages. We usually think of code as something for the machine to read. But nearly all software has to be maintained — by programmers — and that means software code has to be read by humans, too. Programming languages are not very readable, and reading code requires things that ordinary prose doesn't usually demand, like exacting attention to detail, moving across different levels of abstraction, and jumping around from place to place rather than proceeding linearly.
I've also been a scholar for ... well, somewhere between 17 and 37 years, depending on how you want to define it. Much of my scholarship has been in the area of the written word, in one form or another. So you can see why it might occur to me to put my scholarly interest in reading and writing together with my professional interest in software quality and code readability.
The Argument: “Writing Code”
In March 2009 I gave a presentation titled Writing Code to the Association of Teachers of Technical Writing. “Writing Code” looked at the problem of code readability and some of the historical attempts to deal with it. I explained the economic argument for improving code readability, and finally I argued that technical writers, and particularly teachers of technical writing, ought to reach out to the programming community to explore better ways of reading and writing code.
It's a damned good argument. So good, in fact, that I convinced myself.
Welcome to The Code Show
“Writing Code” didn't propose any novel ways of writing code so it would be more readable. In it I did suggest that we revisit one particularly interesting suggestion for improving code readability, Donald Knuth's Literate Programming, which treats software source code as literature; and I still believe that's an area we need to explore and develop. But Literate Programming still treats source code as static text (and some static graphics). Also, it's a way to generate software for better later readability. It doesn't help with reading software we already have.
The Code Show is my first attempt to look at several other areas for improving code readability:
- integrating additional media (particularly synchronous media, media with an inherent timeline, such as video and audio) into a presentation of the code itself
- adding interactive features to the code presentation, so the reader doesn't simply receive information, but engages with it
- enhancing existing source code with these features, with relatively little manual labor, rather than having to build it all in from the start
- adding some obstacles to the code-reading process
That last one may sound a little backward. Isn't this whole project about making code easier to read?
The key here is that it's all about having the right sort of obstacles. This is an idea I've taken from the theory of Ergodic literature, an idea that came out of video game studies. Ergodic literature is literature you have to work at to read — like RPG-style video games, where you have to defeat challenges in the game to see how the story progresses. Ergodic features don't make these narratives less attractive to their audience. People play them compulsively, partly due to the psychological effect of random reinforcement, and partly to the emotional reward from solving an intellectual challenge. In fact, people write texts (walkthroughs, cheat sheets, FAQs) about these games and publish them online — for free. The effort of problem-solving makes the information so compelling that they want to share it with others.
So one of the things I'm doing in The Code Show is trying not to explain things too much. The reader should wander around and experiment with things. See what they do. I think that makes it more interesting, and it rewards the reader, and it provides other memory cues to help retain the information.
A Historical Interlude
What did I draw on in thinking about The Code Show?
I've already talked about my Writing Code piece, which inspired the project itself. For my actual subject, I took the code for a little application I wrote in 2008 called the Ethos Estimator, which I presented at Computers & Writing in May 2008.
My first move was to convert some source code into a presentable form that I could then enhance with other media. For that I used the open-source software-documentation tool Doxygen, with a number of customizations I created.
When I converted the Ethos Estimator presentation to something I could post to my website, I created a screencast — an interactive animation of my computer screen — of the demo I had done during my presentation. This was my first screencast, but I'd seen a few others, such as the Zotero demo, and I thought it was a very nice genre for conveying that sort of information. So I decided to include screencasts in the project. I generated and edited those, and added audio and other features, with the free Wink application.
I'm not the first to try to explain software source code using screencasts. I found one for the MTV API, for example. But clearly there's a lot of room for development here.
I was also captivated by the thought of using filmed, edited video (and not just screen animation) and more extensive audio. I was aiming for something like a documentary of the design and function of a body of source code. One great example of this sort of thing is this short but sophisticated video exploring the source code of a piece of malware.
I would really like to find some highly interactive explorations of source code, but I haven't yet. I put some simple interactive features into The Code Show, but this is definitely an area where much more can be done. I'd like to try making an entire video game about the design of a piece of software. It's hard to see what that might even mean, at this point (who are the characters? what is the plot?), but it's something to aim for. Instructional games are a big topic of discussion in the field, as for example this issue of Communications of the ACM shows, and some very exiting work is being done here.
The Making of The Code Show
I put all of this stuff together using an iterative process: create some content manually, tweak it, repeat. Some parts, such as generating and modifying the HTML pages showing the actual source, I automated, so I wasn't manually modifying the output each time I generated it.
Other parts, such as screencasts and video, were recorded and edited manually, then copied into the output to be picked up by automatically-generated links.
The final toolchain looked something like this:
- Run Doxygen (with customized configuration, stylesheet, header, and footer) to create the HTML source-code pages
- Copy the Doxygen support files to the output area
- Use a quick & dirty awk script to add the interactive "folding" feature to the HTML files, and to turn "insertion points" into links to Shockwave files with video and screencasts
- Copy in the screencasts, previously generated manually with Wink
- Copy in the static HTML content (such as this file)
- Copy in the live videos, previously generated manually with Windows Movie Maker
All of this content was backed up and change-tracked using CVSNT running on my own server.
The process is highly technical, not at all user-friendly, and pretty rough. But it is reproducible and safe — it's hard to lose content or break the output without being able to fix it easily.
What The Code Show Is, and Isn't
We'll take the short one first. Here's what The Code Show isn't: it isn't a great new way to read software source code. I've taken a small, relatively simple program, written in a fairly expressive and readable language, and turned it into a multimedia monstrosity. It might be fun, but it's not efficient; few practitioners would spend this much time studying something this small.
It's also wasn't efficient to produce. I assembled and refined a toolchain for processing the source code. I adapted some existing software to generate the initial HTML, and wrote other programs to modify it. I created HTML text and style sheets. I recorded and edited screencasts, audio, and video. I scoured the web for graphics I could use. I assembled the whole thing out of all those bits. And the result isn't exactly lightweight in terms of storage space and processing power, either.
And it's not nearly complete. There really isn't very much here, in fact. I would have liked to spend another week or so generating a lot more content. There should be a lot more of existing features (such as screencasts), and I have a bunch of ideas for new ones, some useful and some just fun (like a director's commentary).
But The Code Show is a success, I think, because of course it was never about creating something actually useful in itself. It was an opportunity to experiment and play with genres in a number of media to see what they might contribute to the problem of code readability. I'm quite pleased with the result, hokey and awkward and inefficient as it is. At the very least, it suggests that someday we might be using these media to attack the problem of software quality.
Michael Wojcik, WRA 417, Michigan State University, September 2009