Entropic Thoughts

Technical Writing: Learning from Kernighan

Technical Writing: Learning from Kernighan

I have in front of me two books that immediately struck me with their similarity: The c Programming Language11 The c Programming Language; Kernighan & Ritchie; Prentice Hall; 1988. and The awk Programming Language22 The awk Programming Language; Aho, Kernighan & Weinberger; Addison-Wesley; 1988. Available online.. The former is widely hailed as a model for technical writing, and the latter is just as good.

So what makes these books special? Why are they good? Let’s take a look!

Content

First of all, both books are very short33 The second edition of k&r is around 260 pages, and that includes a full reference for the standard libraries. The awk book is just shy of 200 pages.. A well-written short text is easier to read, but much harder to write. Blaise Pascal summarised it by saying that “I would have written you a shorter email, but I did not have the time.”

I’ll keep using the awk book as a running example, because it’s covering a smaller subject, so it’s easier to pry apart into its constituent pieces. The book is divided into five essential parts44 Note that I have written this list as generally as possible, and not specifically in terms of awk. It applies just as well to nearly anything technical.:

  1. Introductory tutorial,
  2. Complete description,
  3. Intermediate techniques,
  4. Advanced examples in selected domains, and
  5. Reference manual.

These parts may span one or more chapters, depending on the size of the subject of the text. They may not even be consecutive. In the awk book, part 3 maps to chapters 3 and 7; part 4 maps to chapters 4 through 6.

Code Examples

Before we look at the important ideas that underpin each of the five parts, we must briefly consider one of the most important ideas that arch across the entire book: realistic-looking examples. With a realistic-looking example, the learner can leverage their intuition to aid their understanding of the technical. A reader who almost understands a concept, can have the last piece of the puzzle provided by their intuition; this is one of the primary reasons we use examples in the first place.

On the opposite side, the abstract $foo = "bar" style examples so frequent on the web provides nothing other than a hint about syntax.

The examples are also self-contained, meaning they are generally complete programs executable on their own. Despite that, they are kept relatively small. Sometimes this means the authors have glossed over details regarding e.g. encoding, or error handling, or other such things that are necessary in a real-world program.

Introductory Tutorial

The tutorial throws the user straight into the experience of using awk. There is no theoretical discussion first; there are no installation instructions; the first thing you see is an input example and a short program doing something with that input. They even get to the expected output on the very first page. I don’t think that’s an accident.55 Short note on target audience: in k&r, they wait until the second page with showing this stuff, because that book has a different target audience; the readers of that book may need some soothing words before they enter the shocking world of computer programming. The awk readers are likely familiar with programming already.

This may not sound very special. Lots of things market themselves with a complete example up-front. However, when it comes to books about technologies, that’s more rare. You often see a whole chapter in the beginning devoted to explaining the philosophy behind the technology, rather than letting the user get a feel for the philosophy themselves, by doing. It also seems common these days to have users of programming languages start out by typing expression like 4+2 in a repl. I believe this is less satisfying and further removed from real programming than creating complete programs – even if they have to be very simple to start out!

Following the first example is a high-level explanation of how the program produces the expected output, just in case you didn’t guess that already. The authors then give you just enough information on general principles so that you can start experimenting with modifying that example. This is also where they write half a page on errors – because if you don’t create errors when you modify the example code, your tinkering is not bold enough. Testing the limits of the system – even with your minimal knowledge – should be part of the process.

The tutorial goes on to introduce slightly more complicated examples based on the same input data. Only one self-contained example at a time – and still very basic stuff, but enough to get you going. In the end, the tutorial has touched briefly on every aspect of awk, but not gone into detail with anything.

At this point, there has been nearly no discussion on syntax. The reader is expected to just accept the syntax and internalise it by copying and tweaking examples.

Complete Description

The second part, the complete description of awk, is a more systematic reference to the entire language, with both high-level descriptions and examples for most things. This is where the syntax is given in detail, as well as listings of special variables, arithmetic operations, built-in functions, control structures, evaluation rules and so on.66 This is also where regular expressions are introduced. In the tutorial, text matching was limited to plain strings. The reader was encouraged to read ahead to learn about regexes, which “can be used to specify much more elaborate patterns”.

Although this is a systematic description of the full language, it still follows a logical progression: It starts with the core fundamentals of the language and ends with advanced, slightly niche features. In the middle is stuff that becomes useful when writing larger programs.

Once the reader is done with this chapter, they know everything there is to know about awk. Everything else in the book is literally just various strategies to combine what the reader knows into useful programs.

Intermediate Techniques

The third part contains a bunch of realistic-looking examples, exploring ways to use the concepts taught in the previous two parts. The examples are usually chosen to highlight the obvious things most users will want to try.

Most examples are self-contained and full programs, which helps the reader get a sense for the entire task of writing an awk program. Essentially, it gives some ideas on how the authors intend the users to write code in awk.

Advanced Examples (in Selected Domains)

As a followup to the previous part, the authors also devote a few chapters to cover some more complicated, domain-specific examples. These are more complicated and the authors have given each example its own chapter. These chapters are split into sections roughly such that at the end of each section comes a point where you could leave off and have a functioning program.

I call these examples domain-specific, because they are. But that’s also slightly unfair, because they have selected very general domains: language processing, compilation, and relational data management. These are things that tend to be embedded into pretty much any larger system.

Reference Manual

Last, but not least, is an appendix which is nothing but a compact listing serving as a language definition. This part continues to be highly useful for a long time after the reader has learned awk.

Presentation and Typography

The presentation of the book is minimalistic in multiple senses. The vast majority of the book is constructed from only five types of text: regular paragraphs of prose, verbatim text77 I.e. input/output/code snippets., chapter headings, section headings, and subsection headings. Besides these, one can also find the rare table, simple line-based illustration88 For things such as trees, automata, memory layouts and such., and plots illustrating performance.

That’s it, basically. That’s very different to many modern books which have various informational boxes in more or less garish colours, and employ several different styles of text to convey the thoughts of the authors.

In particular, I want to highlight that there are only three levels of headings in total, including chapter headings. In order to pull that off, one really needs to be concise and think about what one wants to say and how to say it.99 On this website, I have imposed on myself a rule to only use two levels of headings in each article (meaning a total of three levels when counting the article title), and I often find myself wishing for an additional level because it’d be the lazy way out.

In terms of typography, the book keeps up the minimalistic style. The bulk of the text is set in a basic proportional serif typeface. Input, output, and code is set in a monospaced serif. Headings in a heavy sans-serif. Everything is black on white.

The margins are fairly small, and the paragraphs are justified to them. Curiously, the lines are noticeably longer than common typographic sense would have you believe. Based on my random sample, it’s not uncommon for a line to be just over 80 characters long. I’m not sure whether this is to make the book physically lighter, or if the authors aimed for a higher information density and the smaller book is just a side effect.

Conclusions

At least in my opinion, the most important lessons here are:

  • Be concise.
  • Use realistic examples.
  • Lead with a practical tutorial using only high-level explanations.
  • Don’t focus on details; let the user learn by doing.
  • Eliminate distracting elements from your presentation.

Let’s see how well I can make use of these principles myself!