The Reinforcing Nature of Toil
As usual when I discuss systems theory (e.g. information flow or material flow), this article pretends to be about one thing, but is really about a much more general concept. Let’s talk about reinforcing feedback loops!
What Is Toil?
I tell non-technical people that the site reliability engineer job is about creating automation to do what a system administrator would otherwise do. The automation does work that, in the words of the sre book1 Site Reliability Engineering: How Google Runs Production Systems; Murphy, Beyer, Jones, & Petoff; O’Reilly Media; 2016. Available online.,
tends to be […] repetitive, automatable, tactical, devoid of enduring value, and that scales linearly as a service grows.
This type of work, when performed manually, is known as toil. What’s insidious about toil is that it breeds itself. Here’s the basic idea graphically.
Active development can improve the quality of software, whereas the passage of time generally decreases its quality2 Lehman’s laws of software evolution, number 1, 2, and 7.. Hopefully, these two effects are in rough balance, keeping the quality level constant.3 I have been in many organisations where they are not. In those cases, development is usually insufficient to stave off the inevitable quality decline. But this model is missing a critical component, namely the effect of quality on toil.
Increased quality decreases the need for toil, and increased toil decreses the time available for development.4 These feedback diagrams can be a little tricky to read. The sign on the arrow indicates whether two quantities move together or in opposite directions. Development and quality have a positive arrow, because when development increases, so does quality. Toil and development, on the other hand, have a negative arrow because when toil increases, development moves in the opposite direction and decreases.
Note what happens when we put these effects together. Follow carefully now, since the arrows flip signs a few times as we go along:
- The passage of time (evolution) decreases quality.
- Decreased quality increases toil.
- Increased toil decreases development.
- Decreased development decreases quality.
- Decreased quality increases toil.
- Increased toil decreases development.
- …
This is a reinforcing feedback loop!
Reinforcing Feedback Loops
This might sound like a catastrophic scenario: things get worse, and then they get even worse, and eventually they will be infinitely worse. This, at least, seem to be people’s intuition about reinforcing feedback loops. And to be sure, the resting state of this feedback loop is that things get progressively worse.
But! We’re not bound to the resting state of the system! We can make interventions. Interventions are particularly powerful around reinforcing feedback loops, because reinforcing feedback loops reinforce both ways.
Here’s an example of something we can do:
Adopting stricter quality requirements during code review will have a positive effect on quality. This results in less toil, which gives us more time for development, which then improves quality further.5 Not modeled in the diagram above: stricter code review requirements might slow down changes. If you’re making this decision for real, it’s important to model these effects – in particular since slowed change rate might both reduce profits and actually reduce toil, too! If we can make the code review quality bar high enough to overpower the negative effects of evolution, then suddenly the feedback cycle spins the other way, and things will get better over time! By nudging the system in the direction we want, we get the reinforcing effect on our side. These types of interventions are highly levered, and when performed properly, they have a much greater effect than would be expected from the intervention alone.
The reason we often miss the feedback effects on interventions is that the feedback cycles usually work on greater timescales than quarterly reports. We won’t get an appreciable difference in quality right away. That might take a few months. For that to propagate around the cycle and then improve quality further can take a year or more. Building systems for long-term improvement involves thinking about how things fit together and applying small efforts in the right places. We do not effect long-term improvements with quick fixes.
Referencing This Article
Comments
: Jeff R
Toil is something that my team discusses a lot. I like your observation that it is part of a feedback loop. For fun, I drew your diagram in Loopy ( https://ncase.me/loopy/ - not my project, just a neat tool). Start the evolution and watch the toil grow!