Lines of code are useful
The internet is full of people dismissing lines of code as a measurement. People say things like
Lines of code written has been firmly established over the decades as a largely meaningless metric.
and
Metrics and kpis are being based on stupid measurements like lines of code.
and
Lines of code is a dumb metric and anyone touting them for anything meaningful is disconnected from reality.
and
We’ve apparently collectively forgotten that lines of code is one of the worst metrics for measuring productivity.
and
I think obsession with lines of code is one of the most counterproductive practices.
and
Lines of code is renowned for being a very bad measure of anything
I find these statements strange, because they are not true.
Measuring complexity
Lines of code measure code complexity. That is well established. You don’t have to take my word for it:
- Basili and Hutchens (1981) analyse 19 programs, and find line count to correlate strongly with their own volume definition (+0.98) and cyclomatic complexity (+0.88). They also consider other complexty metrics like Halstead volume, flow graphs, nesting, etc. None of these complexity metrics predict the effort required to get a working program better than a simple line count.
- Revilla and van der Meulen (2007) analyse over 70,000 small C programs performing 59 different tasks, and determine that lines of code correlate very strongly with both Halstead volume (+0.82) and cyclomatic complexity (+0.78). They find that a line count predicts anything just as well as more complicated complexity measures.
- Herraiz and Hassan (2010) analyse over 200,000 C source code files from a real open source project (Arch Linux) and find line count correlates strongly with cyclomatic complexity (+0.72) and various Halstead metrics (+0.91).
Any time we have actually gone through the effort of testing a code complexity metric, it has resulted in the same predictions (or worse) than a simple count of the number of lines of code. This does not mean better complexity metrics cannot exist, it just means we should be skeptical by default. Since all the complexity metrics that came before have been trenchcoated lines-of-code measurements, it’s likely the next one will be, too.
Basili and Hutchens phrase this particularly well:
Since the line count is very easy to calculate and many researchers have found that it does a credible job of measuring the complexity, it must be considered as the metric to beat in studies of this kind. We have failed to find a metric that is significantly better than line count.
Line count is not a meaningless measurement. It is the best way we know of to measure code complexity.
Complexity matters
Complexity, in turn, is what determines how expensive software is to build, maintain, and, to some degree, how useful it is. All else equal, more complex software (a) costs more, and (b) can perform more useful tasks.
It is important here to distinguish between what Fred Brooks called essential and accidental complexity.1 No Silver Bullet: Essence and Accident in Software Engineering; Brooks; Proceedings of the ifip Tenth World Computing Conference; 1986.
Essential complexity
We can think of essential complexity as complexity that exists due to the problem to be solved.2 Brooks also mentions a few other forms of essential complexity we have to deal with in software, that have more to do with the invisible and fractal nature of software. These are not relevant when comparing the complexity of two different programs, because they apply equally to all software. Landing a rover on Mars is a complex problem, and the software that lands a rover on Mars cannot be any simpler than the complexity inherent in landing a rover on Mars. As Brooks puts it,
Much of the complexity the software engineer must master is arbitrary complexity, forced […] by the many […] systems to which his interfaces must conform. […] This cannot be simplified out by any redesign of the software alone.
It is impossible to remove essential complexity while retaining functionality. To reduce essential complexity, we have to remove features.
Accidental complexity
On the other hand, accidental complexity does not come from the problem to be solved. Instead, it is complexity introduced when translating that problem into software. Brooks never defines accidental complexity, but discusses how advances in tooling has removed some of it. For example, he says,
Abstract data types […] remove one more accidental difficulty from the process, allowing the designer to express the essence of his design without […] large amounts of syntactic material that add no new information content. The result is to allow a higher-order expression of design.
Accidental complexity, then, is complexity we put into the code which didn’t have to be there. Some accidental complexity is forced into the code because the programming language we use does not have the abstractions we need, and some we put in because we’re not all rockstar 10× developer ninjas.
Complexity means cost
This distinction is critical because
- essential complexity is what increases the value of software, whereas
- both essential and accidental complexity increase the cost of software.
For better or worse, lines of code measures only total complexity; a line count cannot distinguish between essential and accidental complexity. The upside of this is that lines of code corresponds closely to the cost of software: if e.g. Blender has more lines of code than nginx (which is does), we expect Blender to have been more expensive to develop (it was) – but also that the ongoing cost of its maintenance is higher (it is). See appendix A for some fun observations around this.
This relationship between lines of code and complexity is true both for the total size of a software project (the larger project is more expensive) but also for changes to software size. A project that grows by so-and-so many lines of code per day also has its ongoing maintenance cost grow by so-and-so many minutes per day. For small changes, the effect is small. Over time, it adds up. Lines of code is how we measure this growth in maintenance cost.
Even without any concrete measurements, we can use this as a rule of thumb for planning purposes. If a team spends a quarter of its time on maintenance today, and we expect to grow the project to twice the size over the next year, then the pace of new development will decrease by a third, to account for the maintenance burden of the additional code.
Measuring added value
At this point, we may be tempted to admit that Dijkstra was right all along when he wrote3 ewd 1036: On the cruelty of really teaching computer science; Dijkstra; 1998. Available online.
My point today is that, if we wish to count lines of code, we should not regard them as “lines produced” but as “lines spent”: the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger.
But it’s also not that simple. Lines written are lines spent, but some of them are also lines produced. Some of the additional complexity is accidental, and some of it is essential. The latter does indeed increase the value of the software.
If the ratio of essential complexity to accidental complexity is relatively constant in a project (which it seems reasonable to believe, at least in aggregate across medium timespans) then the line count is also a measurement weakly correlated with essential complexity, i.e. with the amount of value the software can provide.
This would be true also for individual productivity. If the ratio of essential complexity to accidental complexity is relatively stable for someone’s contributions one month to the next, then the number lines of code they have added to a project are a proxy for how much additional value they have made the software provide. However, different types of work has different proportions of essential to accidental complexity, so it’s not exactly a safe assumption that the ratio is stable unless averaged over many contributions.4 Goodhart’s law is also clearly a problem: if an individual is aware that they are judged by lines of code, they are likely to start producing more accidental complexity, which increases cost without increasing value.
This reasoning makes it sound like it is never good to remove code, because that is correlated with negative productivity. That’s not what I’m saying! Removing code makes software cheaper to maintain. However, there is an important difference between removing accidental complexity (a purely good thing) and removing essential complexity (sometimes good, but harder to evaluate.)
Further complicating matters is the fact that not all essential complexity adds equal value. Some valuable features are fundamentally simple, yet other features may be complex and also not very desirable. This further weakens the correlation between line count and value.
Guidance
The above may make it sound like lines of code is a useless metric after all, but it’s important we separate the two ways we can use it as a metric:
- As a cost metric, lines of code is mostly fine, even considering Goodhart’s law. This is because lines of code is an almost perfect proxy for total – essential plus accidental – complexity.
- As a productivity metric, lines of code is difficult to use, because it is an imperfect proxy for essential complexity, which in turn is an imperfect proxy for customer value. However, if we can ensure the relationship between value and essential complexity is stable, and further that the ratio of essential to accidental complexity is stable, then lines of code is usable as a productivity metric too.5 Note, however, that “stable relationship between value and essential complexity” means your product organisation is not getting better at their jobs. You might not want to aim for that.
There, slightly more nuance than the quotes that opened this article.
Appendix A: The cost of open source maintenance
We used Blender and nginx as examples in the article. Blender has 2.8 million lines of code, and nginx has 250 thousand lines of code. It seems reasonable to guess that Blender costs more to maintain than nginx, and this is also true, at least in one way. As a proxy of maintenance cost, we’ll use the number of maintainers, defined as the number of authors that have made more than five commits in the past six months. Here’s the data for a spattering of projects popular on GitHub.
| Project | Line count | Maintainers |
|---|---|---|
| Rust | 3,800,000 | 229 |
| Blender | 2,800,000 | 71 |
| Kubernetes | 2,300,000 | 89 |
| VSCode | 2,000,000 | 81 |
| Node.js | 1,300,000 | 39 |
| PowerToys | 760,000 | 15 |
| React | 550,000 | 10 |
| NeoVim | 410,000 | 25 |
| Transmission | 300,000 | 8 |
| nginx | 250,000 | 4 |
| yt-dlp | 240,000 | 6 |
| Redis | 230,000 | 12 |
| Audacity | 180,000 | 9 |
| Excalidraw | 160,000 | 4 |
| tmux | 100,000 | 3 |
| Fish | 90,000 | 10 |
| htop | 49,000 | 4 |
| i3 | 29,000 | 3 |
| scc | 24,000 | 3 |
As a very rough guide, this data hints that each 25 thousand lines of open source requires one more maintainer.6 If we look closer, we’ll see the maintainer count seems to be a quadratic function of code size, meaning the cost of a line of code becomes greater when the total code size is greater. That makes sense – interactions between components scale quadratically with the number of components because in n things, there are roughly n² pairs of things. Let’s say these maintainers spend on average an hour a day on maintenance.7 Some of these projects are spare time efforts, others have corporate sponsors. The hour-a-day figure comes out of my arse. If you think it’s wrong, you can repeat these calculations with a number you agree with more. That means
- Each 200,000 lines of code needs a full workday of maintenance per day. That lets us estimate a limit for how large a project a single person can maintain, while not having time for any new development.
- If the fully-loaded hourly cost of a developer is $50 (this may be on the cheap end depending on where you live) then every 100,000 lines of code costs $200 per day in maintenance alone.
- If a team of eight people wants to spend at most 1/5 of its time maintaining a project, then their project can only expand to 320,000 lines of code before they need to hire one more person.
Lots of assumptions going into those specific numbers, so they’re probably not universally true. But still interesting to think about, and certainly close enough to my professional experience to be useful rules of thumb.
Appendix B: do we need make functions smaller?
Basili and Hutchens made another interesting observation, although they admit their sample is too small to generalise from. As a proxy for how difficult a program is to write, they measured the number of changes that were necessary to get the program to successfully fulfill its specification. They tried to model this based on the size of individual components of the resulting program, testing a linear fit against an exponential fit on high percentiles of component complexity in the program.
Going by classic advice from the likes of Uncle Bob, who suggest functions must be short lest maintainability suffers8 Clean Code: A Handbook of Agile Software Craftsmanship; Martin; Pearson; 2008., we would expect the exponential fit to be best. That corresponds to the hypothesis that individual complex components cause an outsized maintenance demand.
But Basili and Hutchens found the linear fit to be better! In their data, one complex component doesn’t cause more maintenance demand than five components, each a fifth of the size. This agrees more with advice of the likes of John Carmack suggesting inlining subprograms results in greater clarity.
Again, their sample is too small to generalise from. I don’t know if anyone has replicated the result with more varied data. But that’s some great science!
Comments
: Derek Jones
The purpose of the Halstead/McCabe metrics is to provide a complicated formula for manager to impress the people they need to impress. Halstead metrics have a mathematical problem.
Appendix A in your article ignores the major influence that fashion plays in maintenance. Rust has so many more maintainers because it is fashionable.
Some analysis for an optimal function length relies on sloppy analysis.