Quick Variance Computation
Here’s a neat trick I’ve never seen explained other than in one niche statistics book, but it feels like the sort of thing basically everyone else knows implicitly.
If you have a sample and you want its mean and variance, and you only have a plain desktop calculator at hand, there is a quick way to compute those. It involves first computing their sum, and the sum of their squares. Let’s look at an example.
We may, for example, have recorded how many gb of memory one of our services is using each day. The numbers we recorded are 28.1, 29.0, 29.6, 29.1, 28.6, 27.9, 29.0, 28.1, 27.7, and 29.2.
We have already used statistical process control to find out that this level of memory usage seems stable, so now we want to do traditional statistics on it, so we need to find out the variance. We put the numbers – and their squares – in a table, and then we sum down both columns.
x | x² |
---|---|
28.1 | 790 |
29.0 | 841 |
29.6 | 876 |
28.6 | 818 |
27.9 | 778 |
29.0 | 841 |
28.1 | 790 |
29.1 | 847 |
27.7 | 767 |
29.2 | 853 |
a = Σx = 286 | b = Σx² = 8200 |
Let’s call the sum \(a\), and the sum of squares \(b\). Now, the mean only involves the sum:
\[\mu = \frac{a}{n} = \frac{286}{10} = 28.6\]
The variance is, and this is the trick:
\[s^2 = \frac{1}{n-1} \left(b - \frac{a^2}{n}\right) =\]
\[= \frac{1}{9} \left( 8200 - \frac{286^2}{10} \right) =\]
\[= \frac{1}{9} \left( 8200 - 8180 \right) ==\]
\[= \frac{20.4}{9} = 2.27\]
It looks complicated but it’s not! It’s just a little tricky to remember. But importantly, it’s much easier on a desk calculator than trying to expand the definition of variance.