Student Resampled His t Distribution

kqr

, published 2024-07-30

Tags:

I was reading William Sealy Gosset’s paper where he introduced the t distribution for small-sample hypothesis testing.1¹ The Probable Error of a Mean; Gosset a.k.a. Student; Biometrika; 1908.

In it he derives the distribution analytically, but also adds that

Before I had succeeded in solving my problem analytically, I had endeavoured to do so empirically. The material used was a correlation table containing the height and left middle finger measurements of 3000 criminals, from a paper by W. R. Macdonnell.

The measurements were written out on 3000 pieces of cardboard, which were then very thoroughly shuffled and drawn at random. As each card was drawn its number were written down in a book, which thus contains the measurements of 3000 criminals in a random order.

Finally, each consecutive set of 4 was taken as a sample – 750 in all – and the mean, standard deviation, and correlation of each sample determined. The difference between the mean of each sample and the mean of the population was then divided by the standard deviation of the sample, giving us the z score.

First off, I’m impressed people back in the day managed to do any statistics at all, when their idea of resampling from a table was “write out each row on 3000 pieces of cardboard, shuffle thoroughly, draw one at a time and write into a table again.” But also, isn’t that a little inspiring? Student – a successful, working statistician – wasn’t immediately able to solve his problem analytically, so he took to writing things out on cardboard and shuffling! It’s okay to not get things right away and have to experiment first.

Additionally, it shows that resampling doesn’t require modern computers, they just allow us to decrease the error bars for sensitive measurements, by running more iterations in our lifetime. Given enough resources, one could hypothetically hire a lot of humans to do the cardboard dance and within just days get the same result a computer would in milliseconds.

It’s not that modern computers can do things we couldn’t before. Modern technological development has one effect: it improves the size, energy, and reliability constraints on computing. Fast resampling in the 1800s would have required the energy and space to feed a small city of humans. In the early 1900s, it would have taken a building and industrial power to run a large number of vacuum tubes. In the 1970s, a few cabinets of transistors and a domestic power supply could probably have done it. Today, a very small battery can run that computation on a device that fits in my pocket.

What will be the next leap forward in thermal, size, and reliability constraints of computing? Is the silicon semiconductor the end station?