Entropic Thoughts

The Sample Size Curve

The Sample Size Curve

I’m writing an article on correlations (much more interesting than I had thought!) One of the things I have learned is that the coefficient of determination captures the idea of effect size in a way that is particularly intuitive to estimate because it is additive. This makes it a useful tool to estimate the sample size needed for an experiment.

sasi-curve-01.svg

To find out the sample size necessary for an experiment, we

  1. Estimate the coefficient of determination for the effect we want to investigate1 We’ll learn how this works in that future article on correlations.;
  2. Pick a significance level2 For quick and dirty experiments, I often use the second-lowest curve, which corresponds to a one-sided p=0.05 and two-sided p=0.1.; and then
  3. Read off the necessary sample size on the left.

If this number is bigger than our budget for this particular experiment, we’ll have to find something else to investigate.

If we are investigating effects with smaller coefficients of variation, we’ll have to rely on a separate plot because the sample sizes quickly become large.

sasi-curve-02.svg

I don’t think this graph should be used as often. Once we’re investigating effects with coefficients of determination around the single-digit percent we have to be really sure that the effect (a) even makes theoretical sense, and (b) is highly valuable to make the effort worth it. There are usually lower hanging fruit around.

Derivation notes

In the future correlations article we will learn the following transformation of the correlation coefficient3 The logit function captures the idea that a difference in correlation between 0.95 and 0.98 is more difficult to prove than e.g. a difference between 0.2 and 0.3.:

\[z_r = 0.5 \log{\frac{1+r}{1-r}}\]

This value has a nearly-normally distributed sample distribution with a standard deviation of

\[\mathrm{se}_{z_r} = \frac{1}{\sqrt{n-3}}\]

regardless of the size of \(r\). The above curves are constructed by driving these equations backward.