Sensitivity Counts Against You

kqr

, published 2023-10-14

Tags:

Whenever you update your belief due to evidence, the strength of that evidence comes from a low false positive rate, not a high sensitivity.

This is counter-intuitive, because we would like to think the strength of evidence comes from its ability to call our the truth, not how well it does at not lying.

It also tells us how to optimise our processes for finding out the truth: if the sensitivity is reasonable, focus on reducing the false positive rate.

Bayes’ rule in its (log-)odds form

You probably know about Bayes’ rule.

You might not know about its odds form, but you absolutely should. It says the same thing, except in a more practical way. I use the odds form of Bayes’ rule all the time. If you haven’t read it yet, read that article now for the necessary prerequisites to this article.

In that article, I mentioned parenthetically that interesting happen when you start talking about log-odds instead of odds. Then Bayes’ rule takes the shape of

posterior log-odds = prior log-odds
    + strength of evidence

That’s not a figurative plus sign, that’s an actual arithmetic sum. You simply add the strength of your evidence to the prior and you get the posterior. In log-odds scale, evidence is linear (!). That’s a philosophically (and practically) pleasing way to look at probabilistic reasoning.

Strength of Evidence

How do we measure the strength of evidence? Going back to Bayes’ rule in odds form, you may guess that the strength of evidence is the logarithm of its odds ratio, and you’d be correct. This is called the log-odds difference, because we’re no longer multiplying things – the strength of the evidence is simply the difference between the prior and posterior log-odds.

The log-odds difference can be computed in two ways:

Either we compute the odds ratio (sensitivity divided by false positive rate) and take the logarithm of that; or
We compute the logarithm of the sensitivty and false positive rates separately, and subtract the log-fpr from the log-sensitivity.

These two methods give the exact same number – use whichever is easier.

Decomposing Evidence

If we take the second route though, something strange happens in the arithmetic. In the taxi cab problem from the previous article, -1.8 is the prior log-odds in favour of the cab being blue1¹ Corresponds roughly to a probability of 15 %.. The log-sensitivity of the witness is -0.2, and the log-fpr is -1.6.2² Corresponding to 80 % and 20 % respectively. From these two, we could compute the strength of the evidence and add it to the prior.3³ If we do, we find that the evidence has a log-odds difference of 1.4, which substantively increases our belief from -1.8 log odds to -0.4. Note that -0.4 log-odds correspond to roughly a 40 % probability – remarkably close to the exact value 41 %.

But we can also not do that, and instead leave the two components of evidence strength as separate numbers, and compute the entire update as

posterior log-odds = prior log-odds
    + sensitivity
    - false positive rate

or, numerically,

\[-1.8 - 0.2 + 1.6 = -0.4.\]

Now here, if you watched closely as the signs changed, you saw the strange thing happening. The logarithms of sensitivity and false positive rate will both always be negative. By adding sensitivity and subtracting false positive rate, we are effectively treating sensitivity as an argument against the observation and conversely, the false positive rate as an argument in favour of the observation.

But wait! Why would someone’s ability to call out the truth (sensitivity) be considered an argument against their observation?

You could call this an arithmetic mirage and be done with it4⁴ It just falls out of the combination of the definition of the two concepts and the log transform.. In fact, it would be easy to criticise this entire premise by saying “Well, duh. If you take something good and transform the scale so it becomes negative, then obviously it will appear to be bad.”

The Paradox of Sensitivity

Still, it spooked and surprised me enough that I had to think a little more about it. And if we subscribe to the premise that log-odds is the natural measure of belief, and log-odds difference the natural measure of evidence strength, then we also have to admit that sensitivity is, in some sense, a bad thing. One way to resolve this is to ask, “Bad compared to what?” Because then we find out that the baseline sensitivity in log-odds difference is 0, or 100 %. In other words, to the extent that sensitivy has an effect on our belief, it’s specifically in how much worse it is than perfect sensitivity.5⁵ This mirrors the arithmetic mirage argument, but is phrased more intuitively.

Yet there’s still something weird going on. Imagine you paraded ten green and ten blue cars in front of me, and I accidentally called two of the green cars out as blue. You have no information on what I did when I saw blue cars. That gives me a sensitivity of 80 % when detecting green cars. Now you have one green and one blue car, and you put one at random in front of me and I shout “Green!”

How likely should an observer think it is that the car is indeed green? 50 %? More? Less? All else equal, remember, my sensitivity – any sensitivity other than 100 % – is an argument against the observation. This means you would assign less than a 50 % probability of the car being green, based on me saying it is green, when you know I correctly identify “only” 80 % of green cars as such.

I suspect where I get hung up is that all else equal part. I’m not used to looking at evidence as decomposed into sensitivity and false positive rate separately. Almost always in real life when you vary the sensitivity, you also get a different false positive rate.6⁶ In fact, the two co-vary so often that there’s an type of plot invented to visualise their joint effect, called the receiver operating characteristic curve.

The strength of evidence is composed of precisely two things: sensitivity and false positive rate. As much as the log-odds form of Bayes’ rule makes it seem that way, it is not a meaningful operation to account for just the sensitivity of an observation and not the false positive rate, or vice versa.7⁷ Or rather, if you account for just the sensitivity, that implies a 100 % false positive rate – hardly a plausible situation.

Uh… Summary, I Guess?

This article is a bit confused and you may leave having more questions than you came with, but the main thrust is that that sensitivity is, arithmetically speaking, an opportunity for evidence to be wrong, and as such, it counts against the evidence. False positive rate, on the other hand, is where evidence proves itself correct, meaning it counts in favour of the evidence.

The power of the false positive rate can also be seen in the taxi cab problem. The sensitivity of -0.2 log-odds difference is a weak argument against the observation, but what really gives us faith in the observation is that false positive rate of -1.6 log-odds difference. That’s where the strength of our evidence came from. Not from the sensitivity, which is just an opportunity to mess up.

Indeed, the sensitivity can at best be 0, which is perfect sensitivity. False positive rate, however, can go down to negative infinity in the best case.

Sensitivity itself can never give our evidence much strength, only a low false positive rate does. The reason we care about sensitivity is that it can certainly spoil the strength of our evidence if we don’t.

Referencing This Article

The Making of a Forecasting Bot