Bayes Rule in Odds Form
Table of Contents
Using Bayes’ rule in its commonly stated form requires arithmetic and probability estimation that can be difficult to apply mentally on the fly. Stating Bayes’ rule in its lesser known odds form makes it significantly more usable for on the fly calculations.
Summary
To state it extremely briefly,
\[o(H|E) = \frac{p(E|H)}{p(E|\neg H)} o(H).\]
Here, \(o(H)\) represents your prior odds in favour of hypothesis \(H\). The result of the application is \(o(H|E)\), which represents your updated belief of the odds in favour of hypothesis \(H\), after having considered evidence \(E\).
To go from prior odds \(o(H)\) to posterior odds \(o(H|E)\) you multiply the prior with the odds ratio of the evidence,
\[\frac{p(E|H)}{p(E|\neg H)}.\]
Computing this odds ratio is the most difficult part of applying the odds form of Bayes’ rule, since it requires estimating the probability of seeing evidence \(E\) both when hypothesis \(H\) is true, and when it is false, and then dividing the two numbers. However, we can usually work with rough estimates that make this computation easier.
Example
Let’s take the classic taxi cab problem: a taxi cab was involved in a hit and run at night. Two cab companies operate in the city: blue and green. Which one is at fault?
Prior Odds
In this city, there are clearly more green cabs than blue ones – you see roughly one blue cab for every five green ones. The prior odds, in other words, are 1:5 (pronounced “one to five”) in favour of the blue company being involved.
Odds Ratio
However, an eyewitness identified the cab as blue. Court tests of this witness’ reliability under nighttime circumstances indicate a sensitivity of 80 % and false positive rate of 20 %. This translates to an odds ratio of \(0.8/0.2 = 0.8 × 5 = 4\).
Posterior Odds
Computing the posterior odds is now trivial: prior was 1:5, and odds ratio is 4, so when we multiply we get posterior odds of 4:5, which means after the witness testifies, we think there’s just under an even chance that the blue company was involved rather than the green one.
Bayes’ Rule, Common Form
This problem is usually presented with green cabs being 85 % of the cabs in the city. If we try to apply Bayes’ rule the way it’s normally phrased, we start with
\[p(H|E) = \frac{p(E|H)p(H)}{p(E)}.\]
We can compute \(p(E) = 0.8 × 0.15 + 0.2 × 0.85 = 0.29\) and then plug in the numbers
\[p(H|E) = \frac{0.8 × 0.15}{0.29} = 0.41.\]
In other words, in this more accurate version of the computation we see that the true chance of it being a blue cab is 41 %. For most practical problems, our quicker approximation of “just under even chance” will be fine.
Bayesian Hypothesis Testing
This helps mentally perform Bayesian hypothesis testing as well. Imagine there are three urns. All three have 5 balls each, however,
- Urn A has 1 red ball,
- Urn B has 2 red balls, and
- Urn C has 3 red balls.
Someone hands you an urn, you reach into it and pick a ball. It’s red. What are the chances that you were handed urn A?
Prior Odds
There is just one of urn A, and two urns B and C. This means the odds in favour of being handed urn A is 1:2.
Odds Ratio
Assuming you were handed urn A, the probability of picking up a red ball would be \(1/5 = 0.2\).
Assuming you were handed one of the other urns, the probability of picking up a red ball would be \((2 + 3)/10 = 1/2 = 0.5\).
Thus, the odds ratio is \(0.2/0.5 = 0.2 × 2 = 0.4\).
Posterior Odds
We multiply, and get posterior odds
\[1:2 × 0.4 =\] \[= 0.4 : 2 \approx\] \[\approx 0.5 : 2 =\] \[= 1: 4.\]
In other words, we went from a one-in-three chance of holding urn a to a one-in-five chance. Picking up a red ball was evidence against urn A. Specifically, it about halved our belief in holding urn A.
Generalisation
This is the essence of Bayesian hypothesis testing: each competing hypothesis (or parameter value) is a metaphysical urn and then we evaluate what effect the evidence has on our belief of drawing from each urn. The result is a posterior distribution over urns… or hypotheses.
Other Observations
Independent Evidence
If you have multiple pieces of independent observations, you can multiply their odds ratios together to get the strength of all the evidence combined. (If the observations are correlated, you have to work a bit harder.)
In other words, if in the taxi cab problem we had a second, similar eyewitness independently saying the same thing, the evidence of both combined would have an odds ratio of \(4×4 = 15\), resulting in posterior odds of 16:5, which means now we would think it three times more likely that the culprit was from the blue company rather than the green one.
This is, in fact, what I mean when I say that “the plural of anecdote is data”: there is no magic threshold at which observations become many enough to become significant. Evidence is nothing but small independent anecdotes whose odds ratios multiply together to become a large number.
Strength of Evidence
In fact, whether you are talking about the product of multiple pieces of evidence, or just one observation, the odds ratio measures the strength of the evidence. A sensitivity of 80 % for the witness in the taxi cab problem might have sounded weak, but the odds ratio tells us that this single observation quadruples our belief in the blue company being guilty. That’s some strong evidence!
(This is even more natural when you work in log-odds instead of odds, because then you just sum together the strength of the evidence expressed in log-odds-difference. The problem is that we generally don’t have intuition for estimating and working with log-odds. This is something I want to practise but I haven’t gotten around to it yet.)
Going Backwards to Prior Odds
Sometimes it’s hard to estimate the probability of low probability events. For example, how likely do I think it is that it was snowing in Stockholm in the summer of last year? I have no idea.
If someone told me it did snow in Stockholm in the summer of last year, I might double my belief in it – but that doesn’t really help me estimate my prior. However, if six independent people told me the same thing, I might judge that snow a 50–50 event, or having odds of 1. So, going backwards, my prior odds must have been \(1/2^6 \approx 1/50 = 0.02\).
Low odds (roughly \(< 0.15\)) are easy to convert to probabilities because they’re basically the same number. So odds of 0.02 means a probability of 2 %.
Thus, I can conclude that my prior odds for snow the previous summer in Stockholm must have been 2 %. This is higher than I would have thought, but I also trust this number more than my gut reaction.
Derivation
Bayes’ rule falls naturally out of the identity
\[p(H|E)p(E) = p(E, H) = p(E|H)p(H).\]
The one weird trick we are employing here is dividing
\[\frac{p(E, H)}{p(E, \neg H)}\]
which causes something strange and useful to happen.
We can expand it using the same identity as before, and the \(p(E)\) cancel out. We are left with
\[\frac{p(H|E)}{p(\neg H|E)} = \frac{p(E|H)p(H)}{p(E|\neg H)p(\neg H)}.\]
As a gambler could tell you, \(p(H)/p(\neg H)\) is the odds in favour of \(H\). We will invent the notation \(o(H)\) for this.
By extending another familiar concept, we could invent the idea of “conditional odds” in favour of hypothesis \(H\) given evidence \(E\), and notate it \(o(H|E)\). This would be defined as
\[o(H|E) = \frac{p(H|E)}{p(\neg H|E)}.\]
Using the concepts of odds and conditional odds, we can rewrite the previous division as
\[o(H|E) = \frac{p(E|H)}{p(E|\neg H)} o(H).\]
This is Bayes’ rule in its odds form.