Against the odds: a caution to practical Bayesians

From stacky wiki
Revision as of 08:34, 7 March 2016 by Anton (talk | contribs) (Created page with "Category:Blog Lots of people I know like to use Bayes' Theorem in their daily life to estimate the odds that various statements are true. See http://betterexplained.com/ar...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Lots of people I know like to use Bayes' Theorem in their daily life to estimate the odds that various statements are true. See http://betterexplained.com/articles/understanding-bayes-theorem-with-ratios/ for a quick tutorial. This post is about a pitfall to watch out for once you're feeling pretty good about applying Bayes' Theorem.

Upshot: if you do multiple Bayesian updates on the odds of a proposition (i.e. P(X):P(¬X)), then you're probably making errors unless X and ¬X are pretty simple hypotheses.

A typical example of using Bayes' Theorem: I randomly choose either a fair coin or a double-headed coin. I flip it 3 times and get heads each time. What's your credence that it's the double-headed coin? The initial odds fair:double-headed are assumed 1:1. A fair coin will produce heads with probability 0.5, and a double-headed coin will produce heads with probability 1, so the likelihood ratio to multiply by every time you see a heads is 0.5:1 = 1:2. So after seeing 3 heads, the fair:double-headed odds are 1:1 × 1:2 × 1:2 × 1:2 = 1:8, so there's a 8/(8+1) = 89% chance it's double-headed.

Let's try a more complicated example. Suppose somebody stole my lunch from the fridge. I know the culprit was one of my three "friends", Alice (A), Bob (B), or Charlie (C). I don't really care much about Bob or Charlie, but I'm considering starting a business with Alice, so I really want to know whether it was Alice or not. That is, the odds I care about are Alice-did-it:Alice-didn't-do-it, P(A):P(¬A).

Initially, I have no reason to suspect any one of them more than the others, so my odds are 1:2 in Alice's favor; there's a 1/3 chance that Alice is the culprit. Luckily, there were two witnesses exonerating Alice. Suppose that the witnesses are independent, and each witness identifies the true thief with 80% probability, and otherwise names one of the innocent friends (with equal probability, so 10% each). If a witness names someone other than Alice as the thief, what does that tell me? If Alice did it, there's a 10% chance of hearing that, and if Alice didn't do it, there's a 90% chance, so a witness testimony clearing Alice contributes a likelihood ratio of 1:9. That means that the odds after two testimonies are 1:2 × 1:9 × 1:9 = 1:162, so a 1/163 = 0.6% chance that Alice took my lunch, right?

WRONG! Our fixation on Alice messed us up. Let's go back and keep track of all the hypotheses, so the initial odds of the culprit being Alice:Bob:Charlie are 1:1:1. If a witness says Bob took the lunch, it contributes 1:8:1. If a second witness says Bob did it, the odds are 1:64:1, so there's a 1/(1+64+1) = 1.5% chance Alice did it; significantly more than our previous estimate of 0.6%. Worse, if one witness says Bob and the other says Charlie, the odds are 1:8:8, which is a 1/(1+8+8) = 5.9% chance it was Alice.

What went wrong? First of all, that 1:9 likelihood ratio is bogus. If we condition on the thief not being Alice, the probablity of a witness saying some specific person other than Alice (say Bob) did it is really $$ P(witness_B|¬A) = 0.8×P(B|¬A) + 0.1×P(C|¬A) = 0.8×0.5 + 0.1×0.5 = 0.45, $$

so the likelihood ratio should have been 10:45 = 2:9 instead of 1:9. But using this corrected ratio still gives us the wrong answer for a deeper reason: if you condition on ¬Alice, then the witness testimonies are not logically independent! Once I hear one testimony against Bob, then when compute $P(witness_B|¬A)$, $P(B|¬A)$ and $P(C|¬A)$ aren't 0.5 anymore. Note that this is consistent with the assumption that the witness testimonies are causally independent.

This problem arises whenever your evidence differentiates between different sub-hypotheses. What to do about it? I think the best answer is to use unnormalized probability distributions, like we did to get the right answer above. Of course, it's infeasible to keep track of all possible states of the universe, but maybe we can develop decent intuitions about what the hypothesis space should be. Most people I've talked to about Alice recognize that there should be a difference between witness testimonies agreeing and disagreeing. I'd be curious if there was an example where it really feels right to do the wrong thing.