# Chocolates, bugs, and socks

## How many chocolates has Lindsay eaten?[edit]

Every chocolate Lindsay eats, she gets a picture of a bug, randomly selected from a set of 20 bug pictures. She just collected her 19-th unique bug picture. How many chocolates do you think she's eaten?

When she's collected 7 unique bugs, each chocolate brings with it a $\frac{13}{20}$ chance of getting an 8-th bug. How many chocolates does she have to eat before that happens, and with what variance? Well, for any event with probability $p$ of success, the expected number of trials before the first success is $\langle n\rangle =\sum_{n=1}^\infty n(1-p)^{n-1}p = \frac 1p$, and the expected square of the number of trials before success is $\langle n^2\rangle =\sum_{n=1}^\infty n^2(1-p)^{n-1}p = \frac{2-p}{p^2}$, so the variance is $\langle n^2\rangle - \langle n\rangle^2 = \frac 1p - 1$. So the expected number of chocolates Lindsay ate is $$\frac{20}{20}+\frac{20}{19}+\cdots+\frac{20}{2} \approx 20\times(\ln(20) + \frac 12 - 1) \approx 50.$$ (This was my back-of-the-envelope guess using that $\ln(20)\approx 3$, and that the Euler-Mascheroni constant is about 0.5. The actual value is very close to 52.) The number of chocolates eaten to get from 5 to 6 bugs is independent of the number of chocolates eaten to get from 6 to 7 bugs, so the variances add too: $$\Bigl(\frac{20}{20} - 1\Bigr) + \Bigl(\frac{20}{19} - 1\Bigr) + \cdots \Bigl(\frac{20}{2} - 1\Bigr) \approx 33.$$ (Same as previous number, minus 19.) So the standard deviation is $\sqrt{33}\approx 5.74$. Assuming this sum of random variables is roughly Guassian (how good is this assumption?), the 90% confidence interval (z-score 1.645) for how many chocolates Lindsay has eaten is $52\pm 9.4$.

Note: she should expect to eat another 20 chocolates before getting that last bug.

## How many unique bugs are there?[edit]

Suppose Lindsay **knows** she's eaten 50 chocolates, among which there were 19 unique bugs, but she doesn't know how many unique bugs there are out there. If she's eaten n chocolates and found k unique bugs, and there are b unique bugs, the likelihood is
$$
\binom{b}{k}k^{n-k}b^{-n}
$$
(pick the subset of k, which also uses up k of your chocolates, then distribute the remaining n-k chocolates among those k bugs). For n=50 and k=19, the maximally likely answer is that there are 20 bugs.

## How many socks do I have?[edit]

I've just pulled 10 socks out of a pile of laundry, making 2 pairs and 6 singletons. How many pairs of socks are there total (assuming each sock has a unique partner)?

If there are $n$ pairs of socks, I pull out $k$, and make $p$ pairs, the likelihood is $$\binom np \binom{n-p}{k-2p} \binom{2n}{k}^{-1}$$ (first pick the $p$ pairs, then $k-2p$ singletons). Maximum likelihood answer to the original question: 11 pairs (so $22-10=12$ socks left). Of course, you'd usually have a pretty good prior multiplying this likelihood function.