What the +1 Means - Jerry's Math Garden

### Original Question Suppose that a random variable $X$ follows the Bernoulli distribution with probability of success $p$, but we don't know $p$. Therefore by the principle of symmetry, our **prior** distribution of the parameter $p$ is uniform: $p(p = x) = 1$ ($p$ is a continuous distribution/probability density function since the parameter can be anything between 0 and 1). After taking $n$ samples and getting $s$ successes ($s$ comes from the random variable $S \sim B(n, p)$), we can build a **posterior** probability $ p\paren{p = x|S} = \frac{p\paren{S = s|p = x}}{N}p\paren{p = x} $ Where $N$ is a normalisation factor $ N = \int_{0}^{1}p\paren{S = s|p = x}p\paren{p = x}dx $ Such that the integral over the posterior gives us $1$. Since $p(p = x) = 1$, we can simplify this down to: $ p\paren{p = x|S} = \frac{p\paren{S = s|p = x}}{N} $ $ N = \int_{0}^{1}p\paren{S = s|p = x}dx $ It turns out that $ N = \frac{1}{n + 1} \quad \forall n \in \nat, 0 \le s \le n $ Meaning that the total area under the un-normalised graph goes down as we increase our sample size, and that we become more certain as we pick up more data. Why is the denominator $(n + 1)$ instead of $n$? What does it mean? While integrating over all of this told me that it is true, but it never told me *why*. ### Answer I was stuck on this for many months until I was playing with this problem again on a graphing calculator a few days ago. ![[binomial.png|200]] The more samples I took, the narrower the peak becomes and the graph needs to be stretched vertically more to maintain its area. The denominator $(n + 1)$ felt like a rough measure of how much knowledge we have of the distribution that shapes our belief. The more samples we take, the more knowledge we have. Then, I realised that I never looked at the situation where $n = 0$. My original guess was that there would simply be nothing. ![[uniform.png|250]] However, the uniform distribution showed up. Then I realised, that if $(n + 1)$ is a measure of our knowledge that shapes our belief, then when $n = 0$, $(0 + 1)$ is our knowledge of the distribution without taking any samples. The remaining $1$ represents our assumption, the principle of symmetry! ### Conclusion After finally dealing with this problem, I now understand that even when we know nothing about something, we still have beliefs about it, and that belief is the assumption based on our ignorance! It's beautiful that even when we know nothing, math still has a principle that guides our decisions!