> [!definition] > > Let $\list{p}{n}$ and $\list{q}{n}$ be two [[Probability Distribution|probability distributions]] of the same size. Then, > $ > -\sum_{j = 1}^{n} p_j \log_2(p_j) \le -\sum_{j = 1}^{n}p_j\log_2(q_j) > $ > With equality if and only if each $p_j = q_j$. > > *Proof*. > > When $p_j \gt 0$, using the fact that $\ln(x) \le x - 1$: > $ > \begin{align*} > -\sum_{j = 1}^{n}p_j\log_2\paren{\frac{q_j}{p_j}} &\ge -\sum_{j = 1}^{n}p_j\paren{\frac{q_j}{p_j} - 1}\\ > -\sum_{j = 1}^{n}p_j(\log_2 q_j - \log_2 p_j) > &\ge -\sum_{j = 1}^{n}(q_j - p_j)\\ > -\sum_{j = 1}^{n}p_j\log_2 q_j +\sum_{j = 1}^{n}p_j\log_2 p_j > &\ge -\sum_{j = 1}^{n}q_j + \sum_{j = 1}^{n}p_j = -1+1 = 0\\ > -\sum_{j = 1}^{n}p_j\log_2 q_j > &\ge -\sum_{j = 1}^{n}p_j\log_2 p_j > \end{align*} > $ > > For cases where $p_j = 0$, the expression $p_j \log_2(p_j)$ and $q_j \log_2(q_j)$ are both 0[^1], which does not affect the inequality. > > Consider a situation with $n = 2$ where $p_1 \log_2(p_1) = p_1 \log_2(q_1)$ and $p_2 \log_2(p_2) = p_2 \log_2(q_2)$, meaning that > $ > p_1 \log_2(p_1) + p_2 \log_2(p_2) = p_1 \log_2(q_1) + p_2 \log_2(q_2) > $ > Since $p$ and $q$ are probability distributions, $p_1 + p_2 = 1$ and $q_1 + q_2 = 1$, we can simplify the equation to: > $ > p\log_2(p) + (1-p)\log_2(1-p) = p\log_2(q) + (1-p)\log_2(1-q) > $ > > According to the inequality part of the theorem, > $ > p\log_2(p) + (1-p)\log_2(1-p) \ge p\log_2(q) + (1-p)\log_2(1-q) > $ > Because the [[Logarithmic Function|natural logarithm]] is [[Continuity|continuous]], the equality must be at an [[Extrema|local maximum]] of the right-hand side, solving for that point gives: > $ > \begin{align*} > \frac{d}{dq}p\log_2(q) + (1 - p)\log_2(1 - q) &= 0 \\ > \frac{d}{dq}p\ln(q) + (1 - p)\ln(1 - q) &= 0 \\ > \frac{p}{q} - \frac{1 - p}{1 - q} &= 0 \\ > \frac{p}{q} &= \frac{1 - p}{1 - q} \\ > \frac{p}{1 - p} &= \frac{q}{1 - q} \\ > p &= q > \end{align*} > $ > > This shows that for $n = 2$, the inequality is equal if and only if the distributions are identical. By induction, if the values of all but two values $q_a$ and $q_b$ is known, the above proof still holds by swapping all occurrences of $1$ with $s = q_a + q_b = p_a + p_b$. If $q_a + q_b \ne p_a + p_b$, then the equation only has solution $p = q = 0$. > [!theorem] > > Given two [[Random Variable|random variables]] $X$ and $Y$, the [[Conditional Entropy|conditional entropy]] (information about $Y$ encoded in $X$) of $Y$ given $X$, $H_X(Y)$ is always less than or equal to its [[Entropy|entropy]]. > $ > H_X(Y) \le H(Y) > $ > > *Proof*. Using the above theorem. Let $\list{p}{n}$ be the distribution of $X$, and let $\list{q}{m}$ be the distribution of $Y$. Begin by expanding the entropy of $Y$. > $ > \begin{align*} > H(Y) &= -\sum_{k = 1}^{m}q_k \log_2(q_k) \\ > &= -\sum_{j = 1}^{n}\sum_{k = 1}^{m}q_{jk} \log_2(q_k)\\ > &= -\sum_{j = 1}^{n}\sum_{k = 1}^{m} p_j \cdot p_j(k) \log_2(q_k)\\ > &= -\sum_{j = 1}^{n}p_j\sum_{k = 1}^{m} p_j(k) \log_2(q_k) > \end{align*} > $ > And the conditional entropy of $Y$ given $X$. > $ > H_X(Y) = -\sum_{j = 1}^{n}p_j\sum_{k = 1}^{m}p_j(k)\log_2(p_j(k)) > $ > > Invert the inequality: > $ > \sum_{j = 1}^{n} p_j \log_2(p_j) \ge \sum_{j = 1}^{n}p_j\log_2(q_j) > $ > > Consider the second layer of summation, by the inequality: > $ > \sum_{k = 1}^{m}p_j(k)\log_2(p_j(k)) \ge > \sum_{k = 1}^{m}p_j(k)\log_2(q_k) > $ > Since the outer layer of summation are identical, > $ > \begin{align*} > \sum_{j = 1}^{n}p_j\sum_{k = 1}^{m}p_j(k)\log_2(p_j(k)) &\ge > \sum_{j = 1}^{n}p_j\sum_{k = 1}^{m} p_j(k) \log_2(q_k) \\ > -\sum_{j = 1}^{n}p_j\sum_{k = 1}^{m}p_j(k)\log_2(p_j(k)) &\le > -\sum_{j = 1}^{n}p_j\sum_{k = 1}^{m} p_j(k) \log_2(q_k) \\ > H_X(Y) &\le H(Y) > \end{align*} > $ [^1]: In the context of calculating [[Information|information]], we are really using $\phi(x) = \log_2(x)$ when $x \gt 0$ and $\phi(x) = 0$ when $x = 0$.