> [!definition]
>
> Let $\list{p}{n}$ and $\list{q}{n}$ be two [[Probability Distribution|probability distributions]] of the same size. Then,
> $
> -\sum_{j = 1}^{n} p_j \log_2(p_j) \le -\sum_{j = 1}^{n}p_j\log_2(q_j)
> $
> With equality if and only if each $p_j = q_j$.
>
> *Proof*.
>
> When $p_j \gt 0$, using the fact that $\ln(x) \le x - 1$:
> $
> \begin{align*}
> -\sum_{j = 1}^{n}p_j\log_2\paren{\frac{q_j}{p_j}} &\ge -\sum_{j = 1}^{n}p_j\paren{\frac{q_j}{p_j} - 1}\\
> -\sum_{j = 1}^{n}p_j(\log_2 q_j - \log_2 p_j)
> &\ge -\sum_{j = 1}^{n}(q_j - p_j)\\
> -\sum_{j = 1}^{n}p_j\log_2 q_j +\sum_{j = 1}^{n}p_j\log_2 p_j
> &\ge -\sum_{j = 1}^{n}q_j + \sum_{j = 1}^{n}p_j = -1+1 = 0\\
> -\sum_{j = 1}^{n}p_j\log_2 q_j
> &\ge -\sum_{j = 1}^{n}p_j\log_2 p_j
> \end{align*}
> $
>
> For cases where $p_j = 0$, the expression $p_j \log_2(p_j)$ and $q_j \log_2(q_j)$ are both 0[^1], which does not affect the inequality.
>
> Consider a situation with $n = 2$ where $p_1 \log_2(p_1) = p_1 \log_2(q_1)$ and $p_2 \log_2(p_2) = p_2 \log_2(q_2)$, meaning that
> $
> p_1 \log_2(p_1) + p_2 \log_2(p_2) = p_1 \log_2(q_1) + p_2 \log_2(q_2)
> $
> Since $p$ and $q$ are probability distributions, $p_1 + p_2 = 1$ and $q_1 + q_2 = 1$, we can simplify the equation to:
> $
> p\log_2(p) + (1-p)\log_2(1-p) = p\log_2(q) + (1-p)\log_2(1-q)
> $
>
> According to the inequality part of the theorem,
> $
> p\log_2(p) + (1-p)\log_2(1-p) \ge p\log_2(q) + (1-p)\log_2(1-q)
> $
> Because the [[Logarithmic Function|natural logarithm]] is [[Continuity|continuous]], the equality must be at an [[Extrema|local maximum]] of the right-hand side, solving for that point gives:
> $
> \begin{align*}
> \frac{d}{dq}p\log_2(q) + (1 - p)\log_2(1 - q) &= 0 \\
> \frac{d}{dq}p\ln(q) + (1 - p)\ln(1 - q) &= 0 \\
> \frac{p}{q} - \frac{1 - p}{1 - q} &= 0 \\
> \frac{p}{q} &= \frac{1 - p}{1 - q} \\
> \frac{p}{1 - p} &= \frac{q}{1 - q} \\
> p &= q
> \end{align*}
> $
>
> This shows that for $n = 2$, the inequality is equal if and only if the distributions are identical. By induction, if the values of all but two values $q_a$ and $q_b$ is known, the above proof still holds by swapping all occurrences of $1$ with $s = q_a + q_b = p_a + p_b$. If $q_a + q_b \ne p_a + p_b$, then the equation only has solution $p = q = 0$.
> [!theorem]
>
> Given two [[Random Variable|random variables]] $X$ and $Y$, the [[Conditional Entropy|conditional entropy]] (information about $Y$ encoded in $X$) of $Y$ given $X$, $H_X(Y)$ is always less than or equal to its [[Entropy|entropy]].
> $
> H_X(Y) \le H(Y)
> $
>
> *Proof*. Using the above theorem. Let $\list{p}{n}$ be the distribution of $X$, and let $\list{q}{m}$ be the distribution of $Y$. Begin by expanding the entropy of $Y$.
> $
> \begin{align*}
> H(Y) &= -\sum_{k = 1}^{m}q_k \log_2(q_k) \\
> &= -\sum_{j = 1}^{n}\sum_{k = 1}^{m}q_{jk} \log_2(q_k)\\
> &= -\sum_{j = 1}^{n}\sum_{k = 1}^{m} p_j \cdot p_j(k) \log_2(q_k)\\
> &= -\sum_{j = 1}^{n}p_j\sum_{k = 1}^{m} p_j(k) \log_2(q_k)
> \end{align*}
> $
> And the conditional entropy of $Y$ given $X$.
> $
> H_X(Y) = -\sum_{j = 1}^{n}p_j\sum_{k = 1}^{m}p_j(k)\log_2(p_j(k))
> $
>
> Invert the inequality:
> $
> \sum_{j = 1}^{n} p_j \log_2(p_j) \ge \sum_{j = 1}^{n}p_j\log_2(q_j)
> $
>
> Consider the second layer of summation, by the inequality:
> $
> \sum_{k = 1}^{m}p_j(k)\log_2(p_j(k)) \ge
> \sum_{k = 1}^{m}p_j(k)\log_2(q_k)
> $
> Since the outer layer of summation are identical,
> $
> \begin{align*}
> \sum_{j = 1}^{n}p_j\sum_{k = 1}^{m}p_j(k)\log_2(p_j(k)) &\ge
> \sum_{j = 1}^{n}p_j\sum_{k = 1}^{m} p_j(k) \log_2(q_k) \\
> -\sum_{j = 1}^{n}p_j\sum_{k = 1}^{m}p_j(k)\log_2(p_j(k)) &\le
> -\sum_{j = 1}^{n}p_j\sum_{k = 1}^{m} p_j(k) \log_2(q_k) \\
> H_X(Y) &\le H(Y)
> \end{align*}
> $
[^1]: In the context of calculating [[Information|information]], we are really using $\phi(x) = \log_2(x)$ when $x \gt 0$ and $\phi(x) = 0$ when $x = 0$.