> [!abstract] Introduction
>
> Given a source alphabet of size $n$ and a [[Code|code]] alphabet of size $r$. Create a [[Set|set]] of $n$ codewords for each source symbol, with lengths $\list{l}{n}$ and the [[Frequency|frequency]] of each as a [[Probability Distribution|probability distribution]] $\list{p}{n}$. The [[Random Variable|random variable]] $L$ that represents the length of a codeword produces the [[Expectation|expected value]]
> $
> \ev(L) = \sum_{j = 1}^{n}p_j l_j
> $
> that represents the average length of the code word.
>
> A prefix-free code is *optimal* if it [[Extrema|minimises]] $\ev(L)$.
> [!theoremb] Noiseless Coding Theorem
>
> Define the [[Entropy|entropy]] of the source alphabet $S$ as $H(S) = -\sum_{j = 1}^{n}p_j \log_2(p_j)$. For any prefix-free code with $r$ code symbols,
> $
> \ev(L) \ge \frac{H(S)}{\log_2(r)}
> $
> Meaning that the maximum average [[Information|information]] transmitted by one code symbol is the entropy of a [[Uniform Distribution|uniform distribution]] with $r$ possible values.
> $
> \frac{H(S)}{\ev(L)} \le \log_2(r)
> $
>
> There exists a prefix-free code that satisfies
> $
> \frac{H(S)}{\log_2(r)} \le \ev(L) \lt \frac{H(S)}{\log_2(r)} + 1
> $
>
> *Proof.* If the probability that each code symbol appears is uniform, then the probability distribution of each codeword $\list{q}{n}$ would be
> $
> q_j = \frac{e^{-l_j}}{\sum_{i = 1}^{n}r^{-l_i}}
> $
> By the [[Gibbs Inequality|Gibbs inequality]],
> $
> \begin{align*}
> H(S) = -\sum_{j = 1}^{n}p_j \log_2(p_j) &\le -\sum_{j = 1}^{n}p_j \log_2(q_j) \\
> &= -\sum_{j = 1}^{n}p_j \log_2\paren{r^{-l_j}} + \sum_{j = 1}^{n}p_j\log_2\paren{\sum_{i = 1}^{n}r^{-l_i}}
> \end{align*}
> $
>
> By the [[Kraft-McMillan Inequality|Kraft-McMillan inequality]], $\sum_{i = 1}^{n}r^{-l_i} \le 1$, $\log_2\paren{\sum_{i = 1}^{n}r^{-l_i} \le 1} \le 0$, meaning that:
> $
> \begin{align*}
> H(S) &\le -\sum_{j = 1}^{n}p_j \log_2\paren{r^{-l_j}} \\
> H(S) &\le \sum_{j = 1}^{n}p_j l_j\log_2(r) \\
> H(S) &\le \ev(L)\log_2(r)
> \end{align*}
> $
>
> By the Gibbs inequality and the Kraft-McMillan inequalities, this would only be equal if the probability that each code symbol appears is uniform ($p = q$).
>
> For the right-hand side, construct code words such that the lengths satisfy
> $
> r^{-l_j} \le p_j \lt r^{-l_j + 1}
> $
> Meaning that each code word should convey as much information as a random word with one less code symbol (otherwise the extra symbol would be unnecessary). Since this satisfies the [[Kraft-McMillan Inequality]], it is a prefix-free code. Take logarithms on both sides,
> $
> -l_j \log_2(r) \le \log_2(p_j) \lt (-l_j + 1)\log_2(r)
> $
>
> Then multiply both sides by $p_j$ and sum over $j$:
> $
> \begin{align*}
> H(S) = -\sum_{j = 1}^{n}p_j \log_2(p_j) &\gt \sum_{j = 1}^{n}p_j(l_j - 1)\log_2(r) \\
> H(S) &\gt (\ev(L) - 1)\log_2(r) \\
> \frac{H(S)}{\log_2(r)} &\gt \ev(L) - 1 \\
> \ev(L) &\lt \frac{H(S)}{\log_2(r)} + 1 \\
> \end{align*}
> $
> [!theorem] Corollary
>
> Given any $\varepsilon \gt 0$ (no matter how small), there exists a prefix free code such that
> $
> \frac{H(S)}{\log_2(r)} \le \frac{\ev(L^{(m)})}{m} \lt \frac{H(S)}{\log_2(r)} + \epsilon
> $
>
> *Proof*. Since $\varepsilon$ exists, there exists a $m \in \nat$ such that $\frac{1}{m} \lt \varepsilon$. Choosing one such $m$ and using a [[Block Coding|block code]] with the $m$-th extension of $S$, by the noiseless coding theorem,
> $
> \begin{align*}
> \frac{H(S^{(m)})}{\log_2(r)} &\le \ev(L^{(m)}) \lt \frac{H(S^{(m)})}{\log_2(r)} + 1 \\
> \frac{mH(S)}{\log_2(r)} &\le \ev(L^{(m)}) \lt \frac{mH(S)}{\log_2(r)} + 1\\
> \frac{H(S)}{\log_2(r)} &\le \frac{\ev(L^{(m)})}{m} \lt \frac{H(S)}{\log_2(r)} + \frac{1}{m}\\
> \frac{H(S)}{\log_2(r)} &\le \frac{\ev(L^{(m)})}{m} \lt \frac{H(S)}{\log_2(r)} + \varepsilon
> \end{align*}
> $