By the [[Noiseless Coding Theorem|noiseless coding theorem]], [[Extrema|minimising]] the [[Expectation|average]] codeword length $\ev(L)$ to be close to $\frac{H(S)}{\log_2(r)}$ allows the [[Code|code]] to contain the most [[Information|information]] per symbol. Instead of coding the source alphabet $S$, this technique codes the $m$-*th extension* $S^{(m)}$, which comprises all the source letters taken $m$ at a time, then create a prefix free code for each extension.
$
\begin{align*}
S^{(1)} &= \{A, B, C\} \\
S^{(2)} &= \{AA, AB, AC, BA, BB, BC, CA, CB, CA\} \\
&\cdots
\end{align*}
$
Let $S_j$ be the [[Random Variable|random variable]] that represents the $j$-th symbol in the block, with range $\list{a}{n}$. Take $\list{S}{n}$ be [[Probabilistic Independence|independent]] identically [[Probability Distribution|distributed]] random variables, such that the source probability of each block $c_1 c_2 \cdots c_n$ is $p_1 p_2 \cdots p_n$.
Let $L^{(m)}$ be random variables whose values are the lengths of the codewords that code $S^{(m)}$. Since symbols are coded in groups of $m$, $\frac{\ev(L^{(m)})}{m}$ is the average length of codeword per symbol.
The average length of each codeword approaches the optimal value as $m$ becomes large. However, block coding becomes overly complicated as blocks increase in size.