> [!note] Code > > A code is a mapping from a source [[Set|alphabet]] to a *codeword*, a succession of one or multiple *code symbols*. A code message is a succession of one or more codewords. > > Ideally, the codes must have the following properties: > 1. Each symbol of the source alphabet should be mapped to a unique codeword. > 2. The code should be *uniquely decodable*: any finite code message should correspond to one and only one message. > 3. Instantaneous (prefix-free): let $c_1 c_2 c_3 \cdots c_n$ be a codeword, and suppose that for some $k \le n$, $c_1 c_2 \cdots c_k$ is also a codeword, then the latter is a *prefix* of the former. > - The possibility of prefix words means that the receiver must wait for future code symbols before decoding. > - The lack of prefix words allows the receiver to instantly decipher the code. > [!example] Binary Code > > $ C = \{0, 1\} $ > [!example] Genetic Code > > The genetic code maps 20 different amino acids and the *stop* command to 64 codewords (3 base pairs) made up of the code alphabet $\{A, T, C, G\}$. Each code message encodes a specific protein. > [!theorem] Lemma > > Any prefix-free code is uniquely decodable. > > *Proof*. By contradiction. > > Let $d_1 d_2 \cdots d_n$ be a code message composed of codewords $d_1, d_2, \cdots, d_n$. Suppose that the ambiguity arises via the combination $d_{k - 1}d_k d_{k + 1}$ ($2 \le k \le n - 1$). If $d_{k - 1}d_k d_{k + 1}$ is ambiguous, then $d_{k - 1}d_k d_{k + 1}$ must be a codeword, or that $d_{k - 1}c_{k1} c_{k2} c_{k3} \cdots$ is a codeword. In any case, $d_{k - 1}$ would be a prefix, meaning that prefix-free codes are uniquely decodable.