> [!definition]
>
> Let $X$ and $Y$ be two random [[Variable|variables]] whose range are of the same size, their relative [[Entropy|entropy]]/[[Information|information]] theoretical distance is a measure of the difference between them. Define the relative entropy, $D(X, Y)$ of $Y$ from $X$[^1] by
> $
> D(X, Y) = \sum_{j = 1}^{n}p_j \log_2(p_j) - \sum_{j = 1}^{n}p_j \log_2(q_j)
> = \sum_{j = 1}^{n}p_j \log_2\paren{\frac{p_j}{q_j}}
> $
> [!theorem]
>
> $D(X, Y) \ge 0$ with equality if and only if $X$ and $Y$ are identically [[Probability Distribution|distributed]].
>
> *Proof*. Using [[Gibbs Inequality]],
> $
> \begin{align*}
> -\sum_{j = 1}^{n}p_j \log_2(p_j) &\le -\sum_{j =1}^{n}p_j \log_2(q_j) \\
> 0 &\le \sum_{j = 1}^{n}p_j \log_2(p_j) - \sum_{j =1}^{n}p_j \log_2(q_j) \\
> D(X, Y) &\ge 0
> \end{align*}
> $
> [!theorem]
>
> If $Y$ has a [[Uniform Distribution|uniform distribution]], then,
> $
> D(X, Y) = \log_2(n) - H(X)
> $
>
> *Proof*.
> $
> \begin{align*}
> D(X, Y) &= \sum_{j = 1}^{n}p_j \log_2(p_j) - \sum_{j = 1}^{n}p_j \log_2(q_j) \\
> D(X, Y) &= -H(X) - \sum_{j = 1}^{n}p_j \log_2\paren{\frac{1}{n}} \\
> D(X, Y) &= -H(X) - \log_2\paren{\frac{1}{n}}\sum_{j = 1}^{n}p_j \\
> D(X, Y) &= -H(X) + \log_2(n) \\
> D(X, Y) &= \log_2(n) - H(X)
> \end{align*}
> $
> [!theorem]
>
> Let $W$ be a random [[Vector|vector]] $(X, Y)$, with its distribution being the [[Joint Distribution|joint distribution]] of $X$ and $Y$, and let $Z$ be the random vector with its distribution being the joint distribution of $X$ and $Y$ if they were independent ($p_{jk} = p_j q_k$), then the relative entropy of $Z$ from $W$ is the [[Mutual Information|mutual information]] of $X$ and $Y$.
> $
> D(W, Z) = I(X, Y)
> $
>
> *Proof.*
> $
> \begin{align*}
> D(W, Z) &= \sum_{j = 1}^{n}\sum_{k = 1}^{m}p_{jk} \log_2(p_{jk})
> - \sum_{j = 1}^{n}\sum_{k = 1}^{m}p_{jk} \log_2(q_{jk}) \\
> &= \sum_{j = 1}^{n}\sum_{k = 1}^{m}p_{jk} \log_2(p_{jk})
> - \sum_{j = 1}^{n}\sum_{k = 1}^{m}p_{jk} \log_2(p_jq_k) \\
> &= -H(X, Y)
> - \sum_{j = 1}^{n}\sum_{k = 1}^{m}p_{jk} \log_2(p_j)
> - \sum_{j = 1}^{n}\sum_{k = 1}^{m}p_{jk} \log_2(q_k)\\
> &= -H(X, Y) + H(X) + H(Y)\\
> &= - H_X(Y) - H(X) + H(X) + H(Y)\\
> &= H(Y) - H_X(Y) = I(X, Y)
> \end{align*}
> $
[^1]: This measure is not symmetrical! $D(X, Y) \ne D(Y, X)$