> [!note] Between Two Events
>
> $
> I(E, F) = I(E) - I_F(E) = -\log_2(P(E)) + \log_2(P_{F}(E))
> $
> The mutual information of two [[Event|events]] $E$ and $F$ is the [[Information|information]] content about one of them contained in the other, calculated as the information of one subtracted by its [[Conditional Probability|conditional]] information given the other.
>
> Mutual information between events may be negative, in which case the pattern between them reveal the information about the "noise" that distorts them.
> [!note] Between Two Random Variables
>
> $
> I(X, Y) = H(Y) - H_X(Y)
> $
>
> The mutual information of two [[Random Variable|random variables]] $X$ and $Y$ is the [[Information|information]] content about one of them contained in the other, calculated as the [[Entropy|entropy]] of one subtracted by its [[Conditional Entropy|conditional entropy]] given the other.
> [!theorem]
>
> Let $X$ and $Y$ be two random variables with [[Probability Distribution|probability distributions]] $\list{p}{n}$ and $\list{q}{m}$, and outcomes $\list{a}{n}$ and $\list{b}{m}$, then for each $1 \le j \le n, 1 \le k \le m$:
> $
> \begin{align*}
> I(a_j, b_k) &= \log_2\paren{\frac{p_{jk}}{p_jq_k}} \\
> I(a_j, b_k) &= I(b_k, a_j) \\
> I(X, Y) &= \sum_{j = 1}^{n}\sum_{k = 1}^{m}p_{jk}I(a_j, b_k)
> \end{align*}
> $
>
> *Proof*. The conditional probability can be calculated from the [[Joint Distribution|joint]] probability $P_{X = a_j}(Y = b_k) = \frac{p_{jk}}{p_j}$, meaning that
> $
> \begin{align*}
> I(a_j, b_k) &= -\log_2(q_k) + \log_2(p_{j}(k)) \\
> &= \log_2\paren{\frac{p_j(k)}{q_k}} = \log_2\paren{\frac{p_{jk}}{p_jq_k}}
> \end{align*}
> $
> The same applies to $P_{Y = b_k}(X = a_j) = \frac{p_{jk}}{q_{k}}$, meaning that
> $
> \begin{align*}
> I(a_j, b_k) &= \log_2\paren{\frac{p_{jk}}{p_j q_k}} \\
> &= \log_2\paren{\frac{q_k(j)}{p_j}} \\
> &= -\log_2(p_j) + \log_2(q_k(j)) \\
> I(b_k, a_j)&= I(X = a_j) - I_{Y = b_k}(X = a_j)
> \end{align*}
> $
> Using the definition of mutual information,
> $
> \begin{align*}
> H(X, Y) &= H(Y) - H_X(Y) \\
> &= -\sum_{k = 1}^{m}q_k \log_2(q_k) +
> \sum_{j = 1}^{n}p_j\sum_{k = 1}^{m}p_{j}(k)\log_2(p_{j}(k))\\
> &= -\sum_{j = 1}^{n}p_j\sum_{k = 1}^{m}q_k \log_2(q_k) +
> \sum_{j = 1}^{n}\sum_{k = 1}^{m}p_{jk}\log_2(p_{j}(k))\\
> &= -\sum_{j = 1}^{n}\sum_{k = 1}^{m}p_{jk}\log_2(q_k) +
> \sum_{j = 1}^{n}\sum_{k = 1}^{m}p_{jk}\log_2(p_{j}(k))\\
> &=
> \sum_{j = 1}^{n}\sum_{k = 1}^{m}p_{jk}\log_2\paren{\frac{p_{j}(k)}{q_k}}\\
> &=
> \sum_{j = 1}^{n}\sum_{k = 1}^{m}p_{jk}I(a_j, b_k)
> \end{align*}
> $
> [!theorem]
>
> $I(E, F) = 0$ if and only if they are independent. $I(X, Y) \ge 0$ with equality if and only if $X$ and $Y$ are [[Probabilistic Independence|independent]].
>
> *Proof*.
>
> When $E$ and $F$ are independent, $P(E) = P_F(E)$, $I(E) = I_F(E)$, making $I(E, F) = 0$.
>
> Since $H(Y) \ge H_X(Y)$, $H(Y) - H_X(Y) \ge 0$. When $X$ and $Y$ are independent, $H(Y) = H_X(Y)$, and $H(Y) - H_X(Y) = 0$.