> [!note] Between Two Events > > $ > I(E, F) = I(E) - I_F(E) = -\log_2(P(E)) + \log_2(P_{F}(E)) > $ > The mutual information of two [[Event|events]] $E$ and $F$ is the [[Information|information]] content about one of them contained in the other, calculated as the information of one subtracted by its [[Conditional Probability|conditional]] information given the other. > > Mutual information between events may be negative, in which case the pattern between them reveal the information about the "noise" that distorts them. > [!note] Between Two Random Variables > > $ > I(X, Y) = H(Y) - H_X(Y) > $ > > The mutual information of two [[Random Variable|random variables]] $X$ and $Y$ is the [[Information|information]] content about one of them contained in the other, calculated as the [[Entropy|entropy]] of one subtracted by its [[Conditional Entropy|conditional entropy]] given the other. > [!theorem] > > Let $X$ and $Y$ be two random variables with [[Probability Distribution|probability distributions]] $\list{p}{n}$ and $\list{q}{m}$, and outcomes $\list{a}{n}$ and $\list{b}{m}$, then for each $1 \le j \le n, 1 \le k \le m$: > $ > \begin{align*} > I(a_j, b_k) &= \log_2\paren{\frac{p_{jk}}{p_jq_k}} \\ > I(a_j, b_k) &= I(b_k, a_j) \\ > I(X, Y) &= \sum_{j = 1}^{n}\sum_{k = 1}^{m}p_{jk}I(a_j, b_k) > \end{align*} > $ > > *Proof*. The conditional probability can be calculated from the [[Joint Distribution|joint]] probability $P_{X = a_j}(Y = b_k) = \frac{p_{jk}}{p_j}$, meaning that > $ > \begin{align*} > I(a_j, b_k) &= -\log_2(q_k) + \log_2(p_{j}(k)) \\ > &= \log_2\paren{\frac{p_j(k)}{q_k}} = \log_2\paren{\frac{p_{jk}}{p_jq_k}} > \end{align*} > $ > The same applies to $P_{Y = b_k}(X = a_j) = \frac{p_{jk}}{q_{k}}$, meaning that > $ > \begin{align*} > I(a_j, b_k) &= \log_2\paren{\frac{p_{jk}}{p_j q_k}} \\ > &= \log_2\paren{\frac{q_k(j)}{p_j}} \\ > &= -\log_2(p_j) + \log_2(q_k(j)) \\ > I(b_k, a_j)&= I(X = a_j) - I_{Y = b_k}(X = a_j) > \end{align*} > $ > Using the definition of mutual information, > $ > \begin{align*} > H(X, Y) &= H(Y) - H_X(Y) \\ > &= -\sum_{k = 1}^{m}q_k \log_2(q_k) + > \sum_{j = 1}^{n}p_j\sum_{k = 1}^{m}p_{j}(k)\log_2(p_{j}(k))\\ > &= -\sum_{j = 1}^{n}p_j\sum_{k = 1}^{m}q_k \log_2(q_k) + > \sum_{j = 1}^{n}\sum_{k = 1}^{m}p_{jk}\log_2(p_{j}(k))\\ > &= -\sum_{j = 1}^{n}\sum_{k = 1}^{m}p_{jk}\log_2(q_k) + > \sum_{j = 1}^{n}\sum_{k = 1}^{m}p_{jk}\log_2(p_{j}(k))\\ > &= > \sum_{j = 1}^{n}\sum_{k = 1}^{m}p_{jk}\log_2\paren{\frac{p_{j}(k)}{q_k}}\\ > &= > \sum_{j = 1}^{n}\sum_{k = 1}^{m}p_{jk}I(a_j, b_k) > \end{align*} > $ > [!theorem] > > $I(E, F) = 0$ if and only if they are independent. $I(X, Y) \ge 0$ with equality if and only if $X$ and $Y$ are [[Probabilistic Independence|independent]]. > > *Proof*. > > When $E$ and $F$ are independent, $P(E) = P_F(E)$, $I(E) = I_F(E)$, making $I(E, F) = 0$. > > Since $H(Y) \ge H_X(Y)$, $H(Y) - H_X(Y) \ge 0$. When $X$ and $Y$ are independent, $H(Y) = H_X(Y)$, and $H(Y) - H_X(Y) = 0$.