> [!definition] > > The *conditional entropy* of a [[Random Variable|random variable]] $Y$ *given* another random variable $X = x_j$, $H_j(Y)$, is the amount of [[Entropy|entropy]] (uncertainty) about $Y$ if the $X$ takes a particular value, calculated as > $ > H_j(Y) = H(\tilde{Y}_j) = -\sum_{k = 1}^{m}p_j(y_k)\log_2(p_j(y_k)) > $ > where $p_j(m)$ is the [[Conditional Probability|conditional probability]] $P_{X = x_j}(Y = y_k)$. > - If $p_j = 0$ such that $p_j(y_k)$ is undefined for all $k$, then $X$ never occurs with $Y$ and does not encode any information about it, meaning that $H_j(Y) = H(Y)$. > > Consider the random variable $H.(Y)$, with range $\{H_1(Y), H_2(Y), \cdots, H_n(Y)\}$ with the [[Probability Distribution|probability distribution]] of $X$ ($\{p_1, p_2, \cdots, p_n\}$). The *conditional entropy* of $Y$ *given* $X$ $H_X(Y)$ is the amount of entropy about $Y$ if the value of $X$ is known (whatever it is), the information content of $Y$ not contained in $X$, calculated as > $ > \begin{align*} > H_X(Y) &= \ev(H.(Y)) = \sum_{j = 1}^{n}p_j H_j(Y) \\ > &= -\sum_{j = 1}^{n}p_j \sum_{k = 1}^{m}p_j(y_k)\log_2(p_j(y_k)) \\ > &= -\sum_{j = 1}^{n}\sum_{k = 1}^{m} p_j\ p_j(y_k)\log_2(p_j(y_k)) > \end{align*} > $ > [!definition] > > $ > \begin{align*} > H(Y|X_1, X_2, \cdots, X_n) &= -\sum_{j, i_*}P(Y = j, X_1 = i_1,\cdots, X_n = i_n) \\ > &\times\log(P(Y = j, X_1 = i_1, \cdots, X_n = i_n)) > \end{align*} > $ > The conditional entropy of a random variable $Y$ given multiple other random variables indexed by $X$ is the remaining entropy of $Y$ after measuring all $X$s. > [!theorem] > > If $X$ and $Y$ are independent, then > $ > H_X(Y) = H(Y) > $ > > *Proof*. Using the fact that the [[Conditional Probability|conditional probability]] $P_A(B) = P(B)$ is a product for independent events: > $ > p_j(y_k) = P_{X = x_j}(Y = y_k) = P(Y = y_k) = q_k > $ > $ > \begin{align*} > H_X(Y) &= -\sum_{j = 1}^{n}\sum_{k = 1}^{m} p_j\ p_j(y_k)\log_2(p_j(y_k)) \\ > H_X(Y) &= -\sum_{j = 1}^{n}\sum_{k = 1}^{m} p_jq_k\log_2(q_k) \\ > H_X(Y) &= -\sum_{k = 1}^{m} q_k\log_2(q_k) \cdot \sum_{j = 1}^{n}p_j \\ > H_X(Y) &= -\sum_{k = 1}^{m} q_k\log_2(q_k) \\ > &= H(Y) > \end{align*} > $