Conditional Expectation - Jerry's Math Garden

TODO: Bounded Additive Adjoint shenanigans (once I have enough background). > [!definition] > > Let $(X, \cf, \bp)$ be a [[Probability|probability space]], $\mathcal{G} \subset \cf$ be a [$\sigma$-field](Sigma%20Algebra), and $X$ be a [[Random Variable|random variable]]. If $Xd\bp$ is a [[Signed Measure|signed measure]], then the **conditional expectation of** $X$ **with respect to** $\mathcal{G}$, denoted by $\ev(X|\mathcal{G})$ is a $\mathcal{G}$-[[Measurable Function|measurable]] map such that > $ > \int_E Xd\bp = \int_E \ev(X|\mathcal{G})d\bp \quad \forall E \in \mathcal{G} > $ > which is $\bp|_{\mathcal{G}}$ [[Almost Everywhere|a.e.]] unique. > > *Proof*. Since $Xd\bp$ is a signed measure and is [[Absolute Continuity|absolutely continuous]] with respect to $\bp|_\mathcal{G}$, by the [[Lebesgue-Radon-Nikodym Theorem|Radon-Nikodym theorem]], $\ev(X|\mathcal{G}) = \frac{dX\bp}{d\bp|_{\mathcal{G}}}$ is the desired function. The conditional expectation inherits the following properties from measurable/integrable functions: - **Monotone Convergence**: If $0 \le X_n \upto X$ almost surely, then $\ev\bracs{X_n|\mathcal{G}} \upto \ev\bracs{X|\mathcal{G}}$ almost surely. - **Fatou's Lemma**: If $X_n \ge 0$ for all $n \in \nat$, then $\ev\bracs{\liminf X_n|\mathcal G} \le \liminf \ev\bracs{X_n|\mathcal{G}}$ almost surely. - **Dominated Convergence:** If $X_n \to X$ almost surely and $\abs{X_n} \le Y$ almost surely for all $n$, then $\ev\bracs{X_n | \mathcal{G}} \to \ev\bracs{X|\mathcal G}$ almost surely. - **Jensen's Inequality:** If $\varphi: \real \to \real$ is convex, then $\varphi(\ev\bracs{X|\mathcal G})\le\ev\bracs{\varphi(X)|\mathcal G}$. > [!theorem] > > Let $X$ and $Y$ be discrete [[Random Variable|random variables]], $\seq{x_i}$, $\seq{y_i}$ be such that $\sum_{i \in \nat}P(X = x_i) = \sum_{i \in \nat}P(Y = y_i) = 1$, and $g: \real \to \real$. > > Suppose that $P(X = x) > 0$, then the **[[Conditional Probability|conditional]] [[Expectation|expectation]]** of $g(Y)$ given $X = x$ is > $ > \ev(g(Y)|X = x_i) = \sum_{i \in \nat}g(y_i)P(Y = y_i|X = x) > $ > [!definition] > > Let $X$ and $Y$ be discrete random variables. Then $\ev_X(Y)$ (depending on $X$) is the conditional expectation of $Y$ given $X$ (whatever it is), as a random variable. > > *Note.* $P(\ev_X(Y) = \ev(Y|X = x_i)) = P(X = x_i)$. > [!theorem] > > $ > \ev(\ev_X(Y)) = \ev(Y) > $ > *Proof*. Since a [[Probability Distribution|probability distribution]] is a [[Measure Space|measure]], addition can be converted to unions. > $ > \begin{align*} > \ev(\ev_X(Y)) &= \sum_{j = 1}^{n}\ev_j(Y)p_j \\ > &= \sum_{j = 1}^{n}\sum_{k = 1}^{m}y_k p_j(y_k) p_j \\ > &= \sum_{k = 1}^{m}\sum_{j = 1}^{n}y_k p_j(y_k) p_j \\ > &= \sum_{k = 1}^{m}y_k\sum_{j = 1}^{n} p_j(y_k) p_j \\ > &= \sum_{k = 1}^{m}y_k\sum_{j = 1}^{n}P_{(X = x_j)}(Y = Y_k) P(X = x_j) \\ > &= \sum_{k = 1}^{m}y_k\sum_{j = 1}^{n}\frac{P((X = x_j) \cap (Y = Y_k))}{P(X = x_j)} P(X = x_j) \\ > &= \sum_{k = 1}^{m}y_k\sum_{j = 1}^{n}P((X = x_j) \cap (Y = Y_k)) \\ > &= \sum_{k = 1}^{m}y_k P\paren{\bigcup_{j = 1}^{n} (X = x_j) \cap (Y = Y_k)} \\ > &= \sum_{k = 1}^{m}y_k P\paren{(Y = Y_k) \cap \bigcup_{j = 1}^{n} (X = x_j)} \\ > &= \sum_{k = 1}^{m}y_k P\paren{(Y = Y_k) \cap S} \\ > &= \sum_{k = 1}^{m}y_k P\paren{Y = Y_k} \\ > &= \ev(Y) > \end{align*} > $ > [!theorem] > > Let $X, Y$ be discrete random variables, $\seq{x_i}$, $\seq{y_i}$ be such that $\sum_{i \in \nat}P(X = x_i) = \sum_{i \in \nat}P(Y = y_i) = 1$, then for any $g: \real^2 \to \real$, > $ > \ev(g(X, Y)|X = x) = \ev(g(x, Y)|X = x) > $ > *Proof*. > $ > \begin{align*} > \ev(g(X, Y)|X = x) &= \sum_{z}zP(g(X, Y)|X = x) \\ > &= \sum_{z}zP(g(x, Y)|X = x) \\ > &= \ev{(g(X, Y)|X = x)} > \end{align*} > $ > [!theorem] > > Let $X$ and $Y$ be [[Probabilistic Independence|independent]] random variables with range $\{0, 1, 2, \cdots\}$. If $1 \le k \le n$, then > $ > P_{(X + Y = n)}(X = k) = \frac{P(X = k)P(Y = n - k)}{P(X + Y = n)} > $ > > *Proof.* Using their independence, > $ > \begin{align*} > P_{(X + Y = n)}(X = k) &= \frac{P((X = k) \cap (X + Y = n))}{P(X + Y = n)} \\ > &= \frac{P((X = k) \cap (k + Y = n))}{P(X + Y = n)} \\ > &= \frac{P((X = k) \cap (Y = n - k))}{P(X + Y = n)} \\ > &= \frac{P(X = k) P(Y = n - k)}{P(X + Y = n)} \\ > \end{align*} > $ > [^1]: While being an expected value, since it depends on variation in $X$, it is also a random variable. [^2]: While being an expected value, since it depends on variation in $X$, it is also a random variable. [^3]: This bit is a little hard to chew through. It doesn't refer to the EV of Y given a specific outcome of X, but the EV of Y when *any* outcomes from X happens, or when the random variable X happens. This is because X and Y can be dependent, and the very occurrence of X, whichever outcome it is, changes Y. It is a measure of the information about Y that X contains. For example, if Y is dependent on X, then any information about X also contains information about Y. I'll stop rambling.