TODO: Bounded Additive Adjoint shenanigans (once I have enough background).
> [!definition]
>
> Let $(X, \cf, \bp)$ be a [[Probability|probability space]], $\mathcal{G} \subset \cf$ be a [$\sigma$-field](Sigma%20Algebra), and $X$ be a [[Random Variable|random variable]]. If $Xd\bp$ is a [[Signed Measure|signed measure]], then the **conditional expectation of** $X$ **with respect to** $\mathcal{G}$, denoted by $\ev(X|\mathcal{G})$ is a $\mathcal{G}$-[[Measurable Function|measurable]] map such that
> $
> \int_E Xd\bp = \int_E \ev(X|\mathcal{G})d\bp \quad \forall E \in \mathcal{G}
> $
> which is $\bp|_{\mathcal{G}}$ [[Almost Everywhere|a.e.]] unique.
>
> *Proof*. Since $Xd\bp$ is a signed measure and is [[Absolute Continuity|absolutely continuous]] with respect to $\bp|_\mathcal{G}$, by the [[Lebesgue-Radon-Nikodym Theorem|Radon-Nikodym theorem]], $\ev(X|\mathcal{G}) = \frac{dX\bp}{d\bp|_{\mathcal{G}}}$ is the desired function.
The conditional expectation inherits the following properties from measurable/integrable functions:
- **Monotone Convergence**: If $0 \le X_n \upto X$ almost surely, then $\ev\bracs{X_n|\mathcal{G}} \upto \ev\bracs{X|\mathcal{G}}$ almost surely.
- **Fatou's Lemma**: If $X_n \ge 0$ for all $n \in \nat$, then $\ev\bracs{\liminf X_n|\mathcal G} \le \liminf \ev\bracs{X_n|\mathcal{G}}$ almost surely.
- **Dominated Convergence:** If $X_n \to X$ almost surely and $\abs{X_n} \le Y$ almost surely for all $n$, then $\ev\bracs{X_n | \mathcal{G}} \to \ev\bracs{X|\mathcal G}$ almost surely.
- **Jensen's Inequality:** If $\varphi: \real \to \real$ is convex, then $\varphi(\ev\bracs{X|\mathcal G})\le\ev\bracs{\varphi(X)|\mathcal G}$.
> [!theorem]
>
> Let $X$ and $Y$ be discrete [[Random Variable|random variables]], $\seq{x_i}$, $\seq{y_i}$ be such that $\sum_{i \in \nat}P(X = x_i) = \sum_{i \in \nat}P(Y = y_i) = 1$, and $g: \real \to \real$.
>
> Suppose that $P(X = x) > 0$, then the **[[Conditional Probability|conditional]] [[Expectation|expectation]]** of $g(Y)$ given $X = x$ is
> $
> \ev(g(Y)|X = x_i) = \sum_{i \in \nat}g(y_i)P(Y = y_i|X = x)
> $
> [!definition]
>
> Let $X$ and $Y$ be discrete random variables. Then $\ev_X(Y)$ (depending on $X$) is the conditional expectation of $Y$ given $X$ (whatever it is), as a random variable.
>
> *Note.* $P(\ev_X(Y) = \ev(Y|X = x_i)) = P(X = x_i)$.
> [!theorem]
>
> $
> \ev(\ev_X(Y)) = \ev(Y)
> $
> *Proof*. Since a [[Probability Distribution|probability distribution]] is a [[Measure Space|measure]], addition can be converted to unions.
> $
> \begin{align*}
> \ev(\ev_X(Y)) &= \sum_{j = 1}^{n}\ev_j(Y)p_j \\
> &= \sum_{j = 1}^{n}\sum_{k = 1}^{m}y_k p_j(y_k) p_j \\
> &= \sum_{k = 1}^{m}\sum_{j = 1}^{n}y_k p_j(y_k) p_j \\
> &= \sum_{k = 1}^{m}y_k\sum_{j = 1}^{n} p_j(y_k) p_j \\
> &= \sum_{k = 1}^{m}y_k\sum_{j = 1}^{n}P_{(X = x_j)}(Y = Y_k) P(X = x_j) \\
> &= \sum_{k = 1}^{m}y_k\sum_{j = 1}^{n}\frac{P((X = x_j) \cap (Y = Y_k))}{P(X = x_j)} P(X = x_j) \\
> &= \sum_{k = 1}^{m}y_k\sum_{j = 1}^{n}P((X = x_j) \cap (Y = Y_k)) \\
> &= \sum_{k = 1}^{m}y_k P\paren{\bigcup_{j = 1}^{n} (X = x_j) \cap (Y = Y_k)} \\
> &= \sum_{k = 1}^{m}y_k P\paren{(Y = Y_k) \cap \bigcup_{j = 1}^{n} (X = x_j)} \\
> &= \sum_{k = 1}^{m}y_k P\paren{(Y = Y_k) \cap S} \\
> &= \sum_{k = 1}^{m}y_k P\paren{Y = Y_k} \\
> &= \ev(Y)
> \end{align*}
> $
> [!theorem]
>
> Let $X, Y$ be discrete random variables, $\seq{x_i}$, $\seq{y_i}$ be such that $\sum_{i \in \nat}P(X = x_i) = \sum_{i \in \nat}P(Y = y_i) = 1$, then for any $g: \real^2 \to \real$,
> $
> \ev(g(X, Y)|X = x) = \ev(g(x, Y)|X = x)
> $
> *Proof*.
> $
> \begin{align*}
> \ev(g(X, Y)|X = x) &= \sum_{z}zP(g(X, Y)|X = x) \\
> &= \sum_{z}zP(g(x, Y)|X = x) \\
> &= \ev{(g(X, Y)|X = x)}
> \end{align*}
> $
> [!theorem]
>
> Let $X$ and $Y$ be [[Probabilistic Independence|independent]] random variables with range $\{0, 1, 2, \cdots\}$. If $1 \le k \le n$, then
> $
> P_{(X + Y = n)}(X = k) = \frac{P(X = k)P(Y = n - k)}{P(X + Y = n)}
> $
>
> *Proof.* Using their independence,
> $
> \begin{align*}
> P_{(X + Y = n)}(X = k) &= \frac{P((X = k) \cap (X + Y = n))}{P(X + Y = n)} \\
> &= \frac{P((X = k) \cap (k + Y = n))}{P(X + Y = n)} \\
> &= \frac{P((X = k) \cap (Y = n - k))}{P(X + Y = n)} \\
> &= \frac{P(X = k) P(Y = n - k)}{P(X + Y = n)} \\
> \end{align*}
> $
>
[^1]: While being an expected value, since it depends on variation in $X$, it is also a random variable.
[^2]: While being an expected value, since it depends on variation in $X$, it is also a random variable.
[^3]: This bit is a little hard to chew through. It doesn't refer to the EV of Y given a specific outcome of X, but the EV of Y when *any* outcomes from X happens, or when the random variable X happens. This is because X and Y can be dependent, and the very occurrence of X, whichever outcome it is, changes Y. It is a measure of the information about Y that X contains. For example, if Y is dependent on X, then any information about X also contains information about Y. I'll stop rambling.