Markov Chain - Jerry's Math Garden

> [!definition] > > A [[Stochastic Process|stochastic process]] $(X_n, n \in \integer^+)$ is known as a **Markov chain** if the [[Conditional Probability|conditional probabilities]] > $ > P(X_{n + 1} = k_{n + 1}|X_n = k_n, \cdots, X_1 = k_1, X_0 = k_0) = > P(X_{n + 1} = k_{n + 1}|X_{n} = k_n) > $ > for all $k \in S$, $n \in \nat$. > > $P(X_{n + 1} = j|X_n = i)$ is called the one-step **transition probability**, denoted as > $ > P_{ij} = P_{ij}^{n, n+1} = P(X_{n + 1} = j|X_n = i) > $ > When it is the same for all values of $n$, the change with each step does not depend on time, the Markov chain is known as *homogenous*. > [!definition] > > A square [[Matrix|matrix]] that collects together all $P_{i,j}$s is known as the **transition probability matrix** of the Markov chain. Since each entry in the matrix is a [[Probability|probability]], > - $P_{i, j} \ge 0 \forall i, j \in S$ > - $\sum_{j = 1}^{\infty}P_{i, j} = 1 \forall i \in S$ > > Any square matrix that satisfies the above conditions is known as a *stochastic* matrix, as it can be used to model a Markov chain. > [!definition] > > If for some state $s \in S$, $P_{s, s} = 1$, then it is an *absorbing* state, one that once entered, does not provide a way out. > [!theorem] > > If $(X_n, n \in \nat)$ is a Markov chain, then > $ > P(X_0 = i_0, X_1 = i+1, \cdots, X_n = i_n) = > \pi_{i_0}\prod_{k = 0}^{n - 1}P_{i_{k}, i_{k + 1}} > $ > where $P(X_0 = i_0) = \pi_{i_0}$ denotes the probability of the initial condition. > > *Proof*. By chain-product on conditional probabilities. > [!theoremb] Chapman-Kolmogorov Theorem > > The $n$-step transition matrix encodes the transition probability of the chain after $n$ steps, which can be calculated as > $ > P^{n} = P^n > $ > using [[Matrix|matrix]] multiplication. > > *Reasoning.* Given a specific state $X_i = j$, it can be represented as a column vector $\langle \cdots, 1, \cdots \rangle$ where position $j$ has probability one, which allows the extraction of the $j$-th column from any matrix with compatible size. We [[Matrix Transpose|transpose]] the transition probability matrix, such that each column represents the [[Probability Distribution|probability distribution]] of the next state given the current state, which allows the extraction of the next step through the [[Matrix Vector Product|matrix vector product]]. > > This does not apply to just one specific situation though, but given any column vector as a probability distribution, matrix vector multiplication allows calculating the probability distribution of the next state, including using column vectors from the matrix itself. Naturally, this allows the incredibly efficient calculation of $n$-step transition matrices, simply as the product of the transposes, transposed back, which comes back to the product itself. > [!definition] > > Let $X = (X_n, n \in \integer^+)$ be a Markov chain with transition probability matrix $P$. A probability [[Vector|vector]] $\rho$ is an *invariant distribution* for $X$ if > $ > \rho = \rho P > $ > which can be found as an [[Eigenvalue and Eigenvector|eigenvector]] of the transition probability matrix. > [!definition] > > Suppose that $\gamma$ is the initial distribution of a Markov chain. The chain is in *detailed balance* if the probability of starting at point $i$ and then moving to $j$ is the same as the reverse: > $ > \gamma_i P_{ij} = \gamma_j P_{ji} > $ > which represents an equilibrium. > [!definition] > > Let $X = (X_n, n \in \integer^+)$ be a Markov chain. If regardless of the initial conditions > $ > \exists \pi^{\infty} = \pi_0\limv{n}P^{n} \quad \forall \pi_0 > $ > as time goes on, the probabilities converge to the same distribution, then this Markov chain has a *limiting distribution* $\pi^\infty$. > > $\pi^\infty$ then is a probability vector representing the long-term distribution of the chain. Naturally, $\pi^\infty$ would be an invariant distribution if the limit exists. > [!definition] > > A Markov chain is *irreducible* if for any $i, j \in S$, $\exists m_1, m_2 > 0$ where $P_{ij}^{m_1} > 0$ and $P_{ij}^{m_2} > 0$. In other words, it is possible to go from an state to any other state. > [!definition] > > A Markov chain is *aperiodic* if $\exists r > 0: P^n_{ii} > 0$ for all $n \ge r$. > [!theorem] > > Let $X = (X_n, n \in \mathbb{X}^+)$ be a stationary Markov chain with transition matrix $P$ and initial distribution $\pi^0$. Then, the entropy rate > $ > h(X) = H(X_1|X_0) > $ > *Proof*. Since the process is stationary and $\pi^0$, the [[Information|information]] gained at each step is simply $H(X_1|X_0)$, which is equal to $h(X)$ by definition.