Gibbs Distribution - Jerry's Math Garden

> [!definition] > > ![[gibbs.png]] > $ > P(X = x_j) = \frac{\exp(\mu^\prime x_j)}{Z(\mu^\prime)} \quad \mu^\prime = \mu\ln(2) \quad Z(\mu^\prime) = \sum_{j = 1}^{n}\exp(\mu^\prime x_j) > $ > The Gibbs distribution is the *assumed* [[Probability Distribution|probability distribution]] of a [[Random Variable|random variable]] if *only* its range and its [[Expectation|expected value]] is [[Information|known]], derived from the [[Principle of Maximum Entropy|principle of maximum entropy]]. The parameter $\mu$ ($\mu^\prime = \mu \ln(2)$) represents the "skewedness" of the distribution to the left or right, and as it [[Limit|approaches]] infinity, only the most extreme outcome would be possible. > [!theorem] Reasoning > > Let $X$ be a random variable with range $\{x_1, x_2, \cdots, x_n\}$ and unknown probability distribution $\{p_1, p_2, \cdots, p_n\}$. Suppose that the only information given about $X$ is its [[Expectation|expected value]] $\ev(X) = E$. If $E \ne \frac{1}{n}\sum_{j = 1}^{n}x_j$, then $X$ cannot have a [[Uniform Distribution|uniform distribution]]. > > In this case, to find the distribution that still preserves the principle of maximum entropy, optimise $H(X)$ under the following constraints: > - $p$ is a probability distribution ($\sum_{j = 1}^{n}p_j - 1 = 0$) > - $\ev(X) = E \Leftrightarrow \sum_{j = 1}^{n}p_j x_j - E = 0$. > > Using [[Lagrange Multipliers|Lagrange multipliers]], create the Lagrangian function: > $ > \mathcal{L}(\vec{p}, \lambda, \mu) > = -\sum_{j = 1}^{n}p_j \log_2(p_j) + \lambda\paren{\sum_{j = 1}^{n}p_j - 1} + \mu\paren{\sum_{j = 1}^{n}p_j x_j - E} > $ > > Solve the following $(n + 2)$ equations corresponding to $\nabla \mathcal{L} = \vec{0}$: > $ > \begin{align*} > \frac{\partial \mathcal{L}}{\partial p_j} &= > -\frac{\ln p_j + 1}{\ln 2} + \lambda + \mu x_j = 0 \\ > \frac{\partial \mathcal{L}}{\partial \lambda} &= \sum_{j = 1}^{n}p_j - 1 = 0\\ > \frac{\partial \mathcal{L}}{\partial \mu} &= \sum_{j = 1}^{n}p_j x_j - E = 0 > \end{align*} > $ > > > Isolating $p_j$ in the equations gives: > $ > \begin{align*} > \frac{\ln p_j + 1}{\ln 2} &= \lambda + \mu x_j \\ > \ln p_j + 1 &= \lambda\ln 2 + \mu x_j\ln 2 \\ > \ln p_j &= \lambda\ln 2 + \mu x_j\ln 2 - 1 \\ > p_j &= \exp\paren{\lambda\ln 2 + \mu x_j\ln 2 - 1} \\ > p_j &= \exp\paren{\lambda^\prime + \mu^\prime x_j} \quad \lambda^\prime = \lambda\ln 2 - 1 \quad \mu^\prime = \mu \ln2\\ > \end{align*} > $ > > Solving for $\lambda^\prime$ using the first constraint: > $ > \begin{align*} > \sum_{j = 1}^{n}p_j &= 1 \\ > \sum_{j = 1}^{n}\exp(\lambda^\prime + \mu^\prime x_j) &= 1 \\ > \exp(\lambda^\prime)\sum_{j = 1}^{n}\exp(\mu^\prime x_j) &= 1 \\ > \exp(\lambda^\prime) &= \frac{1}{\sum_{j = 1}^{n}\exp(\mu^\prime x_j)} \\ > \lambda^\prime &= \ln\frac{1}{\sum_{j = 1}^{n}\exp(\mu^\prime x_j)} \\ > \lambda^\prime &= -\ln\sum_{j = 1}^{n}\exp(\mu^\prime x_j) \\ > \lambda^\prime &= -\ln Z(\mu^\prime) \quad Z(\mu^\prime) = \sum_{j = 1}^{n}\exp(\mu^\prime x_j) > \end{align*} > $ > > Plugging the result back in provides: > $ > \begin{align*} > p_j &= \exp\paren{\lambda^\prime + \mu^\prime x_j} \\ > &= \exp(\lambda^\prime)\exp(\mu^\prime x_j) \\ > &= \exp(-\ln Z(\mu^\prime))\exp(\mu^\prime x_j) \\ > &= \frac{\exp(\mu^\prime x_j)}{Z(\mu^\prime)} > \end{align*} > $ > > The $\mu^\prime$ value can be calculated in terms of $E$ by solving for it using the second constraint. > [!theoremb] Special Case where $x_j = j\quad \forall 0 \le j \le n$ > > While the equation is very messy for random variables with arbitrary ranges, if the range is specifically $\{0, 1, \cdots, n\}$, the sum can be substantially clarified. > > Starting with the partition function $Z(\mu^\prime)$, which follows a [[Geometric Series|geometric series]] that can be simplified using a partial sum formula: > $ > Z(\mu^\prime) = \sum_{j = 0}^{n}\exp(\mu^\prime j) = > \sum_{j = 0}^{n}\exp(\mu^\prime)^j = > \frac{1 - \exp((n + 1)\mu^\prime)}{1 - \exp(\mu^\prime)} > $ > > Move on to the equation for the expected value: > $ > \begin{align*} > E &= \sum_{j = 0}^{n}x_j p_j \\ > &= \sum_{j = 0}^{n}x_j\frac{\exp(\mu^\prime x_j)}{Z(\mu^\prime)} \\ > &= \frac{1}{Z(\mu^\prime)}\sum_{j = 0}^{n}j \exp(j\mu^\prime) \\ > &= \frac{1}{Z(\mu^\prime)}\sum_{j = 0}^{n}\frac{d}{d\mu^\prime} \exp(j\mu^\prime) \\ > &= \frac{1}{Z(\mu^\prime)}\frac{d}{d\mu^\prime}\sum_{j = 0}^{n} \exp(j\mu^\prime) \\ > &= \frac{1}{Z(\mu^\prime)}\frac{d Z(\mu^\prime)}{d\mu^\prime} \\ > &= \frac{1 - \exp(\mu^\prime)}{1 - \exp((n + 1)\mu^\prime)} > \frac{n\exp((n + 2)\mu^\prime) - (n + 1)\exp((n + 1)\mu^\prime) + \exp(\mu^\prime)}{(1 - \exp(\mu^\prime))^2} \\ > E &= \frac{n\exp((n + 2)\mu^\prime) - (n + 1)\exp((n + 1)\mu^\prime) + \exp(\mu^\prime)}{(1 - \exp((n + 1)\mu^\prime))(1 - \exp(\mu^\prime))} > \end{align*} > $ > > Since $\mu^\prime = \mu \ln 2$, the expression can be changed to use 2 as the base exponent: > $ > E = \frac{n\cdot2^{(n + 2)\mu} - (n + 1)2^{(n + 1)\mu} + 2^\mu}{(1 - 2^{(n + 1)\mu})(1 - 2^\mu)} > $ > > The result function is smooth and S-shaped, most closely resembling the [[Hyperbolic Tangent|hyperbolic tangent]], which makes the inverse hyperbolic tangent a possible approximation. The precise parameters in terms of $n$ is still unknown.