Continuous Entropy - Jerry's Math Garden

[[Entropy]] is the [[Expectation|expected]] [[Information|information]] of a random variable. The more often a variable is surprising, the more entropy it has, and the more information can be gained by measuring it. However, so far I've only learned about the discrete case, which looks something like this: $ H(X) = -\sum_{j = 1}^{n}p_j \log_2(p_j) $ Continuous random variables must also encode information, so it should have a similar formula. The one that Shannon wrote down just swapped the sum for the integral, which looked like this: $ H(X) = -\int_{-\infty}^{\infty}p(x) \log_2(p(x)) dx $ However, a [[Continuity|continuous]] probability distribution means something different. While a range of values may still be surprising and contain information (any value below the median would provide at least 1 bit of information), the transformation from probability to probability *density* makes entropy very weird. Shannon's version of entropy here is *differential* entropy, which though can be calculated, has many weird properties. For one, under a change of unit (going from $X$ to $1000X$): $ \begin{align*} H(1000X) &= -\int_{-\infty}^{\infty}p_{1000}(x)\log_2(p_{1000}(x)) dx \\ &= -\int_{-\infty}^{\infty}\frac{1}{1000}p(x)\log_2\paren{\frac{1}{1000}p(x)} dx\\ &= -\int_{-\infty}^{\infty}\frac{1}{1000}p(x)\log_2\paren{p(x)} dx -\int_{-\infty}^{\infty}\frac{1}{1000}p(x)\log_2\paren{\frac{1}{1000}} dx \end{align*} $ Since the new $p_{1000}(x)$ spans a wider range, the increased bounds where $p(x) > 0$ undoes the $\frac{1}{1000}$ multiplier, but a new term emerges at the end. Does $1000X$ have more information than $X$? Obviously no, right? Because whatever information we have about the value of $X$, we just multiply it by 1000. In addition, since $1000X$ is completely dependent on $X$, it makes sense that $H_X(1000X) = 0$ from the definition. Since $p(x)$ is now a probability **density** function, $\log_2(p(x))$ can now be positive, which brings up *negative entropy*. That makes negative sense. A variable cannot possess negative information, measuring it cannot open up more possibilities. Figuring out how this works will probably be a job for future me. While Shannon's version has this major pitfall, everything else seems to be alright. A different approach to looking at continuous entropy is to play with limits. For the simplest example, consider the [[Uniform Distribution|uniform distribution]] with $n$ outcomes, its entropy is then $ H(X) = -\sum_{j = 1}^{n} p\log_2(p) = -\sum_{j = 1}^{n}\frac{1}{n}\log_2\paren{\frac{1}{n}} = \log_2(n) $ If we just apply the [[Limit|limit]] as $n$ goes to infinity, we will have a continuous version of the uniform distribution (let's call it $Y$). However, here we come into a pitfall: $ H(Y) = \limv{n}H(X) = \limv{n} \log_2(n) = \infty $ Such a random variable has infinite entropy. Looking at this formula as a [[Riemann Sum|Riemann sum]], we have: $ H(Y) = -\limv{n}\sum_{j = 1}^{n}\frac{1}{n}\log_2\paren{\frac{1}{n}} = -\int_{-\infty}^{\infty} \log_2(dx)dx $ This is gotta be the funniest integral I've seen in my entire life, with a $dx$ inside a function. While $ \lim_{dx \to 0}\log_2(dx)dx = 0 $ Integrating over it unfortunately also explodes to infinity (because it represents the same value as $H(Y)$). This formula can be generalised like this: $ H(X) = -\int_{-\infty}^{\infty}f(x)\log_2(f(x)dx)dx $ But since you can use the properties of logarithm to pull the $dx$ out, it's still infinity. It took me a while to realise this, but when the probability space is continuous, any value is infinitely surprising, because the probability of a continuous random variable taking on any specific number is virtually $0$. Therefore, every possible outcome is infinitely surprising, and of course any continuous random variable would have a continuous entropy. In the end, it takes an infinite number of digits to describe a continuous number. Maybe it's simpler to write an infinitely long decimal number $0.142857142857...$ as $\frac{1}{7}$, for the fraction system to encode all numbers between $0$ and $1$, it needs to have space for infinitely many digits on the numerator and denominator. Infinite digits is infinite bits, and infinite bits is infinite information. Continuous variables have infinite information. It's not a mistake.