> [!tldr]- Introduction
>
> When trying to calculate the entropy of a continuous random variable, the first thing that comes to mind is to apply some kind of limit on its discrete version:
> $
> -\sum_{j = 1}^{n}p_j\log_2(p_j) \rightarrow -\int_{-\infty}^{\infty}p(x)\log_2(p(x)dx)dx
> $
> Since $p(x)dx$ converts the probability *density* into raw probability. However, the integral can be evaluated as
> $
> \begin{align*}
> -\int_{-\infty}^{\infty}p(x)\log_2(p(x)dx)dx &=
> -\int_{-\infty}^{\infty}p(x)\log_2(p(x))dx -
> \int_{-\infty}^{\infty}p(x)\log_2(dx)dx \\
> &= -\int_{-\infty}^{\infty}p(x)\log_2(p(x))dx + \infty
> \end{align*}
> $
> because $\log_2(dx)$ explodes into infinity. The important part here is that the infinity arises from the *continuous* nature of this random variable. This makes sense because if a variable is truly continuous, then it would take an infinite number of digits, or information to store[^1]. However, it's still true that some distributions are more "spread out" and therefore more surprising than others, which is what the left term represents.
>
> $
> H(X) = -\int_{-\infty}^{\infty}p(x)\log_2(p(x))dx
> $
> This was what Shannon wrote down as the generalisation for continuous entropy. While technically wrong, it's useful nonetheless.
> [!definition]
>
> $
> H(X) = -\int_{\infty}^{\infty}\log_2(p(x))p(x)dx
> $
> The [[Derivative|differential]] [[Entropy|entropy]] of a continuous [[Random Variable|random variable]] is the generalisation of entropy from discrete to continuous cases, using a [[Probability Distribution|probability density function]] instead of a discrete distribution. In a sense, it is the [[Expectation|expected]] [[Information|information]] of a random variable *in addition to* its continuous nature.
>
> Differential entropy is sensitive to unit change, as multiplying or dividing the random variable will add or subtract some amount from the entropy, as if extra precision digits are needed after the infinite ones we already have.
Let $X$ be a continuous random variable with $\ev(X) = \mu$ and $\var{X} = \sigma^2$, then
$
H(X) \le H_{N(\mu, \sigma^2)}
$
with equality if and only if $X \sim N(\mu, \sigma^2)$.
[^1]: See [[Continuous Entropy]].