> [!definition]
>
> Let $\cx, \cy$ be [[Banach Space|Banach spaces]], $U \subset \cx$ be [[Open Set|open]], and $f: U \to \cy$ be a [[Derivative|differentiable]] function, then
> $
> Df: U \to L(\cx, \cy)
> $
> Since the space of [[Bounded Linear Map|bounded linear maps]] is complete, differentiating $Df$ yields the **second derivative**
> $
> D^2f: U \to L(\cx, L(\cx, \cy))
> $
> As maps in $L(\cx, L(\cx, \cy))$ are separately continuous (with respect to $x$ inputs), we identify $L(\cx, L(\cx, \cy)) = L^2(\cx, \cy)$ as the space of continuous [[Multilinear Map|bilinear]] maps.
> [!theorem]
>
> Let $p \in \nat$, then define inductively the $p$-th derivative
> $
> D^pf(x) = D(D^{p - 1}f)(x)
> $
> with $D^pf(x) \in L^p(\cx, \cy)$ is continuous and [[Multilinear Map|multilinear]]. If
> $
> D^kf: U \to L^k(\cx, \cy)
> $
> exists and is continuous for each $k \le p$, then $f \in C^p$.
> [!theorem]
>
> Let $\seqf{v_k}$ be fixed elements of $\cx$. If $f$ is $p$ times differentiable on $U$, and let
> $
> g(x) = D^{n - 1}f(x)(v_2, \cdots, v_n)
> $
> then $g$ is differentiable on $U$, and
> $
> Dg(x)(v) = D^pf(x)(v, \cdots, v_n)
> $
> *Proof*. Consider $g$ as the composition between
> $
> D^{n - 1}f: U \to L^{n - 1}(\cx, \cy) \quad \lambda: L^{n - 1}(\cx, \cy) \to \cy
> $
> where $\lambda$ is the evaluation map at $(v_2, \cdots, v_n)$. This makes $\lambda$ continuous and linear, which allows differentiating the decomposition
> $
> D(\lambda \circ D^{n - 1}f) = \lambda \circ D^{n}f
> $
> Therefore
> $
> Dg(x)(v) = (\lambda \circ D^nf)(x)(v) = (D^nf(x)v)(v_2, \cdots, v_n)
> $
> [!theorem]
>
> Let $f: U \to \cy$ be $p$ times differentiable and $\lambda: \cy \to \mathcal{Z}$ be a bounded linear map. Then for any $x \in U$,
> $
> D^p(\lambda \circ f)(x) = \lambda \circ D^pf(x)
> $
> *Proof*. Induction, pulling out one layer at a time.
# Higher Derivatives are Symmetric
> [!theorem]
>
> Let $U \subset \cx$ be open, $f: U \to \cy$ be twice differentiable with $D^2f$ being [[Continuity|continuous]]. Then for each $x \in U$, the bilinear map $D^2f(x)$ is symmetric for all $v, w \in \cx$.
>
> *Proof*. Let $r > 0$ such that $B(x, 2r) \subset U$. Let $v, w \in \cx$ such that $\norm{v}, \norm{w} < r$. Denote
> $
> g(x) = f(x + v) - f(x)
> $
> Then
> $
> \begin{align*}
> &f(x + v + w) - f(x + w) - f(x + v) + f(x) \\
> &= g(x + w) - g(x) \\
> &= \int_0^1Dg(x + tw)(w) dt \\
> &= \int_0^1\braks{Df(x + v + tw) - Df(x + tw)}(w)dt \\
> &= \int_0^1 \int_0^1 D^2f(x + sv + tw) \cdot (v) ds \cdot (w)dt
> \end{align*}
> $
> by applying the [[Mean Value Theorem|mean value theorem]] twice. Let
> $
> \psi(sv, tw) = D^2f(x + sv + tw) - D^2f(x)
> $
> then
> $
> \begin{align*}
> g(x + w) - g(x) &= \int_0^1 \int_0^1 D^2f(x + sv + tw)(v, w) ds dt \\
> &= \int_0^1\int_0^1D^2f(x)(v, w)dsdt \\
> &+ \int_0^1\int_0^1\psi(sv, tw)(v, w)dsdt \\
> &= D^2f(x)(v, w) + \underbrace{\int_0^1\int_0^1\psi(sv, tw)(v, w)dsdt}_{\phi(v, w)}
> \end{align*}
> $
> where
> $
> \norm{\phi(v, w)} \le \sup_{s, t}\norm{\phi(sv, tw)} \cdot \norm{v} \cdot \norm{w}
> $
> Swapping the role of $v$ and $w$ in the above example, we can work with
> $
> \begin{align*}
> g_w(x) &= f(x + w) - f(x)
> \end{align*}
> $
> and
> $
> \begin{align*}
> &f(x + v + w) - f(x + w) - f(x + v) + f(x) \\
> &= g_w(x + v) - g_w(x) \\
> &= D^2f(x)(w, v) + \phi_w(v, w)
> \end{align*}
> $
> where
> $
> \norm{\phi_w(v, w)} \le \sup_{s, t} \norm{\psi_w(sv, tw)} \cdot \norm{v} \cdot \norm{w}
> $
> The two separate ways of writing the same expression yields
> $
> D^2f(x)(v, w) - D^2f(x)(w, v) = \phi(v, w) - \phi_w(v, w)
> $
> where since $D^2$ is continuous,
> $
> \begin{align*}
> \lim_{(v, w) \to 0}\phi(v, w) &= \lim_{(v, w) \to 0}\phi_w(v, w) \\
> &= D^2f(x + v + w) - D^2f(x) \\
> &= 0
> \end{align*}
> $
> Meaning that $D^2f(x)(v, w) - D^2f(x)(w, v) = 0$.
> [!theorem]
>
> Let $f \in C^p$ on $U$. Then for each $x \in U$, the map $D^pf(x)$ is symmetric.
>
> *Proof*. With induction on $p$. Suppose that $D^{p - 1}f(x)$ is symmetric and let $g = D^{p - 2}f$, then
> $
> D^2g(x)(v, w) = D^2g(x)(w, v)
> $
> Since $D^pf = D^2D^{p - 2}F$,
> $
> \begin{align*}
> D^pf(x)(v_1, \cdots, v_p) &= (D^2D^{p - 2}f(x))(v_1, v_2) \cdot (v_3, \cdots, v_p) \\
> &= (D^2D^{p - 2}f(x))(v_2, v_1) \cdot (v_3, \cdots, v_p) \\
> &= D^pf(x)(v_2, v_1, \cdots, v_p)
> \end{align*}
> $
> we can swap the first two inputs to the function. By the inductive hypothesis, we can also permute the last $p - 1$ inputs to the function.
>
> As any permutation in $S_p$ can be written as $(12) \cdot \sigma$ for some $\sigma \in S_{p - 1}$, permutations do not affect the value of $D^pf(x)$.