> [!Quote]
> Using the chain rule is like peeling an onion: you have to deal with each layer at a time, and if it is too big you will start crying.
> [!theorem]
>
> Let $E, F, G$ be [[Banach Space|Banach spaces]], $U \subset E$ and $V \subset F$ be [[Open Set|open]], and $f: U \to V$, $g: V \to G$ be maps. Let $x \in U$. If $f$ is [[Derivative|differentiable]] at $x$ and $g$ is differentiable at $f(x)$, then
> $
> D(g \circ f)(x) = Dg(f(x)) \circ Df(x)
> $
> *Proof*. Denote,
> $
> k(y) = f(x + y) - f(x) = Df(x)y + o_f(y)
> $
> then
> $
> \begin{align*}
> &g(f(x + h)) - g(f(x)) \\
> &= Dg(f(x))k(y) + o_g(k(y)) \\
> &= Dg(f(x))(Df(x)y + o_f(y)) + o_g(k(y)) \\
> &= Dg(f(x)) \circ Df(x) y + \underbrace{Dg(f(x))(o_f(y)) + o_g(k(y))}_{o(y)}
> \end{align*}
> $
> [!Definition]
>
> If $g$ is [[Derivative|differentiable]] at $x$, and $f$ is differentiable at $g(x)$, then the composite [[Function|function]] $f(g(x))$ is differentiable at $x$, and its [[Derivative|derivative]] is given by the product
>
> $
> \frac{d}{dx}{\left(f(g(x))\right)} = \left(\frac{d}{dx}f(x)\right)(g(x)) \cdot \frac{d}{dx}g(x)
> $
>
> Or if $y = f(u)$ and $u = g(x)$ are both differentiable functions, then:
>
> $
> \frac{dy}{dx} = \frac{dy}{du}\frac{du}{dx}
> $
> [!theorem] Proof
>
> If $y = f(x)$ and $x$ changes from $a$ to $a + \Delta{x}$, the increment of $y$, $\Delta{y}$ can be defined as follows:
>
> $
> \Delta{y} = f(a + \Delta{x}) - f(a)
> $
>
> And thus, the derivative of $f(a)$ is:
>
> $
> \lim_{\Delta{x} \to 0}{\frac{\Delta{y}}{\Delta{x}}} = \frac{dy}{dx} = \frac{d}{dx}f(x)
> $
>
> Denote $\varepsilon$ as the difference between the difference quotient and the derivative:
>
> $
> \lim_{\Delta{x} \to 0}\varepsilon = \lim_{\Delta{x} \to 0}\left(\frac{\Delta{y}}{\Delta{x}} - \frac{d}{dx}f(a)\right) = \frac{d}{dx}f(a) - \frac{d}{dx}f(a) = 0
> $
>
> But:
>
> $
> \begin{align*}
> \varepsilon &= \frac{\Delta{y}}{\Delta{x}} - \frac{d}{dx}f(a) \\
> \varepsilon\Delta{x} &= \Delta{y} - \Delta{x}\frac{d}{dx}f(a) \\
> \Delta{y} &= \Delta{x}\frac{d}{dx}f(a) + \varepsilon\Delta{x}
> \end{align*}
> $
>
> Let $\varepsilon$ be a continuous function of $\Delta{x}$ that approaches 0 as $\Delta{x}$ approaches 0. Thus, for a differentiable function $f$:
>
> $
> \Delta{y} = \Delta{x}\frac{d}{dx}f(a) + \varepsilon\Delta{x} \quad \text{where} \ \varepsilon \to 0 \ \text{as} \ \Delta{x} \to 0
> $
>
> Suppose $u = g(x)$ is differentiable at $a$, and $y = f(u)$ is differentiable at $b = g(a)$. If $\Delta{x}$ is an increment in $x$, $\Delta{u}$ is an increment in $u$, and $\Delta{y}$ is an increment in $y$, then:
>
> $
> \Delta{u} = \left(\frac{d}{dx}g(x)\right)(a) \cdot \Delta{x} + \varepsilon_1 \Delta{x}
> = \Delta{x}\left(\left(\frac{d}{dx}g(x)\right)(a) + \varepsilon_1\right)
> \quad \text{where} \ \varepsilon_1 \to 0 \ \text{as} \ \Delta{x} \to 0
> $
>
> Similarly:
>
> $
> \Delta{y} = \left(\frac{d}{dx}f(x)\right)(b) \cdot \Delta{u} + \varepsilon_2 \Delta{u}
> = \Delta{u}\left(\left(\frac{d}{dx}f(x)\right)(b) + \varepsilon_2\right)
> \quad \text{where} \ \varepsilon_2 \to 0 \ \text{as} \ \Delta{u} \to 0
> $
>
> Substituting the expression for $\Delta{u}$:
>
> $
> \Delta{y} = \Delta{x}\left(\left(\frac{d}{dx}g(x)\right)(a) + \varepsilon_1\right)
> \left(\left(\frac{d}{dx}f(x)\right)(b) + \varepsilon_2\right)
> $
>
> So:
>
> $
> \frac{\Delta{y}}{\Delta{x}} = \left(\left(\frac{d}{dx}g(x)\right)(a) + \varepsilon_1\right)
> \left(\left(\frac{d}{dx}f(x)\right)(b) + \varepsilon_2\right)
> $
>
> Since as $\Delta{x} \ to 0$, $\Delta{u} \to 0$. Thus, $\varepsilon_1 \to 0$ and $\varepsilon_2 \to 0$ as $\Delta{x} \to 0$.
>
> $
> \begin{align*}
> \frac{dy}{dx}
> &= \lim_{\Delta{x} \to 0}{\frac{\Delta{y}}{\Delta{x}}} \\
> &= \lim_{\Delta{x} \to 0}{\left(\left(\frac{d}{dx}g(x)\right)(a) + \varepsilon_1\right)
> \left(\left(\frac{d}{dx}f(x)\right)(b) + \varepsilon_2\right)} \\
> &= \left(\left(\frac{d}{dx}g(x)\right)(a) + 0\right)
> \left(\left(\frac{d}{dx}f(x)\right)(b) + 0\right) \\
> &= \left(\frac{d}{dx}g(x)\right)(a) \cdot
> \left(\frac{d}{dx}f(x)\right)(b) \\
> &= \left(\frac{d}{dx}g(x)\right)(a) \cdot
> \left(\frac{d}{dx}f(x)\right)(g(a))
> \end{align*}
> $
> [!definition]
>
> Let $f(x_1, \cdots, x_n): \real^n \to \real$ be a differentiable function of $n$ variables, where $x_k = g_k(t)$ are continuous functions, then
> $
> \frac{df}{dt} = \sum_{k = 1}^{n}\frac{\partial f}{\partial x_k}\frac{dg_k}{dt}
> $
> [!definition]
>
> Let $f(x_1, \cdots, x_n): \real^n \to \real$ be a differentiable function of $n$ variables, where $x_k = g_k(t_1, \cdots, t_n)$ are differentiable functions, then
> $
> \frac{\partial f}{\partial t_k} = \sum_{k = 1}^{n}\frac{\partial f}{\partial x_k}\frac{\partial g_k}{\partial t_k}
> $