$$ \newcommand{\RR}{\mathbb{R}} \newcommand{\QQ}{\mathbb{Q}} \newcommand{\CC}{\mathbb{C}} \newcommand{\NN}{\mathbb{N}} \newcommand{\ZZ}{\mathbb{Z}} \newcommand{\EE}{\mathbb{E}} \newcommand{\HH}{\mathbb{H}} \newcommand{\SO}{\operatorname{SO}} \newcommand{\dist}{\operatorname{dist}} \newcommand{\length}{\operatorname{length}} \newcommand{\uppersum}[1]{{\textstyle\sum^+_{#1}}} \newcommand{\lowersum}[1]{{\textstyle\sum^-_{#1}}} \newcommand{\upperint}[1]{{\textstyle\smallint^+_{#1}}} \newcommand{\lowerint}[1]{{\textstyle\smallint^-_{#1}}} \newcommand{\rsum}[1]{{\textstyle\sum_{#1}}} \newcommand{\partitions}[1]{\mathcal{P}_{#1}} \newcommand{\erf}{\operatorname{erf}} \newcommand{\ihat}{\hat{\imath}} \newcommand{\jhat}{\hat{\jmath}} \newcommand{\khat}{\hat{k}} \newcommand{\pmat}[1]{\begin{pmatrix}#1\end{pmatrix}} \newcommand{\smat}[1]{\left(\begin{smallmatrix}#1\end{smallmatrix}\right)} $$

9  Partial Derivatives

How do we differentiate a function with multiple inputs? We will learn several ways to do this throughout the course. But all methods rely on one fundamental idea: the partial derivative.

9.1 Geometry of Partial Derivatives

If we take a function \(f(x,y)\) and hold the \(y\) value constant, we get a function of just \(x\). For example, if \(f(x,y)=\sin(x*y)\) and \(y=2\) we get the function \(f(x,2)=\sin(2x)\). This sort of function we already know how to take the derivative of!

\[f(x,2)^\prime=\frac{d}{dx}f(x,2)=2\cos(2x)\].

What does this derivative mean? Well, we are measuring the slope in the \(x\)-direction along the line where \(y=2\). Here’s a graphing calculator showing this, for different slices and different points!

What happens if we instead looked at the slice where \(y=7\) Then we’d have \(f(x,7)=\sin(7x)\) and the derivative would be \[f(x,7)^\prime=\frac{d}{dx}f(x,7)=7\cos(7x)\]

Thus, whaever \(y\) is we see it comes out front as a coefficient via the chain rule. This tells us that if we just call the constant \(y\) (and don’t bother to specify its numerical value) we should get

\[f(x,y)^\prime = \frac{d}{dx}f(x,y)= y\cos(xy)\]

The only thing confusing here is that unless we know what we are doing, it’s hard to tell what the prime means. So we should probably not use this notation when there’s more than one variable.

In fact, to signify that we are taking the derivative of a multivariable function, its customary to write the \(d\) a little fancy as well, using the italic \(\partial\)

Definition 9.1 (\(x\)-Partial Derivative) If \(f(x,y,z,\ldots)\) is a function of multiple variables, the *partial derivative with respect to \(x\) is the result of treating all other variables as constants, and differentiating with respect to \(x\). It’s denoted \[\frac{\partial f}{\partial x}=\lim_{h\to 0}\frac{f(x+h,y,z,\ldots)-f(x,y,z,\ldots)}{h}\]

Partial derivatives are ubiquitous in the sciences, and because they are so widely used in so many fields, there are also many common notations for them. I will use many of the notations interchangably in class, to prepare you for the real world: and for reference the most common ones appear below.

Definition 9.2 (Notations for Partial Differentiation) The partial derivative of \(f\) with respect to \(x\) may be written as

\[\frac{\partial f}{\partial x}=\frac{\partial}{\partial x}f = \partial_x f= f_x\]

The last notation takes some getting used to at first: the subscript represents differentiation! But because of its conciseness, it is very commonly used when performing calculations.

In single variable calculus there was just a single first derivative: \(f^\prime\). Now, the number of first derivatives depends on the number of variables - \(f(x,y)\) has two first derivatives \(f_x\) and \(f_y\), whereas \(g(u,v,w)\) has three! It turns out that it is very convenient to package all of this information together into a single vector, called the gradient.

Definition 9.3 (The Gradient Vector) Given a multivariable function \(f(x_1,\ldots, x_n)\), its gradient is the vector of all first partial derivatives \[\nabla f = \langle \partial_{x_1}f,\partial_{x_2}f,\ldots,\partial_{x_n}f\rangle\]

The symbol \(\nabla\) used in the notation of the gradient will become quite commonplace throughout vector calculus, though this is our first encounter with it. Alone, its pronounced nabla, or del. Right now, its alright to just think of the gradient a convenient bookkeeping device storing all of the partial derivatives in one handy place. But soon we will see that its direction and magnitude are actually quite meaningful to the geometry of \(f\), and because of this, the gradient vector lies at the heart of many modern optimization techniques in Machine Learning.

Anything you are used to doing in single-variable calculus can be done with partial derivatives. For example, we can use implicit differentiation to take the derivative of implicit equations with multiple variables.

Example 9.1 (Implicit Partial Differentiation) Find the derivative \(\partial_x z\) of the expression \[x^3+y^3+z^3+6xyz=7\]

Here we act as though \(z\) is implicity a function of \(x\), and we differentiate the whole equation: \[\frac{\partial}{\partial x}(x^3+y^3+z^3+6xyz)=\frac{\partial}{\partial x}7\]

Computing this (where we need the product rule on the last term, since we have both an \(x\) and a \(z\)-where \(z\) is implicitly a function of \(x\)!) gives

\[3x^2+0+3z^2\frac{\partial z}{\partial x}+6yz+6xy\frac{\partial z}{\partial x }=0\]

Then, we just solve for \(\partial_x z\):

\[(3z^2+6xy)\frac{\partial z}{\partial x }=-3x^2-6yz\]

\[\frac{\partial z}{\partial x }=\frac{-3x^2-6yz}{3z^2+6xy}\]

What is implicit differentiation measuring? Think back to the 2-dimensional case, from calculus 1. There we had an implicit equation - what we now know to be a level set - and we were trying to measure \(dy/dx\): that is, for a small change in \(x\), how much does \(y\) have to change, to stay on the level set?

The same picture works here, but now in higher dimension. An implicit equation in \(x,y,z\) determines a surface in \(\RR^3\) - which is the level set of some \(3-\)variable function. And a quantity like \(\partial z/\partial x\) tells us how much \(z\) has to change to stay on the surface, if we change \(x\) a little bit.

9.2 Higher Derivatives

Higher order partial derivatives are no more difficult: each time you take the derivative, you just treat all other variables as constants.

For instance, the second partial \(x\) derivative is just what you get by taking the \(x\) derivative twice:

\[\partial_x\partial_x (\cos(xy))=\partial_x(-y\sin(xy))=-y^2\cos(xy)\]

But you can also take partials with respect to different variables.

\[\partial_x\partial_y(x^3y^2)=\partial_x(2x^3y)=6x^2y\]

Definition 9.4 A higher partial derivative is just the result of taking the partial derivative more than once (perhaps with respect to different variables). When doing this, one needs to be careful with notation: the “derivative notations” are all read like function composition \[\partial_x\partial_y\partial_z f=\frac{\partial}{\partial x}\frac{\partial}{\partial y}\frac{\partial}{\partial z}f\] both mean do the z partial, then the y partial, then the x partial.

The subscript notation is read from inside out: \[f_{zyx}=((f_z)_y)_x\] is the equivalent to the above: doing \(z\) first, then diffrentiating with respect to \(y\), and finally with respect ot \(x\).

Theorem 9.1 (Equality of Mixed Partials) So long as the partial derivatives are defined and continuous, the order in which you take them does not matter.
\(\partial_x\partial_y f=\partial_y\partial_x f\)$ This works with higher order derivatives as well \[f_{xyxx}=f_{xxxy}=f_{yxxx}=\cdots\]

In fact, the second order partial derivatives together form an operation called the laplacian which arises in many applications:

Definition 9.5 The Laplacian operator is the sum of the (non-mixed) second order partial derivatives: it is sometimes written as \(\Delta\) and sometimes as \(\nabla^2\): in the plane this is

\[\Delta = \nabla^2 = \frac{\partial^2}{\partial x}+\frac{\partial^2}{\partial y}\]

and in higher dimensions, its analogous just with more variables. To take the laplacian of a function, you just find its non-mixed second partials, and add them all up:

\[\Delta f = f_{xx}+f_{yy}\]

One way to imagine what the laplacian is measuring is a sort of average concavity: it adds up the concavity in both the \(x\) and \(y\) directions. Thus, a function like \(x^2+y^2\) has laplacian

\[\Delta(x^2+y^2)=2+2=4\]

so its concave up on average, \(-(x^2+y^2)\) has laplacian \(-4\) so its concave down, and \(x^2-y^2\) has laplacian equal to zero: it is concave up in one direction and concave down in the other: so added together they cancel. Functions whose Laplacian are zero are called harmonic functions and play a huge role in understanding differential equations, physics, and engineering.

Just like we packaged all of the first partial derivatives together into one nice object, the Gradient, we do the same with the second partials:

Definition 9.6 (The Hessian (Matrix of 2nd Derivatives)) Given a twice differentiable function \(f(x,y)\) its Hessian Matrix is the \(2\times 2\) array of all second derivatives

\[Hf = \begin{pmatrix} f_{xx} & f_{xy}\\ f_{yx}& f_{yy} \end{pmatrix}\]

9.3 Partial Differential Equations

Partial derivatives are the language in which much of modern science is written. We saw in the last portion of the course that vector valued differential equations are the right language to describe the motion of single particles: but what about quantites that depend on more than one variable?

A first example is simply waves on a string: when a guitar string is pulled taught, if you try to pluck it away from rest it pulls back on you - the farther you pull it away, the harder it pulls back.

The amount a string curves away from its straight line equalibrium is captured (roughly) by its concavity. And so one simple model of string motion would say the bigger the concavity the faster it wants to “snap back”. Said more precisely

The acceleration of the string is proportional to its concavity

Writing this in math - if the string’s displacement at position \(x\) and time \(t\) is given by the function \(W(x,t)\), we have the partial differential equation below: \[\partial_{xx} W = \partial_{tt}W\]

In two dimensions, a wave equation measures the displacement of a circular membrane, like a drumhead or the interior speaker of an earbud. Here we have to account for displacements in both the x and y directions. Thus, the 2 dimensional wave equation is below (now written in the more ‘verbose’ notation for partial derivatives)

\[\frac{\partial^2 W}{\partial x^2}+\frac{\partial^2 W}{\partial y^2}=\frac{\partial^2 W}{\partial t^2}\]

Here’s some solutions to this equation:

This same wave equation in three dimensions describes the propagation of electromagnetic waves - or light! This was a triumph of 19th century physics, where James Clerk Maxwell derived this wave equation from his equations for the electromangetic field. Below, you can see a (numerically computed) solution to this equation, showing a light beam being focused by a glass lens.

Similar partial differential equations occur throughout physics and engineering. (If you have yet to be convinced of the wide applicability of partial derivatives, look up “continuum mechanics” and try to find a topic you’re interested in.)

One final example I’ll mention here is quantum mechanics, where the fundamental equation (called the Schodinger equation) is a replacement of Newtons Law ( a vector valued differential equation) with a partial differential equation. This big change in the mathematics is what causes people to say that quantum particles can be “like waves”.

Below is a calculator I wrote for playing around with the “double slit experiment” in quantum mechanics. This will not come up further in our course, but feel free to ask me if you are interested!

9.4 Videos

9.4.1 Calculus Blue

9.4.2 Khan Academy

9.4.3 Example Problems: