Partial Derivatives and the Gradient

What You’ll Learn

In this lesson you’ll learn how to take derivatives of functions that depend on more than one variable, and how the gradient vector combines those derivatives into a single powerful tool.

Fair warning: from here on out, the math notation starts looking really cool. You’ve got ∂ (curly d), ∇ (nabla/del), and more on the way. If you catch yourself writing these on a whiteboard and feeling like a genius, that’s completely normal. You’ve earned it.

The Concept

Partial Derivatives

For a function z = f(x, y), the partial derivative with respect to x treats y as a constant and differentiates normally

\frac{\partial f}{\partial x} = \lim_{h \to 0} \frac{f(x + h, y) - f(x, y)}{h}

The partial derivative with respect to y does the same thing but treats x as constant

\frac{\partial f}{\partial y} = \lim_{h \to 0} \frac{f(x, y + h) - f(x, y)}{h}

If you can take a regular derivative, you can take a partial derivative. Just pretend the other variables are numbers.

The symbol ∂ (a curly d, sometimes called “del” or “partial”) is used instead of the regular d from single-variable calculus to signal that there are other variables being held constant. When you see ∂f/∂x, it’s saying “the derivative of f with respect to x, with everything else frozen.” It works exactly like df/dx from Calculus 1, except the curly ∂ reminds you that f depends on more than just x.

The Gradient Vector

The symbol ∇ (an upside-down triangle) is called “nabla” or “del.” It’s the vector calculus operator that means “take all the partial derivatives and pack them into a vector.” When you see ∇f, read it as “del f” or “the gradient of f.” It’s not a number or a variable. It’s an instruction: compute all the partial derivatives and assemble them.

The gradient bundles all the partial derivatives into one vector

\nabla f = \left\langle \frac{\partial f}{\partial x},\, \frac{\partial f}{\partial y} \right\rangle

In 3D, it has three components

\nabla f = \left\langle \frac{\partial f}{\partial x},\, \frac{\partial f}{\partial y},\, \frac{\partial f}{\partial z} \right\rangle

The gradient has three key properties:

It points in the direction of steepest ascent (the direction where f increases fastest)
Its magnitude tells you how steep that ascent is
It is always perpendicular to the level curves (contour lines) of f

Worked Examples

Example 1: Computing partial derivatives

Let f(x, y) = x²y + 3xy².

To find the partial with respect to x, treat y as a constant

\frac{\partial f}{\partial x} = 2xy + 3y^2

To find the partial with respect to y, treat x as a constant

\frac{\partial f}{\partial y} = x^2 + 6xy

Each partial derivative tells you the rate of change of f in one coordinate direction while the other stays fixed.

Each partial derivative is just the slope of a slice through the surface. Hold one variable constant and you get a regular 2D curve whose slope is the partial derivative.

The left panel holds y = 1 and varies x, giving the curve f(x, 1) = x² + 3x. The slope of that curve at x = 1 is the partial derivative ∂f/∂x = 2xy + 3y² = 5. The right panel holds x = 1 and varies y, giving f(1, y) = y + 3y². The slope at y = 1 is ∂f/∂y = x² + 6xy = 7. Each partial derivative is just the slope of a slice through the surface.

Example 2: The gradient of f(x, y) = x² + y²

\frac{\partial f}{\partial x} = 2x \qquad \frac{\partial f}{\partial y} = 2y

\nabla f = \langle 2x, 2y \rangle

At the point (3, 4)

\nabla f(3, 4) = \langle 6, 8 \rangle

The magnitude is

|\nabla f| = \sqrt{36 + 64} = \sqrt{100} = 10

This means the function increases fastest in the direction (6, 8) (which points away from the origin), at a rate of 10 units per unit distance.

The point (3, 4) sits on the f = 25 level curve. The gradient ∇f = ⟨6, 8⟩ points straight outward, perpendicular to the circle, in the direction where f increases fastest.

The blue circles are level curves. A level curve is what you get when you set f(x, y) equal to some constant and ask “which points (x, y) give me that value?” Think of it like a topographic map: each contour line connects all the points at the same elevation. Here, f(x, y) = x² + y², so setting f = c gives x² + y² = c, which is a circle of radius √c centered at the origin.

The four circles in the visual correspond to f = 4, 9, 16, and 25. Why those numbers? Because their square roots are whole numbers: √4 = 2, √9 = 3, √16 = 4, √25 = 5. That makes the circles land on nice grid coordinates (radii 2, 3, 4, 5) so the picture is easy to read. Every point on the f = 9 circle, for example, satisfies x² + y² = 9. The point (3, 0) is on it. So is (0, 3). So is (−3, 0). They all give f = 9.

The purple dot is our point (3, 4) from Example 2. It sits right on the f = 25 circle because 3² + 4² = 9 + 16 = 25. The dashed line from the origin shows the distance is 5 (which is √25). The orange arrow is the gradient ∇f = ⟨6, 8⟩, pointing straight outward from the origin, perpendicular to the circle. That perpendicularity is always true: the gradient always points across level curves, never along them, in the direction where f increases fastest.

Example 3: Gradient perpendicular to level curves

For f(x, y) = x² + y², the level curve f = 9 is the circle x² + y² = 9. At the point (3, 0) on this circle, the gradient is ⟨6, 0⟩, which points straight outward (perpendicular to the circle). At (0, 3), the gradient is ⟨0, 6⟩, again perpendicular. The gradient never points along the level curve. It always points across it, in the direction of increasing f.

At every point on the circle, the gradient points straight outward, perpendicular to the curve. The small squares mark the right angles.

Real-World Application

Partial derivatives and the gradient are everywhere:

Machine learning uses gradient descent, which follows the negative gradient to minimize a loss function
Game engines use the gradient of a heightmap to determine slope direction for character movement and water flow
Physics uses gradients to describe electric fields (gradient of electric potential) and gravitational fields
Economics uses partial derivatives for marginal analysis (how does profit change when you adjust one variable?)

Quiz

When computing $\frac{\partial f}{\partial x}$, you treat

A.x as a constant
B.all other variables as constants
C.everything as a variable
D.only z as constant

The gradient $\nabla f$ points in the direction of

A.steepest descent
B.the x-axis
C.steepest ascent
D.the level curve

For $f(x,y) = x^2 + y^2$, the gradient at $(3, 4)$ is

A.$\langle 3, 4 \rangle$
B.$\langle 9, 16 \rangle$
C.$\langle 6, 8 \rangle$
D.$\langle 2, 2 \rangle$

The gradient is always perpendicular to

A.the x-axis
B.the tangent vector
C.the level curves of f
D.the y-axis

Gradient descent in machine learning follows

A.the gradient direction
B.the negative gradient direction
C.a random direction
D.the level curve