§ CALCULUS · 20 MIN READ · Updated 2026-05-13
Derivatives Explained: From Definition to Application
The single most useful idea in calculus — explained from intuition to gradient descent.
"Nature, to be commanded, must be obeyed."

The derivative measures how fast a function is changing at a single point. If describes the position of a car at time , then is the car's speed at time . If is the cost of producing units of a product, then is the marginal cost — the cost of producing one more unit when you are already producing . If is the loss function of a neural network as a function of its parameters, then the derivative tells you how to adjust the parameters to reduce the loss.
This article covers what a derivative actually measures, the formal limit definition with a worked example, all the differentiation rules you will use daily (power, product, quotient, chain), the table of derivatives of common functions, implicit differentiation, higher-order derivatives, and applications in physics, economics, and machine learning.
What a derivative actually measures
The intuitive picture: at any point on the graph of a function, draw the tangent line — the straight line that touches the curve at exactly that point. The slope of this tangent line is the derivative at that point.
Why this matters. The slope tells you how steeply the function is changing. A large positive slope means the function is increasing rapidly. A negative slope means it is decreasing. A slope of zero means the function is momentarily flat — a peak, a valley, or an inflection point.
Concretely: if , then . At , the slope is 6. This means: in a neighborhood of , when increases by a small amount , the function increases by approximately . The derivative is the rate of change, and it predicts the function's behavior locally.
The derivative is the foundation of nearly every quantitative discipline. It captures the idea of change at an instant — which is at the core of physics (motion), economics (marginal quantities), engineering (rates), biology (growth), and machine learning (loss optimization).
The formal limit definition
The slope of a line through two points and is:
This is the average rate of change of over the interval from to . It is the slope of a secant line — a line through two points on the curve.
The derivative is what happens when shrinks to zero — the secant line becomes a tangent line:
This is the formal definition. Every other formula for derivatives is derived from it.
Example 1: Compute from the definition when .
So , confirming the rule.
Example 2: Compute from the definition when .
So .
Working from the definition is tedious. In practice you use the rules below.
The differentiation rules
These rules allow you to compute the derivative of almost any function algebraically, without going back to the limit definition.
The power rule:
for any real number .
The sum and constant rules:
for any constant .
The product rule:
The quotient rule:
The chain rule:
The chain rule is the most important rule. It tells you how to differentiate composed functions. In machine learning, the chain rule applied recursively is exactly the backpropagation algorithm.
Example 3: Differentiate .
Apply the chain rule. Let , so . Then .
Example 4: Differentiate .
Apply the product rule:
Example 5: Differentiate .
Apply the quotient rule:
Example 6 — multiple rules: Differentiate .
Product rule on the outside, chain rule on each factor:
Table of common derivatives
These should be memorized.
| (constant) | |
Implicit differentiation
Sometimes a function is defined implicitly by an equation that cannot be solved for in terms of . Example: (the unit circle scaled).
To find at a point on this curve, differentiate both sides with respect to , treating as a function of :
Solve for :
At the point : . (You can verify this is the slope of the tangent to the circle at this point.)
Implicit differentiation is essential for problems where is locked inside the equation — common in physics (constraints) and economics (utility curves).
Higher-order derivatives
The derivative of a derivative is the second derivative: . Geometrically, it measures how the slope itself is changing — the curvature of the function.
In physics: position velocity (first derivative) acceleration (second derivative).
In optimization: the second derivative tells you whether a critical point (where ) is a minimum (where ), a maximum (where ), or an inflection point (where ).
Higher-order derivatives appear in Taylor series, in physics (jerk is the third derivative of position), and in numerical analysis (Newton's method uses both first and second derivatives).
Applications
Physics
The most direct application. Newton's second law: , where . Any equation of motion is a differential equation involving derivatives of position with respect to time.
Economics
Marginal quantities. If is the total cost of producing quantity , then is the marginal cost — the cost of producing one more unit at the current production level. Marginal revenue, marginal utility, marginal product of labor — all are derivatives.
Optimization
To find the maximum or minimum of a function, set the derivative equal to zero and solve. Then use the second derivative to determine whether the critical point is a maximum, minimum, or saddle point.
Example 7: A box with no top is to be made from a square piece of cardboard 12 inches on a side by cutting squares of side from each corner and folding up the sides. What value of maximizes the volume?
The volume is . Set :
Setting this to zero gives or . The first makes the box have zero volume; the second gives the maximum. So inches.
Machine learning
Gradient descent. The optimization algorithm that trains nearly every neural network. Given a loss function as a function of model parameters , gradient descent updates the parameters by
where is a learning rate and is the gradient — the vector of partial derivatives of with respect to each parameter.
Backpropagation. The algorithm that computes the gradient of the loss with respect to every parameter in a neural network. It is the chain rule of calculus applied recursively, layer by layer, from the output back to the input. Every parameter update in deep learning is a chain rule computation.
Common mistakes
Mistake 1 — Confusing with in the chain rule. When you compute , the rule is , not . The outer derivative is evaluated at , not at .
Mistake 2 — Forgetting the product rule when both factors involve . is not . You have to use the product rule: .
Mistake 3 — Treating as a fraction. It is not a fraction in the strict sense. You cannot generally cancel in the denominator with a elsewhere. (There is a more advanced treatment in differential forms, but for the level here, treat as a single notation.)
Mistake 4 — Computing with the power rule. The power rule applies to where is a constant. For , you have to take the log first: , then differentiate implicitly.
Frequently asked
- What is the geometric interpretation of the derivative?
- The slope of the tangent line at a point. Equivalently, the instantaneous rate of change. Both pictures lead to the same calculation.
- Can a function be continuous but not differentiable?
- Yes. The absolute value function $f(x) = |x|$ is continuous everywhere but not differentiable at $x = 0$ (the slope jumps from $-1$ to $+1$). More dramatically: Weierstrass constructed a function that is continuous everywhere and differentiable nowhere — its graph is a fractal.
- What is the relationship between derivatives and limits?
- The derivative is defined as a limit. Everything in differential calculus is built on the limit concept. Detail: [Limits and Continuity](/library/limits-and-continuity).
- Why does the chain rule work?
- Intuitively: if $y$ changes by an amount $\Delta y$ when $x$ changes by $\Delta x$, and $y$ changes by $\Delta y / \Delta u$ per unit of $u$, and $u$ changes by $\Delta u / \Delta x$ per unit of $x$, then $\Delta y / \Delta x = (\Delta y / \Delta u) \cdot (\Delta u / \Delta x)$. Taking limits gives the chain rule. The rigorous proof handles edge cases where $\Delta u = 0$.
- What is the difference between $\frac{dy}{dx}$ and $f'(x)$ notation?
- Both notations mean the same thing — the derivative of $y$ (or $f$) with respect to $x$. The Leibniz notation $\frac{dy}{dx}$ makes the variable of differentiation explicit and is useful in differential equations and chain rule arguments. The Lagrange notation $f'(x)$ is more compact and is preferred when context is clear.
- Why do we have so many notations: $f'$, $\dot f$, $\frac{df}{dx}$, $Df$?
- Different fields developed their own. $f'$ (Lagrange) is general; $\frac{df}{dx}$ (Leibniz) is general and explicit; $\dot f$ (Newton) is used in physics for time derivatives; $Df$ (Euler operator) is used in differential operator notation. All mean the same thing.
- Is differentiation harder than integration?
- The opposite — differentiation is much easier. Every elementary function has an elementary derivative. Differentiation is mechanical: apply the rules. Integration is not: many elementary functions have no elementary antiderivative.
— ACT —
Cited works & further reading
- ·Stewart, J. (2020). Calculus: Early Transcendentals, 9th edition. Cengage. — Chapters 2–4.
- ·Spivak, M. (2008). Calculus, 4th edition. Publish or Perish.
- ·3Blue1Brown. Essence of Calculus (YouTube series).
- ·Paul's Online Math Notes: Derivatives.
More from this cluster
24 MIN
How to Solve Integrals: Step-by-Step Methods
17 MIN
Limits and Continuity: The Foundation of Calculus
26 MIN
Linear Algebra Basics: Vectors, Matrices, Transformations
14 MIN
Eigenvalues and Eigenvectors, Intuitively
20 MIN
Probability Theory Basics for Engineers
14 MIN
Series and Sequences: Convergence Tests Explained
16 MIN
Differential Equations: First-Order Methods
About the author
Tim Sheludyakov writes the Stoa library.
By Tim Sheludyakov · Edited 2026-05-13
A letter from the portico
Once a week — a long-read, a quote, a practice. No promotions. Unsubscribe in one click.
By subscribing you agree to receive letters from Stoa.