What is the geometric interpretation of the derivative?

The slope of the tangent line at a point. Equivalently, the instantaneous rate of change. Both pictures lead to the same calculation.

Can a function be continuous but not differentiable?

Yes. The absolute value function $f(x) = |x|$ is continuous everywhere but not differentiable at $x = 0$ (the slope jumps from $-1$ to $+1$). More dramatically: Weierstrass constructed a function that is continuous everywhere and differentiable nowhere — its graph is a fractal.

What is the relationship between derivatives and limits?

The derivative is defined as a limit. Everything in differential calculus is built on the limit concept. Detail: [Limits and Continuity](/library/limits-and-continuity).

Why does the chain rule work?

Intuitively: if $y$ changes by an amount $\Delta y$ when $x$ changes by $\Delta x$, and $y$ changes by $\Delta y / \Delta u$ per unit of $u$, and $u$ changes by $\Delta u / \Delta x$ per unit of $x$, then $\Delta y / \Delta x = (\Delta y / \Delta u) \cdot (\Delta u / \Delta x)$. Taking limits gives the chain rule. The rigorous proof handles edge cases where $\Delta u = 0$.

What is the difference between $\frac{dy}{dx}$ and $f'(x)$ notation?

Both notations mean the same thing — the derivative of $y$ (or $f$) with respect to $x$. The Leibniz notation $\frac{dy}{dx}$ makes the variable of differentiation explicit and is useful in differential equations and chain rule arguments. The Lagrange notation $f'(x)$ is more compact and is preferred when context is clear.

Why do we have so many notations: $f'$, $\dot f$, $\frac{df}{dx}$, $Df$?

Different fields developed their own. $f'$ (Lagrange) is general; $\frac{df}{dx}$ (Leibniz) is general and explicit; $\dot f$ (Newton) is used in physics for time derivatives; $Df$ (Euler operator) is used in differential operator notation. All mean the same thing.

Is differentiation harder than integration?

The opposite — differentiation is much easier. Every elementary function has an elementary derivative. Differentiation is mechanical: apply the rules. Integration is not: many elementary functions have no elementary antiderivative.

§ CALCULUS · 20 MIN READ · Updated 2026-05-13

Derivatives Explained: From Definition to Application

The single most useful idea in calculus — explained from intuition to gradient descent.

"Nature, to be commanded, must be obeyed."

— Francis Bacon, *Novum Organum* (1620)

Derivatives Explained: From Definition to Application — DERIVATIVES EXPLAINED: FROM DEFINITION TO APPLICATION

The derivative measures how fast a function is changing at a single point. If $f (x)$ describes the position of a car at time $x$ , then $f^{'} (x)$ is the car's speed at time $x$ . If $f (x)$ is the cost of producing $x$ units of a product, then $f^{'} (x)$ is the marginal cost — the cost of producing one more unit when you are already producing $x$ . If $f$ is the loss function of a neural network as a function of its parameters, then the derivative tells you how to adjust the parameters to reduce the loss.

This article covers what a derivative actually measures, the formal limit definition with a worked example, all the differentiation rules you will use daily (power, product, quotient, chain), the table of derivatives of common functions, implicit differentiation, higher-order derivatives, and applications in physics, economics, and machine learning.

What a derivative actually measures

The intuitive picture: at any point on the graph of a function, draw the tangent line — the straight line that touches the curve at exactly that point. The slope of this tangent line is the derivative at that point.

Why this matters. The slope tells you how steeply the function is changing. A large positive slope means the function is increasing rapidly. A negative slope means it is decreasing. A slope of zero means the function is momentarily flat — a peak, a valley, or an inflection point.

Concretely: if $f (x) = x^{2}$ , then $f^{'} (x) = 2 x$ . At $x = 3$ , the slope is 6. This means: in a neighborhood of $x = 3$ , when $x$ increases by a small amount $Δ x$ , the function increases by approximately $6Δ x$ . The derivative is the rate of change, and it predicts the function's behavior locally.

The derivative is the foundation of nearly every quantitative discipline. It captures the idea of change at an instant — which is at the core of physics (motion), economics (marginal quantities), engineering (rates), biology (growth), and machine learning (loss optimization).

The formal limit definition

The slope of a line through two points $(x, f (x))$ and $(x + h, f (x + h))$ is:

$\frac{f ( x + h ) - f ( x )}{h}$

This is the average rate of change of $f$ over the interval from $x$ to $x + h$ . It is the slope of a secant line — a line through two points on the curve.

The derivative is what happens when $h$ shrinks to zero — the secant line becomes a tangent line:

$f^{'} (x) = lim_{h \to 0} \frac{f ( x + h ) - f ( x )}{h}$

This is the formal definition. Every other formula for derivatives is derived from it.

Example 1: Compute $f^{'} (x)$ from the definition when $f (x) = x^{2}$ .

$f^{'} (x) = lim_{h \to 0} \frac{( x + h ) ^{2} - x ^{2}}{h} = lim_{h \to 0} \frac{x ^{2} + 2 x h + h ^{2} - x ^{2}}{h} = lim_{h \to 0} \frac{2 x h + h ^{2}}{h} = lim_{h \to 0} (2 x + h) = 2 x$

So $f^{'} (x) = 2 x$ , confirming the rule.

Example 2: Compute $f^{'} (x)$ from the definition when $f (x) = \frac{1}{x}$ .

$f^{'} (x) = lim_{h \to 0} \frac{\frac{1}{x + h} - \frac{1}{x}}{h} = lim_{h \to 0} \frac{x - ( x + h )}{h \cdot x ( x + h )} = lim_{h \to 0} \frac{- h}{h \cdot x ( x + h )} = lim_{h \to 0} \frac{- 1}{x ( x + h )} = - \frac{1}{x ^{2}}$

So $\frac{d}{d x} (\frac{1}{x}) = - \frac{1}{x ^{2}}$ .

Working from the definition is tedious. In practice you use the rules below.

The differentiation rules

These rules allow you to compute the derivative of almost any function algebraically, without going back to the limit definition.

The power rule:

$\frac{d}{d x} (x^{n}) = n x^{n - 1}$

for any real number $n$ .

The sum and constant rules:

$\frac{d}{d x} (f + g) = f^{'} + g^{'}$ $\frac{d}{d x} (c f) = c f^{'}$

for any constant $c$ .

The product rule:

$\frac{d}{d x} (f g) = f^{'} g + f g^{'}$

The quotient rule:

$\frac{d}{d x} (\frac{f}{g}) = \frac{f ^{'} g - f g ^{'}}{g ^{2}}$

The chain rule:

$\frac{d}{d x} (f (g (x))) = f^{'} (g (x)) \cdot g^{'} (x)$

The chain rule is the most important rule. It tells you how to differentiate composed functions. In machine learning, the chain rule applied recursively is exactly the backpropagation algorithm.

Example 3: Differentiate $f (x) = (3 x^{2} + 1)^{5}$ .

Apply the chain rule. Let $u = 3 x^{2} + 1$ , so $f = u^{5}$ . Then $f^{'} = 5 u^{4} \cdot u^{'} = 5 (3 x^{2} + 1)^{4} \cdot 6 x = 30 x (3 x^{2} + 1)^{4}$ .

Example 4: Differentiate $f (x) = x^{3} sin x$ .

Apply the product rule:

$f^{'} (x) = 3 x^{2} sin x + x^{3} cos x$

Example 5: Differentiate $f (x) = \frac{e ^{x}}{x ^{2} + 1}$ .

Apply the quotient rule:

$f^{'} (x) = \frac{e ^{x} ( x ^{2} + 1 ) - e ^{x} \cdot 2 x}{( x ^{2} + 1 ) ^{2}} = \frac{e ^{x} ( x ^{2} - 2 x + 1 )}{( x ^{2} + 1 ) ^{2}} = \frac{e ^{x} ( x - 1 ) ^{2}}{( x ^{2} + 1 ) ^{2}}$

Example 6 — multiple rules: Differentiate $f (x) = sin (x^{2}) \cdot e^{3 x}$ .

Product rule on the outside, chain rule on each factor:

$f^{'} (x) = cos (x^{2}) \cdot 2 x \cdot e^{3 x} + sin (x^{2}) \cdot 3 e^{3 x} = e^{3 x} (2 x cos (x^{2}) + 3 sin (x^{2}))$

Table of common derivatives

These should be memorized.

$f (x)$	$f^{'} (x)$
$c$ (constant)	$0$
$x^{n}$	$n x^{n - 1}$
$e^{x}$	$e^{x}$
$a^{x}$	$a^{x} ln a$
$ln x$	$\frac{1}{x}$
$lo g_{a} x$	$\frac{1}{x l n a}$
$sin x$	$cos x$
$cos x$	$- sin x$
$tan x$	$sec^{2} x$
$arcsin x$	$\frac{1}{1 - x ^{2}}$
$arctan x$	$\frac{1}{1 + x ^{2}}$

Implicit differentiation

Sometimes a function is defined implicitly by an equation that cannot be solved for $y$ in terms of $x$ . Example: $x^{2} + y^{2} = 25$ (the unit circle scaled).

To find $\frac{d y}{d x}$ at a point on this curve, differentiate both sides with respect to $x$ , treating $y$ as a function of $x$ :

$2 x + 2 y \frac{d y}{d x} = 0$

Solve for $\frac{d y}{d x}$ :

$\frac{d y}{d x} = - \frac{x}{y}$

At the point $(3, 4)$ : $\frac{d y}{d x} = - \frac{3}{4}$ . (You can verify this is the slope of the tangent to the circle $x^{2} + y^{2} = 25$ at this point.)

Implicit differentiation is essential for problems where $y$ is locked inside the equation — common in physics (constraints) and economics (utility curves).

Higher-order derivatives

The derivative of a derivative is the second derivative: $f^{''} (x) = \frac{d}{d x} (f^{'} (x))$ . Geometrically, it measures how the slope itself is changing — the curvature of the function.

In physics: position $\to$ velocity (first derivative) $\to$ acceleration (second derivative).

In optimization: the second derivative tells you whether a critical point (where $f^{'} = 0$ ) is a minimum (where $f^{''} > 0$ ), a maximum (where $f^{''} < 0$ ), or an inflection point (where $f^{''} = 0$ ).

Higher-order derivatives appear in Taylor series, in physics (jerk is the third derivative of position), and in numerical analysis (Newton's method uses both first and second derivatives).

Applications

Physics

The most direct application. Newton's second law: $F = ma$ , where $a = \frac{d ^{2} x}{d t ^{2}}$ . Any equation of motion is a differential equation involving derivatives of position with respect to time.

Economics

Marginal quantities. If $C (q)$ is the total cost of producing quantity $q$ , then $C^{'} (q)$ is the marginal cost — the cost of producing one more unit at the current production level. Marginal revenue, marginal utility, marginal product of labor — all are derivatives.

Optimization

To find the maximum or minimum of a function, set the derivative equal to zero and solve. Then use the second derivative to determine whether the critical point is a maximum, minimum, or saddle point.

Example 7: A box with no top is to be made from a square piece of cardboard 12 inches on a side by cutting squares of side $x$ from each corner and folding up the sides. What value of $x$ maximizes the volume?

The volume is $V (x) = x (12 - 2 x)^{2}$ . Set $V^{'} (x) = 0$ :

$V^{'} (x) = (12 - 2 x)^{2} + x \cdot 2 (12 - 2 x) (- 2) = (12 - 2 x) ((12 - 2 x) - 4 x) = (12 - 2 x) (12 - 6 x)$

Setting this to zero gives $x = 6$ or $x = 2$ . The first makes the box have zero volume; the second gives the maximum. So $x = 2$ inches.

Machine learning

Gradient descent. The optimization algorithm that trains nearly every neural network. Given a loss function $L (θ)$ as a function of model parameters $θ$ , gradient descent updates the parameters by

$θ \leftarrow θ - α \nabla L (θ)$

where $α$ is a learning rate and $\nabla L$ is the gradient — the vector of partial derivatives of $L$ with respect to each parameter.

Backpropagation. The algorithm that computes the gradient of the loss with respect to every parameter in a neural network. It is the chain rule of calculus applied recursively, layer by layer, from the output back to the input. Every parameter update in deep learning is a chain rule computation.

Common mistakes

Mistake 1 — Confusing $f (x)$ with $f^{'} (x)$ in the chain rule. When you compute $\frac{d}{d x} (f (g (x)))$ , the rule is $f^{'} (g (x)) \cdot g^{'} (x)$ , not $f^{'} (g^{'} (x))$ . The outer derivative is evaluated at $g (x)$ , not at $g^{'} (x)$ .

Mistake 2 — Forgetting the product rule when both factors involve $x$ . $\frac{d}{d x} (x \cdot sin x)$ is not $sin x$ . You have to use the product rule: $sin x + x cos x$ .

Mistake 3 — Treating $\frac{d y}{d x}$ as a fraction. It is not a fraction in the strict sense. You cannot generally cancel $d x$ in the denominator with a $d x$ elsewhere. (There is a more advanced treatment in differential forms, but for the level here, treat $\frac{d y}{d x}$ as a single notation.)

Mistake 4 — Computing $\frac{d}{d x} (x^{x})$ with the power rule. The power rule applies to $x^{n}$ where $n$ is a constant. For $x^{x}$ , you have to take the log first: $ln (x^{x}) = x ln x$ , then differentiate implicitly.

Frequently asked

What is the geometric interpretation of the derivative?: The slope of the tangent line at a point. Equivalently, the instantaneous rate of change. Both pictures lead to the same calculation.
Can a function be continuous but not differentiable?: Yes. The absolute value function $f(x) = |x|$ is continuous everywhere but not differentiable at $x = 0$ (the slope jumps from $-1$ to $+1$). More dramatically: Weierstrass constructed a function that is continuous everywhere and differentiable nowhere — its graph is a fractal.
What is the relationship between derivatives and limits?: The derivative is defined as a limit. Everything in differential calculus is built on the limit concept. Detail: [Limits and Continuity](/library/limits-and-continuity).
Why does the chain rule work?: Intuitively: if $y$ changes by an amount $\Delta y$ when $x$ changes by $\Delta x$, and $y$ changes by $\Delta y / \Delta u$ per unit of $u$, and $u$ changes by $\Delta u / \Delta x$ per unit of $x$, then $\Delta y / \Delta x = (\Delta y / \Delta u) \cdot (\Delta u / \Delta x)$. Taking limits gives the chain rule. The rigorous proof handles edge cases where $\Delta u = 0$.
What is the difference between $\frac{dy}{dx}$ and $f'(x)$ notation?: Both notations mean the same thing — the derivative of $y$ (or $f$) with respect to $x$. The Leibniz notation $\frac{dy}{dx}$ makes the variable of differentiation explicit and is useful in differential equations and chain rule arguments. The Lagrange notation $f'(x)$ is more compact and is preferred when context is clear.
Why do we have so many notations: $f'$, $\dot f$, $\frac{df}{dx}$, $Df$?: Different fields developed their own. $f'$ (Lagrange) is general; $\frac{df}{dx}$ (Leibniz) is general and explicit; $\dot f$ (Newton) is used in physics for time derivatives; $Df$ (Euler operator) is used in differential operator notation. All mean the same thing.
Is differentiation harder than integration?: The opposite — differentiation is much easier. Every elementary function has an elementary derivative. Differentiation is mechanical: apply the rules. Integration is not: many elementary functions have no elementary antiderivative.

— ACT —

Cited works & further reading

·Stewart, J. (2020). Calculus: Early Transcendentals, 9th edition. Cengage. — Chapters 2–4.
·Spivak, M. (2008). Calculus, 4th edition. Publish or Perish.
·3Blue1Brown. Essence of Calculus (YouTube series).
·Paul's Online Math Notes: Derivatives.

A letter from the portico

Once a week — a long-read, a quote, a practice. No promotions. Unsubscribe in one click.

By subscribing you agree to receive letters from Stoa.