Module VI·Article I·~5 min read
Partial Derivatives and the Total Differential
Multivariable Functions
Turn this article into a podcast
Pick voices, format, length — AI generates the audio
Transition to Several Variables
Real-world problems seldom depend on just one variable. Temperature depends on the coordinates x, y, z, and time t. Firm profit depends on prices, quantities, costs. The potential energy of a particle system depends on the positions of all particles. Analysis of functions of many variables is an immediate step toward understanding the real world, and it is precisely here that fundamentally new phenomena arise: the concept of directional derivatives, constrained extrema, multiple integrals.
Partial Derivatives
The partial derivative of $f$ with respect to $x_i$ at point $a$ is the ordinary derivative with respect to $x_i$ when the other variables are held fixed:
$ \frac{\partial f}{\partial x_i} (a) = \lim_{h \to 0} \frac{f(a + h \cdot e_i) - f(a)}{h}. $
For $f(x, y) = x^2 y + \sin(y)$: $\frac{\partial f}{\partial x} = 2 x y$ (differentiate with respect to $x$, $y$ is a parameter), $\frac{\partial f}{\partial y} = x^2 + \cos(y)$ (differentiate with respect to $y$, $x$ is a parameter).
Economic interpretation: If $f(K, L)$ is a production function ($K$ — capital, $L$ — labor), then $\frac{\partial f}{\partial K}$ is the marginal product of capital: how much output increases when capital is increased by one unit, with labor held fixed. Cobb–Douglas function $f = A K^\alpha L^\beta$: $\frac{\partial f}{\partial K} = \alpha A K^{\alpha-1} L^\beta = \alpha f/K$. This means: the marginal product of capital is proportional to the average productivity.
Differentiability
A function $f$ is differentiable at point $a$ if the increment can be written as:
$ \Delta f = \sum_i \frac{\partial f}{\partial x_i} \Delta x_i + o(|\Delta x|). $
This is the linear approximation of the function. Total differential: $df = \sum_i \frac{\partial f}{\partial x_i} dx_i$. The differential is the best linear approximation to the change in the function.
Important: Differentiability $\rightarrow$ continuity $\rightarrow$ existence of partial derivatives. But the converse is not true! Classic counterexample: $f(x,y) = xy/(x^2 + y^2)$ for $(x,y) \neq (0,0)$ and $f(0,0) = 0$. Partial derivatives exist at the origin (both equal zero), but the function is discontinuous at the origin (the limit along $y = x$ equals $1/2 \neq 0$). Thus, for differentiability, one must check the condition on the remainder $o(|\Delta x|)$, not just the existence of partial derivatives.
Gradient and Directional Derivative
Gradient: $\nabla f = (\frac{\partial f}{\partial x_1}, ..., \frac{\partial f}{\partial x_n})$ — a vector in $\mathbb{R}^n$ pointing in the direction of maximal increase of the function.
Directional derivative along $l$ ($|l| = 1$): $D_l f = \nabla f \cdot l = |\nabla f| \cos \theta$, where $\theta$ is the angle between $\nabla f$ and $l$. It is maximal in the direction of the gradient itself ($\theta = 0$), zero perpendicular to the gradient.
In machine learning: The gradient descent algorithm for minimizing loss function $L(w)$: $w \leftarrow w - \eta \nabla L(w)$, where $\eta$ is the learning rate. At each iteration, a step is taken in the direction of the negative gradient—the direction of fastest decrease. This is the foundation of neural network training.
Example: $f(x, y) = x^2 + 4y^2$, $\nabla f = (2x, 8y)$. At point $(3, 1)$: $\nabla f = (6, 8)$, $|\nabla f| = 10$. Directional derivative along $l = (3/5, 4/5)$: $D_l f = 6 \cdot 3/5 + 8 \cdot 4/5 = 18/5 + 32/5 = 10$.
Theorem on Mixed Partial Derivatives
If $f$ and all its first and second order partial derivatives are continuous in a neighborhood of a point, then the order of differentiation does not matter:
$ \frac{\partial^2 f}{\partial x \partial y} = \frac{\partial^2 f}{\partial y \partial x} $
(Schwarz's theorem).
This property is often used “in reverse”: if $\frac{\partial^2 f}{\partial x \partial y} \neq \frac{\partial^2 f}{\partial y \partial x}$, then the pointwise derivative does not exist or is not continuous.
Hessian Matrix and Sufficient Condition for Extremum
Hessian matrix $H = \left( \frac{\partial^2 f}{\partial x_i \partial x_j} \right)$ — a symmetric matrix of second derivatives.
Necessary condition for extremum: At an internal extremum point $\nabla f = 0$ (stationary point).
Sufficient condition (via the Hessian at the stationary point):
- $H$ is positive definite (all eigenvalues gt;0$) $\rightarrow$ local minimum.
- $H$ is negative definite (all eigenvalues lt;0$) $\rightarrow$ local maximum.
- $H$ is indefinite (some positive, some negative) $\rightarrow$ saddle point (not extremum).
Example: $f(x, y) = x^3 − 3xy^2 + y^4$. Stationary points: $\frac{\partial f}{\partial x} = 3x^2 − 3y^2 = 0$, $\frac{\partial f}{\partial y} = −6xy + 4y^3 = 0$. From the first: $x = \pm y$. At point $(0, 0)$: $H = \mathrm{diag}(0, 0)$ — indefinite; a finer analysis is needed.
Constrained Extrema: the Method of Lagrange Multipliers
Find the extremum of $f(x)$ under constraint $g(x) = 0$.
Introduce the Lagrangian function: $L(x, \lambda) = f(x) − \lambda g(x)$. Necessary condition for extremum: $\nabla L = 0$, that is, $\nabla f = \lambda \nabla g$. This means: at the point of constrained extremum, the gradients of $f$ and $g$ are parallel—the level curves of $f$ touch the constraint $g = 0$.
Parameter $\lambda$ — the Lagrange multiplier: it shows how the optimal value of $f$ will change upon a small relaxation of the constraint $g = 0 \rightarrow g = \varepsilon$. In economics: $\lambda$ is the marginal value of the resource (how maximum profit changes if the budget increases by one unit).
Expanded example: Maximize $f(x, y) = xy$ under constraint $g = 2x + 3y − 6 = 0$.
$L = xy − \lambda (2x + 3y − 6)$. Conditions: $\frac{\partial L}{\partial x} = y − 2\lambda = 0$; $\frac{\partial L}{\partial y} = x − 3\lambda = 0$; $g = 0$.
From the first two: $y = 2\lambda$, $x = 3\lambda$. Into the constraint: $2(3\lambda) + 3(2\lambda) = 6 \rightarrow 12\lambda = 6 \rightarrow \lambda = 1/2$. Point: $x = 3/2$, $y = 1$. $f(3/2, 1) = 3/2$. Verification: geometrically, $f = xy$ is maximized on the constraint when the "weighted" contributions to the constraint are equal.
The Lagrange method is the foundation of economic optimum theory (utility maximization under a budget constraint), engineering optimization problems, and, in generalized form, Euler–Lagrange equations of variational calculus.
§ Act · what next