Module II·Article I·~5 min read
Lagrangian Duality and Slater's Theorem
Duality and Optimality Conditions
Turn this article into a podcast
Pick voices, format, length — AI generates the audio
Why is duality needed?
Sometimes it is difficult to solve an optimization problem directly, but there exists a “reformulation” that is easier to solve. Lagrangian duality is a systematic way to construct such a dual problem. It turns out that every minimization problem has a “dual maximization problem” whose optimal value does not exceed that of the original. Under certain conditions (Slater's theorem) the two values coincide. This opens an opportunity: to solve a difficult primal problem by means of a more convenient dual one.
The primal problem and the Lagrangian
Let us consider the general optimization problem:
min f(x) subject to: gᵢ(x) ≤ 0, i = 1,...,m and hⱼ(x) = 0, j = 1,...,p
Here f is the objective function, gᵢ are inequalities, hⱼ are equations. Optimum p*.
The Lagrangian: we “relax” the constraints by moving them into the objective function with penalties λᵢ and νⱼ:
L(x, λ, ν) = f(x) + Σᵢ λᵢ gᵢ(x) + Σⱼ νⱼ hⱼ(x)
where λᵢ ≥ 0 are the “dual variables” (Lagrange multipliers) for inequalities, and νⱼ are for equations (can be of any sign).
Physical meaning of λᵢ: the price of violating the i-th constraint. If gᵢ(x) > 0 (violation), we pay λᵢ gᵢ(x) > 0 for it. If λᵢ = 0 — there is no penalty. For large λᵢ — violation is “expensive”.
Dual function: g(λ, ν) = inf_{x} L(x, λ, ν) — the infimum of the Lagrangian over x for fixed multipliers.
An important property: g is always a concave function (as the infimum of linear functions in λ, ν).
Weak duality
Theorem (weak duality): for any admissible λ ≥ 0 and ν it holds that:
g(λ, ν) ≤ p*
Proof: let x* be the optimal admissible solution of the primal problem. Then:
g(λ, ν) = inf_x L(x, λ, ν) ≤ L(x*, λ, ν) = f(x*) + Σ λᵢ gᵢ(x*) + Σ νⱼ hⱼ(x*)
Since x* is admissible: gᵢ(x*) ≤ 0 and λᵢ ≥ 0 → λᵢ gᵢ(x*) ≤ 0; hⱼ(x*) = 0. Thus, the right-hand side ≤ f(x*) = p*.
Corollary: maximizing g(λ, ν) over λ ≥ 0, ν gives the best lower bound for p*.
The dual problem and Slater's theorem
The dual problem: max_{λ ≥ 0, ν} g(λ, ν).
The optimum of the dual problem: d* ≤ p* (weak duality). Duality gap: p* − d* ≥ 0.
Slater’s theorem: if the problem is convex (f, gᵢ are convex, hⱼ are affine) and there exists a strictly feasible point x̃ such that gᵢ(x̃) < 0 strictly for all i (and hⱼ(x̃) = 0), then:
d = p** (strong duality: gap is zero)
and the dual optimum d* is achieved.
KKT (Karush-Kuhn-Tucker) conditions
With strong duality, x*, λ*, ν* are optimal if and only if the four KKT conditions hold:
1. Primal admissibility: gᵢ(x*) ≤ 0, hⱼ(x*) = 0 (x* is feasible)
2. Dual admissibility: λᵢ* ≥ 0
3. Complementary slackness: λᵢ* gᵢ(x*) = 0 for all i
4. Stationarity: ∇f(x*) + Σᵢ λᵢ* ∇gᵢ(x*) + Σⱼ νⱼ* ∇hⱼ(x*) = 0
Interpretation of condition 3: either the constraint is “active” (gᵢ(x*) = 0), or its price is zero (λᵢ* = 0). It is impossible for a “inactive” constraint to have a nonzero price.
Economic interpretation
The dual variables λᵢ* are shadow prices of resources. If we relax the i-th constraint gᵢ(x) ≤ 0 by ε (making it ≤ ε), then the optimal value changes approximately by −λᵢ* ε. Thus, λᵢ* shows how valuable “a little more” of the i-th resource is.
Full analysis of an example
Problem: min (x₁ − 1)² + (x₂ − 1)² subject to x₁ + x₂ ≤ 1, x₁ ≥ 0, x₂ ≥ 0.
Geometrically: the point nearest to (1, 1) in the triangle {x₁ + x₂ ≤ 1, x₁ ≥ 0, x₂ ≥ 0}.
Slater’s condition: x̃ = (0.1, 0.1) is strictly feasible: 0.1 + 0.1 = 0.2 < 1, 0.1 > 0. Strong duality holds.
KKT: Stationarity: 2(x₁* − 1) + λ* − μ₁* = 0, 2(x₂* − 1) + λ* − μ₂* = 0, where λ* is the multiplier for x₁+x₂ ≤ 1, μ₁*, μ₂* for x₁ ≥ 0, x₂ ≥ 0.
By symmetry of the problem x₁* = x₂*. Check: point x* = (1/2, 1/2) is on the boundary x₁+x₂ = 1 (constraint is active, λ* > 0). By stationarity: 2(1/2 − 1) + λ* = 0 → λ* = 1. Complementary slackness: λ*(1/2 + 1/2 − 1) = 0 ✓. The answer: x* = (1/2, 1/2), p* = 1/4 + 1/4 = 1/2.
Geometric interpretation of duality
Lagrangian duality has a profound geometric meaning. The primal problem seeks the lowest point of a convex body (the epigraph of the function). The dual problem describes this body through the set of all supporting hyperplanes. Each hyperplane provides a lower bound for the function; the supremum over all bounds gives the exact value at the optimum (under the Slater conditions).
Saddle points of the Lagrangian
A pair (x*, λ*) is a saddle point of the Lagrangian L(x, λ) = f(x) + Σλᵢgᵢ(x) if L(x*, λ) ≤ L(x*, λ*) ≤ L(x, λ*) for all admissible x, λ ≥ 0. When strong duality holds, a saddle point exists, and x* is the optimum of the primal problem, λ* the optimum of the dual. This gives a practical method: the Uzawa method iteratively updates x by gradient descent on L and λ by gradient ascent, converging to a saddle point.
Applications of duality
- Support Vector Machines (SVM): the dual problem has smaller dimension (number of support vectors instead of feature dimension) and permits the kernel trick
- Benders decomposition in large-scale optimization: decomposition into “easy” and “hard” parts via duality
- Distributed optimization (ADMM): dual variables serve as the “coordinator” among parallel solvers
- Sensitivity analysis: dual variable λᵢ is the “shadow price” of a constraint, showing how much the optimum improves if the constraint is relaxed by one unit
§ Act · what next