Lagrangian Duality and Slater's Theorem — Convex Analysis & Optimization

Why is duality needed?

Sometimes it is difficult to solve an optimization problem directly, but there exists a “reformulation” that is easier to solve. Lagrangian duality is a systematic way to construct such a dual problem. It turns out that every minimization problem has a “dual maximization problem” whose optimal value does not exceed that of the original. Under certain conditions (Slater's theorem) the two values coincide. This opens an opportunity: to solve a difficult primal problem by means of a more convenient dual one.

The primal problem and the Lagrangian

Let us consider the general optimization problem:

min f(x) subject to: gᵢ(x) ≤ 0, i = 1,...,m and hⱼ(x) = 0, j = 1,...,p

Here f is the objective function, gᵢ are inequalities, hⱼ are equations. Optimum p*.

The Lagrangian: we “relax” the constraints by moving them into the objective function with penalties λᵢ and νⱼ:

L(x, λ, ν) = f(x) + Σᵢ λᵢ gᵢ(x) + Σⱼ νⱼ hⱼ(x)

where λᵢ ≥ 0 are the “dual variables” (Lagrange multipliers) for inequalities, and νⱼ are for equations (can be of any sign).

Physical meaning of λᵢ: the price of violating the i-th constraint. If gᵢ(x) > 0 (violation), we pay λᵢ gᵢ(x) > 0 for it. If λᵢ = 0 — there is no penalty. For large λᵢ — violation is “expensive”.

Dual function: g(λ, ν) = inf_{x} L(x, λ, ν) — the infimum of the Lagrangian over x for fixed multipliers.

An important property: g is always a concave function (as the infimum of linear functions in λ, ν).

Weak duality

Theorem (weak duality): for any admissible λ ≥ 0 and ν it holds that:

g(λ, ν) ≤ p*

Proof: let x* be the optimal admissible solution of the primal problem. Then:

g(λ, ν) = inf_x L(x, λ, ν) ≤ L(x*, λ, ν) = f(x*) + Σ λᵢ gᵢ(x*) + Σ νⱼ hⱼ(x*)

Since x* is admissible: gᵢ(x*) ≤ 0 and λᵢ ≥ 0 → λᵢ gᵢ(x*) ≤ 0; hⱼ(x*) = 0. Thus, the right-hand side ≤ f(x*) = p*.

Corollary: maximizing g(λ, ν) over λ ≥ 0, ν gives the best lower bound for p*.

The dual problem and Slater's theorem

The dual problem: max_{λ ≥ 0, ν} g(λ, ν).

The optimum of the dual problem: d* ≤ p* (weak duality). Duality gap: p* − d* ≥ 0.

Slater’s theorem: if the problem is convex (f, gᵢ are convex, hⱼ are affine) and there exists a strictly feasible point x̃ such that gᵢ(x̃) < 0 strictly for all i (and hⱼ(x̃) = 0), then:

d = p** (strong duality: gap is zero)

and the dual optimum d* is achieved.

KKT (Karush-Kuhn-Tucker) conditions

With strong duality, x*, λ*, ν* are optimal if and only if the four KKT conditions hold:

1. Primal admissibility: gᵢ(x*) ≤ 0, hⱼ(x*) = 0 (x* is feasible)

2. Dual admissibility: λᵢ* ≥ 0

3. Complementary slackness: λᵢ* gᵢ(x*) = 0 for all i

4. Stationarity: ∇f(x*) + Σᵢ λᵢ* ∇gᵢ(x*) + Σⱼ νⱼ* ∇hⱼ(x*) = 0

Interpretation of condition 3: either the constraint is “active” (gᵢ(x*) = 0), or its price is zero (λᵢ* = 0). It is impossible for a “inactive” constraint to have a nonzero price.

Economic interpretation

The dual variables λᵢ* are shadow prices of resources. If we relax the i-th constraint gᵢ(x) ≤ 0 by ε (making it ≤ ε), then the optimal value changes approximately by −λᵢ* ε. Thus, λᵢ* shows how valuable “a little more” of the i-th resource is.

Full analysis of an example

Problem: min (x₁ − 1)² + (x₂ − 1)² subject to x₁ + x₂ ≤ 1, x₁ ≥ 0, x₂ ≥ 0.

Geometrically: the point nearest to (1, 1) in the triangle {x₁ + x₂ ≤ 1, x₁ ≥ 0, x₂ ≥ 0}.

Slater’s condition: x̃ = (0.1, 0.1) is strictly feasible: 0.1 + 0.1 = 0.2 < 1, 0.1 > 0. Strong duality holds.

KKT: Stationarity: 2(x₁* − 1) + λ* − μ₁* = 0, 2(x₂* − 1) + λ* − μ₂* = 0, where λ* is the multiplier for x₁+x₂ ≤ 1, μ₁*, μ₂* for x₁ ≥ 0, x₂ ≥ 0.

By symmetry of the problem x₁* = x₂*. Check: point x* = (1/2, 1/2) is on the boundary x₁+x₂ = 1 (constraint is active, λ* > 0). By stationarity: 2(1/2 − 1) + λ* = 0 → λ* = 1. Complementary slackness: λ*(1/2 + 1/2 − 1) = 0 ✓. The answer: x* = (1/2, 1/2), p* = 1/4 + 1/4 = 1/2.

Geometric interpretation of duality

Lagrangian duality has a profound geometric meaning. The primal problem seeks the lowest point of a convex body (the epigraph of the function). The dual problem describes this body through the set of all supporting hyperplanes. Each hyperplane provides a lower bound for the function; the supremum over all bounds gives the exact value at the optimum (under the Slater conditions).

Saddle points of the Lagrangian

A pair (x*, λ*) is a saddle point of the Lagrangian L(x, λ) = f(x) + Σλᵢgᵢ(x) if L(x*, λ) ≤ L(x*, λ*) ≤ L(x, λ*) for all admissible x, λ ≥ 0. When strong duality holds, a saddle point exists, and x* is the optimum of the primal problem, λ* the optimum of the dual. This gives a practical method: the Uzawa method iteratively updates x by gradient descent on L and λ by gradient ascent, converging to a saddle point.

Applications of duality

Support Vector Machines (SVM): the dual problem has smaller dimension (number of support vectors instead of feature dimension) and permits the kernel trick
Benders decomposition in large-scale optimization: decomposition into “easy” and “hard” parts via duality
Distributed optimization (ADMM): dual variables serve as the “coordinator” among parallel solvers
Sensitivity analysis: dual variable λᵢ is the “shadow price” of a constraint, showing how much the optimum improves if the constraint is relaxed by one unit