Module II·Article I·~5 min read

Minimax Theorem and Its Extensions

Zero-Sum Games and Minimax

Turn this article into a podcast

Pick voices, format, length — AI generates the audio

Fundamental Fact: Both Players "Know" the Optimal Value

In 1928, John von Neumann proved a remarkable fact: in any zero-sum matrix game there exists an "equilibrium." The minimizer cannot "worsen" the result below some value V, and the maximizer cannot raise it above the same V. This value V is the "value of the game." The minimax theorem is the mathematical justification for why rational players with opposing interests will reach a predictable outcome.

Von Neumann's Minimax Theorem

For a payoff matrix $A \in \mathbb{R}^{m \times n}$: $ \max_{x \in \Delta_m} \min_{y \in \Delta_n} x^\top A y

\min_{y \in \Delta_n} \max_{x \in \Delta_m} x^\top A y $ where $\Delta_k$ is the standard simplex of mixed strategies (a probability distribution over $k$ pure strategies).

Meaning: the maximum "guaranteed" payoff for player 1 = the minimum "guaranteed" loss for player 2. This value is called the value of the game.

Proof via LP: the task $\max_x \min_y x^\top A y$ is a linear program (one can introduce $v = \min_y x^\top A y$). Strong duality of LP $\rightarrow$ equality min = max.

Nash Equilibrium in Matrix Games

Strategy profile $(x^, y^)$ is a Nash Equilibrium if:

  • $x^$ is optimal given $y^$: $x^\top A y^* \leq x^{* \top} A y^*$ for all $x \in \Delta_m$
  • $y^$ is optimal given $x^$: $x^{* \top} A y \leq x^{* \top} A y^*$ for all $y \in \Delta_n$... no, actually the other way around for the maximizer

For zero sum: $(x^, y^)$ is a NE if and only if $x^{* \top} A y^*$ is the value of the game.

Algorithm for finding: we write as an LP. Both problems are LPs, strong duality.

Extension to Differential Games

Given Isaacs' condition (saddle point condition for H*):

$ \min_{u(\cdot)} \max_{v(\cdot)} J(x_0; u, v) = \max_{v(\cdot)} \min_{u(\cdot)} J(x_0; u, v) = V(x_0) $

This is the "minimax theorem" in continuous time. Proof (Evans-Souganidis, 1984): via viscosity solution theory of HJI.

Feedback Strategies vs Open-Loop

Important fact: for zero-sum games with fixed strategies — there is a difference!

Open-loop strategies: $u = u(t)$, $v = v(t)$ (fixed before the game). The "open" order of moves:

$\min_u \max_v J$ vs $\max_v \min_u J$ — may not coincide!

Feedback strategies: $u = \alpha(x, t)$, $v = \beta(x, t)$. Under Isaacs' condition: $\min \max = \max \min = V(x_0)$. Feedback strategies "bypass" the issue of move order.

Example: coin game. Consider a discrete version: $A = [[1, -1], [-1, 1]]$ (no pure NE!). Mixed equilibrium: $x^* = y^* = (1/2, 1/2)$. Value of the game = 0. This confirms: in any zero-sum matrix game there exists a mixed NE.

LQ Differential Games

Linear-quadratic (LQ) game: ideal case for explicit solutions.

$\dot{x} = Ax + Bu + Cv, \quad J = x^\top(T) G x(T) + \int_0^T [x^\top Q x + u^\top R u - v^\top S v ] dt$

(the minus before $v^\top S v$: E wants to increase control cost)

Solution: optimal strategies are linear:

$u^(t) = -R^{-1} B^\top P(t) x(t), \quad v^(t) = S^{-1} C^\top P(t) x(t)$

where $P(t)$ satisfies the Isaacs-Riccati equation (matrix ODE):

$\dot{P} + A^\top P + P A + Q - P B R^{-1} B^\top P + P C S^{-1} C^\top P = 0, \quad P(T) = G$

This is a second-order equation (contains $P^2$ terms), in contrast to the linear Riccati equation for standard LQ-control.

Full Solution: 1D LQ Game

Problem: $\dot{x} = u + v$, $x(0) = 1$, $J = x(T)^2 + \int_0^T [u^2 - v^2] dt$ ($T = 1$).

Isaacs-Riccati equation: $\dot{P} = -P^2 \cdot 1 + P^2 \cdot 1 = 0 \rightarrow P = \text{const}$. $P(T) = 1$ (terminal cost). So $P(t) = 1$ for all $t$!

Optimal strategies: $u^(t) = -P(t)x(t) = -x(t)$, $v^(t) = P(t)x(t) = x(t)$.

Closed system: $\dot{x} = u^* + v^* = -x + x = 0 \rightarrow x(t) = x(0) = 1$.

Cost: $J = 1^2 + \int_0^1 [1 - 1]dt = 1$. Value of the game $V(1) = 1$. ✓ (At $P=1$ both players "neutralize" each other)

Saddle Points and Equilibrium

In zero-sum games the key concept is the saddle point: a pair of strategies $(u^, v^)$ such that $J(u, v^) \geq J(u^, v^) \geq J(u^, v)$ for all admissible $u, v$. Player 1 has nothing to gain by deviating with $v^*$ fixed, and vice versa. If a saddle point exists, the value of the game is uniquely determined. Isaacs' condition in differential games is the local (Hamiltonian) version of the saddle point existence condition.

Algorithms for Finding Equilibria

For matrix games, the saddle point is found via LP in polynomial time. For differential games, in general — numerical solution of HJI. For linear-quadratic games (LQG) the task reduces to coupled Riccati equations:

$ \frac{d}{dt} P = -A^\top P - P A - Q + P ( B R^{-1} B^\top - \gamma^{-2} D D^\top ) P $

where $\gamma$ is the robustness parameter. The solution $P$ gives the optimal feedback $u^* = -R^{-1} B^\top P x$. This is the basis of $H_\infty$-control.

Extensions

  • Games with informational asymmetry: Stackelberg, leader-follower, with information delay
  • Games with signals: one player can "bluff" — send false signals (application in cybersecurity)
  • Robust optimization as a game: "nature" as the opponent chooses the worst possible scenario — yields robust solutions for financial portfolios and engineering systems

§ Act · what next