Module II·Article III·~4 min read

Differential Games with a Finite Horizon

Zero-Sum Games and Minimax

Turn this article into a podcast

Pick voices, format, length — AI generates the audio

Terminal Moment: What Happens "at the End"

In problems with a finite horizon [0, T], the system state x(T) occupies a special status: the function g(x(T)) defines the "terminal payoff". This determines the "boundaries"—the sets of initial states x₀ from which P or E can guarantee the desired outcome. Mathematically: the value function is defined from the HJI with terminal condition V(x,T) = g(x) and is "computed backwards" down to t = 0.

Solution Structure via Backward Induction

The principle of optimality gives: V(x,t) = min_u max_v {F(x,u,v,t)dt + V(x+f·dt, t+dt)}.

This is the "Bellman equation" in reverse time: knowing V at t+dt, we compute V at t. We start from T (where V(x,T) = g(x)) and "move backward".

For LQ games: V(x,t) = xᵀP(t)x (quadratic in x), P(t) is the Riccati-Isaacs matrix satisfying an ODE.

LQ Game with Finite Horizon

ẋ = Ax + Bu + Cv, J = xᵀ(T)Gx(T) + ∫₀ᵀ [xᵀQx + uᵀRu − vᵀSv] dt.

Riccati-Isaacs equation:

−Ṗ = AᵀP + PA + Q − PBR⁻¹BᵀP + PCS⁻¹CᵀP, P(T) = G.

(Integrate backward in time from T to 0.)

Optimal strategies (linear feedbacks):

u*(t) = −R⁻¹Bᵀ P(t) x(t) v*(t) = S⁻¹Cᵀ P(t) x(t)

Solvability condition: the Riccati equation must have a solution over all [0, T]. For large T (or large C), the solution may "blow up"—the value of the game is unbounded.

Survival and Capture Zones

For problems with a terminal "target": T = the "target set" (for example, a ball around E).

V(x, T) = 1 if x ∈ T, else −1.

The value function takes values in {−1, +1}: P captured (±1 for P) or E escaped.

Reach-Avoid set: initial conditions x₀ from which P can guarantee capture. The boundary = barrier surface.

Computation: level-set method (Hamilton-Jacobi toolbox). V(x,t) = 0 — boundary. V(x,t) < 0 — P’s zone, V(x,t) > 0 — E’s zone.

Complete Analysis: Two-Dimensional Capture Game

System: ẋP = uP, ẋE = uE, |uP| ≤ αP, |uE| ≤ αE. Capture: |xP − xE| ≤ l.

Change of variable: r = xP − xE (relative position). ṙ = uP − uE.

J = 0 if |r(T)| ≤ l (capture), 1 otherwise (avoidance).

Simplest 1D case (r ∈ ℝ): HJI: ∂V/∂t + min_u max_v [(u−v) ∂V/∂r] = 0.

Value function V(r,t): V = 0 if |r| ≤ l, and V(r,t) = V(|r| − (αP−αE)(T−t), T) for |r| > l.

If αP > αE: P captures E if |r(0)| ≤ l + (αP−αE)T. The capture zone grows with T.

If αP < αE: capture zone is only |r(0)| ≤ l (already captured), E escapes.

Numerical Methods for HJI

Finite difference method: discretization in x on a grid, integration of HJI backwards in t.

Requirements: monotone (upwind) scheme for correct viscosity solution. Number of points: O((1/h)^n)—exponential in dimension!

Curse of dimensionality: for n=4 and step h=0.1: 10⁴ = 10,000 points—feasible. For n=6: 10⁶—hard. For n=10: impossible.

Alternatives:

  • Deep learning for HJI (DeepReach, 2021): neural network approximates V(x,t) → works in high dimensions
  • Linearization: V ≈ xᵀP(t)x (LQ-approximation)
  • PINN (Physics-Informed Neural Networks): train a neural net to satisfy HJI as a physical constraint

Problems with Fixed Time

In games with finite horizon, several types of terminal conditions are important:

  • Fixed T, free x(T): V(x, T) = g(x)—terminal-cost problem
  • Fixed xT: achieve a given state—V(xT, T) = 0, V(x, T) = +∞ for x ≠ xT (degeneracy)
  • Free T (stopping time): T is defined by the first moment when x(T) enters the target set—games with stopping time

Principle of Dynamic Programming

Bellman's principle of optimality for games: V(x, t) = inf_u sup_v {∫_t^{t+δ} F dτ + V(x(t+δ), t+δ)}. This local condition generates the HJI equation and connects values at different moments in time. Solution—a viscosity solution to the HJI with terminal condition V(x, T) = g(x), integrated backward from T to 0.

Reachability and Achievable Zones

For practical safety problems, the optimum is often less relevant than the reachability set—which states x the player can guarantee at time T against the opponent’s worst strategy. This is handled via signed distance: define V(x, T) as the signed distance to the target set and solve HJI backward. The level set {V ≤ 0} is the guaranteed reach zone.

Applications

Differential games with a finite horizon are foundational for safe navigation of quadcopters (Tomlin, Mitchell), maneuvering military aircraft (Air Combat Maneuvering), obstacle avoidance for autonomous transportation, reactor control on a finite time window. Libraries helperOC, hj_reachability implement these algorithms for dimensions up to 5–6.

§ Act · what next