Introduction to Discrete Optimization: Problems, Models, Complexity

Why Is Discrete Optimization Everywhere?

FedEx delivers 15 million parcels per day. The route of each truck is a sequence of stops. How should the routes be assigned so the total mileage is minimized? This is a vehicle routing problem—a discrete optimization problem. Or: an airline must schedule flights, assigning crews to routes so as to reduce costs and observe rest requirements. Such problems arise in logistics, finance, manufacturing, bioinformatics every day. The peculiarity of discrete problems: solutions must be whole numbers or from a finite set of options. This makes them fundamentally harder than continuous ones.

Problem Scale

It may seem that one could simply enumerate all options. But the scale makes this impossible. The traveling salesman problem (find the shortest route through n cities): number of possible routes = $(n−1)!/2$. For $n = 20$: ≈ 60 quintillion routes. At 1 billion operations per second, enumeration would take 60 billion years—longer than the age of the universe. For $n = 200$—full enumeration is physically impossible, ever.

The solution? Smart algorithms that find the optimum (or a good approximation) in reasonable time.

Classic Discrete Optimization Problems

Traveling Salesman Problem (TSP): Given $n$ cities with distances $d_{ij}$ between them. Find the shortest route visiting each city exactly once and returning to the start.

Mathematically: find a permutation $\pi \in S_n$, minimizing $\sum_i d_{\pi(i),\pi(i+1)}$.

Knapsack Problem: $n$ items with values $c_i$ and weights $w_i$. The knapsack can hold $W$ kg. Choose a set of items with maximal total value.

Problem: $\max \sum_i c_i x_i$ subject to $\sum_i w_i x_i \leq W$, $x_i \in {0,1}$.

Set Cover Problem: Given a universal set $U$ and a family of subsets $S_1,...,S_m$. Choose the minimal subfamily covering $U$. Applications: minimal number of fire stations to cover an entire city.

MAX-CUT: Graph $G=(V,E)$. Split $V$ into two sets $S$, $V \setminus S$, maximizing the number of edges between them. Applications: data clustering, VLSI layout.

Integer Linear Programming (ILP)

Most discrete problems can be written as ILP:

$\min c^T x$ subject to $Ax \leq b$, $x \in \mathbb{Z}^n$ (or $x_i \in {0,1}$)

Example (TSP as ILP): $x_{ij} \in {0,1}$—use of edge $(i,j)$. Constraints: $\sum_j x_{ij} = 1$ (entering each city), $\sum_i x_{ij} = 1$ (exiting), $y_i - y_j + n x_{ij} \leq n-1$ for $i \neq j \neq 1$ (elimination of subroutes).

Modern ILP solvers (Gurobi, CPLEX, SCIP) solve real-world problems with millions of variables.

Complexity Theory: NP-Completeness

Why are some problems "hard"? Complexity theory provides a rigorous answer.

Class P: problems solvable in polynomial time $O(n^k)$. Examples: sorting $O(n \log n)$, shortest path in a graph $O(n^2)$, linear programming $O(n^{3.5})$.

Class NP: problems whose solution can be verified in polynomial time. Example: TSP—a given route can be checked in $O(n)$. But finding the optimum—we don't know how.

NP-complete problems: hardest in NP. Any NP problem can be polynomially reduced to any NP-complete problem. The first NP-complete problem: SAT (Cook, 1971).

Hypothesis $P \ne NP$ (Millennium Prize Problem, $1$ million dollars): NP-complete problems have no polynomial algorithms. Not proven, but universally believed.

What this means in practice: three strategies for NP-hard problems:

Exact algorithms (Branch-and-Bound): work for small or special cases
Approximation algorithms: guaranteed solutions close to optimum
Heuristics (SA, GA, Local Search): fast, no guarantees, works in practice

Complete Analysis: Knapsack Problem

Data: $W = 10$ kg, 4 items: $(c_1,w_1) = (6,4)$, $(c_2,w_2) = (5,3)$, $(c_3,w_3) = (4,2)$, $(c_4,w_4) = (3,1)$.

Full Enumeration ($2^4 = 16$ options): check all subsets.

${1,2}$: $w=7$, $c=11$; ${1,3}$: $w=6$, $c=10$; ${1,4}$: $w=5$, $c=9$; ${2,3}$: $w=5$, $c=9$; ${2,4}$: $w=4$, $c=8$; ${3,4}$: $w=3$, $c=7$; ${1,2,3}$: $w=9$, $c=15$; ${1,2,4}$: $w=8$, $c=14$; ${1,3,4}$: $w=7$, $c=13$; ${2,3,4}$: $w=6$, $c=12$; ${1,2,3,4}$: $w=10$, $c=18$.

Optimum: ${1,2,3,4}$ with $c=18$. But for $n = 50$ items—$2^{50} > 10^{15}$ options, enumeration is unrealistic.

Greedy Algorithm (sort by $c_i/w_i$): $\rho = [6,5,4,3]/[4,3,2,1] = [1.5, 1.67, 2.0, 3.0]$. Order: item 4 ($\rho=3.0$), 3 ($\rho=2.0$), 2 ($\rho=1.67$), 1 ($\rho=1.5$). Pick: 4 ($w=1$), 3 ($w=3$), 2 ($w=6$), add 1 ($w=10 > 10$). Result: ${2,3,4}$, $c=12$. Not optimal! The greedy algorithm does not guarantee optimum for knapsack.