Simulated Annealing and Its Applications

Physical Analogy: Slow Cooling

In metallurgy, “annealing” means heating a metal to a high temperature and then cooling it slowly. At high temperature, atoms have plenty of energy and can "jump" over energy barriers, finding a better crystalline structure. With slow cooling, they settle in the global energy minimum (the ideal crystal). With rapid cooling, they “freeze” in a local minimum (glass).

The idea of the metaheuristic Simulated Annealing (SA) is to imitate this process! The algorithm accepts "worsening" steps with decreasing probability—this allows “escaping” from local optima and eventually finding the global one.

Simulated Annealing Algorithm

Parameters:

$T_0$ — initial temperature (high)
$\alpha \in (0,1)$ — cooling rate (usually 0.9–0.99)
$L_k$ — number of iterations at temperature $T_k$

Algorithm:

$s = s_0$ (initial solution), $T = T_0$, $BestS = s_0$
While $T > T_{min}$: a. Repeat $L_k$ times:
- $s' = random_neighbor(s)$ (random neighbor)
- $\Delta = f(s') - f(s)$ (change in objective)
- If $\Delta < 0$ (improvement): $s = s'$ (always accept)
- Otherwise: $s = s'$ with probability $\exp(-\Delta/T)$ b. $T \leftarrow \alpha \cdot T$
Return $BestS$

Probability of accepting worsening: $\exp(-\Delta/T)$. As $T \to \infty$: probability $\to 1$ (accept any step, random walk). As $T \to 0$: probability $\to 0$ (accept only improvements, standard local search).

Key property: at high $T$ the algorithm “explores” the solution space, at low $T$ it “exploits” the good solutions found.

Theoretical Guarantees

Geman-Geman Theorem (1984): if $T(k) = C/\log(1+k)$ (logarithmic cooling), SA converges to the global optimum with probability 1.

Why is it not used? Too slow. Logarithmic cooling means you need $O(e^C)$ iterations at temperature ~1. In practice, geometric cooling $T \leftarrow \alpha T$, $\alpha = 0.95–0.99$, is used, which violates the theorem's condition.

Practical convergence: SA finds good (but not guaranteed optimal) solutions in reasonable time. For TSP: tours within 1-2% of the optimum in seconds.

Tuning SA Parameters

Initial temperature $T_0$: chosen so that the initial “acceptance rate” is about 80-90%. For example, you can run 1000 random steps, compute the average worsening $\bar{\Delta}$, and choose $T_0$ so that $\exp(-\bar{\Delta}/T_0) = 0.8$.

Cooling rate $\alpha$: a trade-off between quality and time. $\alpha = 0.99$ means a long run, good quality. $\alpha = 0.9$ means quick, lower quality.

Number of iterations $L_k$ at each temperature: often $L_k = n$ (problem size). More iterations at one temperature → better “equilibrium”.

Stopping criterion: $T < T_{min}$, or no improvements in the last $k \cdot L_k$ iterations.

SA for VLSI Layout

Task: $n$ components on a chip, minimize the total length of connecting wires. This is called the Placement Problem in VLSI (Very Large Scale Integration).

SA approach:

Solution: coordinates of each component on the grid
Neighborhood: swap two components or move one component
Cost function: total length of the Steiner tree for each net

Historical significance: SA was first applied to VLSI Layout—IBM, 1983 (Kirkpatrick, Gelatt, Vecchi). The Science paper is one of the most cited in computer science. SA reduced wire length by 10-20% compared to deterministic methods.

Modern VLSI tools: SA remains fundamental, but with dozens of improvements: adaptive temperature, parallel runs, “restart” on stagnation.

SA vs. Other Methods

Method	Guarantee	Speed	Simplicity	Quality
Exact B&B	Optimum	Slow	Complex	Excellent
Greedy	None	Fast	Simple	Poor
Local Search	None	Fast	Simple	Average
SA	None (practical)	Medium	Simple	Good
Tabu	None	Medium	Complex	Good
GA	None	Slow	Medium	Good

Conclusion: SA is suitable for problems with a large number of local optima where implementation simplicity is important. Tabu—for structured problems with pronounced “memory”. For TSP, the best is LK heuristic.

Simulated Annealing Nuances

The quality of SA depends strongly on parameter choices:

Initial temperature $T_0$: should be high enough so that nearly all moves are initially accepted (≈ 80% acceptance rate)
Cooling schedule: classical geometric $T_k = \alpha \cdot T_{k-1}$ with $\alpha \in [0.85, 0.99]$; adaptive schedules (Lundy-Mees) decrease $T$ depending on progress
Neighborhood: must allow “reachability” of any point. Too small moves mean slow convergence, too large means loss of local structure

Theoretical Convergence

With sufficiently slow cooling ($T_k \geq c/\log(k+1)$ for suitable $c$) SA converges to the global optimum with probability 1 (Geman & Geman, 1984). In practice, this limit is unattainable—you have to accept local optimum.

Parallel Variants

Parallel tempering (replica exchange): several copies of SA with different temperatures exchange states. High-temperature copies explore, low-temperature ones exploit.
Population SA: multi-agent version with information exchange
GPU implementations: thousands of parallel SA runs with different starting points

Hybrid Methods

SA is often combined with other techniques:

SA + Local Search: after SA finds a good solution—local optimization for final polishing
Memetic algorithms: SA inside genetic algorithms as mutation
Variable Neighborhood Search (VNS): SA with changing neighborhood upon stagnation

Applications

VLSI placement: placing millions of transistors on a chip. SA was standard in the 1990s, still used in TimberWolf, Cadence
TSP: on problems up to 100,000 cities, SA gives solutions within 2-3% of the optimum
Aircraft and crew scheduling: Air France, Lufthansa use SA for dynamic rescheduling in disruptions
Molecular design: protein folding prediction, drug design—SA on the space of conformations
Financial portfolios: optimization with discrete constraints (number of shares, sectors, risk categories)