Module IV·Article II·~5 min read
Simulated Annealing and Its Applications
Metaheuristics
Turn this article into a podcast
Pick voices, format, length — AI generates the audio
Physical Analogy: Slow Cooling
In metallurgy, “annealing” means heating a metal to a high temperature and then cooling it slowly. At high temperature, atoms have plenty of energy and can "jump" over energy barriers, finding a better crystalline structure. With slow cooling, they settle in the global energy minimum (the ideal crystal). With rapid cooling, they “freeze” in a local minimum (glass).
The idea of the metaheuristic Simulated Annealing (SA) is to imitate this process! The algorithm accepts "worsening" steps with decreasing probability—this allows “escaping” from local optima and eventually finding the global one.
Simulated Annealing Algorithm
Parameters:
- $T_0$ — initial temperature (high)
- $\alpha \in (0,1)$ — cooling rate (usually 0.9–0.99)
- $L_k$ — number of iterations at temperature $T_k$
Algorithm:
- $s = s_0$ (initial solution), $T = T_0$, $BestS = s_0$
- While $T > T_{min}$:
a. Repeat $L_k$ times:
- $s' = random_neighbor(s)$ (random neighbor)
- $\Delta = f(s') - f(s)$ (change in objective)
- If $\Delta < 0$ (improvement): $s = s'$ (always accept)
- Otherwise: $s = s'$ with probability $\exp(-\Delta/T)$ b. $T \leftarrow \alpha \cdot T$
- Return $BestS$
Probability of accepting worsening: $\exp(-\Delta/T)$. As $T \to \infty$: probability $\to 1$ (accept any step, random walk). As $T \to 0$: probability $\to 0$ (accept only improvements, standard local search).
Key property: at high $T$ the algorithm “explores” the solution space, at low $T$ it “exploits” the good solutions found.
Theoretical Guarantees
Geman-Geman Theorem (1984): if $T(k) = C/\log(1+k)$ (logarithmic cooling), SA converges to the global optimum with probability 1.
Why is it not used? Too slow. Logarithmic cooling means you need $O(e^C)$ iterations at temperature ~1. In practice, geometric cooling $T \leftarrow \alpha T$, $\alpha = 0.95–0.99$, is used, which violates the theorem's condition.
Practical convergence: SA finds good (but not guaranteed optimal) solutions in reasonable time. For TSP: tours within 1-2% of the optimum in seconds.
Tuning SA Parameters
Initial temperature $T_0$: chosen so that the initial “acceptance rate” is about 80-90%. For example, you can run 1000 random steps, compute the average worsening $\bar{\Delta}$, and choose $T_0$ so that $\exp(-\bar{\Delta}/T_0) = 0.8$.
Cooling rate $\alpha$: a trade-off between quality and time. $\alpha = 0.99$ means a long run, good quality. $\alpha = 0.9$ means quick, lower quality.
Number of iterations $L_k$ at each temperature: often $L_k = n$ (problem size). More iterations at one temperature → better “equilibrium”.
Stopping criterion: $T < T_{min}$, or no improvements in the last $k \cdot L_k$ iterations.
SA for VLSI Layout
Task: $n$ components on a chip, minimize the total length of connecting wires. This is called the Placement Problem in VLSI (Very Large Scale Integration).
SA approach:
- Solution: coordinates of each component on the grid
- Neighborhood: swap two components or move one component
- Cost function: total length of the Steiner tree for each net
Historical significance: SA was first applied to VLSI Layout—IBM, 1983 (Kirkpatrick, Gelatt, Vecchi). The Science paper is one of the most cited in computer science. SA reduced wire length by 10-20% compared to deterministic methods.
Modern VLSI tools: SA remains fundamental, but with dozens of improvements: adaptive temperature, parallel runs, “restart” on stagnation.
SA vs. Other Methods
| Method | Guarantee | Speed | Simplicity | Quality |
|---|---|---|---|---|
| Exact B&B | Optimum | Slow | Complex | Excellent |
| Greedy | None | Fast | Simple | Poor |
| Local Search | None | Fast | Simple | Average |
| SA | None (practical) | Medium | Simple | Good |
| Tabu | None | Medium | Complex | Good |
| GA | None | Slow | Medium | Good |
Conclusion: SA is suitable for problems with a large number of local optima where implementation simplicity is important. Tabu—for structured problems with pronounced “memory”. For TSP, the best is LK heuristic.
Simulated Annealing Nuances
The quality of SA depends strongly on parameter choices:
- Initial temperature $T_0$: should be high enough so that nearly all moves are initially accepted (≈ 80% acceptance rate)
- Cooling schedule: classical geometric $T_k = \alpha \cdot T_{k-1}$ with $\alpha \in [0.85, 0.99]$; adaptive schedules (Lundy-Mees) decrease $T$ depending on progress
- Neighborhood: must allow “reachability” of any point. Too small moves mean slow convergence, too large means loss of local structure
Theoretical Convergence
With sufficiently slow cooling ($T_k \geq c/\log(k+1)$ for suitable $c$) SA converges to the global optimum with probability 1 (Geman & Geman, 1984). In practice, this limit is unattainable—you have to accept local optimum.
Parallel Variants
- Parallel tempering (replica exchange): several copies of SA with different temperatures exchange states. High-temperature copies explore, low-temperature ones exploit.
- Population SA: multi-agent version with information exchange
- GPU implementations: thousands of parallel SA runs with different starting points
Hybrid Methods
SA is often combined with other techniques:
- SA + Local Search: after SA finds a good solution—local optimization for final polishing
- Memetic algorithms: SA inside genetic algorithms as mutation
- Variable Neighborhood Search (VNS): SA with changing neighborhood upon stagnation
Applications
- VLSI placement: placing millions of transistors on a chip. SA was standard in the 1990s, still used in TimberWolf, Cadence
- TSP: on problems up to 100,000 cities, SA gives solutions within 2-3% of the optimum
- Aircraft and crew scheduling: Air France, Lufthansa use SA for dynamic rescheduling in disruptions
- Molecular design: protein folding prediction, drug design—SA on the space of conformations
- Financial portfolios: optimization with discrete constraints (number of shares, sectors, risk categories)
§ Act · what next