Collective Intelligence: Swarms, Flocks, and Consensus

Collective intelligence is the ability of a group to solve problems better than its individual members. Ants, bees, fish use decentralized algorithms without central control. These algorithms inspire AI, robotics, and decision theory.

Swarm Intelligence

Ant Colony Optimization (ACO, Dorigo, 1992):

Real ants search for food randomly. Having found it, they return, leaving behind a pheromone trail. Other ants follow the trail with a probability proportional to its intensity. Short paths → traversed faster → more pheromone → more ants → even more pheromone. Pheromone evaporation prevents getting stuck in suboptimal paths.

Mathematically: the probability of ant $i$ choosing edge $(u,v)$:

$ P_{i}(u,v) = [\tau(u,v)]^\alpha \cdot [\eta(u,v)]^\beta / \sum_{w\in N_i(u)} [\tau(u,w)]^\alpha \cdot [\eta(u,w)]^\beta $

Here: $\tau(u,v)$ — pheromone on the edge, $\eta(u,v) = 1/d(u,v)$ — "attractiveness" (inverse distance), $\alpha$, $\beta$ — balancing pheromone and heuristic.

Applications: route optimization (logistics, TSP), network routing (internet protocols), task scheduling.

Particle Swarm Optimization (PSO, Kennedy & Eberhart, 1995):

A swarm of particles in the solution space moves toward the best known positions:

$ v_i^{t+1} = w \cdot v_i^t + c_1 r_1 (p_best_i - x_i^t) + c_2 r_2 (g_best - x_i^t) \ x_i^{t+1} = x_i^t + v_i^{t+1} $

Decoding:

$v_i$ — "velocity" of the particle (direction and speed of movement)
$w$ — inertia (maintaining current direction)
$c_1 r_1 (p_best_i - x_i)$ — "memory": attraction to the best position of this particle
$c_2 r_2 (g_best - x_i)$ — "social influence": attraction to the best global position

Applications: neural network optimization, hyperparameter tuning, control tasks.

Bee algorithm: scouts explore the space randomly. Having found a good source → "waggle dance", encoding direction and distance. Other bees fly to the source proportionally to the intensity of the dance. Optimal resource allocation without central control.

Flock Movement Models

Boids (Reynolds, 1987): three rules:

Separation: if a neighbor is closer than $r_1$ — move away: $\Delta v_{sep} = -\sum_{j: d(i,j)<r_1} (x_j - x_i)/||x_j - x_i||$
Alignment: fly in the average direction of neighbors ($r_1 < d < r_2$): $\Delta v_{ali} = \langle v_j \rangle_{r_1 < d < r_2} - v_i$
Cohesion: fly to the center of mass of neighbors ($d < r_3$): $\Delta v_{coh} = (\langle x_j \rangle_{d < r_3} - x_i)$

Result: realistic flocks without a leader. Applications: special effects in cinema (Batman, Lord of the Rings), video games, autonomous drone swarms.

Collective Decision Making

Bees Choose a Hive (Seeley, 2010):

Scouts visit several locations. Return and "advertise" with waggle dance. Quality of location → duration and intensity of dance. "Recruited" bees fly to the best location, return, also dance. Consensus: when enough bees "vote" for one option — the swarm departs.

Mathematical model: ODE system for the populations of supporters of each option:

$ \frac{dX_i}{dt} = \sum_{j\neq i} r_j X_j X_i / N - r_i X_i (1 - X_i/N) + \sigma(N/k - X_i) $

If one option has high "quality" → bifurcation → the swarm selects it by consensus.

Wisdom of the Crowd and Its Limits

Galton (1907): 800 people guessed the weight of an ox at a fair. The average: 1207 pounds. Actual weight: 1198 pounds. Error 0.8% — better than any expert. Wisdom of the crowd!

Conditions for "wisdom of the crowd": (1) independence of judgments (agents don't copy each other), (2) diversity of agents (different evaluation methods), (3) decentralization (no "leader"). If conditions are violated: information cascades, "madness of crowds".

Information cascade (Bikhchandani, 1992): if the first few agents make the same choice, others ignore private information and copy — collective error. Example: fake news (100K reposts) → information cascade.

Prediction markets: Wisdom of Crowds in action. Iowa Electronic Markets: election predictions more accurate than polls. Futarchy (Hanson): governing a state via prediction markets.

Numerical Example: PSO for Rosenbrock Function

$f(x_1,x_2) = (1-x_1)^2 + 100(x_2-x_1^2)^2$ — famous "banana" function with minimum at (1,1). $n=30$ particles, $w=0.7$, $c_1 = c_2 = 1.5$, maximum 200 iterations. After 50 iterations: $g_best \approx (0.998, 0.996)$, $f \approx 0.0001$. After 200: $g_best \approx (1.0000, 1.0000)$, $f < 10^{-8}$. Comparison: gradient descent (L-BFGS) in 50 iterations gives a similar result, but is sensitive to the starting point.

Assignment: Implement PSO and ACO. PSO: optimize the 20-dimensional Rastrigin function $f(x) = 10n + \sum_i (x_i^2 - 10\cos(2\pi x_i))$. Compare with CMA-ES. ACO: solve TSP for 20 cities (random coordinates). Compare: random, nearest neighbour, ACO, optimal (brute-force for 20 cities is hard, use OR-tools). Plot convergence curves for both algorithms.