Module I·Article I·~4 min read

Probability Space and Kolmogorov's Axioms

Axiomatic Foundations of Probability Theory

Turn this article into a podcast

Pick voices, format, length — AI generates the audio

Probability theory acquired a rigorous mathematical foundation in 1933, when Andrey Nikolaevich Kolmogorov published "Foundations of the Theory of Probability", laying down the axiomatic basis that remains in effect to this day.

Probability Space

Definition: The triple (Ω, F, P), where: Ω is the sample space of elementary outcomes. F ⊆ 2^Ω is a σ-algebra of events. P: F → [0,1] is a probability measure.

σ-algebra F: A family of subsets of Ω satisfying: (1) Ω ∈ F; (2) A ∈ F ⇒ Aᶜ ∈ F (closure under complements); (3) A₁, A₂,... ∈ F ⇒ ⋃ₙ Aₙ ∈ F (closure under countable unions).

Kolmogorov's Axioms

P1 (Non-negativity): P(A) ≥ 0 for all A ∈ F.
P2 (Normalization): P(Ω) = 1.
P3 (Countable additivity): For pairwise disjoint A₁, A₂,...: P(⋃ₙ Aₙ) = Σₙ P(Aₙ).

Corollaries: P(∅) = 0. P(Aᶜ) = 1 - P(A). A ⊆ B ⇒ P(A) ≤ P(B). P(A ∪ B) = P(A) + P(B) - P(A ∩ B). P(⋃ᵢ Aᵢ) ≤ Σᵢ P(Aᵢ) (Boole's/Bonferroni's inequality).

Classical Probability

Laplace's definition: Ω is a finite set, all outcomes are equally likely. P(A) = |A|/|Ω|.

Inclusion-exclusion principle:
P(A₁ ∪ A₂ ∪ ... ∪ Aₙ) = Σ P(Aᵢ) - Σ P(AᵢAⱼ) + Σ P(AᵢAⱼAₖ) - ... + (–1)^{n+1}P(A₁...Aₙ).

Exercise: (a) For σ-algebra F: If A, B ∈ F, prove that A∩B ∈ F and AB ∈ F. (b) 52 cards, a 5-card hand. P(full house) = P(three of one + pair of another). Compute. (c) Birthday paradox: among n people P(at least two share a birthday) > 0.5. Find minimal n.

Historical Development of the Axiomatic System

Before Kolmogorov, probability theory developed intuitively. Pascal and Fermat in 1654 corresponded about the problem of dividing stakes—this is considered the birth of probability theory as a mathematical discipline. Laplace in the 19th century systematized classical probability, but there was no rigorous mathematical foundation. The problem was that for continuous distributions the classical definition "favorable cases divided by the total number of cases" lost its meaning. Kolmogorov's axiomatic system (1933) resolved this problem, relying on Lebesgue's measure theory.

Key idea: probability is simply a measure on a measurable space with the normalization condition P(Ω) = 1. This makes probability theory a part of the more general theory of measure and integration, opening access to the entire mathematical apparatus of analysis.

Types of Events and Operations

In practice, it is important to be able to work with compound events. Certain event—Ω, always occurs. Impossible event—∅, never occurs. Elementary outcome—singleton subset {ω} ∈ F. Mutually exclusive events A and B: A ∩ B = ∅, cannot occur simultaneously. Complete group of events B₁,...,Bₙ: pairwise mutually exclusive and ⋃Bᵢ = Ω.

Operations on events correspond to logical connectives: union A ∪ B means "A or B", intersection A ∩ B means "A and B", complement Aᶜ means "not A". De Morgan's law: (A ∪ B)ᶜ = Aᶜ ∩ Bᶜ says that "neither A nor B" is equivalent to "not A and not B".

Continuity of the Probability Measure

An important consequence of countable additivity is the continuity of P. If A₁ ⊇ A₂ ⊇ ... is a decreasing sequence of events with ⋂ₙ Aₙ = A, then P(Aₙ) → P(A). Similarly, for increasing: A₁ ⊆ A₂ ⊆ ..., ⋃ₙ Aₙ = B ⇒ P(Aₙ) → P(B). This property is critical when working with limits of events—a standard operation in asymptotic analysis.

Borel–Cantelli Lemma: If Σₙ P(Aₙ) < ∞, then P(lim sup Aₙ) = 0—almost surely only a finite number of events from the sequence will occur. If the events are independent and Σₙ P(Aₙ) = ∞, then P(lim sup Aₙ) = 1—almost surely infinitely many will occur. This result is used in proving the strong law of large numbers.

Models of Probability Spaces in Practice

In practice, the construction of the probability space is the first step in modeling. For a coin toss: Ω = {H, T}, F = {∅, {H}, {T}, Ω}, P({H}) = P({T}) = 1/2. For a die roll: Ω = {1,...,6}, F = 2^Ω (all 64 subsets), P is uniform. For continuous cases: Ω = ℝ, F = Borel σ-algebra B(ℝ), P is specified via the distribution function F(x) = P((–∞, x]).

The complexity of choosing a suitable F reflects the depth of the theory. For ℝ, one cannot take F = 2^ℝ—there exist non-measurable sets (the Vitali example) for which it is impossible to assign a probability correctly without violating additivity. The Borel σ-algebra B(ℝ) contains all "reasonable" subsets and is the standard choice.

Numerical Example: Tossing Three Coins

Problem: Three fair coins are tossed. Find P(exactly 2 heads).

Step 1: Ω = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}, |Ω| = 2³ = 8. All outcomes are equally likely.

Step 2: A = "exactly 2 heads" = {HHT, HTH, THH}, |A| = C(3,2) = 3.

Step 3: P(A) = |A|/|Ω| = 3/8 = 0.375.

Step 4: Verification via the Bernoulli formula: P(X=2) = C(3,2)·(1/2)²·(1/2)¹ = 3·(1/4)·(1/2) = 3/8. ✓ The σ-algebra F = 2^Ω here contains 2⁸ = 256 subsets, and the measure P is defined on each.

§ Act · what next