Continuous Distributions — Probability & Statistics

Continuous random variable has a probability density function f(x) ≥ 0 with ∫f dx = 1. Probability of falling into an interval is the integral of the density.

Normal Distribution

X ~ N(μ, σ²): f(x) = (1/√(2πσ²)) exp(-(x-μ)²/(2σ²)). E[X]=μ, Var[X]=σ². Symmetric, 68-95-99.7 rule (±1σ, ±2σ, ±3σ).

Standard: Z ~ N(0,1). Function Φ(z) = P(Z≤z). P(a<X<b) = Φ((b-μ)/σ) - Φ((a-μ)/σ).

Sum of normals: If X~~N(μ₁,σ₁²) and Y~~N(μ₂,σ₂²) are independent: X+Y~N(μ₁+μ₂, σ₁²+σ₂²).

Exponential and Gamma Distributions

Exponential: X ~ Exp(λ). f(x) = λe^{-λx} for x≥0. E[X]=1/λ, Var[X]=1/λ². Memoryless property: P(X>s+t|X>t) = P(X>s). Models: time between events in a Poisson process.

Gamma distribution: X ~ Gamma(α,β). f(x) = β^α x^{α-1}e^{-βx}/Γ(α). E[X]=α/β, Var[X]=α/β². When α=1: Exp(β). Sum of n Exp(λ) ~ Gamma(n,λ).

Beta distribution: X ~ Beta(α,β) on [0,1]. f(x) = x^{α-1}(1-x)^{β-1}/B(α,β). Models probabilities and proportions. When α=β=1: U[0,1].

Lognormal and Heavy Tails

Lognormal: X ~ LN(μ,σ²): ln X ~ N(μ,σ²). f(x) = exp(-(ln x-μ)²/(2σ²))/(xσ√(2π)). E[X] = e^{μ+σ²/2}. Right-sided heavy tail. Models: asset prices, incomes, file sizes.

Exercise: (a) X~~N(100, 225). P(X>130), P(75<X<115). (b) Processor handles a request in T~~Exp(0.5 ms⁻¹). P(T>3ms)? Mean time? (c) Lognormal LN(5, 0.5²): calculate E[X], Median, Mode. Why are they different?

The Memoryless Property and Its Uniqueness

Theorem: The only continuous memoryless distributions are exponential. If P(X > s+t | X > t) = P(X > s) for all s,t > 0, then X ~ Exp(λ). This fundamental property makes the exponential distribution “standard” in queueing theory: elapsed waiting time does not affect the future.

Discrete analogue — geometric distribution (P(X > m+n | X > m) = P(X > n)). This explains why Poisson processes (continuous time, exponential inter-event intervals) and Markov chains (discrete time, geometric holding times) are mathematically close.

Sums and Transformations of Random Variables

Convolution: If X and Y are independent with densities fₓ, f_Y, then density of X+Y: f_{X+Y}(z) = ∫ fₓ(x)·f_Y(z−x)dx. Sum of normals ~ normal (closure). Sum of gamma distributions with same scale parameter ~ gamma with sum of shape parameters.

Transformation of monotonic function: If Y = g(X), g strictly monotonic, then f_Y(y) = f_X(g⁻¹(y))·|d/dy g⁻¹(y)|. Example: if X ~ U[0,1], then Y = −ln(X)/λ ~ Exp(λ). This is used for generating random variables by the inverse transform sampling method.

Box-Muller polar method: From independent U₁, U₂ ~ U[0,1]: Z₁ = √(−2 ln U₁)·cos(2πU₂) ~ N(0,1). Standard way to generate normally distributed random variables on computer.

Applications of Continuous Distributions in Science and Engineering

Normal: Measurement errors (Gauss theorem), height and weight of people, thermal fluctuations in physics, residuals in linear regressions. Central limit theorem explains the universality of the normal.

Exponential: Lifetimes of radioactive atoms, service times in queues (M/M/1), time between earthquakes (approximately), reliability of electronic components (time to failure).

Gamma: Aggregated waiting time for k Poisson events, repair time, depletion rate of insurance reserves. For α=n/2, β=1/2: χ²(n) — chi-square distribution, fundamental in statistics.

Beta: Prior distributions for probabilities in Bayesian statistics. Market share, conversion proportion, probability of experiment success. Conjugate to binomial: if prior Beta(α,β) and observe k successes out of n, posterior Beta(α+k, β+n−k).

Tail Probabilities and Extremes Distributions

Extreme Value Theory (EVT): Maximum of n i.i.d. samples as n → ∞ converges to one of three types: Gumbel distribution (light tails: normal, exponential), Fréchet (heavy tails: Pareto, Cauchy), Weibull (bounded tails: uniform, beta). Generalized Pareto distribution describes tails beyond high threshold — foundation of insurance and risk management.

Quantiles and percentiles: Quantile at level p: Q(p) = F⁻¹(p) = min{x: F(x)≥p}. Median = Q(0.5). Interquartile range IQR = Q(0.75) − Q(0.25) — robust measure of spread. For the normal: mean ± 1.96σ = 95% confidence interval for one observation.

68-95-99.7 rule for normal: P(μ−σ < X < μ+σ) ≈ 0.6827; P(μ−2σ < X < μ+2σ) ≈ 0.9545; P(μ−3σ < X < μ+3σ) ≈ 0.9973. In engineering, “6-sigma” means p ≈ 3.4 defects per million — manufacturing quality standard.

Link Between Continuous Distributions and Real Data

Normality check: QQ-plot (Quantile-Quantile plot) compares sample quantiles with theoretical quantiles — linear trend indicates conformity. Kolmogorov-Smirnov, Anderson-Darling, Shapiro-Wilk tests check the normality hypothesis. Tails of real data are often heavier than normal (leptokurtosis, excess > 0).

Reliability and survival: Reliability function R(t) = P(T>t) = 1−F(t). Failure rate h(t) = f(t)/R(t) — conditional failure density at time t given survival to t. For exponential h(t) = λ = const (no aging). For Weibull W(α,β): h(t) = (α/β)(t/β)^{α−1} — for α>1 intensity increases (aging), for α<1 decreases (burn-in). This is “bathtub curve”: high intensity at first, then constant, then growth.

Numerical Example: Normal and Exponential Distributions

Problem: (a) Height of students X~~N(175,100). Find P(160<X<190). (b) Device T~~Exp(λ=0.01). Find P(T>100) and median.

Step 1 (normal): Standardize: P(160<X<190) = P(−1.5<Z<1.5), since z₁=(160−175)/10=−1.5, z₂=(190−175)/10=1.5.

Step 2 (normal): P(−1.5<Z<1.5) = 2Φ(1.5)−1 = 2·0.9332−1 = 0.8664. About 87% of students have height between 160 and 190 cm.

Step 3 (exp): P(T>100) = e^{−0.01·100} = e^{−1} ≈ 0.368. Median: e^{−λm}=0.5 → m=ln(2)/0.01≈69.3.

Step 4: Weibull with β=2 (aging device): h(t)=2λ²t grows linearly. At t=100: h(100)=2·0.0001·100=0.02 — twice the initial level. Failure rate per year λ=0.01: about 37% of devices will last 100 units.