Module II·Article I·~4 min read

Discrete Distributions

Random Variables and Distributions

Turn this article into a podcast

Pick voices, format, length — AI generates the audio

Discrete Distributions

A discrete random variable takes on a countable number of values. It is fully described by its probability mass function (PMF)—the probabilities of each value.

Bernoulli and Binomial Distribution

Bernoulli: X ~ Bernoulli(p). P(X=1)=p, P(X=0)=1-p. E[X]=p, Var[X]=p(1-p).

Binomial: X ~ Bin(n,p). X = number of successes in n independent Bernoulli trials. P(X=k) = C(n,k)p^k(1-p)^{n-k}. E[X]=np, Var[X]=np(1-p). Generating function: G(z) = (1-p+pz)^n.

Poisson: X ~ Poisson(λ). P(X=k) = e^{-λ}λ^k/k!, k=0,1,2,... E[X] = Var[X] = λ. Limit of Bin(n,p) as n→∞, p→0, np→λ.

Geometric: X ~ Geom(p). X = number of trials until first success. P(X=k) = (1-p)^{k-1}p. E[X]=1/p, Var[X]=(1-p)/p². Memoryless property: P(X>m+n|X>m) = P(X>n).

Negative Binomial and Hypergeometric

Negative Binomial: X = number of failures until the r-th success. P(X=k) = C(r+k-1,k)p^r(1-p)^k. E[X]=r(1-p)/p.

Hypergeometric: Sample of size n from N (K “special” items). X = number of specials in sample. P(X=k) = C(K,k)C(N-K,n-k)/C(N,n). As N→∞, K/N→p → Bin(n,p).

Exercise: (a) Binomial Bin(20, 0.3): compute P(X ≤ 5), P(X = 6), E[X], Var[X]. Approximate with Poisson. (b) Number of “clicks” on an ad ~ Poisson(2). P(≥3 clicks)? What is P(first click on 4th impression)—geometric?

Cumulative Distribution Function and Generating Functions

For a discrete random variable, the cumulative distribution function (CDF) is important: F(x) = P(X ≤ x) = Σ_{k≤x} P(X=k). It is non-decreasing, right-continuous, F(−∞) = 0, F(+∞) = 1. For discrete distributions, the CDF is a step function with jumps at points where P(X=k) > 0.

Probability generating function G(z) = E[z^X] = Σ_{k=0}^∞ P(X=k)z^k is convenient for finding probabilities and moments. P(X=k) = G^{(k)}(0)/k!, E[X] = G'(1), E[X(X−1)] = G''(1). For the sum of independent X, Y: G_{X+Y}(z) = G_X(z)·G_Y(z).

Tail characteristics: The Poisson distribution has “light tails”—P(X ≥ k) decreases exponentially for large k. Negative binomial has heavier tails. This is important when modeling rare catastrophic events.

Approximations and Limit Theorems for Discrete Distributions

Poisson approximation: Bin(n, p) as n → ∞, p → 0, np → λ: P(Bin(n,p) = k) → P(Poisson(λ) = k). Error: |P(Bin(n,p)=k) − P(Poisson(λ)=k)| ≤ min(p, λ/n) · (λ or np). Works when n ≥ 100, p ≤ 0.01.

Normal approximation to the binomial (CLT): When np ≥ 5 and n(1−p) ≥ 5: Bin(n,p) ≈ N(np, np(1−p)). Continuity correction: P(X ≤ k) ≈ Φ((k+0.5−np)/√(np(1−p))). Important when calculating tail probabilities.

Real-World Applications of Discrete Distributions

Binomial: Quality control—the number of defective items in a batch. A/B testing—the number of conversions for given traffic. Genetics—the number of alleles of a specific type (Mendel’s law).

Poisson: Phone calls to a call center per hour. Radioactive decay—number of particles per second. DNA mutations per 1 million base pairs. Number of server failures per day. In all cases: rare events in a large number of independent trials.

Geometric: Number of attempts before the first successful sale (cold calling). Lifetime of a part until the first failure in discrete time. The memoryless property (P(X>m+n|X>m) = P(X>n)) makes the geometric distribution the discrete analog of the exponential.

Hypergeometric: Lottery (how many “winning” numbers are chosen)—sampling without replacement from a finite population. As N → ∞, transitions to binomial. Population size estimation: capture-recapture method uses hypergeometric.

Variance and Relationship Between Moments

The formula Var[X] = E[X²] − (E[X])² is useful for analytical calculations: there is no need to center X before squaring. For Poisson(λ): E[X²] = E[X(X−1)] + E[X] = λ² + λ, Var[X] = λ—a remarkable equality of mean and variance, used to check for Poisson character in data (dispersion/mean ratio test, or “index of dispersion”).

Chebyshev’s inequality for discrete distributions: P(|X−μ| ≥ kσ) ≤ 1/k². For Bin(100, 0.5): μ=50, σ=5. P(|X−50| ≥ 15) ≤ 1/9 ≈ 0.111. Exact value (via binomial CDF) ≈ 0.003. Chebyshev is very conservative, but works without knowing the exact distribution.

Relationship between distributions: Negative binomial = mixture of Poisson distributions with gamma-distributed rate parameter. As r → ∞, r(1−p) → λ: NB(r,p) → Poisson(λ). This links discrete distributions with continuous ones hierarchically—a foundation for hierarchical Bayesian models.

Methods for Estimating Distribution Parameters

Maximum likelihood method (MLE): θ̂ = argmax L(θ|x₁,...,xₙ) = argmax Σ ln f(xᵢ|θ). Properties: consistency, asymptotic normality (√n(θ̂−θ) → N(0, I(θ)⁻¹)), invariance. Rao-Cramer bound: Var(θ̂) ≥ 1/(n·I(θ)), where I(θ) = E[(∂ ln f/∂θ)²]—Fisher information.

Method of moments: Set theoretical moments equal to sample moments. Less efficient than MLE, but often analytically simpler. Bayesian estimate: θ̂ = E[θ|data] from the posterior distribution. For conjugate priors it is calculated analytically.

For Poisson(λ): MLE is λ̂ = x̄ (sample mean), and this is also the unbiased estimator with minimum variance (MVUE by Rao-Blackwell-Lehmann-Scheffe theorem via the sufficient statistic Σxᵢ).

Numerical Example: Poisson Distribution

Problem: The call center receives λ=4 calls per minute. Find P(X=6) and P(X≥1).

Step 1: Poisson formula: P(X=k) = e^{−λ}·λᵏ/k!. For λ=4: e^{−4} ≈ 0.01832.

Step 2: P(X=6) = 0.01832·4⁶/6! = 0.01832·4096/720 = 0.01832·5.689 ≈ 0.1042. Most probable are X=3 and X=4.

Step 3: P(X≥1) = 1−P(X=0) = 1−e^{−4}·1 = 1−0.01832 ≈ 0.9817.

Step 4: MLE on sample x={3,5,4,6}: λ̂ = x̄ = (3+5+4+6)/4 = 4.5. Method of moments: E[X]=λ → λ̂=x̄=4.5. Both yield the same result—a characteristic property of the Poisson distribution.

§ Act · what next