Module IX·Article II·~4 min read

Cramer-Rao Bound and Asymptotic Efficiency

Asymptotic Statistics and Robustness

Turn this article into a podcast

Pick voices, format, length — AI generates the audio

Cramer-Rao bound establishes the lower limit for the variance of unbiased estimators. An asymptotically efficient estimator attains this bound as n → ∞.

Fisher Information

Definition: I(θ) = E[(∂ log f(X;θ)/∂θ)²] = -E[∂² log f(X;θ)/∂θ²]. Fisher information measures the "steepness" of the likelihood function—the higher it is, the more precisely θ can be estimated.

Additivity: For n i.i.d. observations: Iₙ(θ) = n·I(θ). Matrix form: I(θ)ᵢⱼ = -E[∂² log f/∂θᵢ∂θⱼ]—the Fisher information matrix.

Examples: N(μ,σ²) with σ² known: I(μ)=1/σ². Poisson(λ): I(λ)=1/λ. Bernoulli(p): I(p)=1/(p(1-p)). Exp(λ): I(λ)=1/λ².

Cramer-Rao Inequality

Formulation: For an unbiased estimator θ̂ under regularity conditions: Var[θ̂] ≥ 1/(n·I(θ)). The lower bound for variance CRB = 1/(nI(θ)).

Condition for attaining the CRB: The bound is attained if and only if: ∂ log L/∂θ = c(θ)·(θ̂-θ) for some function c(θ). For exponential families, the natural sufficient statistic always reaches the CRB.

Matrix version: For a vector θ: Cov[θ̂] - I(θ)⁻¹ ≥ 0 (the difference is positive semi-definite). The variance of any linear combination aᵀθ̂ is not less than aᵀI(θ)⁻¹a.

Asymptotic Efficiency

Definition: The estimator θ̂ₙ is asymptotically efficient if √n(θ̂ₙ - θ₀) →_d N(0, I(θ₀)⁻¹). Asymptotic variance = CRB—you cannot do better as n→∞.

Le Cam's Theorem: The maximum likelihood estimator is asymptotically efficient within a wide class of regular models. "Super-efficiency" (Hodges): there exists an estimator better at one point, but worse elsewhere.

Relative Efficiency (ARE): ARE(θ̂₁, θ̂₂) = AsyVar(θ̂₂)/AsyVar(θ̂₁). ARE(median, mean) for N(0,1) = 2/π ≈ 0.637. For Laplace(0,b): ARE = 2—the median is more efficient with heavy tails.

Exercise: (a) Exp(λ): CRB for estimating λ and 1/λ. Does X̄ attain the CRB for 1/λ? And does 1/X̄ for λ? (b) ARE(median, mean) for t(ν) with ν=3,5,10. For what ν is the median preferable? (c) Logistic regression with n=200: estimate I(β) via the Hessian and as 1/Var[β̂] from simulation—compare.

Relative Efficiency of Estimators

ARE (Asymptotic Relative Efficiency) of estimators T₁ and T₂: ARE(T₁,T₂) = lim_{n→∞} Var(T₂)/Var(T₁) = lim nVar(T₂)/(nVar(T₁))—the ratio of variances for the same n. ARE > 1 ⟹ T₁ is more efficient.

ARE of mean vs. median: For a symmetric distribution F with density f: ARE(median, mean) = 4f(0)²σ². For the normal: 2/π ≈ 0.637 (median loses 36% efficiency). For Laplace: 2 > 1 (median is twice as efficient as the mean!). For Cauchy: ARE = ∞ (the mean has infinite variance and is useless).

Convexity and Estimation from Exponential Families

For the exponential family f(x|θ) = h(x)exp{η(θ)T(x) − A(θ)}: MLE = θ̂: E_θ[T(X)] = T̄. Fisher information I(θ) = A''(θ) (second derivative of log-partition). Matrix: I(η) = Var_η[T(X)]. Checking CRB attainability: MLE attains CRB if and only if T is minimal sufficient for θ (exponential family!).

Example for Bernoulli(p): A(p) = ln(1/(1−p)), A''(p) = 1/(p(1−p)) = I(p). CRB for p: 1/(nI(p)) = p(1−p)/n. X̄ has Var = p(1−p)/n = CRB—it attains!

Method of Moments Estimators and Their Efficiency

Method of moments (MoM): Solve the system E[Xᵢ] = μᵢ(θ) for θ. Easier to compute than MLE, but usually less efficient. Generalized method of moments (GMM): minimize ||n⁻¹Σg(Xᵢ,θ)||²_{W⁻¹} in θ, where W = Var[g]. Optimal weight W = Var[g(X,θ)] → asymptotically efficient in the class of MM estimators.

Hausman Test: Efficiency vs. Consistency

Hausman Test: Compare two estimators θ̂₁ (efficient, but inconsistent under violation of assumptions) and θ̂₂ (consistent, but inefficient). Statistic: H = (θ̂₂−θ̂₁)ᵀ(Var(θ̂₂)−Var(θ̂₁))⁻¹(θ̂₂−θ̂₁) ~ χ²(k). If H is large → θ̂₁ is inconsistent (assumptions violated). Used in econometrics: test of endogeneity (OLS vs. IV), test of random vs. fixed effects.

Instrumental Variables and Two-Stage Least Squares

Endogeneity problem: Cov(X,ε) ≠ 0 → OLS biased. Instrumental variable Z: Cov(Z,X) ≠ 0 (relevant), Cov(Z,ε) = 0 (exogenous). 2SLS: 1st stage: regress X on Z → Xhat. 2nd stage: regress Y on Xhat. The estimator is consistent with weak instruments (1st stage F-stat > 10—a practical rule). Weak instruments: 2SLS is heavily biased toward OLS.

Bennett and Bernstein Exponential Inequalities

For independent centered X with |Xᵢ| ≤ M and Var(Xᵢ) = σᵢ²: Bennett inequality: P(Σ(Xᵢ − EXᵢ) ≥ t) ≤ exp(−σ²/M²·h(tM/σ²)), h(u) = (1+u)ln(1+u)−u. Bernstein inequality: P(|X̄ − μ| ≥ t) ≤ 2exp(−nt²/(2(σ²+Mt/3))). For small t → Gaussian tail bound; for large t → exponential. Standard in ML theory to estimate rates of convergence for empirical risk.

Super-efficiency and Le Cam's Theorem

Super-efficient estimation: An estimator θ̂ is super-efficient at θ₀ if √n(θ̂−θ₀)/√I(θ₀)⁻¹ → 0 at the point θ₀. Example: θ̂ = I(|X̄|<n^{-1/4})·0 + I(|X̄|≥n^{-1/4})·X̄. At θ₀=0 this is more efficient than the CRB! Le Cam’s theorem: super-efficiency is possible only on a set of measure zero—you cannot be better than the MLE everywhere simultaneously.

Numerical Example: Fisher Information and the Cramer-Rao Bound

Problem: X₁,...,X₁₀₀ ~ Bernoulli(p). Find I(p), lower bound Var[p̂] at p=0.4. Compare with Var[X̄].

Step 1: ℓ(p;x)=x·ln(p)+(1−x)·ln(1−p). ∂ℓ/∂p=x/p−(1−x)/(1−p). Fisher information for one observation: I(p)=E[(∂ℓ/∂p)²].

Step 2: E[(x/p−(1−x)/(1−p))²]=p/p²+(1−p)/(1−p)²=1/p+1/(1−p)=1/(p(1−p)).

Step 3: At p=0.4: I(0.4)=1/(0.4·0.6)=1/0.24≈4.167. CR bound for n=100: Var[p̂]≥1/(n·I(p))=1/(100·4.167)=0.0024. SE≥√0.0024≈0.049.

Step 4: MLE p̂=x̄: Var[p̂]=p(1−p)/n=0.24/100=0.0024—exactly equal to the lower bound! The MLE is efficient (attains the Rao-Cramer bound) for one-dimensional exponential families. Relative efficiency of the median estimator: 2p(1−p)·(2f(μ)²)⁻¹·n=π/2≈1.57—the median is 1.57 times worse than the mean for the normal distribution.

§ Act · what next