Module V·Article III·~5 min read

Sufficient Statistics and the Rao-Blackwell Theorem

Sample Statistics and Estimation

Turn this article into a podcast

Pick voices, format, length — AI generates the audio

Sufficient statistics compress all information about a parameter from the sample. The Rao-Blackwell theorem allows improving any estimator by conditionally averaging over a sufficient statistic.

Sufficient Statistics

Definition (Fisher, 1922): A statistic T(X) is sufficient for θ if the conditional distribution of the sample given T is independent of θ. Intuition: T contains everything the sample knows about θ.

Factorization Criterion (Neyman-Fisher): T is sufficient if and only if the likelihood factorizes: $L(\theta; x) = g(T(x); \theta) \cdot h(x)$. The part depending on θ enters only via T(x).

Examples: Poisson(λ): $T = \sum X_i$. Bernoulli(p): $T = \sum X_i$. N(μ, known σ²): $T = \bar{X} = \sum X_i / n$. N(μ, unknown σ²): $T = (\sum X_i, \sum X_i^2)$ — a two-dimensional sufficient statistic.

Minimal sufficient statistic: T is minimally sufficient if it is a function of any other sufficient statistic. For exponential families: the natural sufficient statistic is minimal.

Rao-Blackwell Theorem

Statement: Let $\tilde{\theta}$ be an unbiased estimator of θ, T a sufficient statistic. Define $\hat{\theta} = E[\tilde{\theta} | T]$. Then: (1) $\hat{\theta}$ is unbiased; (2) $Var[\hat{\theta}] \leq Var[\tilde{\theta}]$ for all θ. Conditional averaging over the sufficient statistic does not worsen the variance.

Proof: $E[\hat{\theta}] = E[E[\tilde{\theta}|T]] = E[\tilde{\theta}] = \theta$. By the law of total variance: $Var[\tilde{\theta}] = E[Var[\tilde{\theta}|T]] + Var[E[\tilde{\theta}|T]] = E[Var[\tilde{\theta}|T]] + Var[\hat{\theta}] \geq Var[\hat{\theta}]$.

UMVU Estimators (UMVUE)

Definition: The uniformly minimum variance unbiased estimator (UMVUE) is the unbiased estimator with the smallest possible variance for all θ simultaneously.

Lehmann–Scheffé Theorem: If T is a complete sufficient statistic and $g(T)$ is unbiased for θ, then $g(T)$ is UMVUE.

Example: Poisson(λ): $T = \sum X_i$ is completely sufficient. $\bar{X} = T/n$ is the UMVUE for λ. UMVUE for $e^{-\lambda} = P(X=0)$: $((n-1)/n)^T$. For Bernoulli(p): $\bar{X} = T/n$ is the UMVUE for p; $T(T-1)/(n(n-1))$ is the UMVUE for $p^2$.

Cramér–Rao Bound for UMVUE: For exponential families, the UMVUE attains the lower bound of the Cramér-Rao inequality. Outside of this family, UMVUE may exist without attaining the bound.

Exercise: (a) Sample from Exp(λ): factorization criterion for $T = \sum X_i$, UMVUE for $1/\lambda$. (b) Bernoulli(p): UMVUE for $p(1-p)$ via Rao-Blackwell theorem starting from estimator $X_1(1 - X_2)$. (c) N(μ, σ²): prove that $(\bar{X}, S^2)$ is a complete sufficient statistic. UMVUE for $P(X > c) = 1 - \Phi((c - \mu)/\sigma)$?

Algorithm for Applying the Rao-Blackwell Theorem

To find the UMVUE: (1) Find any unbiased estimator $\delta(X)$. (2) Find a complete sufficient statistic T. (3) Improve the estimator: $\delta^(X) = E[\delta(X)|T(X)]$. The result $\delta^$ is the MVUE by the Lehmann–Scheffé lemma.

Example for Poisson(λ): Estimator $P(X_1=0) = I(X_1=0)$ is unbiased ($E = e^{-\lambda}$). Sufficient statistic $T = \sum X_i \sim Poisson(n\lambda)$. $E[I(X_1=0)|T=t] = ((n-1)/n)^t = (1-1/n)^t$. This is MVUE for $e^{-\lambda}$. More efficient than $e^{-\bar{X}}$: $Var[MVUE] < Var[e^{-\bar{X}}}$ for finite n.

Optimality and Lower Bounds for Variance

Rao–Cramér Bound (information inequality): $Var_\theta(\hat{\theta}) \geq 1/(n \cdot I(\theta))$ for unbiased $\hat{\theta}$. Here $I(\theta) = E[(\partial \ln f/\partial \theta)^2] = -E[\partial^2 \ln f/\partial \theta^2]$. Attained if and only if $\hat{\theta}$ is a function of the sufficient statistic of an exponential family.

Hammersley–Chapman Bound: Generalization for small samples. For a matrix parameter: Fisher information bound is the matrix $I(\theta)$, $Var[\hat{\theta}] \geq I(\theta)^{-1}$ (in the sense of positive semidefiniteness).

Robust Estimation Methods

M-estimates: $\hat{\theta} = \arg\min \sum \rho(x_i - \theta)$ where $\rho$ is the loss function. Quadratic losses $\rho(u) = u^2$ yield the mean. Absolute $\rho(u) = |u|$ gives the median. Huber: $\rho(u) = u^2/2$ for $|u| \leq c$, $c|u| - c^2/2$ for $|u| > c$ — a compromise between robustness and efficiency.

Asymptotics of M-estimates: $\sqrt{n}(\hat{\theta}-\theta) \to N(0, E[\psi^2]/(E[\psi'])^2)$, where $\psi = \rho'$. The optimal $\rho$ for a given F is the log-density: maximum likelihood estimate. $ARE$(M-estimate, mean) depends on F and choice of $\rho$.

Sufficiency in Multivariate Models

For a multivariate parameter $\theta \in \mathbb{R}^p$, sufficiency is defined analogously. Minimal sufficient statistic T: sufficient and a function of any other sufficient statistic. For exponential families: $T(X) = (\sum t_1(X_i),..., \sum t_k(X_i))$ — minimally sufficient.

Example: for normal $N(\mu, \sigma^2)$ (both parameters unknown): $T = (\bar{X}, S^2) = (\sum x_i / n, \sum (x_i - \bar{x})^2/(n-1))$ — complete sufficient. Basu's theorem: $\bar{X}$ and $S^2$ are independent. UMVUE for $\mu$: $\bar{X}$. UMVUE for $\sigma^2$: $S^2$. UMVUE for $P(X > c)$: a complicated function of T.

Example of Applying the Rao-Blackwell Theorem

Sample from Geometric(p) (number of trials until success). Initial estimator: $\delta(X_1,...,X_n) = I(X_1 = 1)$ — unbiased for p. Sufficient statistic: $T = \sum X_i$ (total number of trials). Improved estimator: $\delta^*(T) = E[I(X_1=1)|T=t] = P(X_1=1, X_2+...+X_n=t-1)/P(T=t)$. This is the MVUE for p, more efficient than any other unbiased estimator.

Numerical Example: Rao-Blackwell Theorem

Problem: $X_1,...,X_4 \sim Bernoulli(p)$. $T = \sum X_i \sim Binomial(4, p)$. Improve $\delta(X) = X_1$ via conditional expectation.

Step 1: $\delta(X) = X_1$ is unbiased: $E[X_1] = p$. $Var[X_1] = p(1-p)$. Sufficient statistic: $T = X_1 + X_2 + X_3 + X_4$.

Step 2: $\delta^*(T) = E[X_1|T=t] = P(X_1=1|T=t) = P(X_1=1, X_2 + X_3 + X_4 = t - 1)/P(T = t)$.

Step 3: Numerator: $p \cdot C(3, t-1) p^{t-1} (1-p)^{3-(t-1)}$. Denominator: $C(4, t) p^t (1-p)^{4-t}$. The ratio: $\delta^*(t) = C(3, t-1)/C(4, t) = t/4$.

Step 4: $\delta^*(T) = T/4 = \bar{X}$ — sample mean! $Var[\bar{X}] = p(1-p)/4 < Var[X_1] = p(1-p)$. For $p=0.5$: $Var[X_1]=0.25$, $Var[\bar{X}]=0.0625$ — variance decreased by a factor of 4. Rao-Blackwell theorem: conditioning on the sufficient statistic always improves (does not worsen) the estimator.

§ Act · what next