Nonparametric Methods and Bootstrap — Probability & Statistics

Nonparametric methods do not assume a concrete distributional form. They are used when the assumptions of parametric methods are violated. Bootstrap is a universal technique for error estimation.

Nonparametric Criteria

Sign Test: H₀: median = m₀. Statistic: number of observations > m₀. Under H₀: ~ Bin(n, 0.5). Does not depend on the distribution form, but is not very efficient.

Wilcoxon Signed-Rank Test: Accounts for the magnitude of deviations. Compute dᵢ = xᵢ − m₀, rank |dᵢ|, W⁺ = sum of ranks with dᵢ > 0. Under H₀: E[W⁺] = n(n+1)/4, Var[W⁺] = n(n+1)(2n+1)/24.

Mann-Whitney Test: Nonparametric two-sample test. U = #{(xᵢ, yⱼ): xᵢ > yⱼ}. Under H₀: E[U] = n₁n₂/2. Analog of two-sample t-test without normality.

Density Function Estimation

Histogram: Divide the domain into K bins. Bin width h = (max-min)/K. P(bin k) = n_k/n. Bias O(h²), variance O(1/(nh)) — trade-off.

Kernel Density Estimation (KDE): f̂(x) = (1/(nh))Σᵢ K((x−xᵢ)/h). K — kernel (Gaussian, Epanechnikov). Optimal bandwidth h* = 1.06σn^{-1/5} (for Gaussian K).

Bootstrap

Idea (Efron, 1979): Empirical distribution F̂ₙ → use resampling with replacement as “empirical universe”.

Algorithm: From the original sample x₁,...,xₙ → B bootstrap samples x₁*,...,xₙ* (with replacement). For each: compute statistic θ̂*. Bootstrap SE = std(θ̂*), Bootstrap CI = [2θ̂ − θ̂_{0.975}, 2θ̂ − θ̂_{0.025}].

When it works: For “reasonably” smooth θ̂. Convergence O(1/n) — better than normal approximation O(1/√n).

Exercise: (a) 10 observations: 2, 3, 4, 5, 5, 6, 7, 8, 10, 12. Sign test H₀: median=5. Wilcoxon test. (b) KDE: Gaussian kernel with h=1 for the same data. Draw f̂(x). (c) Bootstrap CI for the median: B=1000 samples. Compare to asymptotic CI.

Nonparametric Statistics: Details

Kolmogorov-Smirnov Test: For H₀: F = F₀, statistic Dₙ = sup_x |F̂ₙ(x) − F₀(x)|. Under H₀: √n·Dₙ → Kolmogorov distribution. Tables of critical values or exact formula via series. For two-sample KS test: Dₙₘ = sup_x |F̂₁(x) − F̂₂(x)| — test for homogeneity.

Kernel Density Estimation: Choosing bandwidth h is a key issue. Silverman's rule of thumb: h = 1.06·σ̂·n^{−1/5} (optimal for normal data). Leave-one-out cross-validation: MISE minimized by scanning h. Adaptive KDE: h depends on xᵢ (wider in tails).

Bootstrap: Theory and Variants

Nonparametric Bootstrap: For estimating SE(θ̂): generate B samples with replacement, compute θ̂* each time. SE_boot = sd(θ̂₁,...,θ̂_B). CI: percentile (quantiles of bootstrap distribution), BCa (bias-corrected accelerated — corrections for bias and skewness), studentized (via bootstrap SE).

Parametric Bootstrap: Estimate θ̂ from data, resample from F(θ̂). More accurate than nonparametric if the model is correct. Bootstrap for dependent data: block bootstrap, circular block bootstrap — preserve dependence structure.

Rank Tests and Their Power

Wilcoxon-Mann-Whitney test (two-sample): H₀: F₁ = F₂. Statistic U = #{ (i,j): X₁ᵢ > X₂ⱼ }. Asymptotically normal. Power against shift alternatives: ARE(Wilcoxon, t-test) = 3/π ≈ 0.955 for normal data — almost as powerful! For t(3): ARE = 1.5 > 1 — rank test is more powerful. For heavy tails, rank tests are significantly better.

Resampling Methods: Jackknife and Permutation Tests

Jackknife: θ̂_{(i)} — estimate without the i-th observation. SE_jack = √[(n−1)/n·Σ(θ̂_{(i)} − θ̂_{.})²], θ̂_{.} = mean of θ̂_{(i)}. Bias adjustment: θ̂_bias_corrected = nθ̂ − (n−1)θ̂_{.}. Jackknife is less computationally intensive than bootstrap, but less flexible.

Permutation Test: H₀: X and Y from the same distribution. Statistic T = X̄−Ȳ. Exact distribution under H₀: enumerate all C(n₁+n₂, n₁) permutations. p-value = #{Tᵢ ≥ T_obs}/n_perms. No requirement of normality. For n=10+10: C(20,10)=184756 — exact test; for large n — randomized (10000 random permutations).

Bayesian Nonparametric Methods

Dirichlet Process (DP): random discrete distribution P ~ DP(α, G₀). α — concentration, G₀ — base. Realizations are weighted sums of atoms. Used in Dirichlet Process Mixture: P ~ DP, Xᵢ|μᵢ ~ N(μᵢ, σ²), μᵢ|P ~ P. Automatically adapts the number of components — nonparametric mixture. Chinese Restaurant Process: pictorial description of DP-prior for partitions.

Gaussian Processes in Regression

Gaussian Process (GP): f ~ GP(m, k), where m(x) — mean function, k(x, x′) — kernel (covariance) function. Posterior prediction: f*|X, y, X* ~ N(μ*, Σ*), μ* = KK⁻¹y, Σ = K** − KK⁻¹Kᵀ. Kernel determines smoothness: SE (RBF), Matérn, periodic. GP regression — Bayesian nonparametric analog of kriging. Complexity O(n³) — limits scalability; approximations: sparse GP, inducing points.

Numerical Example: Bootstrap Estimate of Standard Error

Problem: Data n=5: x = {1, 3, 5, 7, 9}. Estimate the SE of the mean using bootstrap (B=4 samples for illustration).

Step 1: Original mean: x̄ = (1+3+5+7+9)/5 = 5.0.

Step 2: Four bootstrap samples with replacement and their means:

{1, 1, 5, 9, 7} → x̄* = 23/5 = 4.6;
{3, 5, 5, 9, 9} → x̄* = 31/5 = 6.2;
{7, 7, 3, 1, 5} → x̄* = 23/5 = 4.6;
{9, 5, 7, 5, 3} → x̄* = 29/5 = 5.8.

Step 3: SE_boot = std{4.6, 6.2, 4.6, 5.8}. Mean: 5.3. Variance: ((−0.7)²+(0.9)²+(−0.7)²+(0.5)²)/4 = (0.49+0.81+0.49+0.25)/4 = 0.51. SE_boot = √0.51 ≈ 0.714.

Step 4: Theoretically SE = σ/√n = √Var{1, 3, 5, 7, 9}/√5 = √10/√5 = √2 ≈ 1.414. As B→1000, bootstrap-SE converges to this value. GP predicts SE posterior via the matrix Σ*.