Multivariate Distributions and Copulas — Probability & Statistics

Multivariate distributions describe the joint behavior of several random variables. Copulas are a powerful tool for modeling dependencies between variables independently of their marginal distributions.

Bivariate Normal Distribution

Joint Density: (X, Y) ~ N₂(μ, Σ). Σ = [[σ₁², ρσ₁σ₂], [ρσ₁σ₂, σ₂²]]. The correlation parameter ρ = Cov(X, Y)/(σ₁σ₂). Conditionals: Y|X=x ~ N(μ₂ + ρσ₂/σ₁(x−μ₁), σ₂²(1−ρ²)).

Correlation ≠ dependence: For ρ=0: X and Y are uncorrelated, but not necessarily independent (for normals — this is equivalent!).

Sklar’s Theorem and Copulas

Sklar’s Theorem (1959): For the joint CDF H(x, y) = C(F₁(x), F₂(y)), where F₁, F₂ are marginal CDFs, C: [0,1]² → [0,1] is a copula. With continuous marginals, C is unique.

Examples of copulas:

Gaussian: C_Gauss(u, v) = Φ₂(Φ⁻¹(u), Φ⁻¹(v); ρ)
Clayton: C_Clay(u, v) = (u⁻ᵅ + v⁻ᵅ − 1)^{−1/α}
Gumbel: with upper-tail dependence

Tail dependence: λ_U = lim_{u→1} P(X > F₁⁻¹(u) | Y > F₂⁻¹(u)) — probability of simultaneous extreme events. Gaussian copula: λ_U = 0 when |ρ| < 1. Gumbel copula: λ_U = 2 − 2^{1/α} > 0.

Assignment: (a) For bivariate normal (μ=(0,0), σ₁=σ₂=1, ρ=0.7): compute P(X>1, Y>1). (b) Monte Carlo modeling: from a Clayton copula with α=2 and Exp(1) marginals → compute P(X>2, Y>2) and P(X>2). Check: tail dependence.

Marginal and Conditional Distributions

For a vector (X, Y) with joint density f(x, y) the marginal density of X: f_X(x) = ∫ f(x, y) dy. The marginal distribution “forgets” the second variable. Conditional density of Y given X=x: f_{Y|X}(y|x) = f(x, y)/f_X(x). Law of total probability for densities: f_Y(y) = ∫ f_{Y|X}(y|x) f_X(x) dx.

For the bivariate normal: X|Y=y ~ N(μ₁ + ρ(σ₁/σ₂)(y−μ₂), σ₁²(1−ρ²)). The conditional expectation E[X|Y=y] is a linear function of y, which forms the basis of linear regression! Conditional variance Var[X|Y] = σ₁²(1−ρ²) does not depend on y.

Multivariate Normal Distribution

X ~ Nₖ(μ, Σ): X = (X₁,...,Xₖ)ᵀ, μ ∈ ℝᵏ is the mean vector, Σ is the covariance matrix (symmetric, positive definite). Density: f(x) = (2π)^{−k/2}|Σ|^{−1/2}·exp(−(x−μ)ᵀΣ⁻¹(x−μ)/2).

Key properties: any linear combination of components is univariate normal. Uncorrelatedness (Σ diagonal) ⟺ independence (only for normals!). Probabilities of finite intersections of quadrants are computed via the function Φₖ (multivariate normal CDF).

Dependence, Correlation, and Measures of Association

Pearson correlation ρ measures only linear dependence. Rank correlations are more robust to outliers. Spearman’s correlation ρₛ = cor(rank(X), rank(Y)) is the linear correlation of the ranks. Kendall’s correlation τ = (concordant − discordant pairs) / C(n, 2).

These measures are invariant to monotonic transformations, which makes them natural for copula modeling. For the bivariate normal: ρ_Spearman = (6/π)arcsin(ρ/2), τ = (2/π)arcsin(ρ).

The Role of Copulas in the 2008 Financial Crisis

The Gaussian copula was widely applied for modeling default dependencies in mortgage-backed securities (CDO). Its critical flaw — zero tail dependence: with small correlations, simultaneous defaults in the model were extremely unlikely, while in reality the housing market crash triggered cascading defaults nationwide. Models with “proper” copulas (Clayton or t-copula with non-zero lower tail dependence) would have predicted a significantly higher risk. This episode became an instructive example of how a mathematically correct model in “normal” conditions can yield catastrophically wrong estimates under tail events.

Conditional Distribution and Joint Moments

For a pair (X, Y) with joint density f(x, y), the conditional distribution of Y given X=x: f_{Y|X}(y|x) = f(x, y)/f_X(x). Conditional expectation E[Y|X=x] = ∫ y·f_{Y|X}(y|x) dy — is a function of x. Law of total expectation: E[Y] = E[E[Y|X]] = ∫ E[Y|X=x]·f_X(x) dx.

Covariance matrix for a vector X=(X₁,...,Xₙ): Σ_{ij} = Cov(Xᵢ, Xⱼ) = E[XᵢXⱼ] − E[Xᵢ]E[Xⱼ]. Positive semi-definite. For linear transformations Y=AX: Cov(Y) = A·Cov(X)·Aᵀ. Decorrelation: if Σ = PΛPᵀ (eigen-decomposition), then Y = P⁻¹X has diagonal covariance — “principal components” (PCA).

Linear predictor: The optimal linear predictor of Y through X: Ŷ = E[Y] + Cov(Y, X)/Var[X]·(X − E[X]). The residual Y − Ŷ is uncorrelated with X. For the bivariate normal, this coincides with conditional E[Y|X] — in the normal case the nonlinear conditional expectation equals the linear one.

Simulation of Multivariate Distributions

Cholesky Method: To generate X ~ Nₖ(μ, Σ): find the Cholesky decomposition Σ = LLᵀ. Generate Z = (Z₁,...,Zₖ)ᵀ i.i.d. ~ N(0, 1). Then X = μ + LZ ~ Nₖ(μ, Σ). The method works for any positive definite Σ.

Generation from copulas: (1) Generate (U₁,...,Uₖ) from copula C. (2) Apply inverse marginal CDFs: Xᵢ = Fᵢ⁻¹(Uᵢ). For the Gaussian copula: (1) generate Z ~ Nₖ(0, R), where R is the correlation matrix; (2) Uᵢ = Φ(Zᵢ); (3) Xᵢ = Fᵢ⁻¹(Uᵢ). Clayton copula: algorithm via conditional distribution (Gamma mixing representation).

Test for multivariate normality: Mahalanobis distances dᵢ² = (xᵢ−x̄)ᵀΣ⁻¹(xᵢ−x̄) should be approximately chi-square distributed with k degrees of freedom. QQ-plot of dᵢ² against χ²(k) quantiles — standard diagnostic.

Numerical Example: Conditional Distribution in Bivariate Normal

Problem: X₁~~N(0,1), X₂~~N(0,1), ρ=Corr(X₁, X₂)=0.7. Find E[X₂|X₁=2] and P(X₂>2|X₁=2).

Step 1: Conditional mean: E[X₂|X₁=x₁] = μ₂ + ρ·(σ₂/σ₁)·(x₁−μ₁) = 0 + 0.7·1·(2−0) = 1.4.

Step 2: Conditional variance: Var[X₂|X₁] = σ₂²·(1−ρ²) = 1·(1−0.49) = 0.51, σ ≈ 0.714.

Step 3: P(X₂>2|X₁=2) = P(Z > (2−1.4)/0.714) = P(Z > 0.840) = 1−Φ(0.840) ≈ 1−0.7995 = 0.200.

Step 4: Without conditioning, P(X₂>2) = P(Z>2) ≈ 0.023. Knowing X₁=2 increased the probability from 2.3% to 20% — an illustration of the power of the joint normal distribution.