Module V·Article I·~4 min read
Randomized Linear Algebra
Algorithms for Big Data
Turn this article into a podcast
Pick voices, format, length — AI generates the audio
Matrix of data: 10 million users × 100 thousand products. Full SVD is impossible: O(mn·min(m,n)) ≈ 10¹⁷ operations. Randomized algorithms reduce complexity to O(mnk) with theoretical guarantees of closeness to the exact solution. This is not "approximation due to laziness"—this is mathematically rigorous theory.
Randomized SVD
Task: Compute the best rank-k approximation of matrix $A \in \mathbb{R}^{m \times n}$: $\min_{\mathrm{rank}(B) \leq k} |A - B|_F$.
Exact algorithm: SVD $A = U\Sigma V^\top$, $A_k = U_{1:k}\Sigma_{1:k}V^\top_{1:k}$. Complexity O(mn²).
Randomized SVD (Halko, Martinsson, Tropp, 2011):
Step 1. Random matrix $\Omega \in \mathbb{R}^{n \times k}$: $\Omega_{ij} \sim N(0, 1/k)$. Sketch $Y = A\Omega \in \mathbb{R}^{m \times k}$. Meaning: $Y$ approximates the "column space" of $A$—we project $A$ onto $k$ random directions.
Step 2. QR decomposition $Y = QR$, $Q \in \mathbb{R}^{m \times k}$—orthonormal basis of the column space of $Y$.
Step 3. $B = Q^\top A \in \mathbb{R}^{k \times n}$. Matrix $B$ is much smaller: $k \times n$.
Step 4. $\mathrm{SVD}(B) = \tilde{U} \Sigma V^\top$, $U = Q\tilde{U}$.
Result: $U \in \mathbb{R}^{m \times k}$, $\Sigma \in \mathbb{R}^{k \times k}$, $V \in \mathbb{R}^{n \times k}$—approximate SVD. Complexity: O(mnk) vs O(mn²).
Theoretical guarantees: with probability ≥ 1−δ:
$ |A - QQ^\top A|F \leq (1 + O(\sqrt{k/\delta})), \sigma{k+1}(A) $
Here $\sigma_{k+1}(A)$ is the (k+1)th singular value, the lower bound for any rank-k approximation. In practice: $k_\mathrm{actual} = k + p$ ($p=10$ "oversampling" iterations) gives nearly exact solution.
Degree-q power iteration: Instead of $Y = A\Omega$ use $Y = (AA^\top)^q A\Omega$. For $q=1$–2: error $\sim \sigma_{k+1}^{2q+1}$—improves exponentially with $q$. Cost: $q$ additional multiplications by $A$.
Sklearn random SVD by default: n_iter=4—4 power iterations.
Approximate Nearest Neighbor Search (ANNS)
Task: For query $q$ find $x^* = \arg\min_{x \in X} d(q, x)$. Exact kNN: O(nd) per query. For $n=10^8$, $d=768$ (BERT embeddings)—impossible.
LSH (Locality Sensitive Hashing): Family of hash functions $h$ such that $P(h(x) = h(y))$ is high when $d(x, y)$ is small and low when $d(x, y)$ is large.
For L2 (Datar, 2004): $h_{a,b}(x) = \left\lfloor \frac{a^\top x + b}{w} \right\rfloor$, where $a \sim N(0, I)$, $b \sim \mathrm{Uniform}[0, w]$. Close points fall into the same “bucket” with high probability.
Amplification: $L$ independent tables × $k$ hash functions per table. Query: check in $L$ tables, take union of buckets. Time: $O(d \cdot n^\rho)$, $\rho = 1/(2L^2 w^2 + 1) < 1$.
HNSW (Malkov & Yashunin, 2018): Hierarchical Navigating Small World graph. Each level = nearest neighbor graph with different density. Search: start from the top (sparse) level → quickly approach the query zone → descend down. Time: $O(\log n)$. SOTA for most ANNS tasks.
FAISS (Facebook AI Similarity Search): Library with GPU acceleration. Product Quantization: compress vectors into codes (4–8 bytes instead of $768 \times 4 = 3072$ bytes). IVF (Inverted File): cluster the space, search only in closest clusters. Search $10^8$ vectors in <1 ms on GPU.
Streaming Algorithms: Big Data with Small Memory
Task: Incoming stream of $N$ elements $x_1, x_2,..., x_n$. Memory $O(N)$ is unacceptable. Need to compute statistics of the stream in one pass.
Count-Min Sketch (Cormode & Muthukrishnan, 2004): Estimate the frequency $f(x)$ with small memory.
Structure: Counter matrix $C[d][w]$ ($d$ rows × $w$ columns). $d$ independent hash functions $h_1,...,h_d: U \to {1,...,w}$.
Update for element $x$: for $i = 1,...,d$: $C[i][h_i(x)] += 1$.
Estimate $\hat{f}(x) = \min_i C[i][h_i(x)]$ (minimum across rows).
Guarantees: $P(|\hat{f}(x) - f(x)| \leq \varepsilon N) \geq 1 - \delta$ for $w = \lceil e / \varepsilon \rceil$, $d = \lceil \ln(1/\delta) \rceil$. Memory: $O(1/\varepsilon \cdot \log(1/\delta))$—does not depend on $N$!
HyperLogLog (Flajolet et al., 2007): Estimate the number of unique elements (cardinality). Idea: hash $x$ → uniform [0,1]. Maximum exponent of consecutive zeros at the start of the binary representation → estimate $\log_2(n)$. Average across $m = 2^b$ subregisters. Error: ~1.04/$\sqrt{m}$. Memory: $O(\log \log N)$ bits. Implemented in Redis, used for Count Distinct in analytics.
Morris Counter: Count $n$ up to $\sim 2^{64}$ in $O(\log \log n)$ bits. Store $k = \lfloor \log_2(n) \rfloor$ plus random error. Increment: $k \leftarrow k+1$ with probability $2^{-k}$. Estimate: $2^k$. Variance = $O(n^2)$.
Numerical Example
Randomized SVD Netflix Prize (480K users × 17K movies):
- True matrix rank ≈ 100
- Exact SVD: $480K \times 17K \times 100 \approx 816 \times 10^9$ operations (~4 hours)
- Random SVD ($k=100$, $q=2$): ~240 × $10^9$ → ~1 hour, error <1%
HNSW search over 10M BERT embeddings (768d): indexing 10 minutes, top-10 search in 0.3 ms (vs 4.5 seconds for exact brute-force).
Assignment: (1) Implement randomized SVD without sklearn for the MovieLens (100K) ratings matrix. Compare accuracy (Frobenius error) with truncated SVD for $k=10$, $50$, $100$ at $q=0$, $1$, $2$ power iterations. (2) Implement Count-Min Sketch and HyperLogLog in Python. Test on a stream of 10M words from English text. Compare estimated number of unique words with the true count.
§ Act · what next