www.lesswrong.com/posts/eczwWrmX5XNEo7JsS/intricacies-of-feature-geometry-in-lar...
1 correction found
Let {x1,x2,…,xk}∈Rn be zero-mean random vectors with covariance matrix Σ, where n>k. Then there exists a whitening transformation W=Σ−1/2 such that the transformed vectors {yi=Wxi}ki=1 satisfy:E[y⊤iyj]=δij
Whitening makes a random vector’s *covariance matrix* the identity; it does not make the *expected dot product between different sample vectors* equal the Kronecker delta. In particular, for a whitened n-dimensional vector with identity covariance and mean 0, the expected squared norm is n (not 1).
Full reasoning
The post claims that after whitening with (W=\Sigma^{-1/2}), the transformed vectors (y_i) satisfy (\mathbb{E}[y_i^\top y_j]=\delta_{ij}).
What whitening actually guarantees
A standard statement of whitening is: if (X) is a random vector with covariance (\Sigma) and mean 0, then a whitening matrix (W) produces (Y=WX) with covariance equal to the identity matrix (components uncorrelated, each with variance 1). That is, whitening is about (\mathrm{Cov}(Y)=I), not about making a collection of different draws (y_1,\dots,y_k) form an orthonormal set in expectation.
Wikipedia’s definition explicitly frames whitening as producing a random vector whose covariance is the identity matrix. (It also notes this assumes a non-singular covariance matrix.)
Why (\mathbb{E}[y_i^\top y_i]\neq 1) under whitening in (\mathbb{R}^n)
If (Y\in\mathbb{R}^n) has mean 0 and covariance (\mathrm{Cov}(Y)=I), then the expected squared norm is
[
\mathbb{E}[|Y|^2] = \mathbb{E}[Y^\top Y] = \mathrm{tr}(\mathrm{Cov}(Y)) = \mathrm{tr}(I)=n.
]
This is a standard identity (a special case of the “expectation of a quadratic form” formula).
So even in the best-case “perfect whitening” situation, the diagonal term implied by the post’s equation would be
(\mathbb{E}[y_i^\top y_i]=n), not 1. Therefore (\mathbb{E}[y_i^\top y_j]=\delta_{ij}) (which requires (\mathbb{E}[y_i^\top y_i]=1)) is not correct as stated.
Bottom line
Whitening targets (\mathrm{Cov}(Y)=I). The post’s claim instead asserts an orthonormality condition across sample vectors (y_i) that does not follow from whitening, and its diagonal magnitude is wrong for (n)-dimensional whitened vectors.
2 sources
- Whitening transformation - Wikipedia
“A whitening transformation … transforms a vector of random variables with a known covariance matrix into a set of new variables whose covariance is the identity matrix …” and (in the Definition section) assumes “non-singular covariance matrix Σ” and describes Y=WX as producing unit (identity) covariance.
- Multivariate random variable - Wikipedia
In the “Expectation of a quadratic form” section: E[X^T A X] = E[X]^T A E[X] + tr(A K_XX), where K_XX is the covariance matrix. Setting A=I and mean 0 gives E[X^T X] = tr(Cov(X)).