All corrections
LessWrong February 25, 2026 at 10:30 PM

www.lesswrong.com/posts/eczwWrmX5XNEo7JsS/intricacies-of-feature-geometry-in-lar...

1 correction found

1
Claim
Let {x1,x2,…,xk}∈Rn be zero-mean random vectors with covariance matrix Σ, where n>k. Then there exists a whitening transformation W=Σ−1/2 such that the transformed vectors {yi=Wxi}ki=1 satisfy:E[y⊤iyj]=δij
Correction

Whitening makes a random vector’s *covariance matrix* the identity; it does not make the *expected dot product between different sample vectors* equal the Kronecker delta. In particular, for a whitened n-dimensional vector with identity covariance and mean 0, the expected squared norm is n (not 1).

Full reasoning

The post claims that after whitening with (W=\Sigma^{-1/2}), the transformed vectors (y_i) satisfy (\mathbb{E}[y_i^\top y_j]=\delta_{ij}).

What whitening actually guarantees

A standard statement of whitening is: if (X) is a random vector with covariance (\Sigma) and mean 0, then a whitening matrix (W) produces (Y=WX) with covariance equal to the identity matrix (components uncorrelated, each with variance 1). That is, whitening is about (\mathrm{Cov}(Y)=I), not about making a collection of different draws (y_1,\dots,y_k) form an orthonormal set in expectation.

Wikipedia’s definition explicitly frames whitening as producing a random vector whose covariance is the identity matrix. (It also notes this assumes a non-singular covariance matrix.)

Why (\mathbb{E}[y_i^\top y_i]\neq 1) under whitening in (\mathbb{R}^n)

If (Y\in\mathbb{R}^n) has mean 0 and covariance (\mathrm{Cov}(Y)=I), then the expected squared norm is
[
\mathbb{E}[|Y|^2] = \mathbb{E}[Y^\top Y] = \mathrm{tr}(\mathrm{Cov}(Y)) = \mathrm{tr}(I)=n.
]
This is a standard identity (a special case of the “expectation of a quadratic form” formula).

So even in the best-case “perfect whitening” situation, the diagonal term implied by the post’s equation would be
(\mathbb{E}[y_i^\top y_i]=n), not 1. Therefore (\mathbb{E}[y_i^\top y_j]=\delta_{ij}) (which requires (\mathbb{E}[y_i^\top y_i]=1)) is not correct as stated.

Bottom line

Whitening targets (\mathrm{Cov}(Y)=I). The post’s claim instead asserts an orthonormality condition across sample vectors (y_i) that does not follow from whitening, and its diagonal magnitude is wrong for (n)-dimensional whitened vectors.

2 sources
  • Whitening transformation - Wikipedia

    “A whitening transformation … transforms a vector of random variables with a known covariance matrix into a set of new variables whose covariance is the identity matrix …” and (in the Definition section) assumes “non-singular covariance matrix Σ” and describes Y=WX as producing unit (identity) covariance.

  • Multivariate random variable - Wikipedia

    In the “Expectation of a quadratic form” section: E[X^T A X] = E[X]^T A E[X] + tr(A K_XX), where K_XX is the covariance matrix. Setting A=I and mean 0 gives E[X^T X] = tr(Cov(X)).

Model: OPENAI_GPT_5 Prompt: v1.6.0