LessWrong May 27, 2026 at 06:13 PM

standard-deviations-from-just-two-valu...

2 corrections found

Claim

the appropriate estimation of the standard deviation (corrected through the t distribution) is 1.3 times the distance between the two numbers we have.

Correction

This uses the wrong distribution and the wrong factor. Student’s t adjusts confidence intervals for the mean when σ is unknown; for σ itself, the standard small-sample distribution is chi-square, and with two observations the usual unbiased estimate is about 0.886× the distance, not 1.3×.

Full reasoning

NIST’s handbook gives the usual small-sample confidence interval for a mean as

[
\bar Y \pm t_{1-\alpha/2,,N-1},\frac{s}{\sqrt N},
]

so the t distribution is being used to adjust uncertainty in the mean, not to turn a two-point distance into an estimate of the population standard deviation. Separately, NIST states that confidence intervals for the true standard deviation are constructed from the chi-square distribution, not the t distribution.

NIST also notes that the sample standard deviation is biased: if the data are normal, (E[s]=c_4\sigma), where (c_4) depends on sample size. For (n=2), the NIST formula gives

[
c_4=\sqrt{2},/\sqrt{\pi}\approx 0.7979.
]

With two observations (x_1,x_2), the usual sample standard deviation is

[
s=\frac{|x_1-x_2|}{\sqrt 2}.
]

So an unbiased estimate of (\sigma) is

[
\hat\sigma=\frac{s}{c_4}=\frac{|x_1-x_2|/\sqrt2}{\sqrt2/\sqrt\pi}=\frac{\sqrt\pi}{2}|x_1-x_2|\approx 0.886,|x_1-x_2|.
]

That is far below 1.3 times the distance. The article’s factor comes from applying a t-based correction to the wrong quantity: t is for inference on the mean, whereas inference on standard deviation uses chi-square, and the standard bias correction for two observations is about 0.886× the distance between them, not 1.3×.

3 sources

NIST/SEMATECH e-Handbook of Statistical Methods — Confidence Limits for the Mean
Confidence limits are defined as: \[ \bar{Y} \pm t_{1 - \alpha/2, \, N-1} \,\, \frac{s}{\sqrt{N}} \] where \(\bar{Y}\) is the sample mean, s is the sample standard deviation, N is the sample size...
NIST/SEMATECH e-Handbook of Statistical Methods — Confidence interval approach
Confidence intervals for the true standard deviation can be constructed using the chi-square distribution.
NIST/SEMATECH e-Handbook of Statistical Methods — What are Variables Control Charts?
If the underlying distribution is normal, then s actually estimates c4 · σ, where c4 is a constant that depends on the sample size n... So the mean or expected value of the sample standard deviation is c4 · σ.

Claim

If we multiply that by 1.3, we get our estimation of the standard deviation, which is something like 5 litres.

Correction

The 1.3× rule is not a correct way to estimate standard deviation from two observations. For two values 43 and 47, the usual unbiased estimate of σ is about 0.886×4 = 3.54 litres, not about 5 litres.

Full reasoning

This numerical example inherits the same mistake as the earlier rule-of-thumb. NIST’s handbook uses the t distribution for confidence intervals on the mean and the chi-square distribution for inference on the standard deviation; it does not support multiplying a two-point distance by a t-based factor to estimate (\sigma).

For two observations (x_1,x_2), the sample standard deviation is

[
s=\frac{|x_1-x_2|}{\sqrt 2}.
]

NIST also states that for normal data, (E[s]=c_4\sigma). With (n=2), the NIST formula gives (c_4=\sqrt2/\sqrt\pi\approx 0.7979), so an unbiased estimator is

[
\hat\sigma=\frac{s}{c_4}=\frac{\sqrt\pi}{2}|x_1-x_2|\approx 0.886|x_1-x_2|.
]

Plugging in the article’s values, (|47-43|=4), so the unbiased estimate is

[
0.886\times 4 \approx 3.54\text{ litres},
]

not “something like 5 litres.” The 5-litre figure comes from applying a t-based correction to the wrong quantity.

3 sources

NIST/SEMATECH e-Handbook of Statistical Methods — Confidence Limits for the Mean
Confidence limits are defined as: \[ \bar{Y} \pm t_{1 - \alpha/2, \, N-1} \,\, \frac{s}{\sqrt{N}} \] where \(\bar{Y}\) is the sample mean, s is the sample standard deviation, N is the sample size...
NIST/SEMATECH e-Handbook of Statistical Methods — Confidence interval approach
Confidence intervals for the true standard deviation can be constructed using the chi-square distribution.
NIST/SEMATECH e-Handbook of Statistical Methods — What are Variables Control Charts?
If the underlying distribution is normal, then s actually estimates c4 · σ, where c4 is a constant that depends on the sample size n... So the mean or expected value of the sample standard deviation is c4 · σ.

Model: OPENAI_GPT_5 Prompt: v1.16.0