www.lesswrong.com/posts/qRWRjDQdBWRZFBxTv/standard-deviations-from-just-two-valu...
2 corrections found
the appropriate estimation of the standard deviation (corrected through the t distribution) is 1.3 times the distance between the two numbers we have.
This uses the wrong distribution and the wrong factor. Student’s t adjusts confidence intervals for the mean when σ is unknown; for σ itself, the standard small-sample distribution is chi-square, and with two observations the usual unbiased estimate is about 0.886× the distance, not 1.3×.
Full reasoning
NIST’s handbook gives the usual small-sample confidence interval for a mean as
[
\bar Y \pm t_{1-\alpha/2,,N-1},\frac{s}{\sqrt N},
]
so the t distribution is being used to adjust uncertainty in the mean, not to turn a two-point distance into an estimate of the population standard deviation. Separately, NIST states that confidence intervals for the true standard deviation are constructed from the chi-square distribution, not the t distribution.
NIST also notes that the sample standard deviation is biased: if the data are normal, (E[s]=c_4\sigma), where (c_4) depends on sample size. For (n=2), the NIST formula gives
[
c_4=\sqrt{2},/\sqrt{\pi}\approx 0.7979.
]
With two observations (x_1,x_2), the usual sample standard deviation is
[
s=\frac{|x_1-x_2|}{\sqrt 2}.
]
So an unbiased estimate of (\sigma) is
[
\hat\sigma=\frac{s}{c_4}=\frac{|x_1-x_2|/\sqrt2}{\sqrt2/\sqrt\pi}=\frac{\sqrt\pi}{2}|x_1-x_2|\approx 0.886,|x_1-x_2|.
]
That is far below 1.3 times the distance. The article’s factor comes from applying a t-based correction to the wrong quantity: t is for inference on the mean, whereas inference on standard deviation uses chi-square, and the standard bias correction for two observations is about 0.886× the distance between them, not 1.3×.
3 sources
- NIST/SEMATECH e-Handbook of Statistical Methods — Confidence Limits for the Mean
Confidence limits are defined as: \[ \bar{Y} \pm t_{1 - \alpha/2, \, N-1} \,\, \frac{s}{\sqrt{N}} \] where \(\bar{Y}\) is the sample mean, s is the sample standard deviation, N is the sample size...
- NIST/SEMATECH e-Handbook of Statistical Methods — Confidence interval approach
Confidence intervals for the true standard deviation can be constructed using the chi-square distribution.
- NIST/SEMATECH e-Handbook of Statistical Methods — What are Variables Control Charts?
If the underlying distribution is normal, then s actually estimates c4 · σ, where c4 is a constant that depends on the sample size n... So the mean or expected value of the sample standard deviation is c4 · σ.
If we multiply that by 1.3, we get our estimation of the standard deviation, which is something like 5 litres.
The 1.3× rule is not a correct way to estimate standard deviation from two observations. For two values 43 and 47, the usual unbiased estimate of σ is about 0.886×4 = 3.54 litres, not about 5 litres.
Full reasoning
This numerical example inherits the same mistake as the earlier rule-of-thumb. NIST’s handbook uses the t distribution for confidence intervals on the mean and the chi-square distribution for inference on the standard deviation; it does not support multiplying a two-point distance by a t-based factor to estimate (\sigma).
For two observations (x_1,x_2), the sample standard deviation is
[
s=\frac{|x_1-x_2|}{\sqrt 2}.
]
NIST also states that for normal data, (E[s]=c_4\sigma). With (n=2), the NIST formula gives (c_4=\sqrt2/\sqrt\pi\approx 0.7979), so an unbiased estimator is
[
\hat\sigma=\frac{s}{c_4}=\frac{\sqrt\pi}{2}|x_1-x_2|\approx 0.886|x_1-x_2|.
]
Plugging in the article’s values, (|47-43|=4), so the unbiased estimate is
[
0.886\times 4 \approx 3.54\text{ litres},
]
not “something like 5 litres.” The 5-litre figure comes from applying a t-based correction to the wrong quantity.
3 sources
- NIST/SEMATECH e-Handbook of Statistical Methods — Confidence Limits for the Mean
Confidence limits are defined as: \[ \bar{Y} \pm t_{1 - \alpha/2, \, N-1} \,\, \frac{s}{\sqrt{N}} \] where \(\bar{Y}\) is the sample mean, s is the sample standard deviation, N is the sample size...
- NIST/SEMATECH e-Handbook of Statistical Methods — Confidence interval approach
Confidence intervals for the true standard deviation can be constructed using the chi-square distribution.
- NIST/SEMATECH e-Handbook of Statistical Methods — What are Variables Control Charts?
If the underlying distribution is normal, then s actually estimates c4 · σ, where c4 is a constant that depends on the sample size n... So the mean or expected value of the sample standard deviation is c4 · σ.