en.wikipedia.org/wiki/Point-biserial_correlation_coefficient
1 correction found
If it can be assumed that the dichotomous variable Y is normally distributed, a better descriptive index is given by the biserial coefficient:
This is misstated: a dichotomous variable itself cannot be normally distributed. The biserial correlation assumes the observed binary variable comes from an underlying continuous, normally distributed latent variable that has been dichotomized.
Full reasoning
A dichotomous variable takes only two values, so it is a discrete variable. A normal distribution is a continuous distribution, not a two-point distribution.
For the biserial correlation, standard references do not assume that the observed binary variable itself is normal. Instead, they assume that the binary variable is a dichotomized measurement of an underlying continuous variable that is normally distributed:
- SAS states that the biserial correlation is used when the binary variable "has an underlying continuous distribution but is measured as binary."
- NCSS explains the biserial correlation by starting with bivariate normal variables and then dichotomizing one of them to create the binary variable.
- NIST defines the normal distribution as a continuous distribution.
So the sentence should refer to an underlying continuous normally distributed variable, not to the observed dichotomous variable itself being normally distributed.
3 sources
- SAS Sample 24991: Compute biserial, point biserial, and rank biserial correlations
The biserial correlation measures the strength of the relationship between a binary and a continuous variable, where the binary variable has an underlying continuous distribution but is measured as binary.
- NCSS: Point-Biserial and Biserial Correlations
Suppose you want to find the correlation between a pair of bivariate normal random variables when one has been dichotomized... The biserial correlation is an estimate of the original product-moment correlation constructed from the point-biserial correlation.
- NIST Glossary: Normal (Gaussian) Distribution
A continuous distribution whose density function is given by ...