Tetrachoric

Tetrachoric correlation is a statistic used to estimate the correlation between two latent continuous variables from a pair of observed binary variables. The concept assumes that each binary variable results from thresholding an underlying normally distributed variable, and that the two latent variables follow a bivariate normal distribution. Under this model, the observed 0/1 outcomes reflect whether each latent variable exceeds its threshold.

Calculation involves a 2x2 contingency table of the binary data. The thresholds for each variable are derived

Uses and interpretation: The tetrachoric correlation is preferred over the phi coefficient when the binary measures

Limitations: The method relies on the normality and threshold assumptions. Estimates can be unstable with small

implementations

polychoric/tetrachoric