Home

Tetrachoric

Tetrachoric correlation is a statistic used to estimate the correlation between two latent continuous variables from a pair of observed binary variables. The concept assumes that each binary variable results from thresholding an underlying normally distributed variable, and that the two latent variables follow a bivariate normal distribution. Under this model, the observed 0/1 outcomes reflect whether each latent variable exceeds its threshold.

Calculation involves a 2x2 contingency table of the binary data. The thresholds for each variable are derived

Uses and interpretation: The tetrachoric correlation is preferred over the phi coefficient when the binary measures

Limitations: The method relies on the normality and threshold assumptions. Estimates can be unstable with small

from
their
marginal
proportions
using
the
inverse
standard
normal
CDF.
The
correlation
parameter,
rho,
that
defines
the
bivariate
normal
distribution
is
then
estimated
by
maximum
likelihood,
maximizing
the
probability
of
the
observed
counts
for
that
rho
and
the
fixed
thresholds.
There
is
no
closed-form
formula
for
rho;
numerical
optimization
is
required.
The
resulting
estimate
lies
in
the
interval
[-1,
1].
are
thought
to
reflect
underlying
continuous
traits
that
are
normally
distributed,
rather
than
being
purely
discrete.
It
is
commonly
used
in
psychometrics,
item
response
theory,
and
factor
analysis
of
binary
data,
as
well
as
in
meta-analytic
synthesis
and
structural
equation
modeling.
sample
sizes
or
imbalanced
margins;
sparse
data,
boundary
cells,
or
zero
counts
can
cause
convergence
issues.
Software
implementations
exist
in
statistical
packages,
often
as
part
of
routines
for
polychoric/tetrachoric
correlations.