Home

biserial

Biserial refers to a family of statistical measures used to describe the association between a binary (dichotomous) variable and a continuous variable, typically under the assumption that the binary outcome arises from dichotomizing an underlying normal latent variable.

Two common forms are the point-biserial correlation and the biserial correlation. For a binary variable Y (0/1)

The biserial correlation, r_b, is an adjusted form that reflects an underlying latent continuous variable Z

Interpretation and use: Both coefficients range from −1 to 1 and indicate the strength and direction of

with
p
=
P(Y
=
1),
and
a
continuous
variable
X
with
overall
mean
μ
and
standard
deviation
σ,
and
with
μ1
and
μ0
denoting
the
means
of
X
for
Y
=
1
and
Y
=
0
respectively,
the
point-biserial
correlation
is
r_pb
=
(μ1
−
μ0)
·
sqrt(p
q)
/
σ,
where
q
=
1
−
p.
This
is
the
direct,
observed
correlation
between
X
and
Y.
from
which
Y
is
derived
by
thresholding.
If
z_p
is
the
standard
normal
quantile
corresponding
to
p
(z_p
=
Φ^−1(p))
and
φ
is
the
standard
normal
density,
then
r_b
=
(μ1
−
μ0)
·
φ(z_p)
/
(σ
·
p
q).
This
form
tends
to
be
larger
than
r_pb
when
p
is
near
0.5
and
incorporates
the
assumed
normal
shape
of
the
latent
distribution.
association
between
the
continuous
variable
and
the
binary
variable.
r_b
is
particularly
used
when
researchers
believe
an
underlying
continuous
propensity
exists
and
Y
reflects
a
threshold
on
that
latent
variable.
Assumptions
include
approximate
normality
of
X
within
Y
groups
and,
for
r_b,
an
appropriate
latent-variable
model.
When
those
assumptions
may
not
hold,
alternative
measures
such
as
the
phi
coefficient
or
rank-based
methods
can
be
considered.