Home

Dstatistics

D-statistics, commonly referred to as the ABBA-BABA test, is a population-genetics statistic used to detect admixture or gene flow among populations or lineages. It analyzes genome-wide single nucleotide polymorphism data in a four-taxon framework consisting of an outgroup and three ingroup populations, testing whether the history deviates from a simple tree due to introgression.

In practice, the statistic compares two discordant allele-sharing patterns across many loci: ABBA and BABA. ABBA

Quantitatively, D is defined as D = (N_ABBA − N_BABA) / (N_ABBA + N_BABA), where N_ABBA and N_BABA are the

D-statistics do not specify the direction, timing, or proportion of admixture on their own. They are often

occurs
when
the
first
population
carries
the
ancestral
allele
and
the
second
and
third
share
a
derived
allele,
while
BABA
occurs
when
the
second
population
carries
the
ancestral
allele
and
the
first
and
third
share
a
derived
allele.
Under
a
strictly
bifurcating
history
with
no
gene
flow,
ABBA
and
BABA
patterns
should
be
equally
frequent.
counts
of
the
respective
patterns
across
loci.
The
statistic
typically
ranges
between
-1
and
1.
Significance
is
assessed
with
standard
error
estimates
such
as
block
jackknife
or
bootstrap
over
loci.
A
significantly
positive
D
suggests
excess
allele
sharing
between
the
third
population
and
the
second,
while
a
significantly
negative
D
suggests
excess
sharing
between
the
third
population
and
the
first.
followed
by
extensions
such
as
the
f4-statistic
and
f4-ratio
to
estimate
admixture
proportions,
and
are
implemented
in
tools
like
ADMIXTOOLS,
Dsuite,
and
ANGSD.
Limitations
include
sensitivity
to
ancestral
structure,
selection,
data
quality,
and
outgroup
choice.