Home

RFMix

RFMix is a software tool for local ancestry inference in admixed populations. It uses a discriminative modeling approach based on random forests to assign the ancestral origin of chromosomal segments along the genome, using phased haplotypes from reference populations. The method was introduced to provide rapid and robust local ancestry calls in individuals with mixed ancestry and has become widely used in population genetics and disease studies.

The methodology combines a classifier and a genome-wide smoothing step. A random forest classifier is trained

Inputs to RFMix include phased genotype data for the target individuals, phased reference panels representing the

Limitations include dependence on accurate and representative reference panels, potential biases when ancestral populations are underrepresented

on
labeled
reference
haplotypes
from
multiple
ancestral
populations,
using
a
window
of
nearby
markers
to
predict
the
ancestry
of
target
haplotype
segments.
The
trained
model
then
assigns
ancestry
probabilities
to
each
position
or
window
in
the
target
individuals.
To
maintain
consistency
across
the
genome
and
account
for
recombination,
an
additional
step
using
a
hidden
Markov
model
or
related
smoothing
framework
integrates
information
across
adjacent
positions
and
incorporates
a
recombination
map.
ancestral
populations,
and
a
genetic
map.
Optional
inputs
may
include
population
labels
for
the
reference
samples
and
sample
weights.
Outputs
consist
of
local
ancestry
calls
at
each
genomic
position
or
window
for
every
individual,
along
with
posterior
probability
estimates
for
each
ancestry.
These
results
enable
downstream
analyses
such
as
admixture
mapping,
studies
of
fine-scale
population
structure,
and
control
for
local
ancestry
in
association
studies.
or
absent,
and
the
requirement
for
phased
data.
Relatively
high
computational
demands
exist,
though
RFMix
is
designed
to
be
faster
than
many
alternative
local
ancestry
methods.