Home

PWMs

PWMs, or position weight matrices, are a common representation of sequence motifs used to model the preferences of DNA- and RNA-binding proteins. They summarize how likely each symbol is at each position within a motif, enabling quantitative assessment of how well a sequence matches the motif. In DNA contexts, a PWM is typically an L by 4 matrix, where L is the motif length and the four columns correspond to the nucleotides A, C, G, and T. Each entry represents either a probability, a count, or a log-odds score against a background distribution.

PWMs are usually derived from a set of known motif instances. Counts at each position generate position-specific

To score a candidate sequence, the base at each motif position is mapped to its corresponding weight

scores,
often
with
pseudocounts
to
avoid
zero
values.
These
counts
can
be
converted
to
probabilities
or
to
log-odds
scores
by
comparing
observed
frequencies
to
a
background
model,
such
as
the
overall
genomic
nucleotide
frequencies.
The
resulting
matrix
allows
uniform
scoring
across
motif
positions
and
supports
comparisons
between
sequences
of
the
same
length.
in
the
column
for
that
position,
and
the
scores
are
summed
over
the
motif
length.
Higher
scores
indicate
better
matches
to
the
motif.
PWMs
underpin
motif
scanning,
genome
annotation,
and
motif
discovery,
and
are
frequently
used
with
thresholding
or
p-value
estimation
to
identify
potential
binding
sites.
Limitations
include
the
independence
assumption
between
positions,
sensitivity
to
the
chosen
background
model,
and
reliance
on
the
quality
and
representativeness
of
the
training
data.
Extensions
include
information
content
analyses,
sequence
logos,
and
alternative,
non-independent
models.