Home

PSSMs

Position-Specific Scoring Matrices (PSSMs) are matrices used in sequence analysis to quantify the likelihood of each residue at each position in a sequence profile. They encode position-specific substitution preferences derived from a multiple sequence alignment of related sequences, allowing a single matrix to capture conservation patterns across the region of interest. PSSMs are widely used to detect distant homologs and to score sequences against a protein family or motif.

Construction typically starts with a curated multiple sequence alignment of related sequences. For each position i,

Applications include scoring a query sequence by summing the scores for each position, enabling the detection

Types and relation to other concepts: PSSMs are commonly constructed for proteins (20 amino acids) but can

the
observed
frequency
p_i(a)
of
residue
a
(amino
acid
or
nucleotide)
is
estimated.
To
avoid
zero
probabilities,
pseudo-counts
or
Bayesian
priors
are
often
added.
Background
frequencies
b(a)
are
used
to
convert
to
log-odds
scores,
commonly
expressed
as
s(i,a)
=
log2(
p_i(a)
/
b(a)
).
The
resulting
matrix
has
a
row
for
each
position
and
a
column
for
each
possible
residue.
Gaps
can
be
treated
with
penalties
or
separate
handling.
of
conserved
motifs
and
remote
homologs.
PSSMs
underpin
profile-based
searches
and
motif
detection
workflows,
and
are
central
to
iterative
methods
such
as
PSI-BLAST,
which
builds
and
refines
a
PSSM
from
hits
in
a
database
to
improve
sensitivity
over
successive
rounds.
be
created
for
nucleic
acids.
They
are
related
to
position
weight
matrices
used
in
motif
discovery;
both
are
profile
representations
that,
in
many
implementations,
assume
positional
independence,
an
approximation
with
both
strengths
and
limitations.
Limitations
include
dependence
on
alignment
quality,
potential
biases
from
overrepresented
sequences,
and
the
independence
assumption
that
may
overlook
inter-positional
correlations.