PWMs

PWMs, or position weight matrices, are a common representation of sequence motifs used to model the preferences of DNA- and RNA-binding proteins. They summarize how likely each symbol is at each position within a motif, enabling quantitative assessment of how well a sequence matches the motif. In DNA contexts, a PWM is typically an L by 4 matrix, where L is the motif length and the four columns correspond to the nucleotides A, C, G, and T. Each entry represents either a probability, a count, or a log-odds score against a background distribution.

PWMs are usually derived from a set of known motif instances. Counts at each position generate position-specific

To score a candidate sequence, the base at each motif position is mapped to its corresponding weight

a

representativeness

non-independent