Home

digram

A digram is a unit consisting of two adjacent elements within a sequence, such as two consecutive letters, digits, or other symbols. In linguistic analysis and information theory, digrams are studied as two-letter sequences, revealing patterns in a language or corpus. The term is sometimes used interchangeably with bigram, but authors may reserve digram for letter pairs or other small-scale units.

In text analysis, digrams are the simplest form of an n-gram and are used to model structure

Example: the word "example" yields the digrams "ex", "xa", "am", "mp", "pl", "le". In longer corpora, common

In cryptography, digrams have historical relevance in systems that operate on pairs of letters, such as digraphic

Because the term digram is sometimes used variably, readers should check context and field. See also bigram,

and
predictability
in
text.
To
construct
a
digram
model,
each
pair
of
consecutive
characters
or
tokens
is
counted,
and
probabilities
are
assigned
to
what
often
follows
a
given
symbol.
This
approach
helps
measure
similarity
between
texts,
detect
language,
or
inform
compression
and
cryptanalysis.
English
digrams
include
"th",
"he",
"in",
and
"er".
and
Playfair-type
schemes,
where
the
security
depends
on
pairwise
letter
frequencies.
In
modern
NLP,
bigram
models
often
refer
to
pairs
of
tokens
(words
or
characters)
and
are
a
standard
building
block
for
language
modeling.
trigram,
n-gram.