Home

AhoCorasick

Aho-Corasick, named after Alfred V. Aho and Margaret J. Corasick, is a string-search algorithm designed to locate occurrences of multiple patterns simultaneously within a single text. The algorithm constructs a finite automaton from a set of patterns and scans the text in linear time, emitting all matches of any pattern in the set. It is widely used for high-volume pattern matching in information retrieval, network security, and data processing.

The core idea is to build a trie (prefix tree) of all patterns and augment it with

Complexity considerations: constructing the automaton takes time proportional to the total length of all patterns. Scanning

Applications and variants: The algorithm is widely used in intrusion detection systems, spam and content filtering,

failure
links
that
connect
a
node
to
the
longest
proper
suffix
that
is
also
a
prefix
of
some
pattern.
During
text
scanning,
the
automaton
follows
transitions
for
each
character;
on
a
mismatch,
it
follows
failure
links
until
a
valid
transition
is
found.
When
a
node
with
outputs
is
reached,
the
associated
patterns
are
reported,
including
overlapping
matches.
This
structure
also
supports
reporting
multiple
patterns
that
end
at
the
same
position.
a
text
of
length
n
takes
O(n
+
m)
time,
where
m
is
the
number
of
reported
matches.
Memory
usage
is
proportional
to
the
number
of
nodes
in
the
trie,
typically
O(sum
of
pattern
lengths),
which
grows
with
the
size
of
the
pattern
set
and
the
alphabet.
search
tools,
and
DNA
motif
analysis.
Variants
and
optimizations
address
dynamic
pattern
addition,
compressed
alphabets,
and
reporting
efficiency,
making
Aho-Corasick
a
foundational
technique
for
multi-pattern
string
matching.