Home

Corasick

Corasick, also known as the Aho-Corasick algorithm, is a string-search algorithm designed for locating multiple patterns simultaneously within a text. It was introduced by Alfred Aho and Margaret Corasick in 1975. The method builds a finite automaton from a set of patterns by first constructing a trie of the patterns. Each node corresponds to a prefix of one or more patterns and stores pointers to child nodes. To handle mismatches efficiently, the automaton is augmented with failure links (suffix links) that point to the longest proper suffix that is a prefix of some pattern. Each node also records which patterns end at that node, enabling the algorithm to report all matches when it is reached during the scan.

During matching, the text is processed character by character, following transitions in the automaton. When a

Applications include intrusion-detection systems, anti-virus software, spam filtering, and content moderation, where rapid checking for many

transition
is
absent,
the
algorithm
follows
failure
links
until
a
valid
transition
can
be
made.
If
the
current
node
corresponds
to
the
end
of
one
or
more
patterns,
those
patterns
are
reported.
Because
matches
can
overlap,
the
outputs
are
checked
at
each
step.
The
preprocessing
phase
creates
the
automaton
in
time
linear
in
the
total
length
of
the
patterns,
and
the
search
runs
in
time
linear
in
the
length
of
the
text
plus
the
number
of
reported
matches.
keywords
is
required.
The
algorithm
remains
a
foundational
approach
for
multi-pattern
matching,
and
is
often
referred
to
as
the
Aho–Corasick
algorithm
in
honor
of
its
developers.