Corasick
Corasick, also known as the Aho-Corasick algorithm, is a string-search algorithm designed for locating multiple patterns simultaneously within a text. It was introduced by Alfred Aho and Margaret Corasick in 1975. The method builds a finite automaton from a set of patterns by first constructing a trie of the patterns. Each node corresponds to a prefix of one or more patterns and stores pointers to child nodes. To handle mismatches efficiently, the automaton is augmented with failure links (suffix links) that point to the longest proper suffix that is a prefix of some pattern. Each node also records which patterns end at that node, enabling the algorithm to report all matches when it is reached during the scan.
During matching, the text is processed character by character, following transitions in the automaton. When a
Applications include intrusion-detection systems, anti-virus software, spam filtering, and content moderation, where rapid checking for many