ngramindex
An ngram index, often simply called an n-gram index, is a data structure used in text retrieval and processing that stores occurrences of contiguous sequences of n characters (or, less commonly, n words) from a collection of documents. It supports fast substring search, approximate matching, and related text analysis tasks by enabling efficient lookup of documents containing specific sequences.
Construction typically involves two steps. First, each document is decomposed into overlapping n-grams using a sliding
Query processing relies on matching the query’s n-grams against the index. The system retrieves candidate documents
Applications include search engines, spell checkers, plagiarism detection, and DNA or biosequence analysis where short, exact