ngrambased
N-gram based methods are statistical approaches in natural language processing that use contiguous sequences of n items—words or characters—to model language. An n-gram model estimates the probability of a text by decomposing it into a chain of conditional probabilities, where each token depends on the previous n−1 tokens. Training relies on a corpus to count occurrences and derive frequency-based estimates.
Word-level and character-level variants are common. Word n-grams capture lexical sequences, while character n-grams capture subword
Applications include predictive typing, spelling correction, language identification, information retrieval, and as features in classifiers. They