quadgram
A quadgram is a sequence of four consecutive symbols drawn from a text. In natural language processing and text analysis, quadgrams are a specific case of n-grams with n equal to four. They are commonly defined over characters, though in some applications they may be defined over tokens or syllables.
In language modeling, quadgrams are used to estimate the probability of text using a fourth-order Markov assumption.
Quadgrams are used in various applications including predictive text input, spelling and OCR post-processing, language identification,
In cryptography, quadgram statistics are used to score candidate decryptions of monoalphabetic or polyalphabetic substitution ciphers.
Limitations include sparsity of data for many quadgrams, high dimensionality, and sensitivity to tokenization and text