unigrams - Infinite Lexicon - Infinite Lexicon

unigrams

Unigrams are the simplest unit in n-gram language models. A unigram is a single element of a sequence, most commonly a word. In word-based unigram models, the probability of a text is approximated by the product of the probabilities of its individual words. Unigrams can also refer to single characters in character-level modeling, where the alphabet letters are treated as tokens.

Use cases: In text classification and information retrieval, unigrams form the basis of bag-of-words representations, where

Advantages and limitations: Unigrams are simple and robust to small corpora, fast to compute, and provide a

Variants and related concepts: In character-level modeling, unigrams are single characters; higher-order n-grams (bigrams, trigrams) capture

high-dimensional

representations,

A

a

a