Unigrammalli - Infinite Lexicon - Infinite Lexicon

Unigrammalli

Unigrammalli, or unigram model, is a simple probabilistic language model that treats each word as generated independently from a fixed probability distribution over the vocabulary. In practice, the probability of a word sequence w1 w2 ... wn is approximated as the product P(w1) P(w2) ... P(wn). The model ignores word order and context, hence it is a baseline rather than a full language model.

The probabilities P(w) are typically estimated from a corpus as relative frequencies P(w) = count(w) / N, where

Unigram models serve as simple baselines in various NLP tasks. They are used in text classification, language

Limitations include the strong independence assumption, which ignores syntax, semantics, and word order. Consequently, unigram models

N

identification,

a

a

a

computationally