Unigrammalli
Unigrammalli, or unigram model, is a simple probabilistic language model that treats each word as generated independently from a fixed probability distribution over the vocabulary. In practice, the probability of a word sequence w1 w2 ... wn is approximated as the product P(w1) P(w2) ... P(wn). The model ignores word order and context, hence it is a baseline rather than a full language model.
The probabilities P(w) are typically estimated from a corpus as relative frequencies P(w) = count(w) / N, where
Unigram models serve as simple baselines in various NLP tasks. They are used in text classification, language
Limitations include the strong independence assumption, which ignores syntax, semantics, and word order. Consequently, unigram models