OutofVocabularyProbleme

Out of vocabulary (OOV) refers to tokens that are not present in the vocabulary used by a natural language processing system. In many NLP models, a fixed vocabulary maps words to embeddings or probability distributions. When a word or token not included in this set is encountered, it is treated as OOV, triggering fallback mechanisms designed to maintain processing.

OOVs arise for several reasons. New names and neologisms, technical jargon, multilingual input, misspellings, and morphological

The presence of OOVs can impact performance in language modeling, translation, search, and information retrieval. Common

Mitigation strategies include subword tokenization methods such as byte-pair encoding (BPE), WordPiece, and SentencePiece, which break

a

a

a

representations.

Character-level

transliteration