Lemmasítás
Lemmasítás, or lemmatization, is the process of reducing a word to its lemma, the canonical base form found in dictionaries. The goal is to map inflected or derived forms to a single, linguistically valid base form. Unlike stemming, which often produces truncated or non-dictionary forms, lemmasítás aims for a form that is both meaningful and standardized for linguistic analysis, information retrieval, and corpus studies.
The process typically combines morphological analysis with part-of-speech tagging. A lemmatizer uses a lexicon or morphological
Applications of lemmasítás span natural language processing pipelines and information retrieval. By normalizing words to their
Challenges include lexical ambiguity, where a form corresponds to multiple lemmas depending on part of speech,