lemmabased

Lemma-based is an adjective used in natural language processing and information retrieval to describe approaches that rely on lemmas—the canonical base forms of words—as the primary units of analysis. In such systems, inflected or derived word forms are mapped to their lemma by a lemmatizer before further processing. This contrasts with token-based or stem-based methods, which may treat each surface form or stem as a distinct item.

In practice, lemma-based methods are used to reduce data sparsity, improve recall in search and text classification,

Challenges include disambiguation when a single lemma corresponds to multiple word senses, dependence on language-specific lexicons

The term is widely used as an adjective in research papers and software documentation, but users should

representations

dictionary-based

morphosyntactic