lemmat

A lemmat is a basic unit in some linguistic and computational contexts that combines a word's lemma with its morphosyntactic information. A lemma is the canonical dictionary form of a lexeme; a lemmat thus represents the pair consisting of that base form and its grammatical features, such as part of speech, tense, number, or case. In annotated corpora and lexicographic databases, lemmats are used to represent the underlying meaning and grammatical behavior of surface forms, enabling consistent analysis across inflected or derived variants.

The term is not widely standardized and is mostly encountered in discussions of corpus annotation schemas,

Examples: The surface form "running" maps to the lemmat ("run", "VBG"); "better" maps to ("good", "JJR"). In

See also: Lemma (linguistics); Lemmatization; Morphology; Part of speech tagging.

a

a

A

a

disambiguation,