Home

lemmat

A lemmat is a basic unit in some linguistic and computational contexts that combines a word's lemma with its morphosyntactic information. A lemma is the canonical dictionary form of a lexeme; a lemmat thus represents the pair consisting of that base form and its grammatical features, such as part of speech, tense, number, or case. In annotated corpora and lexicographic databases, lemmats are used to represent the underlying meaning and grammatical behavior of surface forms, enabling consistent analysis across inflected or derived variants.

The term is not widely standardized and is mostly encountered in discussions of corpus annotation schemas,

Examples: The surface form "running" maps to the lemmat ("run", "VBG"); "better" maps to ("good", "JJR"). In

See also: Lemma (linguistics); Lemmatization; Morphology; Part of speech tagging.

lemmatization
pipelines,
and
lexicographic
data
models.
It
is
distinct
from
a
plain
lemma
(which
stores
only
the
base
form)
and
from
a
token
(the
exact
surface
form
as
it
appears
in
text).
A
lemmat
can
be
seen
as
a
compact
representation
used
by
NLP
systems
to
link
inflected
tokens
to
their
canonical
base
form
with
features.
practice,
lemmats
can
support
search,
disambiguation,
and
linguistic
analysis
by
preserving
both
base
form
and
grammatical
information.