Home

lemmatisation

Lemmatization is the process of reducing a word to its base or dictionary form, the lemma. Unlike simple stemming, which may produce truncated or non-dictionary forms, lemmatization uses morphological analysis and, often, part-of-speech information to assign the most appropriate lemma for a given token in context.

In practice, a lemmatizer may rely on lexicons and inflection tables, rule-based morphological analyzers, or statistical

Lemmatization is used in information retrieval to improve matching across word forms, in natural language processing

Challenges include ambiguity, handling of proper nouns and multiword expressions, languages with rich morphology, and resource

models.
A
lemmatizer
typically
requires
the
surrounding
context
or
a
POS
tag
to
select
the
correct
lemma
when
a
word
has
multiple
possible
lemmas
(for
example,
"better"
as
an
adjective
may
map
to
"good",
while
"better"
as
a
verb
maps
to
"improve").
pipelines
for
parsing
and
tagging,
in
question
answering,
machine
translation,
and
text
normalization.
availability
for
less-resourced
languages.
Evaluation
uses
corpora
annotated
with
lemmas
or
gold-standard
morphological
analyses.