Home

Basisform

Basisform is a term used in linguistics and computational linguistics to denote the base or dictionary form of a word, commonly referred to as the lemma or citation form. The basis form represents the canonical form from which inflected or derived variants are generated, and it is typically the form stored in lexical resources and dictionaries.

In natural language processing, basisforms are used to normalize tokens for tasks such as lemmatization, morphological

Terminology and usage vary by language and field. In English, the preferred term is usually lemma or

Relation to related concepts: basisform is distinct from stemming, which often outputs an incomplete or non-dictionary

See also: lemma, dictionary form, canonical form, lemmatization, morphology.

analysis,
and
part-of-speech
tagging.
By
mapping
each
surface
form
to
its
basis
form,
systems
can
consolidate
related
word
forms
and
improve
search,
indexing,
and
linguistic
analysis.
The
concept
is
especially
important
for
languages
with
rich
morphology,
where
a
single
lemma
may
have
many
inflected
forms.
dictionary
form;
basisform
is
more
common
in
German-language
linguistic
literature
and
some
cross-linguistic
works.
The
basis
form
is
not
always
unique
across
languages
or
POS
categories,
and
different
lemmatization
schemes
may
choose
different
canonical
forms
for
multi-form
words.
form.
It
is
also
related
to
canonical
forms
and
root
forms,
though
definitions
may
differ
by
linguistic
framework.
Examples
include
mapping
went
and
going
to
the
basis
form
go,
or
cameras
to
camera.