Home

fullword

Fullword is a term used in computing and linguistics to denote a complete lexical word as a unit, rather than a morpheme, stem, or subword fragment. It is not a standardized technical term across all fields, but it is used in documentation and discussions about text processing to contrast full words with subword units.

In search and indexing, fullword matching requires that the query match a complete word boundary in the

In natural language processing, "fullword" can refer to tokens that correspond to dictionary entries, as opposed

In lexicography and corpora, corpora may annotate words as "fullword" tokens and mark inflected variants as

See also: tokenization, stemming, lemmatization, full-text search, exact match, substring search.

text.
This
avoids
false
positives
from
partial
matches.
For
example,
a
fullword
search
for
"cat"
would
not
find
"category,"
while
a
substring
search
would.
Implementations
often
rely
on
tokenization
to
split
text
into
words
and
on
boundary
rules
to
determine
matches.
to
affixes
or
clitics
treated
as
separate
tokens,
or
to
the
base
units
produced
by
tokenization.
Some
NLP
pipelines
use
subword
units
(byte-pair
encoding,
character
n-grams)
for
model
input,
which
are
not
full
words;
others
maintain
fullword
tokens
for
certain
tasks,
typically
after
lemmatization.
separate
occurrences
of
the
same
lemma.
Limitations
include
handling
hyphenated
compounds
and
languages
with
rich
morphology.