Home

stopord

Stopord, or stop words, are common words that are typically removed from text during preprocessing in information retrieval and natural language processing. They include articles, pronouns, prepositions and auxiliary verbs that usually carry little domain-specific meaning. Removing them can reduce noise, decrease index size and speed up searches and analyses.

The concept arose in early information retrieval when large document collections were indexed for fast lookup.

Lists vary by language and domain. English examples include the, of, and, to, a, in, that, it,

There are criticisms: blanket removal can negatively affect phrase queries, named entities, dates and negations, and

Stop
word
lists,
or
stoplists,
filter
out
these
high-frequency
words.
Today,
stop
word
handling
is
a
configurable
part
of
many
NLP
pipelines
and
search
engines,
with
language-specific
lists
and
options
to
keep
or
ignore
certain
words
depending
on
the
task.
for.
Other
languages
have
their
own
common
words,
such
as
och
and
i
in
Swedish,
og
in
Danish
or
Norwegian.
In
morphologically
rich
languages,
stop
lists
are
often
used
together
with
lemmatization
or
stemming.
stop
words
can
carry
syntactic
or
semantic
information
in
context.
As
a
result,
many
systems
employ
selective,
context-aware
stopping,
configurable
thresholds,
or
no
stopping
for
certain
tasks,
rather
than
applying
a
single
universal
list.