Home

stopword

A stopword is a word commonly filtered out in text processing and information retrieval because it is considered to have little semantic value in many contexts. English examples include the, is, at, which, on, and, a, an, with.

The primary purpose of stopword removal is to reduce the size of indexes and speed up search

Stopword filtering is a standard preprocessing step in search engines, document indexing, and natural language processing.

Stopword lists are language specific and can be customized for a task or domain. They may be

Despite their utility, stopwords can remove information relevant to certain analyses, and excessive removal can hurt

and
analysis
by
eliminating
high-frequency
function
words
that
contribute
less
to
distinguishing
documents.
It
is
often
applied
before
tokenization
into
bag-of-words
or
TF-IDF
representations.
Some
modern
models
with
contextual
embeddings
or
subword
tokenization
lessen
or
bypass
the
need
for
stopword
removal.
static,
dynamic,
or
per-corpus;
some
tasks
require
keeping
certain
stopwords
(for
example,
negations)
or
domain-specific
terms.
performance
on
tasks
involving
phrase-level
meaning
or
sentiment.
Therefore,
practitioners
often
evaluate
whether
to
apply
stopword
filtering
and
tailor
lists
accordingly.