stopord - Infinite Lexicon - Infinite Lexicon

stopord

Stopord, or stop words, are common words that are typically removed from text during preprocessing in information retrieval and natural language processing. They include articles, pronouns, prepositions and auxiliary verbs that usually carry little domain-specific meaning. Removing them can reduce noise, decrease index size and speed up searches and analyses.

The concept arose in early information retrieval when large document collections were indexed for fast lookup.

Lists vary by language and domain. English examples include the, of, and, to, a, in, that, it,

There are criticisms: blanket removal can negatively affect phrase queries, named entities, dates and negations, and

a

language-specific

i

morphologically

a

a