Stemming - Infinite Lexicon - Infinite Lexicon

Stemming

Stemming is a text normalization technique used in natural language processing and information retrieval to reduce words to their base or stem form by removing affixes. The goal is to group together related word variants so they can be matched or indexed together, even when they appear in different forms. Stemmers do not attempt to produce linguistically correct lemmas, and the resulting stem may not be a valid word in the language.

Stemming algorithms can be rule-based or statistical. Early rule-based systems include the Lovins stemmer and the

Stemming versus lemmatization: stemming reduces words to a stem that may not be a dictionary form, whereas

Applications and limitations: In information retrieval, stemming can increase recall by treating related forms as equivalent,

Overall, stemming remains a foundational preprocessing step in many IR and NLP pipelines, balancing simplicity and

affix-stripping

a

language-specific

language-dependent

morphologically