Stemming
Stemming is a text normalization technique used in natural language processing and information retrieval to reduce words to their base or stem form by removing affixes. The goal is to group together related word variants so they can be matched or indexed together, even when they appear in different forms. Stemmers do not attempt to produce linguistically correct lemmas, and the resulting stem may not be a valid word in the language.
Stemming algorithms can be rule-based or statistical. Early rule-based systems include the Lovins stemmer and the
Stemming versus lemmatization: stemming reduces words to a stem that may not be a dictionary form, whereas
Applications and limitations: In information retrieval, stemming can increase recall by treating related forms as equivalent,
Overall, stemming remains a foundational preprocessing step in many IR and NLP pipelines, balancing simplicity and