tfidf - Infinite Lexicon - Infinite Lexicon

tfidf

TF-IDF, short for term frequency–inverse document frequency, is a numeric statistic used to evaluate how important a word is to a document within a collection or corpus. The idea is to weigh terms that appear frequently in a document but are relatively rare across the corpus higher than common words.

Term frequency TF(t, d) measures how often term t occurs in document d. Inverse document frequency IDF(t,

TF-IDF vectors are often normalized (such as to unit length) to enable meaningful comparisons between documents

Applications include information retrieval and search engine ranking, document classification, clustering, and other text mining tasks

A

=

/

N

t

d

=

a

representation.