IDFs

IDFs, short for inverse document frequency, is a statistic used in information retrieval to assess how informative a term is across a document collection. It helps distinguish terms that are common across many documents from those that are relatively unique to a subset of the corpus.

Calculation and interpretation: For a corpus with N documents, the document frequency df(t) is the number of

Relation to tf-idf: IDF is a key component of the tf-idf weighting scheme, where the weight of

Variants and limitations: Some approaches use adjusted formulas, such as BM25’s IDF component, e.g., log((N - df

t

log((N+1)/(df(t)+1))

a

a

a

a

implementation.

+

+

corpus-dependent