bm2
BM25, also known as BM25Okapi, is a ranking function used by search engines to estimate the relevance of documents to a given search query. It is based on the probabilistic retrieval framework and is an improvement over the earlier BM11 and BM15 models. The BM25 formula takes into account the term frequency in the document, the inverse document frequency, and the length of the document. This makes it effective in handling variations in document length and term frequency, which are common issues in information retrieval.
The BM25 formula is defined as follows:
BM25(q, d) = ∑ (IDF(q_i) * ((f(q_i, d) * (k1 + 1)) / (f(q_i, d) + k1 * (1 - b + b * |d| / avgdl))))
- f(q_i, d) is the term frequency of q_i in document d,
- |d| is the length of the document d,
- avgdl is the average document length in the collection,
- k1 and b are free parameters, typically set to 1.2 and 0.75 respectively,
- IDF(q_i) is the inverse document frequency of q_i.
BM25 has been widely adopted in various search engines and information retrieval systems due to its