BM25 - Infinite Lexicon - Infinite Lexicon

BM25

BM25 (Best Matching 25) is a ranking function used in information retrieval to estimate how relevant a document D is to a user query Q. It is part of the Okapi BM25 family and originated from the probabilistic retrieval framework developed in the 1990s by Robertson, Walker, and others at the University of Glasgow. BM25 has become a standard baseline in text search due to its effectiveness and simplicity.

Core idea: for each query term t, BM25 weighs the term's presence in a document by its

Definitions: f(t,D) is the term frequency in D; |D| is the document length; avgdl is the average

Variants and usage: BM25F extends the model to structured documents by incorporating multiple fields; BM25+, BM25L

Limitations: while effective, BM25 is a bag-of-words model and does not capture term dependencies, semantics, or

D

Q

=

/

+

-

b

+

N

=

-

+

/

+

b

~

b

~

a

parameterizable

a

a