Home

BM25

BM25 (Best Matching 25) is a ranking function used in information retrieval to estimate how relevant a document D is to a user query Q. It is part of the Okapi BM25 family and originated from the probabilistic retrieval framework developed in the 1990s by Robertson, Walker, and others at the University of Glasgow. BM25 has become a standard baseline in text search due to its effectiveness and simplicity.

Core idea: for each query term t, BM25 weighs the term's presence in a document by its

Definitions: f(t,D) is the term frequency in D; |D| is the document length; avgdl is the average

Variants and usage: BM25F extends the model to structured documents by incorporating multiple fields; BM25+, BM25L

Limitations: while effective, BM25 is a bag-of-words model and does not capture term dependencies, semantics, or

frequency,
its
rarity
across
the
collection,
and
the
document's
length.
The
standard
score
for
D
and
Q
is:
score(D,Q)
=
sum_{t
in
Q}
IDF(t)
*
(f(t,D)
*
(k1+1))
/
(f(t,D)
+
k1*(1
-
b
+
b*|D|/avgdl)).
document
length
in
the
collection;
N
is
the
total
number
of
documents;
n(t)
is
the
number
of
documents
containing
term
t;
IDF(t)
=
log((N
-
n(t)
+
0.5)
/
(n(t)
+
0.5)).
k1
and
b
are
free
parameters
that
control
term
frequency
saturation
and
length
normalization;
typical
values
are
k1
~
1.2–2.0
and
b
~
0.0–0.75.
are
adjusted
variants.
BM25
is
widely
used
as
a
strong,
parameterizable
baseline
in
search
engines
and
libraries
such
as
Lucene
and
Elasticsearch,
often
serving
as
a
default
ranking
component.
It
remains
a
standard
reference
for
evaluating
retrieval
quality
and
for
educational
purposes.
query
expansion
by
itself.
It
relies
on
well-tuned
parameters
and
representative
document
length
statistics
and
may
be
outperformed
by
newer
neural
or
hybrid
approaches
on
some
tasks.