Home

BM25L

BM25L is a variant of the Okapi BM25 ranking function used in information retrieval. It was introduced to address limitations of BM25 in handling documents of varying lengths, particularly very long documents, by modifying the document-length normalization in the term-frequency component. BM25L retains the core structure of BM25, including the inverse document frequency factor and the general tf-k1-b normalization, but introduces an additional length-based adjustment controlled by a parameter, commonly denoted l. This adjustment changes how term frequency scales with document length, aiming to reduce the bias toward short documents while avoiding excessive penalization of long ones.

In practice, BM25L computes a score for a query by summing contributions from query terms, with the

BM25L is part of the family of BM25 variants that includes BM25+, BM25F and others. It is

term-frequency
contribution
being
adjusted
by
the
length-aware
term
to
reflect
document
length
more
smoothly.
Compared
with
BM25,
BM25L
often
yields
improvements
on
corpora
with
wide
variation
in
document
length
and
when
long
documents
contain
many
query
terms.
The
method
is
simple
to
implement
and
can
be
incorporated
into
existing
BM25-based
ranking
pipelines
with
minimal
changes,
using
the
same
tokenization
and
indexing
infrastructure.
evaluated
in
information
retrieval
research
and
used
as
a
strong
baseline
for
experiments
involving
document-length
effects.