querylikelihood

The query likelihood model is a probabilistic information retrieval approach that ranks documents by the likelihood that a document’s language model would generate the user’s query. In this framework, every document D is associated with a language model P(.|D) over terms, and the query Q is treated as data drawn from that model. Documents are ranked by P(Q|D), often computed as the product of term probabilities across the query: P(Q|D) = ∏_{w∈Q} P(w|D) or, equivalently, by summing the log probabilities of query terms.

Because a document’s language model is typically sparse, smoothing is used to combine the document-specific distribution

Practical use involves computing P(w|D) for each term, assembling P(Q|D), and ranking documents by this likelihood

a

Jelinek–Mercer

=

+

μ

/

+

μ),

μ

a

Jelinek–Mercer

=

−

λ)

+

λ

a

language-modeling

a