Home

fdivergences

F-divergences, or f-divergences, are a family of statistical distances between probability measures. Let (X, F) be a measurable space and P, Q probability measures on it with P absolutely continuous with respect to Q. Let f: (0, ∞) → R be a convex function satisfying f(1) = 0. The f-divergence of P from Q is defined by D_f(P || Q) = ∫ f(dP/dQ) dQ = E_Q[f(dP/dQ)]. In the discrete case, with probability mass functions p and q, D_f(P||Q) = ∑ q(x) f(p(x)/q(x)). If P is not absolutely continuous with respect to Q, D_f(P||Q) is infinite.

Common choices of f yield well-known divergences. For example, f(t) = t log t gives the Kullback–Leibler

Key properties include non-negativity and identity of indiscernibles: D_f(P||Q) ≥ 0, with equality if and only if

Applications span statistics, information theory, and machine learning. F-divergences are used to quantify model discrepancies, robust

divergence
D_KL(P||Q);
f(t)
=
(t−1)^2
yields
the
Pearson
chi-squared
divergence;
f(t)
=
(√t
−
1)^2
yields
the
Hellinger
distance
(squared).
The
f-divergence
framework
encompasses
a
broad
class
of
measures
by
selecting
different
convex
functions
f.
P
=
Q
almost
everywhere
(on
the
support
of
Q).
For
fixed
Q,
the
map
P
↦
D_f(P||Q)
is
convex.
A
data-processing
inequality
holds:
for
any
Markov
kernel
T,
D_f(P_T
||
Q_T)
≤
D_f(P
||
Q),
reflecting
the
contractive
nature
of
information
under
processing.
The
class
is
also
jointly
convex
in
(P,Q)
under
appropriate
conditions.
estimation,
hypothesis
testing,
and
variational
inference.
The
choice
of
f
affects
sensitivity
to
tails,
outliers,
and
computational
properties,
making
f-divergences
a
versatile
tool
for
comparing
distributions.