Home

Crossentropy

Cross-entropy is a measure of the difference between two probability distributions. In information theory, the cross-entropy H(P, Q) between a true distribution P and a predicted distribution Q is defined as H(P, Q) = - sum_x P(x) log Q(x). The logarithm can be base e (nats) or base 2 (bits). A related quantity, the entropy H(P), depends only on P.

In machine learning, cross-entropy is commonly used as a loss function to train probabilistic models. It is

There are two widely used forms. Binary cross-entropy applies to binary or multi-label classification and for

Cross-entropy is related to KL divergence by H(P, Q) = H(P) + KL(P || Q); minimizing cross-entropy thus reduces

equivalent
to
the
negative
log-likelihood
of
the
data
under
the
model’s
predicted
distribution,
so
minimizing
cross-entropy
corresponds
to
maximizing
the
likelihood
of
the
observed
labels.
Ground-truth
labels
are
often
represented
as
a
distribution
P,
while
the
model
outputs
a
distribution
Q
over
classes
(for
example,
via
a
softmax
layer).
a
single
example
is
-[
y
log
p
+
(1
-
y)
log(1
-
p)
],
where
y
is
the
true
label
and
p
is
the
predicted
probability.
Categorical
cross-entropy
applies
to
multi-class,
single-label
classification
and
for
an
example
is
-
sum_i
y_i
log
p_i,
with
y_i
as
the
one-hot
true
label
and
p_i
as
the
model’s
predicted
probability
for
class
i.
In
multi-label
tasks,
binary
cross-entropy
is
typically
used
per
class
with
independent
outputs.
the
divergence
between
the
true
and
predicted
distributions.
It
is
differentiable
and
widely
used
with
gradient-based
optimization,
with
practical
considerations
including
numerical
stability,
label
smoothing,
and
class
weighting.
It
is
not
a
true
metric.