Home

kruisentropie

Kruisentropie, often called Kreuzentropie in German, is a measure used to quantify the difference between two probability distributions. In information theory and statistics, it is denoted H(P, Q) and takes P as the true distribution and Q as an approximate distribution. For discrete distributions over a finite set X, H(P, Q) = - sum_{x in X} P(x) log Q(x). For continuous distributions, it is the corresponding integral. The base of the logarithm determines the units: natural log yields nats, base 2 yields bits.

In supervised learning, cross-entropy serves as a loss function. If the true labels are represented as a

Relation to entropy and KL divergence: The cross-entropy decomposes into H(P, Q) = H(P) + D_KL(P || Q). Since

Variants and practical considerations: Binary cross-entropy applies to binary classification; categorical cross-entropy to multi-class classification; sparse

Applications: training of classifiers, language models, and other probabilistic models. Cross-entropy provides a principled link to

distribution
P
(often
a
one-hot
encoding)
and
the
model
predicts
Q,
the
cross-entropy
equals
the
negative
log-likelihood
of
the
observed
labels
under
the
model.
Minimizing
H(P,
Q)
thus
encourages
predicted
probabilities
to
align
with
observed
frequencies.
H(P)
is
fixed
with
respect
to
Q,
minimizing
cross-entropy
is
equivalent
to
minimizing
the
Kullback–Leibler
divergence
D_KL(P
||
Q).
categorical
cross-entropy
is
used
when
labels
are
integers.
In
neural
networks,
it's
common
to
pair
cross-entropy
with
a
softmax
output
for
multi-class
problems.
For
numerical
stability,
log-softmax
and
small
epsilon
terms
are
used
to
prevent
log(0).
maximum
likelihood
estimation
and
is
favored
for
its
smooth
gradient
properties.