Home

Perplexity

Perplexity is a statistical measure used in information theory and natural language processing to evaluate how well a probability model predicts a sample. In the context of language modeling, it assesses how surprised a model is by a test set of words.

Formally, for a test sequence of N words W = w1, w2, ..., wN, and a model that assigns

Perplexity is closely related to cross-entropy and entropy. Specifically, perplexity equals exp(H) where H is the

Practical considerations include its dependence on vocabulary size and smoothing methods. Large vocabularies can inflate perplexity

probabilities
P(wi
|
w1,
...,
w(i-1))
to
each
word,
the
perplexity
is
defined
as
P(W)^{-1/N}
or,
equivalently,
exp(-
(1/N)
sum_{i=1}^N
log
P(wi
|
w1..w(i-1))).
The
base
of
the
logarithm
determines
the
unit:
natural
units
use
exp,
while
base-2
logs
yield
perplexity
in
bits.
A
lower
perplexity
indicates
that
the
model
assigns
higher
average
probability
to
the
observed
sequence
and
thus
predicts
more
accurately.
cross-entropy
between
the
true
distribution
and
the
model,
in
natural
units;
with
base-2
logs
it
is
2^{H_2}.
It
is
not
itself
a
probability,
but
a
transformed
measure
of
predictive
uncertainty.
even
for
strong
models,
and
perplexity
can
be
misleading
if
the
test
set
is
not
representative
of
the
intended
domain.
It
remains
a
standard
benchmark
for
comparing
language
models
and
guiding
development,
though
it
should
be
interpreted
alongside
qualitative
assessments
of
text
generation.