Surprisal

Surprisal, also known as self-information, is a measure of how unexpected a particular outcome is. For an event with probability p, the surprisal is I(p) = -log_b p, where log_b denotes the logarithm with base b. The unit is bits if base 2 is used, nats for base e, and so on. Surprisal is nonnegative and becomes larger as p decreases; it is zero when p = 1 and grows without bound as p approaches zero.

For a random variable X with distribution p(x), the average surprisal, called entropy, is H(X) = E[I(X)] =

In conditional form, the surprisal of an event given a context is I(x|context) = -log_b p(x|context). This

Applications include data compression, coding theory, and psycholinguistics. Surprisal is widely used to quantify how expected

-

a

predictability:

a

a