Home

Zipf

Zipf's law is an empirical principle describing a relationship between the frequency of items in a dataset and their rank when the items are sorted by frequency. In its simplest form the frequency f of the item with rank r is proportional to 1/r^s, with s close to 1 in many natural languages. When s equals 1, the product r·f(r) remains approximately constant across many ranks. The law was formulated by American linguist George Kingsley Zipf in the 1930s based on studies of word frequencies, and it has since been observed in a variety of domains beyond language.

Zipf's law is most famous for language, where a small set of words account for a large

fraction
of
usage,
while
there
is
a
long
tail
of
rarely
used
words.
It
has
also
been
reported
in
city-size
distributions
(the
rank-size
rule),
firm
sizes,
income
distributions,
and
other
phenomena.
Several
refinements
exist,
such
as
the
Zipf–Mandelbrot
law,
which
introduces
a
second
parameter
to
better
fit
low-rank
behavior:
f(r)
∝
1/(r+q)^s.
The
law
is
not
exact;
deviations
occur
at
very
high
or
very
low
ranks,
and
empirical
tests
often
yield
mixed
results.
Competing
models
such
as
lognormal,
Pareto,
or
other
heavy-tailed
distributions
can
describe
similar
data,
and
the
underlying
mechanisms
remain
debated.
Explanations
range
from
cognitive
efficiency
and
communicative
constraints
to
preferential
attachment
and
random
growth
processes.