Home

LZ77

LZ77, short for Lempel-Ziv 1977, is a lossless data compression algorithm devised by Abraham Lempel and Jacob Ziv in 1977. It reduces redundancy by replacing repeated substrings with references to a single previously seen occurrence, rather than storing the text again. The core idea is to exploit repetition within a sliding window of previously processed data.

The encoder maintains a sliding window consisting of a search buffer (the dictionary of recent data) and

LZ77 is the basis for the Deflate compression format, used in zlib, gzip, and many software tools.

Characteristics of LZ77 include good performance on data with repeating substrings, with compression and speed influenced

a
look-ahead
buffer
(the
upcoming
data).
For
each
position,
it
searches
the
window
for
the
longest
match
of
the
look-ahead
content
and
emits
a
token
describing
the
match
as
a
pair
of
numbers:
an
offset
indicating
where
the
match
starts
in
the
window
and
a
length
indicating
how
many
characters
match.
If
a
next
symbol
follows
the
match,
it
may
be
emitted
as
part
of
the
token;
if
no
match
exists,
a
literal
symbol
is
output.
The
decoder,
using
the
same
window,
reconstructs
the
original
data
by
copying
the
matched
substring
from
the
already
reconstructed
data
and
appending
literals
when
encountered.
The
process
is
entirely
reversible.
Deflate
combines
LZ77-style
matching
with
Huffman
coding
to
create
compact
encodings.
Notable
derivatives
and
variations
include
LZSS,
which
modifies
token
representation,
and
high-speed
implementations
such
as
LZ4.
by
the
window
size
and
implementation.
It
is
inherently
lossless
and
can
be
memory-intensive,
and
its
effectiveness
diminishes
on
highly
random
data.
See
also
LZ78,
LZSS,
Deflate,
zlib,
PNG.