Home

LSMtree

An LSM-tree, or Log-Structured Merge-Tree, is a data structure and storage architecture designed to optimize write throughput for disk-based systems. It buffers incoming writes in memory and periodically flushes them to disk in sorted runs, avoiding random writes and enabling sequential disk I/O. The structure is particularly suited to workloads with high write rates and reads that can tolerate some extra lookup logic.

In an LSM-tree, data is first written to an in-memory data structure called a memtable. When the

Reads in an LSM-tree search the memtable first, then consult in-memory indexes and on-disk SSTables, often aided

LSM-trees are widely used in modern NoSQL databases and storage engines, including LevelDB, RocksDB, Apache Cassandra,

memtable
is
full,
its
contents
are
converted
into
an
immutable
on-disk
component
called
an
SSTable
(sorted
string
table)
and
written
sequentially.
On
disk,
these
SSTables
are
organized
into
multiple
levels
or
runs.
Periodic
compaction
merges
overlapping
SSTables
to
reduce
redundancy,
maintain
sorted
order,
and
reclaim
space
from
deleted
or
superseded
entries.
Two
common
compaction
strategies
are
levelled
(where
each
level
contains
strictly
larger
data
than
the
previous
one
and
data
is
moved
upward
as
needed)
and
tiered
(where
several
runs
may
exist
at
a
level
and
are
merged
less
aggressively).
by
bloom
filters
to
quickly
determine
whether
a
key
is
absent.
Caches
and
spatial
locality
improve
performance.
Trade-offs
include
higher
read
amplification
and
potential
write
amplification
due
to
compaction,
balanced
by
excellent
write
throughput
and
efficient
sequential
I/O.
and
HBase,
making
them
a
common
choice
for
large-scale
key-value
and
time-series
workloads.
The
concept
was
introduced
in
the
late
1990s
by
Patrick
O’Neil
and
collaborators.