Home

SSTable

An SSTable, short for Sorted Strings Table, is a persistent, immutable file format used to store a mapping from keys to values in many log-structured merge-tree (LSM-tree) based storage systems. SSTables are created by flushing an in-memory table (memtable) to disk or by performing a compaction, and new SSTables accumulate over time as data is written. Each SSTable stores keys in strictly increasing order, enabling efficient lookups and range scans.

A typical SSTable consists of data blocks containing the actual key-value pairs, an index block that records

For access, the storage engine uses the index block to identify the data block that may contain

Role in LSM-trees: writes go to a memory structure and are flushed as SSTables; over time, compaction

the
first
key
(or
key
range)
of
each
data
block
to
guide
reads,
and
often
a
filter
block
(such
as
a
Bloom
filter)
to
quickly
test
for
nonexistence
of
a
key.
Many
implementations
also
include
a
footer
or
metadata
section
with
offsets
to
the
blocks.
The
file
is
immutable
after
creation,
and
the
system
relies
on
combining
multiple
SSTables
during
reads
and
compactions.
the
target
key,
reads
that
block,
and
locates
the
key
within
the
block
(typically
via
binary
search).
If
the
key
is
not
found
in
the
relevant
SSTable,
the
search
proceeds
to
other
SSTables
managed
by
the
LSM-tree.
Deletions
are
represented
by
tombstones
and
become
visible
only
after
subsequent
compaction.
merges
SSTables
and
discards
obsolete
entries,
balancing
read
and
write
amplification.
SSTables
are
central
to
systems
such
as
LevelDB,
RocksDB,
and
Cassandra,
and
influence
performance
through
factors
like
Bloom
filter
effectiveness,
block
size,
and
compaction
strategy.