Home

hfile

HFile is a persistent file format used by Apache HBase to store the actual region data on the Hadoop Distributed File System (HDFS) or other compatible storage systems. It is designed as a single-store, immutable store that enables efficient random reads and scans of large datasets. An HFile contains a sequence of data blocks that store key-value pairs, where the keys are the HBase row keys concatenated with the column family and qualifier. Each data block is accompanied by an index block, enabling fast seeks. The file ends with a trailer that records metadata such as the version, compression codec, and the offset to the final index.

Data blocks may be compressed using supported codecs (such as Snappy, GZIP, or Zstandard). To speed up

HFile has evolved through several versions, with HFile V2 and V3 introducing enhancements to data and index

scans
and
reduce
disk
I/O,
HFile
files
can
include
Bloom
filters
and
optionally
an
in-memory
block
cache.
HFile
is
designed
to
be
appended
to
during
flushes
and
is
later
compacted
and
merged
by
HBase
compaction
processes;
once
written,
an
HFile
is
immutable.
layout,
compression,
and
read
performance.
The
HFile
library
provides
APIs
for
reading
and
writing
HFiles
and
is
used
by
HBase
region
servers
and
the
HFile-based
stores.
It
is
open-source
as
part
of
the
Apache
HBase
project.
See
also
HBase,
HDFS,
and
Bigtable-inspired
storage.