Home

BGZF

BGZF, short for Block GZip Format, is a gzip-compatible container format designed to provide random access to compressed data. It achieves this by dividing the compressed stream into independently decompressible blocks and by including per-block metadata that supports efficient seeking. Each BGZF block contains a standard gzip wrapper around a DEFLATE-compressed payload, and the uncompressed data in a single block is limited to 64 kilobytes. The blocks are concatenated to form a BGZF file, and the block-level metadata, together with a file-wide index, enables readers to jump directly to a given block without decompressing data preceding it.

To support random access, BGZF files are typically accompanied by an index that maps logical positions to

Usage and tooling commonly appear in bioinformatics. BGZF is widely used for genomics data formats such as

block
offsets.
A
virtual
offset
combines
a
block
start
offset
with
an
offset
inside
that
block,
enabling
fast
random
access
and
partial
decompression
without
processing
the
entire
file.
Because
the
underlying
compression
within
each
block
uses
standard
gzip/deflate,
BGZF
remains
compatible
with
many
gzip
implementations
for
full-file
decompression,
though
random
access
requires
BGZF-aware
tooling.
BAM
(binary
alignment/map)
and
VCF
(variant
call
format)
files
that
are
compressed
with
BGZF
and
indexed
(for
example,
by
tabix)
to
support
rapid
coordinate-based
queries.
The
bgzip
program,
part
of
the
HTSlib/Samtools
ecosystem,
creates
and
decompresses
BGZF
streams,
while
libraries
like
htslib
provide
APIs
for
reading
BGZF
with
random
access.
This
design
enables
efficient
storage
and
querying
of
large
genomic
datasets.