bgzip
bgzip is a compression utility that produces BGZF files, where BGZF stands for Block GZIP Format. It is designed for large genomic data files and is widely used in the HTSlib/Samtools ecosystem. BGZF compresses data in independent blocks, typically up to about 64 kilobytes of uncompressed data per block, and stores them sequentially in a single file. Each block is a gzip-compressed unit, and the collection of blocks is arranged to function as a single compressed file while enabling selective decompression.
The key feature of BGZF is support for random access to compressed data when combined with an
Usage and compatibility are straightforward. To compress a VCF file, one typically runs bgzip file.vcf, producing
BGZF is an industry-wide standard in genomic data processing within the Samtools/HTSlib ecosystem. It provides the