hfile
HFile is a persistent file format used by Apache HBase to store the actual region data on the Hadoop Distributed File System (HDFS) or other compatible storage systems. It is designed as a single-store, immutable store that enables efficient random reads and scans of large datasets. An HFile contains a sequence of data blocks that store key-value pairs, where the keys are the HBase row keys concatenated with the column family and qualifier. Each data block is accompanied by an index block, enabling fast seeks. The file ends with a trailer that records metadata such as the version, compression codec, and the offset to the final index.
Data blocks may be compressed using supported codecs (such as Snappy, GZIP, or Zstandard). To speed up
HFile has evolved through several versions, with HFile V2 and V3 introducing enhancements to data and index