datacompaction
Datacompaction refers to techniques that reduce the amount of storage occupied by data and improve access performance by reorganizing and rewriting data. It combines restructuring of layout with data-preserving transformations such as compression, deduplication, and the removal of obsolete or redundant records. Datacompaction is used in databases, file systems, data lakes, and streaming systems to optimize storage efficiency and query or read performance.
In log-structured storage systems and LSM-tree databases, compaction is a controlled process that merges multiple data
A number of design considerations influence when and how aggressively to compact: workload patterns, latency requirements,
Common contexts include Cassandra and RocksDB use of compaction, HBase with major/minor compactions, HDFS or cloud