dataduplication
Dataduplication, or data deduplication, is a data reduction technique that eliminates duplicate copies of repeating data to reduce storage requirements and optimize network bandwidth. It identifies identical data blocks or files and stores only a single copy, replacing duplicates with references.
Deduplication can operate at different granularities. File-level deduplication, also known as single-instance storage, removes duplicate files.
Deduplication can be implemented in inline mode, performing the reduction as data is written, or in post-process
A typical implementation uses a metadata index of unique data blocks identified by cryptographic hashes. When
Benefits include reduced storage capacity, lower bandwidth usage, and faster backups for suitable data sets. Limitations