Duplicatiegraad

Duplicatiegraad, or duplication rate, is a metric used to describe the proportion of duplicates within a dataset, content collection or sequencing data. It expresses how much data is redundant and is often given as a percentage. The exact interpretation can depend on the domain and the method used to identify duplicates.

In data storage and backup, duplicatiegraad refers to the share of data that consists of duplicate blocks

In genomics and sequencing, duplication rate indicates the fraction of reads that are identical copies, typically

In text corpora or document collections, duplication rate refers to the proportion of documents or passages

Factors influencing duplicatiegraad include data collection practices, sampling depth, preprocessing methods and library preparation in sequencing.

A

a

A

near-duplicates

fingerprinting,

content-similarity