Databias
Databias is a term used to describe systematic distortions in data that affect analyses, predictions, and decisions derived from that data. It can originate from how data are collected, labeled, stored, or processed, and often reflects real-world inequities or measurement errors. Databias is not only a property of the data but of the entire data pipeline, including sampling methods, recording practices, and preprocessing steps. It can operate alone or amplify existing model biases when used to train or validate algorithms.
Common sources include sampling bias (unrepresentative samples), measurement or labeling bias (inconsistent or subjective labels), historical
Detection and evaluation involve data audits, exploratory analysis, and fairness metrics; the use of datasets with
Impact and implications include the potential for unfair or inaccurate predictions in hiring, lending, criminal justice,
Governance and standards emphasize risk management, transparency, and auditing. Practices such as dataset documentation, model cards,