datasetdependency
Dataset dependency refers to the reliance of analyses, models, and data pipelines on the characteristics and availability of one or more datasets. It covers how outputs are shaped by data sources, schemas, labels, and preprocessing steps. It is distinct from software dependencies and emphasizes data-driven constraints and assumptions.
Key dimensions of dataset dependency include distributional properties (covariate and concept shift), data quality (missingness, errors
Managing dataset dependency involves practices such as dataset versioning, recording data provenance and lineage, and documenting
Mitigation strategies include monitoring for data drift, applying domain adaptation when appropriate, curating diverse and representative