datasetter
Datasetter is a term used infrequently in data science to describe a person or system involved in assembling, curating, and maintaining datasets for analysis, training, or repository sharing. In practice, a datasetter may perform data collection, cleaning, labeling, annotation, and quality assurance, as well as documenting provenance and licensing. They work across domains—academia, industry, and government—and coordinate with data stewards and researchers to ensure datasets meet requirements for reliability, reproducibility, and compliance.
Key responsibilities include data sourcing and selection, cleaning and deduplication, normalization and schema design, metadata creation,
Common tools and approaches involve data cataloging platforms, version control for data (such as data versioning
Challenges include bias and representativeness, privacy and consent restrictions, licensing compatibility, and handling evolving data. The