datasettiincore
Datasettiincore is a community-driven specification and ecosystem designed to standardize the core datasets used in machine learning, data science, and related research. It provides a common data model, metadata schema, and governance practices intended to improve interoperability, provenance, and reproducibility across projects that rely on publicly available or licensed data.
The core of datasettiincore is its modular data model. It defines a minimal set of metadata fields
Licensing and governance are central to datasettiincore. The project promotes open licenses compatible with research and
Adoption of datasettiincore aims to enhance discovery, comparison, and reproducibility of experiments. By enabling consistent metadata
As an open specification, datasettiincore invites participation from researchers, data stewards, and platform providers.