sementideta
Sementideta is a term used in data science to describe the phenomenon by which the initial seed data chosen for a learning process exerts a lasting influence on subsequent model behavior, outcomes, and evaluation results. The concept foregrounds how early data selections, sampling methods, and preprocessing choices can propagate through training, validation, and deployment, shaping decision boundaries, performance metrics, and observed biases. While not a formal theory, sementideta serves as a heuristic for analyzing and communicating the dependence of models on their starting data.
Origin and usage: The term emerged in the early 2020s in discussions of reproducibility and dataset design.
Applications: Researchers use sementideta to justify experimentation with varying seeds, to document seed sets in code
Reception: The idea has been met with both support and critique. Proponents argue it formalizes a practical