ristikvalidoinnin
Ristikvalidointi, or cross‑validation, is a statistical technique used to assess how a predictive model will generalise to an independent data set. The basic idea is to partition the available data into complementary subsets, train the model on one subset and validate it on the other. By repeating this process several times, one obtains an estimate of the model’s predictive performance that is less biased than a simple split of the data into a single training and test set.
The most common variant is k‑fold cross‑validation. The data set is divided into k roughly equal parts,
Other variants include stratified k‑fold, which preserves the class distribution in classification problems, and repeated cross‑validation,
Cross‑validation is widely used in machine learning, statistical modelling, and bioinformatics for model selection, hyper‑parameter tuning,