kfoldristivalidointi
K-fold cross-validation is a statistical method used to estimate the skill of machine learning models. It is a resampling procedure used to evaluate machine learning models on a limited data sample. The procedure has a single parameter called k that refers to the number of groups that a given data sample is to be split into. As such, the procedure is often called k-fold cross-validation. When a specific value for k is chosen, it may be used in place of k in the reference to the model, such as k=10 becoming 10-fold cross-validation.
The general procedure is as follows:
1. Shuffle the dataset randomly.
2. Split the dataset into k groups.
a. Take the group as a hold out or test data set.
b. Take the remaining groups as a training data set.
c. Fit a model on the training set and evaluate it on the test set.
d. Retain the evaluation score and discard the model.
4. Summarize the skill of the model using the sample of model evaluation scores.
The result is a single estimate of the performance of the model. This estimate is more accurate