kfoldvalidaatiota
K-fold cross-validation is a statistical method used to estimate the skill of machine learning models. It is a resampling procedure used to evaluate machine learning models on a limited data sample. The procedure has a single parameter called k that refers to the number of groups that a given data sample is to be split into. As such, the procedure is often called k-fold cross-validation. When a specific value for k is chosen, it may be used in place of k in the reference to the model, such as k=10 becoming 10-fold cross-validation.
The general procedure is as follows:
1. Shuffle the dataset randomly.
2. Split the dataset into k groups.
a. Take the group as a hold out or test data set.
b. Take the remaining groups as a training data set.
c. Fit a model on the training set and evaluate it on the test set.
d. Retain the evaluation score and discard the model.
4. Summarize the skill of the model using the sample of model evaluation scores.
The choice of k is often set to 5 or 10, but any number can be used.