Home

CrossValidation

Cross-validation is a statistical method used to assess how the results of a predictive model will generalize to an independent dataset. It provides an estimate of model performance on unseen data and supports model comparison and hyperparameter tuning while reducing overfitting.

The basic approach is the hold-out method, which splits the data into a training set and a

Nested cross-validation combines an inner loop for model selection or hyperparameter tuning with an outer loop

Common choices of k balance bias and variance; 5 or 10 folds are typical. LOOCV may be

Practical considerations include standardizing or scaling within folds, maintaining data leakage avoidance, and using stratification for

separate
test
set.
More
robust
estimates
are
obtained
with
k-fold
cross-validation,
where
the
data
are
divided
into
k
folds.
The
model
is
trained
on
k−1
folds
and
evaluated
on
the
remaining
fold,
with
the
process
repeated
k
times
and
the
results
averaged.
Stratified
k-fold
preserves
the
class
distribution
in
classification
tasks.
Leave-one-out
cross-validation
uses
every
observation
once
as
a
test
instance,
which
can
yield
low
bias
but
high
variance
and
substantial
computational
cost.
Repeated
or
Monte
Carlo
cross-validation
performs
several
random
splits
to
stabilize
the
estimate.
for
unbiased
performance
estimation,
helping
to
prevent
optimistic
bias
in
reported
results.
impractical
for
large
datasets.
When
data
are
time-ordered,
conventional
cross-validation
can
leak
temporal
structure;
techniques
such
as
forward
chaining
or
rolling-origin
forecasting
are
used.
imbalanced
tasks.
The
outcomes
of
cross-validation
are
performance
estimates
(for
example
accuracy,
AUC,
RMSE)
rather
than
final
measurements
on
a
truly
independent
test
set.
Cross-validation
is
a
versatile
tool
in
model
evaluation
and
selection,
complementary
to
a
separate
hold-out
test
set
for
final
assessment.