Home

validatieset

A validatieset, in machine learning and data science, is a subset of labeled data used during model development to estimate the model’s performance and to guide decisions about model selection and hyperparameter tuning. It sits between the training data and the final evaluation data, helping to assess how well a model trained on the training set generalizes to unseen data.

In practice, the available data are often split into three parts: training, validation, and test. The model

Key considerations include avoiding data leakage between the sets, ensuring that splits respect the underlying data

is
trained
on
the
training
set,
and
its
performance
is
evaluated
on
the
validation
set
to
compare
different
models
or
to
adjust
hyperparameters.
Once
a
model
and
its
settings
are
chosen,
the
final
assessment
of
its
generalization
capability
is
typically
performed
on
the
test
set,
which
should
remain
untouched
during
development
to
provide
an
unbiased
estimate
of
real-world
performance.
distribution
(for
example,
stratifying
by
class
labels
in
imbalanced
datasets),
and
choosing
an
appropriate
size
for
the
validation
set.
When
data
are
scarce,
cross-validation
or
nested
cross-validation
can
be
used
instead
of
a
single
validation
split
to
obtain
more
reliable
performance
estimates.
In
time-series
problems,
the
validation
split
should
respect
temporal
order
to
prevent
future
information
from
leaking
into
past
predictions.
The
choice
of
metrics
on
the
validation
set
depends
on
the
task,
such
as
accuracy
or
AUC
for
classification,
or
RMSE
and
MAE
for
regression.