Home

PrePruning

Prepruning, also known as early stopping, is a technique used during the construction of decision trees to halt growth before the model perfectly fits the training data. The goal is to prevent overfitting by keeping the tree simple and more generalizable. In prepruning, a stopping rule is applied at each node to decide whether to split further; if the rule is not satisfied, the node becomes a leaf.

Prepruning is contrasted with post-pruning, where the tree is allowed to grow fully and then trimmed back

Common stopping criteria include a maximum depth for the tree, a minimum number of samples required to

Advantages of prepruning are reduced model complexity, faster training, and often less risk of overfitting in

In practice, prepruning is implemented via model parameters in many decision tree algorithms, such as setting

based
on
performance
on
validation
data.
Prepruning
trades
potential
accuracy
on
the
training
set
for
simplicity
and
often
better
generalization
on
unseen
data.
split
a
node,
a
minimum
information
gain
or
impurity
decrease,
a
maximum
number
of
nodes,
or
a
threshold
on
the
reduction
in
impurity.
In
regression
trees,
criteria
can
be
based
on
reductions
in
mean
squared
error.
limited
data
settings.
Disadvantages
include
potential
underfitting
if
stopping
rules
are
too
strict
and
dependence
on
a
chosen
validation
strategy
to
set
thresholds.
Prepruning
requires
careful
selection
of
criteria,
sometimes
via
cross-validation.
a
maximum
depth,
minimum
samples
per
leaf,
or
minimum
information
gain.
It
is
commonly
used
when
training
data
is
noisy
or
when
interpretability
and
training
efficiency
are
priorities.