Home

SMOTE

SMOTE, which stands for Synthetic Minority Over-sampling Technique, is a data augmentation method used to address class imbalance in machine learning. It was introduced to reduce the bias of classifiers toward the majority class by creating synthetic examples of the minority class rather than simply duplicating existing ones.

The basic SMOTE algorithm operates in the feature space of the minority class. For each minority example,

Variants and related approaches extend SMOTE to address specific challenges. Borderline-SMOTE focuses on minority samples near

Limitations include the potential to create ambiguous samples in overlapped class regions, the introduction of noise,

the
algorithm
finds
its
k
nearest
neighbors
within
the
minority
class.
It
then
creates
new
synthetic
samples
by
selecting
one
of
these
neighbors
at
random
and
generating
a
point
along
the
line
segment
between
the
two
minority
samples.
The
number
of
synthetic
samples
to
generate
is
specified
by
an
oversampling
rate,
allowing
practitioners
to
control
how
much
the
minority
class
is
expanded.
SMOTE
works
best
with
numerical
features
and
is
commonly
used
in
conjunction
with
some
form
of
feature
scaling.
the
decision
boundary,
while
SMOTE-NC
handles
nominal
and
continuous
features.
Other
variants,
such
as
ADASYN
and
SVM-SMOTE,
adjust
the
generation
process
based
on
data
density
or
model
boundaries.
SMOTE
is
often
used
together
with
undersampling
techniques
or
ensemble
methods
to
further
improve
performance
on
imbalanced
datasets.
and
sensitivity
to
feature
scaling.
It
is
also
less
effective
when
the
minority
class
is
extremely
scarce
or
heavily
intertwined
with
the
majority
class.
Practitioners
typically
evaluate
SMOTE
as
part
of
a
broader
pipeline
that
includes
resampling,
feature
engineering,
and
appropriate
evaluation
metrics.