Home

SemiSupervised

Semisupervised learning is a machine learning paradigm that uses both labeled and unlabeled data for training. It aims to improve performance when labeled data are scarce by leveraging the information contained in a large pool of unlabeled samples.

SSL relies on assumptions about data structure, such as the cluster assumption (points in the same cluster

Common approaches include self-training, where a model trained on labeled data labels unlabeled examples and retrains

Applications span image and video classification, natural language processing, speech recognition, and biology, particularly when labeled

Historically, the concept emerged in the 1990s, with early work such as co-training proposed by Blum and

tend
to
share
a
label)
and
the
manifold
or
smoothness
assumptions,
which
suggest
that
labels
vary
slowly
along
the
data
geometry.
When
these
assumptions
hold,
unlabeled
data
can
reveal
underlying
structure
and
guide
the
learning
process.
on
high-confidence
predictions;
co-training,
where
multiple
classifiers
teach
each
other;
graph-based
methods
that
propagate
labels
on
a
similarity
graph;
and
semi-supervised
deep
learning
using
consistency
regularization
and
pseudo-labeling.
Variants
such
as
mean
teacher
and
Pi-Model
are
notable
in
neural
settings,
especially
with
modern
data
augmentation.
data
are
limited
relative
to
the
available
unlabeled
data.
Practical
challenges
include
ensuring
that
SSL
assumptions
hold
for
a
given
domain,
avoiding
confirmation
bias
from
incorrect
pseudo-labels,
handling
class
imbalance,
and
managing
computational
resources
for
large
unlabeled
datasets.
Mitchell
(1998).
In
recent
years,
semisupervised
methods
have
gained
prominence
in
deep
learning,
where
semi-supervised
objectives
and
augmentation
strategies
are
widely
used.