Home

cotraining

Co-training is a semi-supervised machine learning technique in which two or more classifiers are trained on different, conditionally independent views of the same data and iteratively teach each other using unlabeled data. The approach aims to improve learning when labeled data are scarce by exploiting abundant unlabeled examples.

Typically, the feature space is split into two views that are each predictive of the class on

Algorithm: start with a small labeled set and train a separate classifier on each view. Apply the

Assumptions and considerations: co-training relies on two main conditions: (1) each view is sufficient for predicting

Applications include text classification, web page categorization, image recognition with multi-modal features, and bioinformatics. Limitations include

their
own,
and
together
are
sufficient
to
determine
the
label
under
the
underlying
distribution.
The
method
was
introduced
by
Blum
and
Mitchell
in
1998
as
a
theoretical
framework
for
semi-supervised
learning
with
multiple
views.
classifiers
to
a
pool
of
unlabeled
instances,
and
select
the
most
confident
predictions
from
each
view.
Add
these
newly
labeled
instances
to
the
labeled
pool,
using
the
corresponding
label
from
the
opposite
view
for
training.
Repeat
until
a
stopping
criterion
is
met.
the
label,
and
(2)
the
views
are
conditionally
independent
given
the
class
label.
In
practice,
these
assumptions
may
be
violated,
which
can
lead
to
error
propagation.
Variants
relax
independence
or
use
multiple
views,
co-regularization,
or
agreement-based
criteria.
the
need
for
valid
distinct
views,
sensitivity
to
the
initial
labeled
set,
and
potential
amplification
of
labeling
errors.
Related
approaches
include
self-training,
tri-training,
and
multi-view
learning.