Home

positiveunlabeled

Positive-unlabeled learning (PU learning) is a subset of semi-supervised machine learning that uses training data consisting of positively labeled instances and unlabeled examples. The unlabeled set may contain both positives and negatives, but negative labels are not explicitly provided.

The objective is to build a classifier that can separate positives from negatives using only the positive

Methodologically, PU learning falls into two broad families. One consists of two-step approaches that identify a

A common practical task is estimating the overall proportion of positives in the population (the class prior),

PU learning has applications in text classification, bioinformatics, fraud detection, and other domains where negative labels

examples
and
the
unlabeled
data.
A
central
difficulty
is
that
the
true
class
distribution—especially
the
proportion
of
positives
among
the
unlabeled
examples—is
unknown.
PU
learning
often
relies
on
assumptions
about
how
positives
are
selected
for
labeling,
such
as
positives
being
a
random
sample
of
all
positives
(the
SCAR
assumption).
set
of
reliable
negatives
within
the
unlabeled
data
and
then
train
a
conventional
classifier.
The
other
consists
of
direct
PU
methods
that
estimate
a
risk
or
loss
function
using
only
positives
and
unlabeled
data,
for
example
through
non-negative
risk
estimators
or
unbiased
risk
estimators;
variants
include
PU-SVM
and
neural
network-based
implementations
such
as
nnPU.
which
improves
calibration
and
decision
thresholds.
Evaluation
typically
requires
a
labeled
test
set
with
true
labels;
metrics
such
as
AUROC
or
precision-recall
are
used,
with
emphasis
on
robustness
to
the
assumed
selection
mechanism.
are
costly
or
unavailable.
Its
effectiveness
depends
on
the
validity
of
assumptions
about
the
labeling
process
and
the
representativeness
of
the
unlabeled
data.