positiveunlabeled

Positive-unlabeled learning (PU learning) is a subset of semi-supervised machine learning that uses training data consisting of positively labeled instances and unlabeled examples. The unlabeled set may contain both positives and negatives, but negative labels are not explicitly provided.

The objective is to build a classifier that can separate positives from negatives using only the positive

Methodologically, PU learning falls into two broad families. One consists of two-step approaches that identify a

A common practical task is estimating the overall proportion of positives in the population (the class prior),

PU learning has applications in text classification, bioinformatics, fraud detection, and other domains where negative labels

A

distribution—especially

a

a

a

implementations

a

precision-recall

representativeness