pseudolabeling
Pseudolabeling is a semi-supervised learning technique in which a classifier trained on a labeled dataset is used to assign labels to unlabeled data. The newly labeled examples, called pseudolabels, are then added to the training data to retrain the model. Pseudolabeling is a form of self-training and is widely used when labeled data are scarce but unlabeled data are plentiful.
Typical procedure: train initial model on labeled set; apply it to the unlabeled set; select predictions with
Assumptions and considerations: the unlabeled data should come from the same distribution as the labeled data;
Challenges and limitations: pseudolabels can be incorrect and propagate errors, a phenomenon known as confirmation bias
Applications: used in computer vision, natural language processing, and speech recognition, particularly for image classification or