SemiSupervised

Semisupervised learning is a machine learning paradigm that uses both labeled and unlabeled data for training. It aims to improve performance when labeled data are scarce by leveraging the information contained in a large pool of unlabeled samples.

SSL relies on assumptions about data structure, such as the cluster assumption (points in the same cluster

Common approaches include self-training, where a model trained on labeled data labels unlabeled examples and retrains

Applications span image and video classification, natural language processing, speech recognition, and biology, particularly when labeled

Historically, the concept emerged in the 1990s, with early work such as co-training proposed by Blum and

a

high-confidence

a

semi-supervised

pseudo-labeling.

a

semi-supervised