SemiSupervised
Semisupervised learning is a machine learning paradigm that uses both labeled and unlabeled data for training. It aims to improve performance when labeled data are scarce by leveraging the information contained in a large pool of unlabeled samples.
SSL relies on assumptions about data structure, such as the cluster assumption (points in the same cluster
Common approaches include self-training, where a model trained on labeled data labels unlabeled examples and retrains
Applications span image and video classification, natural language processing, speech recognition, and biology, particularly when labeled
Historically, the concept emerged in the 1990s, with early work such as co-training proposed by Blum and