poolbased

Pool-based active learning is a framework in machine learning for efficient labeling, where a model selects which instances from a large pool of unlabeled data should be labeled by an oracle or human annotator. The approach assumes that labels are costly, while unlabeled data is abundant. The typical workflow starts with a small labeled dataset, a large unlabeled pool, and a learning algorithm. At each iteration, a query strategy identifies a subset of unlabeled instances to be labeled, these labeled examples are added to the training set, and the model is retrained. This loop continues until labeling resources are exhausted or a satisfactory performance is reached. Pool-based methods contrast with other settings that either label data on demand or operate on streaming data.

Common query strategies include uncertainty sampling (selecting instances where the model is least confident), margin or

Advantages of pool-based approaches include making efficient use of labeling budgets and being applicable to large,

Pool-based active learning remains a foundational technique for reducing annotation effort while aiming to preserve or

density-weighted

representative,