codistill
Codistill, or co-distillation, is a collaborative learning paradigm in machine learning in which multiple models are trained simultaneously with mutual distillation of knowledge. Each model serves as both student and teacher, generating soft targets or intermediate representations that are shared with peer models. The training objective combines the standard supervised loss on labeled data with a distillation loss that encourages alignment with the peers' outputs or representations. Depending on the setup, sharing may involve exchanged logits, probability distributions with temperature scaling, or feature maps, and may occur in centralized or decentralized (federated) settings.
Variants include mutual logit distillation across models trained on different data partitions or architectures, and within-holistic
Relation to related concepts: has roots in knowledge distillation and co-training. Distinct from standard teacher-student distillation
Limitations and considerations: training complexity and communication overhead; risk of converging to similar solutions; benefits depend