Pretrain
Pretrain, short for pretraining, refers to the practice of training a machine learning model on a broad, general dataset before adapting it to a specific downstream task. The aim is to learn versatile representations that can be refined with relatively small amounts of task-specific data, improving performance and reducing the need for extensive labeled datasets.
In natural language processing, preprocessing typically uses self-supervised objectives such as masked language modeling or autoregressive
After pretraining, the model is fine-tuned on a downstream task, such as classification, translation, or question
Pretraining is computationally intensive and typically conducted on large corpora or curated datasets. The resulting pretrained
Examples include BERT, GPT, and CLIP in their respective domains, as well as many domain-specific language or