Modelparallel
Modelparallel, or model parallelism, is a distributed computing approach used to train and run machine learning models that exceed the memory capacity of a single device. Rather than duplicating an entire model on each processor, its components—such as layers or subtensors—are placed on different devices, with computation and communication coordinating the forward and backward passes.
Two common strategies are inter-layer (pipeline) parallelism and intra-layer (tensor) parallelism. Inter-layer splits assign successive layers
Implementation relies on scheduling and communication optimization, including micro-batching, activation checkpointing, and careful shard placement. Performance
Model parallelism is essential for very large models such as billions of parameters that do not fit