recomputes
Recomputes, in the context of automatic differentiation and neural network training, refer to the practice of re-evaluating portions of the forward computation during the backward pass instead of storing all intermediate activations. This technique, often called activation checkpointing or rematerialization, reduces peak memory usage at the cost of extra forward computations.
How it works: The forward graph is divided into segments or checkpoints. During backpropagation, only a limited
Benefits and trade-offs: Recomputes enable training with larger models or larger batch sizes by lowering memory
Implementation and usage: Common in large-scale models such as transformers. Frameworks like PyTorch offer activation checkpointing
Limitations and considerations: Recomputing must be deterministic to preserve gradient correctness; stochastic layers can complicate checkpoints.