GradScaler

GradScaler is a utility used in automatic mixed precision (AMP) training to improve performance and reduce memory usage on modern GPUs. It mitigates numeric underflow when using float16 by dynamically scaling gradients during backpropagation, helping preserve small gradient values that might otherwise vanish.

The core idea is to multiply the loss by a scale factor before backpropagation. After backward, gradients

In PyTorch, GradScaler is available under torch.cuda.amp. A typical workflow involves creating a GradScaler instance and

Benefits include improved memory efficiency and potential speedups on compatible GPUs, along with reduced risk of

a

scale(loss).backward(),

scaler.step(optimizer)

scaler.update().