Home

Recomputes

Recomputes, in the context of automatic differentiation and neural network training, refer to the practice of re-evaluating portions of the forward computation during the backward pass instead of storing all intermediate activations. This technique, often called activation checkpointing or rematerialization, reduces peak memory usage at the cost of extra forward computations.

How it works: The forward graph is divided into segments or checkpoints. During backpropagation, only a limited

Benefits and trade-offs: Recomputes enable training with larger models or larger batch sizes by lowering memory

Implementation and usage: Common in large-scale models such as transformers. Frameworks like PyTorch offer activation checkpointing

Limitations and considerations: Recomputing must be deterministic to preserve gradient correctness; stochastic layers can complicate checkpoints.

set
of
activations
is
kept;
to
compute
gradients
within
a
segment,
the
relevant
forward
computations
are
re-executed
to
recreate
the
needed
activations.
This
trades
compute
for
memory
and
is
supported
by
many
automatic
differentiation
frameworks
through
built-in
checkpoint
controls.
footprint.
The
downside
is
increased
runtime
due
to
additional
forward
passes
and
potential
cache
inefficiencies.
Effectiveness
depends
on
model
structure
and
how
aggressively
recomputation
is
applied.
utilities
to
specify
which
parts
to
recompute.
Strategies
vary
from
simple
segmenting
to
sophisticated
schedules
that
balance
memory
and
compute.
In
some
systems,
recomputation
can
be
combined
with
offloading
to
slower
memory
to
further
reduce
peak
usage.
There
is
overhead
from
recomputation,
and
improper
schedules
can
negate
benefits.
It
does
not
reduce
total
FLOPs
in
the
same
way
as
model
pruning;
it
lowers
memory
while
increasing
computation.