DistGrv
DistGrv is a class of optimization methods for distributed training that employ gradient variance reduction to accelerate convergence on finite-sum objectives common in machine learning. It extends classic variance-reduced techniques such as SVRG and SAGA to multi-node environments, combining local stochastic gradients with a periodically refreshed reference gradient computed from data distributed across workers.
Conceptually, each worker maintains a local parameter copy and a snapshot of its gradient; at regular intervals
DistGrv is applicable to problems with smooth finite-sum objectives, including convex and some non-convex models such
Practical considerations include choosing the update cadence for the reference gradient, balancing computation and communication, and