SGDs - Infinite Lexicon - Infinite Lexicon

SGDs

Stochastic Gradient Descent (SGD) is an iterative optimization algorithm used to minimize differentiable objective functions that can be expressed as a sum over data points. In machine learning, it is commonly applied to train models by minimizing a loss function defined on training examples.

At each iteration, SGD updates the model parameters θ by moving in the direction opposite to the

Convergence properties depend on the objective: for convex objectives, SGD can converge to a global minimum

Variants and extensions include momentum, Nesterov accelerated gradient, AdaGrad, RMSProp, and Adam. These variants modify how

Applications are widespread in training large-scale models, especially neural networks, as well as linear models and

θ.

a

θ

:=

θ

−

η

∇ℓ(θ;

a

B

θ

:=

θ

−

η

∇ℓ(θ;

η

a

a

a