SGDs
Stochastic Gradient Descent (SGD) is an iterative optimization algorithm used to minimize differentiable objective functions that can be expressed as a sum over data points. In machine learning, it is commonly applied to train models by minimizing a loss function defined on training examples.
At each iteration, SGD updates the model parameters θ by moving in the direction opposite to the
Convergence properties depend on the objective: for convex objectives, SGD can converge to a global minimum
Variants and extensions include momentum, Nesterov accelerated gradient, AdaGrad, RMSProp, and Adam. These variants modify how
Applications are widespread in training large-scale models, especially neural networks, as well as linear models and