learningrate
Learning rate is a hyperparameter that determines the size of the parameter updates during optimization. In gradient-based training, the update rule is theta := theta - eta * g(theta), where eta is the learning rate and g(theta) is the gradient of the loss with respect to theta.
Choosing eta affects convergence. A rate that is too large can cause divergence or unstable oscillations; a
Learning-rate schedules modify eta during training. Fixed learning rates remain constant, while schedules such as step
Some optimizers adapt learning rates automatically for each parameter. Algorithms like AdaGrad, RMSProp, and Adam adjust
Practical guidance includes starting with a reasonable base value (commonly around 1e-3 for deep nets), using