Home

learningrate

Learning rate is a hyperparameter that determines the size of the parameter updates during optimization. In gradient-based training, the update rule is theta := theta - eta * g(theta), where eta is the learning rate and g(theta) is the gradient of the loss with respect to theta.

Choosing eta affects convergence. A rate that is too large can cause divergence or unstable oscillations; a

Learning-rate schedules modify eta during training. Fixed learning rates remain constant, while schedules such as step

Some optimizers adapt learning rates automatically for each parameter. Algorithms like AdaGrad, RMSProp, and Adam adjust

Practical guidance includes starting with a reasonable base value (commonly around 1e-3 for deep nets), using

rate
that
is
too
small
leads
to
very
slow
progress
and
may
trap
the
optimization
in
shallow
minima
or
plateaus.
The
optimal
setting
depends
on
the
model,
data,
and
optimization
algorithm
and
often
requires
empirical
tuning.
decay,
exponential
decay,
polynomial
decay,
cosine
annealing,
or
cyclic
learning
rates
progressively
reduce
or
vary
the
rate.
Warmup
starts
with
a
small
eta
and
increases
it
gradually
to
stabilize
early
updates.
step
sizes
during
training,
effectively
using
per-parameter
learning
rates.
These
methods
can
reduce
the
need
for
global
tuning
but
introduce
additional
hyperparameters
to
manage.
an
LR
finder
or
a
scheduled
decay,
and
monitoring
validation
performance
to
adjust.
In
deep
learning,
warmup
periods
and
smaller
learning
rates
for
larger
batch
sizes
are
common
adjustments.
In
reinforcement
learning,
the
learning
rate
controls
the
magnitude
of
updates
to
value
or
policy
estimates,
while
in
supervised
learning
it
governs
gradient
steps
during
training.