momentumoptimointi
Momentumoptimointi, often translated as momentum optimization, is a technique used in machine learning and deep learning to speed up the convergence of training algorithms. It builds upon standard stochastic gradient descent (SGD) by incorporating information from past gradients to influence the direction of the current update. Instead of solely relying on the immediate gradient of the loss function, momentum methods accumulate a "velocity" term that represents a moving average of past gradients.
The core idea behind momentum is to dampen oscillations that can occur in SGD, especially in directions
A common implementation of momentum involves a decay factor, often denoted by beta (β), which controls how