temporaldifferensinlärning
Temporal difference learning is a core concept in reinforcement learning. It refers to a class of algorithms that learn by estimating the difference between successive temporal predictions. Instead of waiting for a final outcome, temporal difference methods update their value estimates as soon as a new piece of information becomes available. This makes them "model-free," meaning they do not need to know the underlying rules of the environment to learn.
The fundamental idea is to improve an estimate of a value function based on a sampled transition
Common algorithms that utilize temporal difference learning include Q-learning and SARSA. These algorithms differ in how