TDlambda
TDlambda, commonly written as TD(λ), is a family of temporal-difference learning algorithms used in reinforcement learning to estimate value functions. It blends bootstrapping from TD methods with information gathered from longer sequences through eligibility traces, governed by the trace-decay parameter λ in the interval [0, 1]. The approach updates estimates online as an agent interacts with an environment and encompasses a spectrum: it reduces to TD(0) when λ is 0 and approaches Monte Carlo methods as λ approaches 1.
The concept has a forward and a backward view. The forward view uses λ-returns, a weighted sum
TD(λ) includes on-policy and off-policy variants. On-policy examples include Sarsa(λ) for action-value estimation, while off-policy variants
Limitations include sensitivity to the choice of λ and potential instability with certain function-approximation settings, particularly in