TDlambda - Infinite Lexicon - Infinite Lexicon

TDlambda

TDlambda, commonly written as TD(λ), is a family of temporal-difference learning algorithms used in reinforcement learning to estimate value functions. It blends bootstrapping from TD methods with information gathered from longer sequences through eligibility traces, governed by the trace-decay parameter λ in the interval [0, 1]. The approach updates estimates online as an agent interacts with an environment and encompasses a spectrum: it reduces to TD(0) when λ is 0 and approaches Monte Carlo methods as λ approaches 1.

The concept has a forward and a backward view. The forward view uses λ-returns, a weighted sum

TD(λ) includes on-policy and off-policy variants. On-policy examples include Sarsa(λ) for action-value estimation, while off-policy variants

Limitations include sensitivity to the choice of λ and potential instability with certain function-approximation settings, particularly in

=

+

γ

−

=

γ

λ

+

representation.

←

+

α

a