tAUCB

tAUCB, short for temporal Adaptive Upper Confidence Bound, is a class of sequential decision-making algorithms designed for multi-armed bandit problems in non-stationary environments. It generalizes the classic UCB approach by incorporating time-awareness to handle changing reward distributions over time.

In tAUCB, each arm maintains a time-weighted estimate of its expected reward. This is achieved using either

Variants of tAUCB differ in how they implement weighting, the exact form of the confidence term, and

Applications of tAUCB arise in settings where reward distributions evolve, including online advertising, recommender systems, and

a

change-detection

a

a

hyperparameters