passbandit
Passbandit is a term used in the study of sequential decision making to describe a bandit-style problem in which the feedback signal is filtered by a passband. The concept blends ideas from the multi-armed bandit framework with a signal-processing view of observations: rewards (or sensing signals) are subject to a frequency-domain filter that passes only components within a specified frequency range and attenuates others. In this formulation, at each time step the agent selects an action from a finite set, and the observed reward is the output of a latent stochastic process that is assumed to be band-limited in time. The observed reward may be the filtered version of the latent reward, introducing temporal correlations and potential non-stationarity into the feedback.
Formulations of the passbandit problem generally consider whether the filter is known or unknown, and whether
Algorithmic approaches to passbandit adapt classical bandit methods to account for filtering effects. Techniques include incorporating
See also: multi-armed bandit, contextual bandits, Thompson sampling, Kalman filtering, signal processing.