banditbased
Bandit-based refers to approaches and systems that base their decision-making on the multi-armed bandit framework. In this framework, an agent repeatedly selects from a finite set of actions (arms) and receives a reward from an unknown distribution. The goal is to maximize cumulative reward over time by balancing exploration of less-known actions with exploitation of actions that currently appear best. Bandit-based methods emphasize online learning from feedback rather than relying on pre-collected labeled data.
There are several variants of the bandit problem. Stochastic bandits assume each arm has a fixed reward
Common algorithms include epsilon-greedy, which selects random arms with a fixed probability, and upper confidence bound
Applications span online advertising, A/B testing, recommendation systems, adaptive experimentation, clinical trials, and robotics, where decisions