exp3
exp3, short for Exponential-weight algorithm for Exploration and Exploitation, is an online learning algorithm designed for adversarial multi-armed bandit problems. It selects among K arms with unknown reward distributions, aiming to perform nearly as well as the best fixed arm in hindsight despite potentially hostile feedback. The algorithm was introduced by Auer, Cesa-Bianchi, Freund, and Schapire in 2002 and is widely cited in machine learning and decision theory.
In each round t, exp3 maintains a weight w_i(t) for each arm i, initialized equally (for example
The algorithm achieves a regret bound on the order of O(sqrt(K T log K)) against the best
EXP3 is applicable to online decision making under uncertainty and adversarial feedback, including online recommendations and