exp4
Exp4 is an online learning algorithm for adversarial multi-armed bandit problems that incorporates advice from a set of experts. It combines exponential weighting of expert opinions with controlled exploration to make decisions when only partial feedback is available.
In the standard setting, there are K arms (actions) and N experts. At each round t, every
To update the experts, exp4 forms an unbiased estimate of each expert’s gain. For the observed round,
Exp4 provides worst-case regret guarantees in the adversarial setting. With appropriate choices of γ and η, the expected
Variants and extensions include EXP4.P, which offers high-probability guarantees. Applications encompass online decision making with expert