exp3 - Infinite Lexicon - Infinite Lexicon

exp3

exp3, short for Exponential-weight algorithm for Exploration and Exploitation, is an online learning algorithm designed for adversarial multi-armed bandit problems. It selects among K arms with unknown reward distributions, aiming to perform nearly as well as the best fixed arm in hindsight despite potentially hostile feedback. The algorithm was introduced by Auer, Cesa-Bianchi, Freund, and Schapire in 2002 and is widely cited in machine learning and decision theory.

In each round t, exp3 maintains a weight w_i(t) for each arm i, initialized equally (for example

The algorithm achieves a regret bound on the order of O(sqrt(K T log K)) against the best

EXP3 is applicable to online decision making under uncertainty and adversarial feedback, including online recommendations and

=

a

=

-

*

/

+

/

∈

a

a

∈

=

/

=

0

j

≠

=

*

*

/

K

),

T

a

≈

K

/

high-probability