explorationexploitation

Exploration-exploitation, also called the exploration-exploitation trade-off, is a central concept in decision making under uncertainty. An agent must choose between exploring new actions to learn about their potential rewards and exploiting actions already believed to be favorable to maximize immediate payoff. The trade-off is particularly prominent in settings where outcomes are stochastic, information is costly to acquire, and future opportunities depend on what has been learned so far.

In the multi-armed bandit problem, a canonical formalization, the agent must allocate trials among several options

Practical approaches include optimism in the face of uncertainty, intrinsic motivation or curiosity signals, and Bayesian

A

exploration-exploitation

a