Home

explorationexploitation

Exploration-exploitation, also called the exploration-exploitation trade-off, is a central concept in decision making under uncertainty. An agent must choose between exploring new actions to learn about their potential rewards and exploiting actions already believed to be favorable to maximize immediate payoff. The trade-off is particularly prominent in settings where outcomes are stochastic, information is costly to acquire, and future opportunities depend on what has been learned so far.

In the multi-armed bandit problem, a canonical formalization, the agent must allocate trials among several options

Practical approaches include optimism in the face of uncertainty, intrinsic motivation or curiosity signals, and Bayesian

with
unknown
payout
distributions.
The
objective
is
to
maximize
cumulative
reward
over
time,
which
requires
collecting
information
about
arms
while
not
sacrificing
too
much
short-term
gain.
A
range
of
strategies
address
this
balance:
epsilon-greedy
(mostly
exploiting
with
occasional
random
exploration),
softmax
or
Boltzmann
exploration
(probabilistic
choice
proportional
to
estimated
value),
and
more
sophisticated
methods
such
as
upper
confidence
bound
(UCB)
and
Thompson
sampling
(posterior
sampling).
In
reinforcement
learning,
the
dilemma
extends
to
sequential
decision
making
in
Markov
decision
processes,
where
planning
and
learning
are
interleaved.
methods
that
quantify
uncertainty
to
guide
exploration.
Applications
span
online
advertising
and
recommender
systems,
clinical
trials,
robotics,
and
game
playing.
Challenges
arise
in
non-stationary
environments,
where
reward
distributions
change,
or
where
exploration
carries
high
costs
or
risks.
Understanding
and
managing
the
exploration-exploitation
balance
remains
a
core
task
in
both
theory
and
applied
AI,
cognitive
science,
and
economics.