Home

POMCP

Partially Observable Monte Carlo Planning (POMCP) is an online planning algorithm designed for partially observable Markov decision processes (POMDPs). It combines Monte Carlo tree search with a particle-based representation of beliefs, enabling planning in large or continuous state spaces without requiring an explicit belief update formula.

The algorithm operates by interacting with a generative model of the environment that can sample state transitions

Key advantages include scalability to large or continuous state spaces, because particle representations and simulation-based planning

Limitations involve dependence on the quality of the generative model and the computational budget for simulations

POMCP was introduced by Silver and Veness in 2010 as a scalable method for large POMDPs, notably

and
observations
given
a
state
and
action.
It
builds
a
search
tree
where
nodes
correspond
to
action–observation
histories.
During
simulations,
POMCP
uses
a
variant
of
the
UCT
(upper
confidence
bounds
applied
to
trees)
selection
rule
to
explore
promising
actions,
and
it
branches
on
possible
observations
to
account
for
partial
observability.
Leaves
of
the
search
are
rolled
out
with
a
default
policy
to
estimate
value,
and
values
are
backpropagated
to
inform
action
selection.
The
current
belief
is
represented
by
a
set
of
particles,
which
are
propagated
through
the
model
and
resampled
in
light
of
simulated
observations.
avoid
enumerating
the
entire
state
space.
POMCP
also
requires
only
a
generative
model,
not
closed-form
solutions
for
belief
updates
or
value
functions.
It
is
particularly
effective
in
online,
real-time
planning
tasks
under
uncertainty.
and
particles.
Performance
hinges
on
the
number
of
particles,
the
number
of
simulations
per
decision,
and
horizon
length.
It
has
found
applications
in
robotics,
autonomous
systems,
and
other
domains
where
planning
under
partial
observability
is
essential.
described
in
the
paper
Monte
Carlo
Planning
in
Large
POMDPs.