Home

QMDP

QMDP is an approximate method for solving partially observable Markov decision processes (POMDPs). It leverages the value function and action-values of the underlying fully observable Markov decision process (MDP) to generate a policy that operates on belief states rather than concrete world states.

Algorithmically, the approach first solves the MDP defined by states S, actions A, transition model T, rewards

Properties and limitations: QMDP is a simple, computationally efficient heuristic for POMDPs because it reduces planning

Applications and relationships: It is often used as a fast baseline or approximation in planning under uncertainty,

R,
and
discount
factor
gamma
to
obtain
the
optimal
Q-values
Q*(s,
a)
for
all
s
in
S
and
a
in
A.
For
any
belief
distribution
b
over
states,
QMDP
computes
the
belief-action
value
as
Q_MDP(b,
a)
=
sum
over
s
of
b(s)
times
Q*(s,
a).
The
recommended
action
is
the
one
that
maximizes
Q_MDP(b,
a):
pi(b)
=
argmax_a
Q_MDP(b,
a).
This
yields
a
stationary,
belief-based
policy
without
requiring
full
POMDP
solution.
to
solving
an
MDP.
It
is
not
guaranteed
to
be
optimal
for
the
original
POMDP,
since
it
assumes
that
after
taking
an
action
the
state
becomes
known
(full
observability
after
one
step).
As
a
result,
its
performance
depends
on
problem
structure
and
how
quickly
uncertainty
is
resolved.
particularly
in
robotics
and
autonomous
systems
where
exact
POMDP
solvers
are
intractable.
It
sits
among
various
POMDP
approximations,
offering
a
blend
of
simplicity
and
speed
at
the
cost
of
potential
suboptimality
compared
to
more
accurate
methods
like
point-based
value
iteration.