QMDP

QMDP is an approximate method for solving partially observable Markov decision processes (POMDPs). It leverages the value function and action-values of the underlying fully observable Markov decision process (MDP) to generate a policy that operates on belief states rather than concrete world states.

Algorithmically, the approach first solves the MDP defined by states S, actions A, transition model T, rewards

Properties and limitations: QMDP is a simple, computationally efficient heuristic for POMDPs because it reduces planning

Applications and relationships: It is often used as a fast baseline or approximation in planning under uncertainty,

s

S

a

b

=

s

=

a

a

approximations,

a