mDP - Infinite Lexicon - Infinite Lexicon

mDP

An MDP, or Markov decision process, is a mathematical framework for modeling decision making in stochastic environments. It represents a decision-maker interacting with an environment over a sequence of time steps, where the outcome depends on both the current state and the action chosen. The model is defined by a five-tuple (S, A, P, R, γ): S is a set of states, A is a set of actions, P(s'|s,a) is the probability of transitioning to state s' when taking action a in state s, R(s,a) is the immediate reward received, and γ ∈ [0,1) is a discount factor that prioritizes sooner rewards.

A policy π specifies the action to take in each state. It can be deterministic (π(s) ∈ A)

Algorithms for solving MDPs include dynamic programming methods such as value iteration and policy iteration when

a

π

a

a

=

+

γ

=

+

γ

P

R

policy-gradient