PolicyEvaluation
PolicyEvaluation refers to the process of determining the value of a fixed policy in a Markov decision process (MDP). The goal is to compute the expected discounted return when following the given policy from each state, enabling assessment of policy quality without altering the policy itself. The action-value counterpart, Q^π(s, a), describes the expected return starting from state s, taking action a, and thereafter following π.
Formally, for a policy π and discount factor γ in [0, 1), the value function is V^π(s) = Eπ[
Algorithms vary from exact to approximate. Exact solutions solve the linear equations directly; iterative policy evaluation
Policy evaluation is a foundational step in policy iteration and many actor-critic algorithms, providing the value