MuZero
MuZero is a model-based reinforcement learning algorithm developed by DeepMind. Introduced in 2019, MuZero achieves high performance by combining planning with a learned world model. Unlike traditional model-based methods that aim to model the environment's dynamics, MuZero learns a compact latent representation and separate models for state transitions, rewards, and value/policy. The agent uses Monte Carlo Tree Search to plan by simulating outcomes in this learned model, using the results to select actions. The environment state need not be known or perfectly modeled; planning operates in the latent space.
MuZero comprises three neural networks: a representation network that maps observations to a latent state, a
Planning uses MuZero's learned model to perform tree search, accumulating value estimates and rewards along simulated
MuZero has demonstrated strong performance on Atari 2600 games, and on traditional board games such as Go,
Variants and follow-up work have explored improvements in efficiency and data reuse, as well as broader applications.