A3C - Infinite Lexicon - Infinite Lexicon

A3C

A3C, short for Asynchronous Advantage Actor-Critic, is a value-based reinforcement learning algorithm that combines multiple parallel actor-learner processes with a shared, global network. Introduced by DeepMind researchers in 2016, it was designed to stabilize and speed up training for deep policies by decorrelating the data collected by agents.

In A3C, several worker agents run in parallel in separate instances of the environment. Each worker maintains

The learning objective combines three components: a policy gradient term using the advantage A(s, a) = Rt

A3C is applicable to a range of tasks, notably discrete-action environments such as Atari games, and often

a

θ)

a

−

a

parameterization

a

A