Home

DDPG

Deep Deterministic Policy Gradient (DDPG) is a model-free, off-policy actor-critic algorithm designed for continuous control tasks. It combines the deterministic policy gradient with deep neural networks to learn a continuous action policy and a value function, using experience replay and target networks to stabilize learning. It was introduced by Lillicrap et al. in 2015.

The agent consists of an actor network μ(s|θμ) that outputs a specific action and a critic network

To improve stability, DDPG uses target networks with soft updates and a continuous action space bounded by

DDPG has inspired subsequent developments such as Twin Delayed DDPG (TD3) and DDPG with demonstrations (DDPGfD),

Q(s,a|θQ)
that
evaluates
the
action.
Training
uses
transitions
from
a
replay
buffer.
The
critic
is
updated
by
minimizing
the
TD
error
with
targets
y
=
r
+
γ
Q′(s′,
μ′(s′)),
where
Q′
and
μ′
are
target
networks.
The
actor
is
updated
by
gradient
ascent
on
Q
with
respect
to
the
actor
parameters:
∇θμ
Q(s,
μ(s|θμ)|θQ).
Exploration
is
typically
provided
by
a
temporally
correlated
noise
process
such
as
Ornstein–Uhlenbeck
noise,
added
to
the
actor’s
actions
during
data
collection.
squashing
functions
(e.g.,
tanh)
to
keep
actions
within
valid
limits.
It
is
trained
off-policy,
which
allows
learning
from
a
fixed
dataset,
but
can
be
sensitive
to
hyperparameters
and
experience
replay
correlations.
The
original
algorithm
is
deterministic
in
its
policy,
enabling
efficient
policy
gradients
but
requiring
good
exploration
strategies.
which
address
issues
like
overestimation
bias
and
sample
efficiency.
The
method
has
been
applied
to
simulated
robotics,
autonomous
control,
and
other
continuous-action
domains.