Research Article

Dynamical Motor Control Learned with Deep Deterministic Policy Gradient

Figure 2

Schematic illustration of the deep deterministic policy gradient method. The critic network approximates the value function by minimizing the TD error . The actor network is updated with the gradient from the critic. Two sets of actors and critics are exploited for stability, shown as the boxes of “actor” and “critic” and the boxes of “target actor” and “target critic,” respectively. The target critic and target actor are updated by “soft update” for stability (arc dashed arrows).