Research Article

Dynamical Motor Control Learned with Deep Deterministic Policy Gradient

Figure 3

The HF learning of the RNN actor with the deterministic policy gradient. The RNN actor was unfolded in time to show the updates of the neural activity , action , and gradient propagation through time. The gradients of network weights were acquired with gradients of the value function propagated from the critic in DDPG (see (6)). , , and denoted the input, the recurrent, and the output weights, respectively. Note that only the task information (the start and target state) was fed to the network at the initial time step.