Research Article

Dynamical Motor Control Learned with Deep Deterministic Policy Gradient

Figure 4

Training and validation of the reaching movement generation. (a) The trajectories of random reaches after training. The start and target points were drawn from a disk-shaped work space. The trajectories were color-coded by the scale of error that measured the distance between the end state and the target state. (b) The cumulative reward versus the number of episodes. (c) The center-out reaching trajectories generated by the trained model.