Research Article

Emergence of Prediction by Reinforcement Learning Using a Recurrent Neural Network

Figure 5

Change in Q-values in an episode. Four lines show the change in Q-values for the actions, “catch”, “wait”, “move up”, and “move down”. If the action is selected greedily, the action with the maximum Q-value is chosen.
437654.fig.005