Emergence of Prediction by Reinforcement Learning Using a Recurrent Neural Network
Figure 5
Change in -values in an episode. Four lines show the change in -values for the actions, “catch”, “wait”, “move up”, and “move down”. If the action is selected greedily, the action with the maximum -value is chosen.