Emergence of Prediction by Reinforcement Learning Using a Recurrent Neural Network

<table>Change in <math id="M62" xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mi>Q</mi></mrow></math>-values in an episode. Four lines show the change in <math id="M63" xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mi>Q</mi></mrow></math>-values for the actions, “catch”, “wait”, “move up”, and “move down”. If the action is selected greedily, the action with the maximum <math id="M64" xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mi>Q</mi></mrow></math>-value is chosen.</table>

Journal of Robotics

fig5

Figure 5

Figure 5: Emergence of Prediction by Reinforcement Learning Using a Recurrent Neural Network