Research Article
Investigating the Effects of Hyperparameters in Quantum-Enhanced Deep Reinforcement Learning
Table 7
The reward and timestep of the agent for the last 200 episodes.
| Episode | 799 | 849 | 899 | 949 | 999 | Alpha |
| Reward | −0.24 | −0.24 | −0.26 | 0.95 | 0.95 | ←0.1 | 0.95 | 0.95 | 0.95 | 0.95 | 0.95 | ←0.2 | 0.95 | 0.95 | 0.95 | 0.95 | 0.95 | ←0.3 | 0.95 | 0.95 | 0.95 | 0.95 | 0.95 | ←0.4 |
| Timesteps | 8 | 8 | 7 | 6 | 6 | ←0.1 | 6 | 6 | 6 | 6 | 6 | ←0.2 | 6 | 6 | 6 | 6 | 6 | ←0.3 | 6 | 6 | 6 | 6 | 6 | ←0.4 |
|
|