Research Article
Deep Q-Network with Predictive State Models in Partially Observable Domains
Table 2
The best mean reward of three methods.
| | CartPole-v1 | Swimmer-v1 | Reacher-v1 |
| DRQN | 189 | 30.23 | −11.25 | DQN-1frame | 46.76 | 23.78 | −17.78 | RPSR-DQN | 194 | 38.51 | −9.24 | RPSP | 116 | 21.32 | −70.23 |
|
|