Research Article

Deep Q-Network with Predictive State Models in Partially Observable Domains

Table 2

The best mean reward of three methods.

CartPole-v1Swimmer-v1Reacher-v1

DRQN18930.23−11.25
DQN-1frame46.7623.78−17.78
RPSR-DQN19438.51−9.24
RPSP11621.32−70.23