Research Article
Ensemble Network Architecture for Deep Reinforcement Learning
Table 1
The columns present the average performance of DQN, DSN, DDQN, EDQN, and TE-DQN after 10000 episodes, using -greedy policy with = 0.0001 after 10000 steps. The standard variation represents the variability over seven independent trials. Average performance improved with the number of averaged networks.
|