Research Article

Ensemble Network Architecture for Deep Reinforcement Learning

Table 1

The columns present the average performance of DQN, DSN, DDQN, EDQN, and TE-DQN after 10000 episodes, using -greedy policy with = 0.0001 after 10000 steps. The standard variation represents the variability over seven independent trials. Average performance improved with the number of averaged networks.

Task
(AVG score, Std.)
CartPole-v0MountainCar-v0LunarLander-v2

DQN(264.9, 21.7)(−148.2, 17.4)(159.3, 16.7)
DSN(167.1, 61.6)(−137.7, 53.9)(153.9, 25.2)
Double DQN(278.2, 31.8)(−144.2, 16.8)(135.8, 11.8)
TE DQN (299.1, 1.3)(−115.6, 21.4)(186.9, 19.1)
TE DQN (300, 0)(−108.4, 11.9)(204.4, 13.5)