Research Article

Ensemble Network Architecture for Deep Reinforcement Learning

Figure 2

Training curves tracking the agent’s average score and average predicted action-value. (a) Performance comparison of all algorithms in terms of the average reward on each task. (b) Average predicted action-value on a held-out set of states on each task. Each point on the curve is the average of the action-value computed over the held-out set of states. (c) The performance of DQN and TEDQN on each task. The darker line shows the average scores of each algorithm, and the orange shaded area shows the two extreme values of DQN and the green shaded area shows TE DQN.