Ensemble Network Architecture for Deep Reinforcement Learning

<div>Training curves tracking the agent’s average score and average predicted action-value. (a) Performance comparison of all algorithms in terms of the average reward on each task. (b) Average predicted action-value on a held-out set of states on each task. Each point on the curve is the average of the action-value <svg height="10.7866pt" id="M136" style="vertical-align:-2.150701pt" version="1.1" viewbox="-0.0498162 -8.6359 9.52083 10.7866" width="9.52083pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M699 368C699 549 574 666 407 666C186 666 23 488 23 277C23 113 129 -3 288 -13L307 -26C431 -111 501 -139 533 -147C559 -154 613 -163 658 -164L666 -141C597 -111 507 -66 430 -11L416 -1C580 42 699 190 699 368ZM601 371C601 227 518 54 381 22L354 40L278 24C175 47 120 145 120 269C120 451 235 631 398 631C540 631 601 521 601 371Z" id="g113-82"></path></g></svg> computed over the held-out set of states. (c) The performance of DQN and TEDQN on each task. The darker line shows the average scores of each algorithm, and the orange shaded area shows the two extreme values of DQN and the green shaded area shows TE DQN.</div>

Mathematical Problems in Engineering

Ensemble Network Architecture for Deep Reinforcement Learning

Figure 2