Ensemble Network Architecture for Deep Reinforcement Learning

<table class="fixed-width table-group" id="tab1"><tr><td><table class="table"><colgroup><col style="width:10.19em"/><col style="width:6.11em"/><col style="width:8.04em"/><col style="width:7.80em"/></colgroup><tr><td class="thead-hr" colspan="4"><hr/></td></tr><tr class="thead"><td class="align_left">Task<br/>(AVG score, Std.)</td><td class="align_center">CartPole-v0</td><td class="align_center">MountainCar-v0</td><td class="align_center">LunarLander-v2</td></tr><tr><td class="thead-hr" colspan="4"><hr/></td></tr><tr><td class="align_left">DQN</td><td class="align_center">(264.9, 21.7)</td><td class="align_center">(−148.2, 17.4)</td><td class="align_center">(159.3, 16.7)</td></tr><tr><td class="align_left">DSN</td><td class="align_center">(167.1, 61.6)</td><td class="align_center">(−137.7, 53.9)</td><td class="align_center">(153.9, 25.2)</td></tr><tr><td class="align_left">Double DQN</td><td class="align_center">(278.2, 31.8)</td><td class="align_center">(−144.2, 16.8)</td><td class="align_center">(135.8, 11.8)</td></tr><tr><td class="align_left">TE DQN <svg height="8.8423pt" id="M139" style="vertical-align:-0.2064009pt" version="1.1" viewbox="-0.0498162 -8.6359 31.1382 8.8423" width="31.1382pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M743 650H503L496 622L527 618C563 613 564 603 532 573C449 495 371 431 323 392C301 374 272 355 246 346L280 522C297 609 300 614 379 622L385 650H135L129 622C209 614 215 609 198 522L124 133C106 39 99 35 23 28L17 0H271L277 28C193 35 192 39 208 133L239 316C264 328 280 325 303 288C368 183 435 90 502 0H652L659 28C602 34 584 43 543 94C495 154 403 283 347 369L574 554C634 603 659 612 735 624L743 650Z" id="g113-76"></path></g><g transform="matrix(.013,0,0,-0.013,13.449,0)"><path d="M535 323V373H52V323H535ZM535 138V188H52V138H535Z" id="g117-34"></path></g><g transform="matrix(.013,0,0,-0.013,24.712,0)"><path d="M285 378C315 398 338 416 353 432C373 451 384 474 384 503C384 579 325 635 236 635H235C182 635 136 610 108 579L65 516L85 496C110 533 150 575 205 575C258 575 300 543 300 481C300 407 232 369 141 339L147 310C163 315 188 321 211 321C268 321 338 284 338 192C338 94 288 40 217 40C160 40 119 68 93 91C85 98 77 97 69 91C60 84 47 71 46 58C44 46 48 35 62 22C75 10 116 -12 162 -12C234 -12 424 62 424 224C424 297 373 359 285 376V378Z" id="g113-52"></path></g></svg></td><td class="align_center">(299.1, 1.3)</td><td class="align_center">(−115.6, 21.4)</td><td class="align_center">(186.9, 19.1)</td></tr><tr><td class="align_left">TE DQN <svg height="8.8423pt" id="M140" style="vertical-align:-0.2064009pt" version="1.1" viewbox="-0.0498162 -8.6359 31.1382 8.8423" width="31.1382pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M743 650H503L496 622L527 618C563 613 564 603 532 573C449 495 371 431 323 392C301 374 272 355 246 346L280 522C297 609 300 614 379 622L385 650H135L129 622C209 614 215 609 198 522L124 133C106 39 99 35 23 28L17 0H271L277 28C193 35 192 39 208 133L239 316C264 328 280 325 303 288C368 183 435 90 502 0H652L659 28C602 34 584 43 543 94C495 154 403 283 347 369L574 554C634 603 659 612 735 624L743 650Z" id="g113-76"></path></g><g transform="matrix(.013,0,0,-0.013,13.449,0)"><path d="M535 323V373H52V323H535ZM535 138V188H52V138H535Z" id="g117-34"></path></g><g transform="matrix(.013,0,0,-0.013,24.712,0)"><path d="M137 343C167 482 260 545 321 574C357 591 397 603 429 609L423 641C382 634 335 622 295 608C189 570 37 457 37 238C37 84 125 -12 242 -12C362 -12 447 89 447 209C447 311 374 393 267 393C247 393 226 386 204 376L137 343ZM227 337C318 337 361 256 361 173C361 105 336 22 258 22C176 22 126 120 126 240C126 266 127 291 132 310C155 323 189 337 227 337Z" id="g113-55"></path></g></svg></td><td class="align_center">(300, 0)</td><td class="align_center">(−108.4, 11.9)</td><td class="align_center">(204.4, 13.5)</td></tr><tr class="table-tr"><td colspan="4"><hr class="tbody-hr"/></td></tr></table></td></tr></table>

<div>The columns present the average performance of DQN, DSN, DDQN, EDQN, and TE-DQN after 10000 episodes, using <svg height="6.1673pt" id="M137" style="vertical-align:-0.2063904pt" version="1.1" viewbox="-0.0498162 -5.96091 5.44961 6.1673" width="5.44961pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M387 375C387 402 357 448 257 448C172 448 82 404 82 326C82 289 108 255 156 241V239C85 223 23 181 23 116C23 39 89 -12 182 -12C265 -12 336 31 378 91L361 114C320 73 269 47 216 47C157 47 115 82 115 137C115 191 160 219 218 219C243 219 262 218 272 217L304 259L302 266C295 265 281 264 255 264C195 264 163 294 163 335C163 377 200 416 249 416C293 416 321 389 329 342C331 332 335 329 341 329C355 329 387 352 387 375Z" id="g113-247"></path></g></svg>-greedy policy with <svg height="6.1673pt" id="M138" style="vertical-align:-0.2063904pt" version="1.1" viewbox="-0.0498162 -5.96091 5.44961 6.1673" width="5.44961pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M387 375C387 402 357 448 257 448C172 448 82 404 82 326C82 289 108 255 156 241V239C85 223 23 181 23 116C23 39 89 -12 182 -12C265 -12 336 31 378 91L361 114C320 73 269 47 216 47C157 47 115 82 115 137C115 191 160 219 218 219C243 219 262 218 272 217L304 259L302 266C295 265 281 264 255 264C195 264 163 294 163 335C163 377 200 416 249 416C293 416 321 389 329 342C331 332 335 329 341 329C355 329 387 352 387 375Z" id="g113-247"></path></g></svg> = 0.0001 after 10000 steps. The standard variation represents the variability over seven independent trials. Average performance improved with the number of averaged networks.</div>

Mathematical Problems in Engineering

Ensemble Network Architecture for Deep Reinforcement Learning

Table 1