A Novel Reinforcement Learning Architecture for Continuous State and Action Spaces

<table> Learning curves of the SARSA <svg height="16.1" id="M139" style="vertical-align:-0.20473pt" version="1.1" viewbox="0 0 29.512501 16.1" width="29.512501" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<g transform="matrix(.017,-0,0,-.017,.062,15.775)"><path d="M673 0h-245v28q47 6 58 16t1 39l-55 154h-213q-31 -87 -36 -129q-16 -47 -5 -61t68 -19v-28h-226v28q50 7 68.5 23.5t43.5 80.5l214 524l24 9l193 -536q23 -63 41.5 -79.5t68.5 -21.5v-28zM418 280l-81 255h-2q-74 -197 -101 -255h184z" id="x41"></path></g>
<g transform="matrix(.012,-0,0,-.012,11.813,7.613)"><path d="M412 140l28 -9q0 -2 -35 -131h-373v23q112 112 161 170q59 70 92 127t33 115q0 63 -31 98t-86 35q-75 0 -137 -93l-22 20l57 81q55 59 135 59q69 0 118.5 -46.5t49.5 -122.5q0 -62 -29.5 -114t-102.5 -130l-141 -149h186q42 0 58.5 10.5t38.5 56.5z" id="x32"></path></g>
<g transform="matrix(.017,-0,0,-.017,18.137,15.775)"><path d="M614 175l29 -10q-33 -109 -57 -154q-121 -26 -184 -26q-90 0 -160.5 29t-112.5 77t-63.5 105.5t-21.5 119.5q0 157 108 253t277 96q36 0 71.5 -5t69 -13.5t36.5 -8.5q15 -102 20 -150l-29 -8q-20 79 -66.5 114t-128.5 35q-119 0 -187.5 -86t-68.5 -207
q0 -140 73.5 -227.5t188.5 -87.5q73 0 119.5 37.5t86.5 116.5z" id="x43"></path></g>
</svg> algorithm using three different function approximators: radial basis functions, multilayer perceptrons, and multilayer perceptrons with a layer of radial basis functions.</table>

Advances in Artificial Intelligence

fig8

Figure 8

Figure 8: A Novel Reinforcement Learning Architecture for Continuous State and Action Spaces