A Novel Reinforcement Learning Architecture for Continuous State and Action Spaces

<table>Accumulated frequency: Comparison of the reliability of the policies found with the SARSA Actor-Actor-Critic algorithm and the <svg height="15.1125" id="M153" style="vertical-align:-2.3205pt" version="1.1" viewbox="0 0 34.325001 15.1125" width="34.325001" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<g transform="matrix(.017,-0,0,-.017,.062,12.162)"><path d="M745 361q0 -134 -83.5 -233t-214.5 -130l16 -11q97 -67 250 -132l-8 -23q-76 3 -131 16q-81 19 -242 125l-20 13q-129 8 -209 91t-80 208q0 160 116 271t289 111q136 0 226.5 -83t90.5 -223zM645 356q0 127 -57.5 201.5t-169.5 74.5q-126 0 -210.5 -104.5t-84.5 -248.5
q0 -97 46 -166.5t129 -87.5l84 15l29 -19q104 21 169 121.5t65 213.5z" id="x1D444"></path></g><g transform="matrix(.017,-0,0,-.017,13.118,12.162)"><path d="M300 -147l-18 -23q-106 71 -159 185.5t-53 254.5v1q0 139 53 252.5t159 186.5l18 -24q-74 -62 -115.5 -173.5t-41.5 -242.5q0 -130 41.5 -242.5t115.5 -174.5z" id="x28"></path></g><g transform="matrix(.017,-0,0,-.017,18.999,12.162)"><path d="M529 97q-70 -109 -136 -109q-41 0 -56 94q-23 144 -37 284q-38 -88 -99 -202.5t-93 -156.5q-26 -8 -76 -19l-9 21q71 78 145.5 193t124.5 232q-5 84 -15 128q-12 55 -29.5 75.5t-42.5 20.5q-21 0 -45 -13l-8 24q16 17 46 30t55 13q43 0 70 -46.5t40 -169.5
q27 -249 51 -392q7 -46 23 -46q24 0 70 60z" id="x1D706"></path></g><g transform="matrix(.017,-0,0,-.017,28.382,12.162)"><path d="M275 270q0 -296 -211 -440l-19 23q75 62 116.5 174t41.5 243t-42 243t-116 173l19 24q211 -144 211 -440z" id="x29"></path></g>
</svg>-learning algorithm.</table>

Advances in Artificial Intelligence

fig10

Figure 10

Figure 10: A Novel Reinforcement Learning Architecture for Continuous State and Action Spaces