Research Article

A Novel Reinforcement Learning Architecture for Continuous State and Action Spaces

Table 1

Comparison of the best policies for the dribbling problem.

ā€‰SARSA ( )-learning

Algorithm type Actor-Critic ( )-learning
Function approx. RBFs CMACs
States Continuous Continuous
Actions Continuous Discrete
Total learning time 10 minutes 24 hours 30 minutes
Average distance 25.45 meters 29.21 meters
Maximum distance 36.23 meters 39.0 meters