Research Article
EAQR: A Multiagent Q-Learning Algorithm for Coordination of Multiple Agents
Table 1
Average steps for 4-agent/12-vertex box-pushing (evaluation episodes = 50,000).
| | = 100,000 | = 500,000 | = 1000,000 |
| Optimal | 1.71 | 1.71 | 1.71 | EAQR | 2.53 ± 0.11 | 1.76 ± 0.03 | 1.74 ± 0.02 | WoLF-PHC | 2.83 ± 0.23 | 2.24 ± 0.11 | 1.99 ± 0.06 | EMA Q-learning | 4.53 ± 0.49 | 3.66 ± 0.40 | 3.47 ± 0.34 | Single-agent RL | 14.78 ± 0.60 | 3.29 ± 0.14 | 2.03 ± 0.06 |
|
|