Research Article
EAQR: A Multiagent Q-Learning Algorithm for Coordination of Multiple Agents
Table 5
Average cumulative reward for the DSN problem (evaluation episodes = 5000).
| | = 10,000 | = 50,000 | = 100,000 |
| EAQR | 41.23 ± 0.52 | 41.99 ± 0.003 | 42 ± 0 | WoLF-PHC | 39.96 ± 0.89 | 40.69 ± 0.73 | 40.74 ± 0.68 | EMA Q-learning | 36.59 ± 1.76 | 36.14 ± 1.82 | 36.21 ± 1.80 | Single-agent RL | 29.88 ± 1.57 | 33.16 ± 1.33 | 34.96 ± 1.05 |
|
|