Research Article

EAQR: A Multiagent Q-Learning Algorithm for Coordination of Multiple Agents

Table 5

Average cumulative reward for the DSN problem (evaluation episodes = 5000).

 = 10,000 = 50,000 = 100,000

EAQR41.23 ± 0.5241.99 ± 0.00342 ± 0
WoLF-PHC39.96 ± 0.8940.69 ± 0.7340.74 ± 0.68
EMA Q-learning36.59 ± 1.7636.14 ± 1.8236.21 ± 1.80
Single-agent RL29.88 ± 1.5733.16 ± 1.3334.96 ± 1.05