Research Article

EAQR: A Multiagent Q-Learning Algorithm for Coordination of Multiple Agents

Table 1

Average steps for 4-agent/12-vertex box-pushing (evaluation episodes = 50,000).

 = 100,000 = 500,000 = 1000,000

Optimal1.711.711.71
EAQR2.53 ± 0.111.76 ± 0.031.74 ± 0.02
WoLF-PHC2.83 ± 0.232.24 ± 0.111.99 ± 0.06
EMA Q-learning4.53 ± 0.493.66 ± 0.403.47 ± 0.34
Single-agent RL14.78 ± 0.603.29 ± 0.142.03 ± 0.06