Research Article

EAQR: A Multiagent Q-Learning Algorithm for Coordination of Multiple Agents

Table 3

Maximal steps for 4-agent/12-vertex box-pushing (evaluation episodes = 50,000).

 = 100,000 = 500,000 = 1000,000

EAQR2.771.851.81
WoLF-PHC3.452.552.18
EMA Q-learning5.904.664.61
Single-agent RL15.893.662.20