Complexity

Research Article

EAQR: A Multiagent Q-Learning Algorithm for Coordination of Multiple Agents

EAQR for repeated games.

1: for each agent i, do
2: initialize with a number within (0,1) for ,
3: initialize with a number within (0,1)
4: : frequency of getting the maximum global immediate reward after selecting action
5: : number of sample games played
6: repeat for each game
7: select an action with the probability of

8:
9: execute action , update information about reward
10: if then
11: for each action do
12: evaluate according to (4)
13:
14: end for each action
15:
16: end if
17: until the predefined number of games have been played
18: end for each agent
19: return Q-value function for each agent