Research Article

EAQR: A Multiagent Q-Learning Algorithm for Coordination of Multiple Agents

Algorithm 1

EAQR for repeated games.
1: for each agent i, do
2: initialize with a number within (0,1) for ,
3: initialize with a number within (0,1)
4: : frequency of getting the maximum global immediate reward after selecting action
5: : number of sample games played
6: repeat for each game
7:  select an action with the probability of
   
8:  
9: execute action , update information about reward
10:  if then
11:   for each action do
12:    evaluate according to (4)
13:    
14:   end for each action
15:   
16:  end if
17: until the predefined number of games have been played
18: end for each agent
19: return Q-value function for each agent