Research Article
EAQR: A Multiagent Q-Learning Algorithm for Coordination of Multiple Agents
1: for each agent i, do | 2: initialize with a number within (0,1) for , | 3: initialize with a number within (0,1) | 4: : frequency of getting the maximum global immediate reward after selecting action | 5: : number of sample games played | 6: repeat for each game | 7: select an action with the probability of | | 8: | 9: execute action , update information about reward | 10: if then | 11: for each action do | 12: evaluate according to (4) | 13: | 14: end for each action | 15: | 16: end if | 17: until the predefined number of games have been played | 18: end for each agent | 19: return Q-value function for each agent |
|