Research Article

Coordinated Learning by Model Difference Identification in Multiagent Systems with Sparse Interactions

Algorithm 2

Model-based difference degree learning approach for agent .
Input: Learning rate , discount factor , learning exploration factor , Individual original MDP , Individual
optimal -values of agent i, threshold value proportion , integer for Monte Carlo sampling, time limit for
per episode learning
()   Initialize with individual optimal policy of agent , initialize to ;
()   Identify coordinated states for agent calling Algorithm 1;
()   Initialize local state for agent , check whether initial states is in coordination;
()   for do
()      observe current local state for agent ;
()      ;
()      if agent , agent is in coordination at time then
()       select according to using ;
()      else
()    select according to using ;
()   end if
()   receive reward and transition state for each agent ;
()   if agent , is part of an augmented coordinated state and is included in the new global
     state then
()    if is not in state space then
()     extend to include joint state and all the available actions pair ;
()     ;
()    end if
()    mark agent is in coordination at time and coordinated states for agent is ;
()   end if
()   if agent , agent is in coordination at time then
()    if agent is in coordination at time then
()     Update according to (5);
()    else
()     Update according to (6);
()    end if
()   else
()    if agent is in coordination at time then
()     Update according to (7);
()    else
()     Update according to (8);
()    end if
()   end if
()   , , ;
()   if is a terminal state then return;
() end for