Research Article

Learning Attentional and Gated Communication via Curiosity

Figure 1

The detailed architecture of IMMAC. At time step , agent gets local observation and shares observed information to other agents, then receives the integrated messages from communication channel and produces action for interacting with environment. More particular, the policy network takes the local observation and aggregated message as input and outputs the action values for available actions. Intrinsic value network takes as input and outputs an observation-dependent value , which is used to distinguish important local observations.