Research Article
AIBPO: Combine the Intrinsic Reward and Auxiliary Task for 3D Strategy Game
Figure 1
The model of the IIML (the agent takes the action according to the policy to obtain the external reward information fed back by the external environment; meanwhile, internal reward information will be generated in the internal environment no matter whether the current state agent gets external reward or not).