Research Article
AIBPO: Combine the Intrinsic Reward and Auxiliary Task for 3D Strategy Game
Figure 2
The structure of the IBPO (in subfigure (a), there are two channels: one is to output the target vector and the other is to output the prediction vector, which forms the intrinsic reward through the combination of them; subfigure (b) is the reward integration of the two rewards): (a) intrinsic reward generation module; (b) differentiated reward integration.
(a) |
(b) |