AIBPO: Combine the Intrinsic Reward and Auxiliary Task for 3D Strategy Game

<div>Experimental results of the IBPO in the pathfinding scenario (the <i>y</i>-axis represents the average reward value obtained by the DRL agents in the pathfinding scenario and the <i>x</i>-axis represents the timestep during training; the greater the reward, the better the method).</div>

Complexity

fig5

Figure 5

Figure 5: AIBPO: Combine the Intrinsic Reward and Auxiliary Task for 3D Strategy Game