Research Article

AIBPO: Combine the Intrinsic Reward and Auxiliary Task for 3D Strategy Game

Figure 5

Experimental results of the IBPO in the pathfinding scenario (the y-axis represents the average reward value obtained by the DRL agents in the pathfinding scenario and the x-axis represents the timestep during training; the greater the reward, the better the method).