Research Article

AIBPO: Combine the Intrinsic Reward and Auxiliary Task for 3D Strategy Game

Table 1

The network structure.

NN1ParameterNN2ParameterNN3Parameter

ConvConv (64, 3, 3)ConvConv (128,3,3)ConvConv (64, 3, 3)
ConvConv (32, 3, 3)ConvConv (32,3,3)ConvConv (32, 3, 3)
ActiveReLUActiveBNFcLinear (64, 3), reward prediction
TransformFlattenActiveReLUFcLinear (64, 3), reward prediction
FcLinear (288, 256)TransformFlattenLSTMHidden state 256, state value
FcLinear (256, 256)FcLinear (288, 256)FcLinear (256, 1), state value
ActorLinear (256, 4), aLSTMHidden state 256, action value
CriticLinear (256, 1), rFcLinear (256, 1), action value
CriticLinear (256, 1), r