Research Article

Deep Ensemble Reinforcement Learning with Multiple Deep Deterministic Policy Gradient Algorithm

Table 5

Comparison of aggregated policies with different numbers of subpolicies.

Number of subpoliciesTraining time (hours)Total stepsTotal rewardPass AalborgPass CG1Pass CG2

322.845000331086.10YesYesYes
524.405000360804.43YesYesYes
1024.165000303678.65YesYesYes
1522.0977147121.87YesNoYes
2020.4956734343.05YesNoYes
3021.74154197146.37YesNoYes