Research Article

Deep Ensemble Reinforcement Learning with Multiple Deep Deterministic Policy Gradient Algorithm

Table 1

Performance comparison of subpolicies and the aggregated policy.

PolicyEpisodesTotal rewardAverage reward

Subpolicy120720.6936.03
Subpolicy220538.2826.91
Subpolicy320463.9823.20
Aggregated policy20829.1741.46