Research Article
Deep Ensemble Reinforcement Learning with Multiple Deep Deterministic Policy Gradient Algorithm
Table 5
Comparison of aggregated policies with different numbers of subpolicies.
| Number of subpolicies | Training time (hours) | Total steps | Total reward | Pass Aalborg | Pass CG1 | Pass CG2 |
| 3 | 22.84 | 5000 | 331086.10 | Yes | Yes | Yes | 5 | 24.40 | 5000 | 360804.43 | Yes | Yes | Yes | 10 | 24.16 | 5000 | 303678.65 | Yes | Yes | Yes | 15 | 22.09 | 771 | 47121.87 | Yes | No | Yes | 20 | 20.49 | 567 | 34343.05 | Yes | No | Yes | 30 | 21.74 | 1541 | 97146.37 | Yes | No | Yes |
|
|