Research Article
Learning Diverse Policies with Soft Self-Generated Guidance
Figure 8
Learning curves of average return and success rate in ant maze. Specially, the success rate is used to illustrate the frequency at which agents reach the globally optimal goal during the training process.
(a) |
(b) |