Research Article

Learning Diverse Policies with Soft Self-Generated Guidance

Figure 4

Learning curves of average return and success rate in the huge gridworld with sparse rewards. Specially, the success rate is used to illustrate the frequency at which agents reach the globally optimal goal during training process.
(a)
(b)