Learning Diverse Policies with Soft Self-Generated Guidance

<div>Learning curves of average return and success rate in swimmer maze. Specially, the success rate is used to illustrate the frequency at which agents reach the globally optimal goal during training process.</div>

International Journal of Intelligent Systems

fig7

Figure 7

Figure 7: Learning Diverse Policies with Soft Self-Generated Guidance