Research Article
Efficient Actor-Critic Algorithm with Hierarchical Model Learning and Planning
Figure 4
Optimal policy and value function learned by AC-HMLP and RAC-HMLP.
(a) Optimal policy of AC-HMLP learned after training |
(b) Optimal value function of AC-HMLP learned after training |
(c) Optimal policy of RAC-HMLP learned after training |
(d) Optimal value function of RAC-HMLP learned after training |