Research Article

Efficient Actor-Critic Algorithm with Hierarchical Model Learning and Planning

Figure 4

Optimal policy and value function learned by AC-HMLP and RAC-HMLP.
(a) Optimal policy of AC-HMLP learned after training
(b) Optimal value function of AC-HMLP learned after training
(c) Optimal policy of RAC-HMLP learned after training
(d) Optimal value function of RAC-HMLP learned after training