Research Article
Efficient Actor-Critic Algorithm with Hierarchical Model Learning and Planning
(1) Loop for time steps | (2) , | (3) Choose according to | (4) | (5) Predict the next state and the reward: | (6) Update the eligibility: | (7) Compute the TD error: | (8) Update the value function parameter: | (9) Update the policy parameter: | (10) If | (11) | (12) End If | (13) Update the number of samples: | (14) End Loop | Output: , |
|