Research Article
Efficient Actor-Critic Algorithm with Hierarchical Model Learning and Planning
Algorithm 4
Global model planning (
).
(1) Loop for times | (2) , | (3) Repeat all episodes | (4) Choose according to | (5) Compute exploration term: | (6) Predict the next state: | | (7) Predict the reward: | (8) Update the eligibility: | (9) Compute the TD error: | (10) Update the value function parameter: | (11) Update the policy parameter: | (12) If | (13) | (14) End If | (15) Update the number of samples: | (16) End Repeat | (17) End Loop | Output: , |
|