Research Article
Efficient Actor-Critic Algorithm with Hierarchical Model Learning and Planning
Input: , , , , | (1) Initialize , | (2) Loop | (3) , , , , | (4) Repeat all episodes | (5) Choose according to | (6) | (7) Execute and observe and | (8) Update the eligibility of the value function: | (9) Compute the TD error: | (10) Update the parameter of the value function: | (11) Update the parameter of the policy: | (12) | (13) End Repeat | (14) End Loop | Output: , |
|