Research Article

Efficient Actor-Critic Algorithm with Hierarchical Model Learning and Planning

Algorithm 4

Global model planning ().
(1) Loop for times
(2) ,
(3) Repeat all episodes
(4)   Choose according to
(5)   Compute exploration term:
(6)   Predict the next state:
           
(7)   Predict the reward:
(8)   Update the eligibility:
(9)   Compute the TD error:
(10)   Update the value function parameter:
(11)   Update the policy parameter:
(12)   If  
(13)      
(14)   End If
(15)   Update the number of samples:
(16) End Repeat
(17) End Loop
Output: ,