Research Article

Efficient Actor-Critic Algorithm with Hierarchical Model Learning and Planning

Algorithm 3

Local model planning ().
(1) Loop for time steps
(2)   ,
(3)   Choose according to
(4)   
(5)   Predict the next state and the reward:
(6)   Update the eligibility:
(7)   Compute the TD error:
(8)   Update the value function parameter:
(9)   Update the policy parameter:
(10)   If  
(11)    
(12)   End If
(13)   Update the number of samples:
(14) End Loop
Output: ,