Research Article

Efficient Actor-Critic Algorithm with Hierarchical Model Learning and Planning

Algorithm 1

S-AC.
Input: , , , ,
(1) Initialize ,
(2) Loop
(3) , , , ,
(4)   Repeat all episodes
(5)   Choose according to
(6)   
(7)   Execute and observe and
(8)   Update the eligibility of the value function:
(9)   Compute the TD error:
(10)   Update the parameter of the value function:
(11)    Update the parameter of the policy:
(12)    
(13)    End Repeat
(14) End Loop
Output: ,