Computational Intelligence and Neuroscience

Research Article

Efficient Actor-Critic Algorithm with Hierarchical Model Learning and Planning

S-AC.

Input: , , , ,
(1) Initialize ,
(2) Loop
(3) , , , ,
(4) Repeat all episodes
(5) Choose according to
(6)
(7) Execute and observe and
(8) Update the eligibility of the value function:
(9) Compute the TD error:
(10) Update the parameter of the value function:
(11) Update the parameter of the policy:
(12)
(13) End Repeat
(14) End Loop
Output: ,