Research Article

Efficient Actor-Critic Algorithm with Hierarchical Model Learning and Planning

Algorithm 2

AC-HMLP algorithm.
Input: , , , , , , , , , , ,
(1) Initialize: , , , ,
(2) Loop
(3)   , , ,
(4)   Repeat all episodes
(5)   Choose according to
(6)   Execute the action:
(7)   Observe the reward and the next state:
    % Update the global model
(8)   Predict the next state:
             
(9)   Predict the reward:
(10)   Update the parameters :
             
(11)   Update the parameter :
    % Update the local model
(12)   If  
(13)    Insert the real sample into the memory
(14)   Else If
(15)      Replace the oldest one in with the real sample
(16)   End if
(17)   Select L-nearest neighbors of the current state from to construct and
(18)   Predict the next state and the reward:
(19)   Update the parameter :
(20)   Compute the local error:
    
(21)   If  
(22)    Call Local-model planning () (Algorithm 3)
(23)   End If
    % Update the value function
(24)   Update the eligibility:
(25)   Estimate the TD error:
(26)   Update the value-function parameter:
    % Update the policy
(27)   Update the policy parameter:
(28)   
(29)   Update the number of samples:
(30)   Until the ending condition is satisfied
(31)   Call Global-model planning () (Algorithm 4)
(32) End Loop
Output: ,