Research Article

Efficient Actor-Critic Algorithm with Hierarchical Model Learning and Planning

Table 1

Parameters settings of RAC-HMLP and AC-HMLP.

ParameterSymbolValue

Time step0.1
Discount factor0.9
Trace-decay rate 0.9
Exploration variance1
Learning rate of the actor0.5
Learning rate of the critic0.4
Learning rate of the model0.5
Error threshold0.15
Capacity of the memory100
Number of the nearest samples9
Local planning times 30
Global planning times300
Number of components of the stateK2
Regularization parameter of the model0.2
Regularization parameter of the critic 0.01
Regularization parameter of the actor0.001