Computational Intelligence and Neuroscience

Research Article

Efficient Actor-Critic Algorithm with Hierarchical Model Learning and Planning

Parameters settings of RAC-HMLP and AC-HMLP.


Parameter	Symbol	Value

Time step		0.1
Discount factor		0.9
Trace-decay rate		0.9
Exploration variance		1
Learning rate of the actor		0.5
Learning rate of the critic		0.4
Learning rate of the model		0.5
Error threshold		0.15
Capacity of the memory		100
Number of the nearest samples		9
Local planning times		30
Global planning times		300
Number of components of the state	K	2
Regularization parameter of the model		0.2
Regularization parameter of the critic		0.01
Regularization parameter of the actor		0.001