Research Article
Efficient Actor-Critic Algorithm with Hierarchical Model Learning and Planning
Table 1
Parameters settings of RAC-HMLP and AC-HMLP.
| Parameter | Symbol | Value |
| Time step | | 0.1 | Discount factor | | 0.9 | Trace-decay rate | | 0.9 | Exploration variance | | 1 | Learning rate of the actor | | 0.5 | Learning rate of the critic | | 0.4 | Learning rate of the model | | 0.5 | Error threshold | | 0.15 | Capacity of the memory | | 100 | Number of the nearest samples | | 9 | Local planning times | | 30 | Global planning times | | 300 | Number of components of the state | K | 2 | Regularization parameter of the model | | 0.2 | Regularization parameter of the critic | | 0.01 | Regularization parameter of the actor | | 0.001 |
|
|