Research Article
Intelligent Online Multiconstrained Reentry Guidance Based on Hindsight Experience Replay
Table 4
The hyperparameters in the training.
| Hyperparameter | DDPG | PPO | DDPG+HER |
| Discount factor | 0.99 | 0.99 | 0.99 | Batch size | 64 | 64 | 64 | Replay buffer size | 20000 | — | 20000 | Actor learning rate | 10−4 | 10−3 | 10−4 | Critic learning rate | 10−3 | 10−3 | 10−3 | Target update rate | 0.001 | — | 0.001 | Maximum number of steps | 1000 | 1000 | 1000 | Exploration policy | OU | — | | GAE factor | — | 0.98 | — | Clip factor | — | 0.2 | — |
|
|