Research Article

Intelligent Online Multiconstrained Reentry Guidance Based on Hindsight Experience Replay

Table 4

The hyperparameters in the training.

HyperparameterDDPGPPODDPG+HER

Discount factor 0.990.990.99
Batch size646464
Replay buffer size2000020000
Actor learning rate10−410−310−4
Critic learning rate10−310−310−3
Target update rate 0.0010.001
Maximum number of steps100010001000
Exploration policyOU
GAE factor0.98
Clip factor0.2