Research Article

Reinforcement Learning for Computational Guidance of Launch Vehicle Upper Stage

Table 2

Hyperparameters.

ParameterValueParameterValue

Shape reward coefficient0.01Epochs per update30
Final reward coefficient1000Episodes per rollouts50
Constant positive reward0.001Number of iterations10000
Discount factor0.995Total episode500000
GAE factor0.98