Research Article
Reinforcement Learning for Computational Guidance of Launch Vehicle Upper Stage
| Parameter | Value | Parameter | Value |
| Shape reward coefficient | 0.01 | Epochs per update | 30 | Final reward coefficient | 1000 | Episodes per rollouts | 50 | Constant positive reward | 0.001 | Number of iterations | 10000 | Discount factor | 0.995 | Total episode | 500000 | GAE factor | 0.98 | | |
|
|