Research Article
Intelligent Online Multiconstrained Reentry Guidance Based on Hindsight Experience Replay
Algorithm 1
Training of multiconstrained reentry guidance based on HER.
Randomly initialize parameters of actor network and critic network . | Initialize target actor and target critic with , . | Initialize basic replay buffer | for, do | Initialize an HER replay buffer and a random process noise for exploration | Initialize an initial goal randomly | Run the Scenario Initialization Function, sample a basic state | for, do | Combine basic state and goal to expanded state | Sample an action from actor and noise: | Execute the action in the Policy Step Function and observe a new state | Combine basic new state and goal to expand new state | Store the transition in and | if the episode is done | Judge whether the HER condition is met and record it | end if | end for | if the HER condition is met | for, in do | Calculate and , , | Recalculate reward according to Equations (30)–(33) | Combine basic states and goal to expand states: | | Store the transition in | end for | Clear data in | end if | for, do | Sample a minibatch from the replay buffer | Update critic by Equation (24), update actor by Equation (25) | Update target network periodically: , | end for | end for |
|