Research Article

Intelligent Online Multiconstrained Reentry Guidance Based on Hindsight Experience Replay

Algorithm 1

Training of multiconstrained reentry guidance based on HER.
Randomly initialize parameters of actor network and critic network .
Initialize target actor and target critic with , .
Initialize basic replay buffer
for, do
 Initialize an HER replay buffer and a random process noise for exploration
 Initialize an initial goal randomly
 Run the Scenario Initialization Function, sample a basic state
for, do
  Combine basic state and goal to expanded state
  Sample an action from actor and noise:
  Execute the action in the Policy Step Function and observe a new state
  Combine basic new state and goal to expand new state
  Store the transition in and
  if the episode is done
   Judge whether the HER condition is met and record it
  end if
end for
if the HER condition is met
   for, in do
    Calculate and , ,
    Recalculate reward according to Equations (30)–(33)
    Combine basic states and goal to expand states:
    
    Store the transition in
   end for
   Clear data in
end if
for, do
   Sample a minibatch from the replay buffer
   Update critic by Equation (24), update actor by Equation (25)
   Update target network periodically: ,
end for
end for