Research Article

Model-Free Attitude Control of Spacecraft Based on PID-Guide TD3 Algorithm

Algorithm 1

PID-Guide TD3 Algorithm.
(1) Randomly initialize critic networks , and actor network with weights , ,
(2) Initialize target networks , , and with weights , ,
(3) Initialize replay buffer
(4) fortodo
 (1) Set the target state, randomly reset the environment, and get the initial state
 (2) fortodo
  (i) Select action according to the current policy and exploration noise,
  (ii) Select another action according to PID controller
  (iii) Execute action and observe reward and observe new state
  (iv) Store transition tuple in
  (v) If is terminal, reset environment state
  (vi) If it is time to update, then randomly sample a mini batch of transitions from
  (vii) Compute target action
  (viii) Compute target value
  (ix) Update critic networks
  (x) If mod then
   (a) Update by the deterministic policy gradient
   (b) Update target networks
  end for
end for