Research Article
Model-Free Attitude Control of Spacecraft Based on PID-Guide TD3 Algorithm
(1) Randomly initialize critic networks , and actor network with weights , , | (2) Initialize target networks , , and with weights , , | (3) Initialize replay buffer | (4) fortodo | (1) Set the target state, randomly reset the environment, and get the initial state | (2) fortodo | (i) Select action according to the current policy and exploration noise, | (ii) Select another action according to PID controller | (iii) Execute action and observe reward and observe new state | (iv) Store transition tuple in | (v) If is terminal, reset environment state | (vi) If it is time to update, then randomly sample a mini batch of transitions from | (vii) Compute target action | (viii) Compute target value | (ix) Update critic networks | (x) If mod then | (a) Update by the deterministic policy gradient | | (b) Update target networks | | | end for | end for |
|