Research Article
Model-Free Attitude Control of Spacecraft Based on PID-Guide TD3 Algorithm
| Hyperparameters | Symbol | Value |
| Random seed | ā | 2 | Max episodes | | 400 | Max steps per episode | | 200 | Sample time | | 1 | Replay buffer size | | 106 | Batch size | | 250 | Policy network learning rate | | 0.0003 | Critic network learning rate | | 0.001 | Exploration noise scale | | 0.1 | Delay update | | 3 | Discount factor | | 0.99 | Soft update rate | | 0.01 |
|
|