| Parameter | Value | Parameter | Value |
| Inner radius (m) | 600 | Total episodes | | Outer radius (m) | 1000 | Each episode’s time (s) | 120 | Adjust factor | 0.05 | Time step (s) | 1.0 | Return discount factor | 0.95 | Episode number to calculate the average total reward | 100 | Update period of the target network | 1000 | ’s mean value and variance | (, 0.8) | Exploration probability | 0.1 | ’s mean value and variance | (0.0, 1.0) | Number of followers | 2 | ’s mean value and variance | (0.0, 1.0) | Capacity of experience replay pool | 105 | ’s mean value and variance | (0.0, 1.0) | Mini-batch size | 32 | ’s mean value and variance | (0.0, 1.0) | Learning rate | 0.01 | ’s mean value and variance | (0.0, 1.0) |
|
|