A Deep -Network-Based Collaborative Control Research for Smart Ammunition Formation
Algorithm 2
DQN algorithm.
1: Initialize -network randomly with weights .
2: Initialize the target network with weights .
3: Initialize experience pool to capacity , greedy probability , size of minibatch samples , discount factor , learning rate , and update period of the target network.
4: Initialize the planned operation time and time interval and calculate with .
5: fordo
6: Initialize the SAF state according to the system’s initial characteristics.
7: whiledo
8: Select followers’ action according to -imitation policy.
9: Perform action on the SAF system, calculate Equations (2) and (3) with the fourth-order Runge-Kutta method, get the system state at next time, and observe reward .
10: Store transition in experience pool.
11: Randomly sample a minibatch of transitions from the experience pool.
12: Train the network and update the parameters using Variable Learning Rate Gradient Descent.