Research Article

A Deep -Network-Based Collaborative Control Research for Smart Ammunition Formation

Algorithm 2

DQN algorithm.
1: Initialize -network randomly with weights .
2: Initialize the target network with weights .
3: Initialize experience pool to capacity , greedy probability , size of minibatch samples , discount factor , learning rate , and update period of the target network.
4: Initialize the planned operation time and time interval and calculate with .
5: fordo
6: Initialize the SAF state according to the system’s initial characteristics.
7: whiledo
8:  Select followers’ action according to -imitation policy.
9:  Perform action on the SAF system, calculate Equations (2) and (3) with the fourth-order Runge-Kutta method, get the system state at next time, and observe reward .
10:  Store transition in experience pool.
11:  Randomly sample a minibatch of transitions from the experience pool.
12:  Train the network and update the parameters using Variable Learning Rate Gradient Descent.
13:  Every steps update .
14: end while
15: end for