Research Article

Autonomous Maneuver Decision of UCAV Air Combat Based on Double Deep Q Network Algorithm and Stochastic Game Theory

Algorithm 1

Minimax-DDQN algorithm pseudocode.
Algorithm: UCAV air combat maneuver based on Minimax-DDQN algorithm
1. Initialize the memory replay unit and limit of memory storage.
2. Initialize online Q network and target Q network, and randomly generate parameters and .
3. fordo
4.  Initialize the UCAV situation information of the enemy and our side to observe the current state
5.   fordo
6.    Select one of the seven moves in the basic maneuver library at random with the probability of
7.    Otherwise, select action
8.    Execute the action
9.    Observe reward and next state are obtained
10.    Stored data sample is in
11.   end for
12.  Take out a group of sample from randomly.
13. Set
14. Update the online network using gradient descent based on the loss function
15. Update the target Q network every turn, set parameter , and gradually decrease value until 0.1.
16. end for