Research Article
Autonomous Maneuver Decision of UCAV Air Combat Based on Double Deep Q Network Algorithm and Stochastic Game Theory
Algorithm 1
Minimax-DDQN algorithm pseudocode.
Algorithm: UCAV air combat maneuver based on Minimax-DDQN algorithm | 1. Initialize the memory replay unit and limit of memory storage. | 2. Initialize online Q network and target Q network, and randomly generate parameters and . | 3. fordo | 4. Initialize the UCAV situation information of the enemy and our side to observe the current state | 5. fordo | 6. Select one of the seven moves in the basic maneuver library at random with the probability of | 7. Otherwise, select action | 8. Execute the action | 9. Observe reward and next state are obtained | 10. Stored data sample is in | 11. end for | 12. Take out a group of sample from randomly. | 13. Set | 14. Update the online network using gradient descent based on the loss function | 15. Update the target Q network every turn, set parameter , and gradually decrease value until 0.1. | 16. end for |
|