Research Article

A Case Study on Air Combat Decision Using Approximated Dynamic Programming

Algorithm 1

Utility function approximating based on sampled states.
ADP_Learn()
Input variables:
(1) : sampled states set;
(2): the number of learning round.
(3): blue plane’s policy derived from Min-Max approach.
Output variables:
(1) : utility function approximated.
Local variables:
(2): action vector of blue plane derived from Min-Max policy;
(3): action vector of red plane derived from current utility;
(4): one step improved utility function by Bellman iteration;
(5): features vector used to compute approximation coefficients;
(6): vector of approximation coefficients.
Code:
(1) ;
(2)FOR , DO:
(3);
(4)
(5)    ;
(6)
(7)
(8)
(9)END
(10) RETURN ;