Research Article
A Case Study on Air Combat Decision Using Approximated Dynamic Programming
Algorithm 1
Utility function approximating based on sampled states.
ADP_Learn() | Input variables: | (1) : sampled states set; | (2) : the number of learning round. | (3) : blue plane’s policy derived from Min-Max approach. | Output variables: | (1) : utility function approximated. | Local variables: | (2) : action vector of blue plane derived from Min-Max policy; | (3) : action vector of red plane derived from current utility; | (4) : one step improved utility function by Bellman iteration; | (5) : features vector used to compute approximation coefficients; | (6) : vector of approximation coefficients. | Code: | (1) ; | (2) FOR , DO: | (3) ; | (4) | (5) ; | (6) | (7) | (8) | (9) END | (10) RETURN ; |
|