Research Article
A Case Study on Air Combat Decision Using Approximated Dynamic Programming
Algorithm 2
Rollout decision procedure using approximated utility function.
Rollout_ADP_Policy() | Input variables: | (1) : the current state; | (2) : the number of rollout steps; | (3) : utility function approximated in ADP_Learn algorithm; | (4) : red plane’s policy derived from , see (8). | Output variables: | (1) : the best action respect to current state . | Local variables: | (1) : to cache the maximum utility responding to different actions; | (2) : the cache the next state computed by system equation. | Code: | (1) ; | (2) ; | (3) FOR , DO: | (4) ; | (5) FOR , DO: | (6) ; | (7) END | (8) IF , THEN: | (9) ; | (10) ; | (11) END | (12) END | (13) RETURN ; |
|