International Journal of Aerospace Engineering

Research Article

UAV Swarm Confrontation Using Hierarchical Multiagent Reinforcement Learning

The high-level policy training in h-MADDPG.

Input: Pretrained low-level policies for all agents
Output: model
1: Randomly initialize the high-level networks and critic networks
2: for each episode do
3: Get local observation and global state
4:
5: whiledo
6: Select macro actions for all agents, where
7: fordo
8: Select primitive actions conditioned on the macro actions , where are the intrinsic observations
9: Execute primitive actions
10: Observe new intrinsic observations and receive extrinsic rewards
11: end for
12: Get new local observation and new global state
13:
14: Store transition in
15: Sample a random minibatch of M transitions from
16: Update the parameters of and according to Equation (6) and (7)
17:
18: end while
19: end for