Multiagent Reinforcement Learning for Task Offloading of Space/Aerial-Assisted Edge Computing

Algorithm 1

MADDPG algorithm for task offloading in SAGIN.
(2)Randomly initialize critic network and actor with weights and
(3)Initialize target network and with weights and
(4)Empty replay buffer
(5)for episode do
(6)   Initialize a Gaussian noise with mean = 0;
(7)   Receive initial observation state ;
(8)   for time slot do
(9)    Select action according to the current policy and exploration noise
(10)    Execute action and observe the reward , and the next state
(11)    Collect the global state , and the action ;
(12)    Store transition in ;
(13)    Sample a random mini-batch of transitions from ;
(14)    Set ;
(15)    Update the critic network by minimize the loss
(16)    Update the actor policy by using the sampled policy gradient
(17)    Update the target networks for each agent :
     and ;
(18)   end