Research Article
Multiagent Reinforcement Learning for Task Offloading of Space/Aerial-Assisted Edge Computing
Algorithm 1
MADDPG algorithm for task offloading in SAGIN.
(1) | Initialization: | (2) | Randomly initialize critic network and actor with weights and | (3) | Initialize target network and with weights and | (4) | Empty replay buffer | (5) | for episode do | (6) | Initialize a Gaussian noise with mean = 0; | (7) | Receive initial observation state ; | (8) | for time slot do | (9) | Select action according to the current policy and exploration noise | (10) | Execute action and observe the reward , and the next state | (11) | Collect the global state , and the action ; | (12) | Store transition in ; | (13) | Sample a random mini-batch of transitions from ; | (14) | Set ; | (15) | Update the critic network by minimize the loss | | | (16) | Update the actor policy by using the sampled policy gradient | | ; | (17) | Update the target networks for each agent : | | and ; | (18) | end | (19) | end |
|