Security and Communication Networks

Research Article

Multiagent Reinforcement Learning for Task Offloading of Space/Aerial-Assisted Edge Computing

MADDPG algorithm for task offloading in SAGIN.

(1)	Initialization:
(2)	Randomly initialize critic network and actor with weights and
(3)	Initialize target network and with weights and
(4)	Empty replay buffer
(5)	for episode do
(6)	Initialize a Gaussian noise with mean = 0;
(7)	Receive initial observation state ;
(8)	for time slot do
(9)	Select action according to the current policy and exploration noise
(10)	Execute action and observe the reward , and the next state
(11)	Collect the global state , and the action ;
(12)	Store transition in ;
(13)	Sample a random mini-batch of transitions from ;
(14)	Set ;
(15)	Update the critic network by minimize the loss

(16)	Update the actor policy by using the sampled policy gradient
	;
(17)	Update the target networks for each agent :
	and ;
(18)	end
(19)	end