Research Article

An Intelligent Offloading System Based on Multiagent Reinforcement Learning

Algorithm 2

Multiagent recurrent deep deterministic policy algorithm (MARDDPG).
Input: the environment
 Output:
(1)Initialize the parameters for the N actor network and for the critic network
(2)Initialized the replay memory M
(3)For training step = 1:all steps, do
(4)=initialized message, t = 0
(5) While t< T, do
(6)  For i = 1:N, do
(7)   Select the action for agent it
(8)   Receive reward which is calculated by regional average preference function
(9)   Receive observation
(10)   Update message by
(11)  End for
(12)  
(13)  
(14) End while
(15) Store episode {m0, a1, r1, m1, o2, a2,...} in M
(16) Sample a random minibatch of episodes from replay memory M
(17) Each episode and each time, we do
(18) Update the critic, actor, and LSTM network by minimizing the loss
(19) Soft-update the target critic and target actor network
(20)End for