Research Article
An Intelligent Offloading System Based on Multiagent Reinforcement Learning
Algorithm 2
Multiagent recurrent deep deterministic policy algorithm (MARDDPG).
| Input: the environment | Output: | (1) | Initialize the parameters for the N actor network and for the critic network | (2) | Initialized the replay memory M | (3) | For training step = 1:all steps, do | (4) | =initialized message, t = 0 | (5) | While t< T, do | (6) | For i = 1:N, do | (7) | Select the action for agent it | (8) | Receive reward which is calculated by regional average preference function | (9) | Receive observation | (10) | Update message by | (11) | End for | (12) | | (13) | | (14) | End while | (15) | Store episode {m0, a1, r1, m1, o2, a2,...} in M | (16) | Sample a random minibatch of episodes from replay memory M | (17) | Each episode and each time, we do | (18) | Update the critic, actor, and LSTM network by minimizing the loss | (19) | Soft-update the target critic and target actor network | (20) | End for |
|