Wireless Communications and Mobile Computing

Research Article

Joint Radio Map Construction and Dissemination in MEC Networks: A Deep Reinforcement Learning Approach

Actor-critic-based joint offloading and resource allocation algorithm.

Input: actor network parameters , critic network parameters , actor target network parameters , critic target network parameters , discount factor , replay buffer , batch size , epsilon greedy
Output: the best strategy
1: Initialize: randomly initialize and , , ,
2: for to do
3: Initialize state
4: for to do
5: Actor output .
6: Add noise on with -greedy on and Gaussian distribution with mean .
7: Get action with exploration variance .
8: Take action , observe reward and next state .
9: Store transition in .
10: Sample a random batch of transitions from .
11: Set .
12: Update the critic with minimizing the loss by Equation (18).
13: According to the loss , update the actor through the continuous part training phase and the discrete part training phase by Equations (15) and (16)
14: Update the target networks: , .
15: end for
16: end for