Research Article

Joint Radio Map Construction and Dissemination in MEC Networks: A Deep Reinforcement Learning Approach

Algorithm 1

Actor-critic-based joint offloading and resource allocation algorithm.
Input: actor network parameters , critic network parameters , actor target network parameters , critic target network parameters , discount factor , replay buffer , batch size , epsilon greedy
Output: the best strategy
1: Initialize: randomly initialize and , , ,
2: for to do
3: Initialize state
4: for to do
5:  Actor output .
6:  Add noise on with -greedy on and Gaussian distribution with mean .
7:  Get action with exploration variance .
8:  Take action , observe reward and next state .
9:  Store transition in .
10:  Sample a random batch of transitions from .
11:  Set .
12:  Update the critic with minimizing the loss by Equation (18).
13:  According to the loss , update the actor through the continuous part training phase and the discrete part training phase by Equations (15) and (16)
14:  Update the target networks: , .
15: end for
16: end for