Wireless Communications and Mobile Computing

Research Article

Dynamic Task Assignment Framework for Mobile Crowdsensing with Deep Reinforcement Learning

Q network training algorithm based on DDQN model.

Input: Historical data set H, replay memory D, maximum training episodes N, a constant Z, initialization evaluation network Q and target network
Output: Evaluate network Q
1: For step from 1 to N do
2: Initialized worker state =
3: while is not the termination state do
4: if is the initial state then
5: Take the task with period equal to in the historical data set H as the
action set of the current state s
6: else
7: Obtain the action set of the current state in the historical data set H
according to the spatiotemporal constraints of Equation (6) and Equation (7)
8: end if
9: if the action set of is empty then
10: The worker executes the virtual task , the state transitions to , and
the reward is 0
11: Store in the cache memory, where is the next state,
is the reward, and is whether is the termination state
12: else
13: Take as input to get the value for each state-action
pair
14: Use the method to select the corresponding action in the
output of the current value
15: Get , , according to action . and store in the
replay memory
16: end if
17: if replay memory D is full then
18: Cover a piece of data in D and extract a mini-batch to randomly sample
for learning
19: Calculate the target value y according to Equation (12)
20: Gradient descent update of evaluation network Q parameters according
to the loss function of Equation (11)
21: Update the target network parameters every Z step
22: end if
23: =
24: end while
25: end for