Research Article

Deep Reinforcement Learning for Scheduling in an Edge Computing-Based Industrial Internet of Things

Algorithm 1

Procedures of DDQN-based DISA.
1. Initialize the evaluate network with random weights and biases as ;
2. Initialize the target network as a copy of the evaluate network weights and biases as ;
3. Initialize replay memory ;
4. for i=1 to do
5. Initialize state in equation (6);
6. Input the system state into the evaluate DQN;
7. Compute the value ;
8. With probability , choose an action ;
9. Execute action , receive a reward and observe the next state ;
10. Store interaction tuple () in ;
11. for j =1 to do
12.  Sample a random transition from ;
13.  Compute the target value
   ;
14.  Train the network to minimize the loss function
   ;
15.  Perform gradient descent with respect to ;
16.  Update target networks every steps
   ;
17. end for
18. end for