Wireless Communications and Mobile Computing

Research Article

Deep Reinforcement Learning for Scheduling in an Edge Computing-Based Industrial Internet of Things

Procedures of DDQN-based DISA.

1. Initialize the evaluate network with random weights and biases as ;
2. Initialize the target network as a copy of the evaluate network weights and biases as ;
3. Initialize replay memory ;
4. for i=1 to do
5. Initialize state in equation (6);
6. Input the system state into the evaluate DQN;
7. Compute the value ;
8. With probability , choose an action ;
9. Execute action , receive a reward and observe the next state ;
10. Store interaction tuple () in ;
11. for j =1 to do
12. Sample a random transition from ;
13. Compute the target value
;
14. Train the network to minimize the loss function
;
15. Perform gradient descent with respect to ;
16. Update target networks every steps
;
17. end for
18. end for