Research Article
Deep Reinforcement Learning for Scheduling in an Edge Computing-Based Industrial Internet of Things
Algorithm 1
Procedures of DDQN-based DISA.
1. Initialize the evaluate network with random weights and biases as ; | 2. Initialize the target network as a copy of the evaluate network weights and biases as ; | 3. Initialize replay memory ; | 4. for i=1 to do | 5. Initialize state in equation (6); | 6. Input the system state into the evaluate DQN; | 7. Compute the value ; | 8. With probability , choose an action ; | 9. Execute action , receive a reward and observe the next state ; | 10. Store interaction tuple () in ; | 11. for j =1 to do | 12. Sample a random transition from ; | 13. Compute the target value | ; | 14. Train the network to minimize the loss function | ; | 15. Perform gradient descent with respect to ; | 16. Update target networks every steps | ; | 17. end for | 18. end for |
|