Research Article
Reinforcement Learning for Security-Aware Workflow Application Scheduling in Mobile Edge Computing
Algorithm 1
Deep Q network-based security-aware workflow scheduling scheme.
| BEGIN | (1) | Initialize the replay memory with the size of , and a minibatch of the state transition experiences with the size of ; | (2) | fordo | (3) | Resetting the system state ; | (4) | for do | (5) | At the beginning of each time slot , the current state of the system is observed; | (6) | Based on the current state , randomly select an action with probability and select the action with the largest value with probability; | (7) | The immediate reward can be calculated and the system state in the next time slot can be observed; | (8) | The state transition experience can be obtained and stored into the replay memory; | (9) | The immediate rewards at each step are accumulatively summed; | (10) | Randomly sample state transition experiences from the replay memory to train the Q network; | (11) | Calculate the expectation of the mean-squared error between the current evaluated value and the target value : | (12) | end for | (13) | end for |
|