Security and Communication Networks

Research Article

Reinforcement Learning for Security-Aware Workflow Application Scheduling in Mobile Edge Computing

Deep Q network-based security-aware workflow scheduling scheme.

	BEGIN
(1)	Initialize the replay memory with the size of , and a minibatch of the state transition experiences with the size of ;
(2)	fordo
(3)	Resetting the system state ;
(4)	for do
(5)	At the beginning of each time slot , the current state of the system is observed;
(6)	Based on the current state , randomly select an action with probability and select the action with the largest value with probability;
(7)	The immediate reward can be calculated and the system state in the next time slot can be observed;
(8)	The state transition experience can be obtained and stored into the replay memory;
(9)	The immediate rewards at each step are accumulatively summed;
(10)	Randomly sample state transition experiences from the replay memory to train the Q network;
(11)	Calculate the expectation of the mean-squared error between the current evaluated value and the target value :
(12)	end for
(13)	end for