Wireless Communications and Mobile Computing

Research Article

Time-Driven Scheduling Based on Reinforcement Learning for Reasoning Tasks in Vehicle Edge Computing

Scheduling algorithm.

	Input: initial state, maximum number of rounds, maximum number of iterations in a single round
	Output: the scheduling strategy for reasoning tasks
(1)	Initialize the experience pool of constant storage space, the action-value function with random weight and the corresponding
(2)	for to Maximum number of rounds do
(3)	initial state
(4)	for to maximum number of iterations in a single round do
(5)	Choose the action with the largest historical reward with possibility , otherwise choose a random action
(6)	Execute action to get the next state and use Algorithm 2 to calculate the reward
(7)	Store in the experience pool
(8)
(9)	Random sampling from the experience pool
(10)	Construct an error function according to equation (17), and back-propagation to update the parameters
(11)	Update per few steps
(12)	If satisfies the termination state, the current iteration is ended
(13)	end for
(14)	end for