Research Article

Time-Driven Scheduling Based on Reinforcement Learning for Reasoning Tasks in Vehicle Edge Computing

Algorithm 2

Scheduling algorithm.
Input: initial state, maximum number of rounds, maximum number of iterations in a single round
Output: the scheduling strategy for reasoning tasks
(1)  Initialize the experience pool of constant storage space, the action-value function with random weight and the corresponding
(2)  for to Maximum number of rounds do
(3)    initial state
(4)   for to maximum number of iterations in a single round do
(5)   Choose the action with the largest historical reward with possibility , otherwise choose a random action
(6)   Execute action to get the next state and use Algorithm 2 to calculate the reward
(7)   Store in the experience pool
(8)   
(9)    Random sampling from the experience pool
(10)    Construct an error function according to equation (17), and back-propagation to update the parameters
(11)   Update per few steps
(12)   If satisfies the termination state, the current iteration is ended
(13)  end for
(14)end for