Research Article
Time-Driven Scheduling Based on Reinforcement Learning for Reasoning Tasks in Vehicle Edge Computing
| Input: initial state, maximum number of rounds, maximum number of iterations in a single round | | Output: the scheduling strategy for reasoning tasks | (1) | Initialize the experience pool of constant storage space, the action-value function with random weight and the corresponding | (2) | for to Maximum number of rounds do | (3) | initial state | (4) | for to maximum number of iterations in a single round do | (5) | Choose the action with the largest historical reward with possibility , otherwise choose a random action | (6) | Execute action to get the next state and use Algorithm 2 to calculate the reward | (7) | Store in the experience pool | (8) | | (9) | Random sampling from the experience pool | (10) | Construct an error function according to equation (17), and back-propagation to update the parameters | (11) | Update per few steps | (12) | If satisfies the termination state, the current iteration is ended | (13) | end for | (14) | end for |
|