Wireless Communications and Mobile Computing

Research Article

Minimizing the Cost of Spatiotemporal Searches Based on Reinforcement Learning with Probabilistic States

QDP’s training method

Input:, , , ,
Output:value function
1 Calculate the vehicle transition probability at and at the moment according to ; //
2 Calculate the expected cost of spatiotemporal searches according to Equation (6); //
3 Initialization value function;
4 while does not converge:
5 Randomize spatiotemporal point and the moment decision ;
6 Calculate according to Equation (9);
7 Update the value function according to Equation (8);
8 end.