Research Article

Minimizing the Cost of Spatiotemporal Searches Based on Reinforcement Learning with Probabilistic States

Table 2

Accumulative search cost of QDP with different learning rates.

Start momentTraining time8 : 0010 : 0012 : 0014 : 0016 : 0018 : 0020 : 00

α =1.0,γ = 111.50 h32.8132.9034.8532.7234.0630.1534.97
α = 0.5,γ = 130.36 h32.8132.9034.8532.7234.0630.1534.97
α = 0.3,γ = 154.12 h32.8132.9034.8532.7234.0630.1534.97
α = 0.1,γ = 198.34 h32.8132.9034.8532.7234.0630.1534.97