Research Article
Minimizing the Cost of Spatiotemporal Searches Based on Reinforcement Learning with Probabilistic States
Table 2
Accumulative search cost of QDP with different learning rates.
| Start moment | Training time | 8 : 00 | 10 : 00 | 12 : 00 | 14 : 00 | 16 : 00 | 18 : 00 | 20 : 00 |
| α =1.0,γ = 1 | 11.50 h | 32.81 | 32.90 | 34.85 | 32.72 | 34.06 | 30.15 | 34.97 | α = 0.5,γ = 1 | 30.36 h | 32.81 | 32.90 | 34.85 | 32.72 | 34.06 | 30.15 | 34.97 | α = 0.3,γ = 1 | 54.12 h | 32.81 | 32.90 | 34.85 | 32.72 | 34.06 | 30.15 | 34.97 | α = 0.1,γ = 1 | 98.34 h | 32.81 | 32.90 | 34.85 | 32.72 | 34.06 | 30.15 | 34.97 |
|
|