Research Article

Minimizing the Cost of Spatiotemporal Searches Based on Reinforcement Learning with Probabilistic States

Figure 4

Training of QDP. denotes the current state, denotes the action performed at state , denotes the reward (reward is the inverse of cost) of the action at state , denotes the probability distribution of next states, and denotes the reward of the optimal action at each possible state.