Research Article

A Cooperative -Learning Path Planning Algorithm for Origin-Destination Pairs in Urban Road Networks

Algorithm 1

The policy iteration algorithm.
Initialize   arbitrarily
Repeat (for each episode):
  Initialize  
  Repeat (for each step of episode):
    Choose a direction a from using policy derived from
      (e.g., -greedy)
    Take action , observe ,
     
Until that is the terminal state.