Mathematical Problems in Engineering

Volume 2015 (2015), Article ID 146070, 10 pages

http://dx.doi.org/10.1155/2015/146070

## A Cooperative -Learning Path Planning Algorithm for Origin-Destination Pairs in Urban Road Networks

School of Information Science and Engineering, Central South University, 22 South Shaoshan Road, Changsha 410075, China

Received 25 May 2015; Accepted 21 September 2015

Academic Editor: Chronis Stamatiadis

Copyright © 2015 Xiaoyong Zhang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

As an important part of intelligent transportation systems, path planning algorithms have been extensively studied in the literature. Most of existing studies are focused on the global optimization of paths to find the optimal path between Origin-Destination (OD) pairs. However, in urban road networks, the optimal path may not be always available when some unknown emergent events occur on the path. Thus a more practical method is to calculate several suboptimal paths instead of finding only one optimal path. In this paper, a cooperative -learning path planning algorithm is proposed to seek a suboptimal multipath set for OD pairs in urban road networks. The road model is abstracted to the form that -learning can be applied firstly. Then the gray prediction algorithm is combined into -learning to find the suboptimal paths with reliable constraints. Simulation results are provided to show the effectiveness of the proposed algorithm.

#### 1. Introduction

Recent years have seen a growing interest in the study of route-guidance system in intelligent transportation systems, due to its advantages in reducing traffic congestion and CO_{2} emissions, minimizing travel time, and conserving energy [1]. More and more vehicle manufacturers have installed the route-guidance system into their products to assist the drivers’ travel.

As an essential part of the route-guidance system, path planning is usually modeled as the shortest-path problem in graph theory [2–8]. When a vehicle departs from the origin and travels to its destination, the map it is involved in can be abstracted as a graph by treating streets as edges and intersections as nodes. The weight of an edge represents the average travel time over the street, which may dynamically change when traffic flows fluctuate. For the graph of a static network, the most efficient one-to-one node shortest path algorithm is Dijkstra’s algorithm [2]. When the dynamic graph is considered, algorithm might be a better choice to solve the Origin-Destination shortest path problem [3]. algorithm estimates the minimum distance between the destination and a node to determine whether the node is on the optimal route.

However, even if the generated route is the shortest one, it may not be always available because of traffic emergencies such as sudden accidents. So it may be more practical to provide a number of candidate paths rather than just one optimal path. Lee revealed that finding multiple paths instead of one is a good way to avoid the path overload phenomenon [4]. This optimal path will even accelerate the deterioration of the road network when the overload phenomenon occurs.

Traditionally, the alternative paths could be calculated by two categories of algorithms in graph theory, namely, the -shortest path algorithm proposed by Eppstein [5] and Jiménez and Marzal [6] and the totally disjoint path algorithms proposed by Dinic [7] and Torrieri [8]. These so-called alternative path planning methods typically find the optimal path using Dijkstra’s algorithm first. Then the candidate path set can be generated by applying link weight increment methods. These algorithms seek for the next suboptimal path iteratively until the generated alternative path satisfies some given constraints.

However, the generated way of alternative paths of these algorithms unavoidably lengthens the response time, especially when the network is huge and the traffic load is crowded and time varying. These algorithms need to adjust the link weight of the generated optimal path and then recalculate the suboptimal paths using Dijkstra’s algorithm repeatedly, thus leading to heavy computation burden. In addition, these algorithms generally concern path planning of just one vehicle, while it is essential to simultaneously consider all vehicles’ path in practical city road networks.

With the development of intelligent science, some researchers have focused on path planning using reinforcement learning in guidance systems. Reinforcement learning is a category of machine learning algorithms, in which a group of agents can decide how to behave according to their interaction with environment and achieve an optimal objective [9]. Recently, multiagent reinforcement learning has been proposed to find the best and shortest path between the origin and the destination. Some studies treat each intersection as one agent, which needs a large amount of information interaction between traffic intersections to find the optimal path [10] while more studies cast each intersection as the state and take each link as the action in the model, which could deal with the road networks on the whole [11, 12]. Thus our proposed -learning adopts the latter method, treating the intersections as states in the model.

With -learning, the computational complexity of path planning algorithm could be reduced significantly and the efficiency would be improved. While most existing -learning algorithms are designed to solve the optimal path planning for just one OD pair in the literature, the proposed -learning algorithm in this paper aims to seek multiple paths for different OD pairs simultaneously. By choosing the suboptimum -value of every intersection, it is convenient to provide some alternative paths rather than seeking every alternative route incrementally. This paper makes the following contributions in particular.

First, the multipath set is found for different OD pairs simultaneously using -learning. Compared with other multipath algorithms, the proposed algorithm significantly reduces the computational complexity.

Second, some reliability constraints are introduced to choose suboptimal paths in -learning. It would not be appropriate to increase the dimension of the multipath set without considering the overall reliability, which ensures that at least one alternative path is available at all times [13].

Third, the FNN prediction is combined with -learning. In order to improve the real-time capability, short-term traffic prediction is essential [14, 15]. This paper adopts the FNN prediction mechanism in the -learning scheme to predict the traffic condition, with which the reward of the action can be computed in advance.

Fourth, the multiagent cooperative mechanism is applied to path planning. The cooperative mechanism introduced in -learning coordinates the actions and strategies among agents with different OD pairs for long-time benefits.

In this paper, we propose a new multiagent reinforcement learning (MARL) algorithm using -learning with prediction for multipath planning for OD Pairs in the road navigation system. Compared with traditional multipath algorithms, it reduces computational complexity and improves the efficiency of vehicles’ guidance with traffic prediction. The scheme could improve the overall performance of urban traffic networks and balance the traffic flow.

The rest of the paper is organized as follows. Section 2 describes the model of road networks. The -learning based cooperative multiagent multipath planning algorithm for OD pairs is proposed in Section 3. The simulation results are shown and analyzed in Section 4. The conclusion is drawn in Section 5.

#### 2. Model of Road Networks

##### 2.1. Graph Abstraction of Road Networks

For urban areas, two important elements of traffic guidance are intersections and roads. During the process of modeling, the intersection can be seen as the node and the road can be seen as the edge connecting two nodes. The weight on the line stands for the traffic condition of the road, and the arrows mean the allowable direction of forward motion for vehicles. By this abstracting, a graph with a nonempty finite set of intersections (nodes) and a set of roads can be used to describe the road map. Once we have the model and the route algorithm, we can find the needed optimal route.

For instance, Eastern Town of Changsha in China could be taken as an example, whose map is shown in Figure 1. The abstract graph model of Figure 1 is showed in Figure 2. stands for each intersection that is taken as one state in reinforcement learning. has three or four directions to neighbor intersections, including the loop direction that returns to . For example, if one vehicle at intersection drives west, it will return to . The setting is convenient to model the complex road networks.