Research Article  Open Access
Dang Khoa Vo, Tran Vu Pham, Nguyen Huynh Tuong, Van Hoai Tran, "Least Expected Time Paths in Stochastic ScheduleBased Transit Networks", Mathematical Problems in Engineering, vol. 2016, Article ID 7609572, 13 pages, 2016. https://doi.org/10.1155/2016/7609572
Least Expected Time Paths in Stochastic ScheduleBased Transit Networks
Abstract
We consider the problem of determining a least expected time (LET) path that minimizes the number of transfers and the expected total travel time in a stochastic schedulebased transit network. A timedependent model is proposed to represent the stochastic transit network where vehicle arrival times are fully stochastically correlated. An exact labelcorrecting algorithm is developed, based on a proposed dominance condition by which Bellman’s principle of optimality is valid. Experimental results, which are conducted on the Ho Chi Minh City bus network, show that the running time of the proposed algorithm is suitable for realtime operation, and the resulting LET paths are robust against uncertainty, such as unknown traffic scenarios.
1. Introduction
The routing problem in a schedulebased transit network involves scheduling decisions made by a traveler, for example, accessing to a stop (station), walking between stops, waiting to board, traveling invehicle, alighting, and egressing. These decisions guide the traveler from an origin to a destination with minimum travel costs, such as number of transfers, total travel time, walking time, and waiting time. The decisions of the traveler are not only constrained by the network configuration, that is, transit routes (lines), but also constrained by the schedules of transit vehicles. However, due to the stochastic and timevarying nature of vehicle travel time, as well as the effects of the arrival of a transit vehicle at upstream stop on its arrivals at downstream stops, the arrival times of transit vehicles usually do not follow their schedules. Therefore, the determination of robust routing decisions can greatly affect the quality of the routing service provided under uncertain conditions.
Along with the stochasticity of vehicle travel times and the relationship between vehicle arrival times on the same transit route, there might also exist overlaps between transit routes in the network. Therefore, the arrival times of transit vehicles would be not only stochastic but also fully stochastically correlated. The routing problem with stochastically correlated link travel times has been investigated intensively in highway networks [1–6]. However, its counterpart in transit networks, where vehicle arrival times are considered as stochastically correlated, has not been addressed, while existing works in literature assumed vehicle arrival times to be deterministic [7–13] or statistically independent [14]. The main issue when designing a routing algorithm in a schedulebased transit network with correlated vehicle arrival times is to model the stochastic correlation of vehicle arrival times. This issue is related to the question of how to incorporate the correlation of vehicle arrival times into the routing process, in which not only constraints on transit routes but also constraints on vehicle arrival times are taken into account:(i)A timedependent model is proposed for stochastic schedulebased transit networks where the correlation of all vehicle arrival times is presented as a scenario. The graph model captures travelers’ decisions, namely, boarding, traveling invehicle, alighting, walking, and time constraints of these decisions in each scenario.(ii)A new dominance condition for paths is established with respect to number of transfers and travel times over a set of possible scenarios. Then a formal proof that Bellman’s principle of optimality is valid with nondominated paths is presented. This theoretical establishment enables the use of pretrip online information to reduce uncertainties for more robust LET paths.(iii)An exact linkbased routing algorithm is proposed for efficiently determining LET paths, based on Bellman’s principle of optimality. The results from experiments, which are conducted using data from a realsize bus network in Ho Chi Minh City, show that the running time of the proposed algorithm is feasible for online applications. Also, LET paths are shown to be robust in the presence of unknown scenarios.
The remaining of the paper is organized as follows. We present related researches on the routing problem in transit networks in Section 2. In Section 3, we define components used to develop the algorithm for our routing problem. Then we propose the solution algorithm for determining the LET path in Section 4. Various experiments are conducted, and their results are discussed in Section 5. Finally, the conclusion is given in Section 6.
2. Related Work
A transit routing algorithm in literature has been built on the notion of path [7, 8, 15] or hyperpath [16–18]. A path consists of fixed decisions made by a traveler at stops, which are determined before he/she leaves the origin. In contrast, a hyperpath represents routing strategies in which the traveler is allowed to change his/her decision at each intermediate stop, depending on the previous decisions and what are likely to happen in the future. Routing based on hyperpath was shown to make better travel costs under uncertainty but requires the incorporation of online information and high computational complexity [19].
Treatment for the routing problem in a transit network can be different, depending on the type of transit services, that is, either headwaybased [15–17] or schedulebased [7, 8, 11, 12]. In the former, transit services are represented by transit routes, and arrival/departure times of transit vehicles are not explicitly considered. This results in an approximation in calculating boarding times and invehicle travel times. In the latter, transit services are explicitly specified in terms of trips (runs), in which arrival/departure times of transit vehicles at stops are considered. Meanwhile the routing algorithm in a headwaybased transit network can employ shortest path algorithms, for example, Dijkstra’s algorithm [20], which are the same as those used for highway networks. A schedulebased transit network requires a timedependent network presentation where routing processes of travelers are not only constrained by the network topology but also constrained by scheduled arrival/departure times of transit vehicles. Therefore, modeling transit services is the first and important task in solving the routing problem in a schedulebased transit network. As classified by Nuzzolo and Crisalli [21], the representation of a schedulebased transit network can be one of the three forms: the diachronic (timeexpanded) network [9, 10, 13], the dual network [22], and the mixed linebased/database supply model (timedependent model) [11, 23, 24].
In the context where transit services are insufficiently reliable, headways and arrival/departure times of transit vehicles are commonly modeled as random variables with wellknown forms of probability distribution, for example, exponentially distributed headways [25, 26], Gaussian distributed headways [27, 28], and Gaussian distributed scheduled times [14]. Along with the stochasticity of transit services, the uncertainty in travelers’ perceptions on different types of travel costs can be also regarded as a source of stochasticity in a transit routing problem [8, 27, 28]. In these works, random weights for different travel cost components, such as transfer penalty, walking time, and waiting time, were incorporated into the routing process.
The routing problem in transit networks has been investigated with various assumptions on many aspects, such as capacity limitation, congestion and overcrowding issues, vehicle capacity, and boarding failures [29]. Nuzzolo and Crisalli [21] investigated various routing models for low and highfrequency schedulebased transit networks. In the former, for example, in regional bus or railway networks, routing processes are based on arrival/departure times of transit vehicles [30, 31]. In the latter, typically in urban areas, travelers usually have a large number of options at stops to reach their destination. In this case, arrivals of travelers at stops do not rely heavily on vehicle arrival/departure times but are significantly affected by vehicle congestion, which are defined in literature as situations in which a traveler cannot board the first arriving vehicle and has to wait for next vehicles. Vehicle congestion can be modeled implicitly as increasing discomfort functions [11, 12, 32] or explicitly with vehicle capacity or set availability constraints [33–35].
3. Network Modeling
In this section, we define components used to develop the algorithm for determining LET paths in stochastic schedulebased transit networks.
3.1. Stochastic ScheduleBased Transit Network
We consider transit network , where is the set of stops and is the set of routes. A route, , is a fixed sequence of stops through which transit vehicles run periodically with fixed trips and defined by a set where is the th stop and is the number of stops on route . Let be the number of trips of route over a set of time intervals , where is unit of time and is the last time interval. The universal stochastic scenario set is a set of all known possible scenarios in the network such that where is the occurrence probability of scenario . Each scenario can be defined by a set of stop timeswhere denotes the stop time (scheduled arrival time of a transit vehicle) at the th stop of the th trip on route in scenario and is the universal set of all stop times in all possible scenarios such that
In the context of transit networks, there might exist overlaps among routes. A scenario presents a stochastic correlation of not only stop times on the same route but also stop times on routes sharing the same physical links. The probability of a scenario happening is the full joint probability of all stop times taking place, and stop times are known for each scenario a priori. This allows us explicitly to take into account delays resulting from transfer failures due to late arrivals and their effects on the total travel time in each scenario.
For example, consider the transit network shown in Figure 1 with and . In this network, there are three routes in which routes and provide services from stop to stop with and , and route provides services from stop to stop with and . The stop times of transit vehicles in the network are shown in Table 1 with , , and in which each of the routes has two trips and each scenario has an occurrence probability of .

In this network, a traveler, who starts from origin stop at time , has two choices of routes to reach destination stop , namely, and . With Assumption (4), the choices of trips for the earliest arrival time at stop in different scenarios are shown in Table 2. In particular, if the traveler uses the choice of routes , his/her expected arrival time at stop equals . In this case, the choice of trips for can be interpreted that, at stop , the traveler transfers to trip 1 of route 3 successfully in scenarios and but misses this trip in scenario . This leads to a later arrival time, that is, 16 instead of 10, at stop , which contributes to the expected arrival time of the choice of routes . Similarly, we have the expected arrival time at stop of the choice of routes that equals . Note that transfer failures might spread over several later trips, depending on scenario.

In this paper, the following assumptions are adopted:(1)Actual travel times of transit vehicles between stops on a given route are nonnegative; that is, (2) Actual arrival times of transit vehicles for later trips cannot be earlier than those of earlier trips; that is, (3) Arrival times of transit vehicles in different scenarios are statistically independent.(4)Vehicle capacity, overcrowding, and fare issues are not considered. In other words, it is assumed that passengers always board any arriving transit vehicle successfully.(5) There is a similar perception for passengers on different time components, such as waiting time, walking time, and invehicle time.
Assumptions (1) and (2) are expected to be valid in practice where it is conventional that transit vehicles serving trips on the same route keep away from each other at certain distance and their travel times are always positive. Assumption (3) is equivalent to the assumption used in the routing problem in highway network with correlated link travel times [4–6]; that is, link travel times in different scenarios are stochastically independent. Assumptions (4) and (5) have been widely adopted in literature, for example, [14, 18, 19, 25, 26].
3.2. TimeDependent Model
A timedependent graph model (similar to [24]) is used to present the transit network as a directed graph whose arcs model travelers’ decisions, namely, boarding, traveling in vehicles, alighting, and walking.
Let denote the graph modeling the transit network , where the set of nodes and the set of arcs are defined as(i),(ii),
in which subsets of and are defined as follows: is the set of stop nodes—a node represents the location of stop . is the set of route nodes associated with route —a node represents a transfer point where route visits stop . is the set of boarding arcs—arc , where and , represents the action of a traveler boarding an arriving vehicle of route at stop . is the set of invehicle arcs of route —arc , where and , represents the action of a traveler being invehicle of route from stop to stop . is the set of walking arcs—arc , where and , represents the action of a traveler walking from stop to stop . is the set of alighting arcs—arc , where and , represents the action of a traveler alighting the current vehicle of route at stop .
Figure 2 presents the graph model for the transit network as shown in Figure 1. Let denote all paths connecting node and node in graph or all paths in short. A welldefined path , defined in Definition 1, in graph represents a choice of routes when he/she travels from origin stop to destination stop within the transit network .
Definition 1 (welldefined path). Path is welldefined if and , , .
3.3. Arc Time and Transfer Weights
Note that only travelers’ decisions are captured in Section 3.2. For modeling constraints on times when the schedules of transit vehicles are taken into account, times are then assigned to arcs as arc weights.
Let be the time weight on arc with time at node in scenario . Depending on the type of arc , the time weight is either boarding penalty, invehicle travel time, alighting penalty, or walking time. In particular, can be assigned according to the four following cases.
Case 1. If , where and , the traveler stands at the th stop at time and boards a vehicle of arriving trip of route . Due to unlimited vehicle capacity assumption, the boarded trip is commonly the first arriving one [24, 36, 37]. For boarding an arriving trip, the traveler must be at the stop before the bus of that trip leaves the stop by at least units of time (note that herein is set to one and will be omitted for convenience in the rest of the paper). The boarding penalty for the first arriving trip if the traveler stands at the th stop of route at time is expressed by
Case 2. If , , where and , the traveler rides on a vehicle serving a certain trip, for example, the th trip, and travels from the th stop to th stop on route . The time weight on arc is therefore the invehicle travel time of the th trip from the th to th stop. The invehicle travel time of the th trip from the th stop to th stop on route iswhere the traveler’s arrival time at the th stop is the stop time of the vehicle in scenario .
Case 3. If , where and , the traveler alights from a vehicle serving a trip, for example, the th trip, at the th stop of route . The arc time weight can be expressed bywhere is the alighting time for the th trip at the th stop on route .
Case 4. If , where and , the traveler walks from stop to stop . Let be the minimum time required for walking between stops and , and the walking time weight is given by
Let be the weight for the number of transfers on arc . Note that does not depend on time and scenario. The arc weight for number of transfers equals one if the arc is a boarding arc and zero for otherwise. Therefore,
In summary, Table 3 shows the arc time weights in the example graph model in Figure 2 after applying (7), (8), (9), and (10) with boarding penalty and alighting penalty . Each arc with symbol “—” at a given time and in a given scenario means the traveler’s action associated with that arc is restricted at that time and scenario. For example, considering arc , in scenario , from times to the arc represents the traveler’s action of boarding route at stop with different boarding penalties; that is, from times 0 to 6 the traveler boards trip 1 with penalties from 7 to 1, and from times 7 to 9 the traveler boards trip 2 with penalties from 3 to 1. After time the traveler’s boarding action is restricted since no trip of route 2 will arrive at stop in scenario (see the timetable in Table 1). Note that only walking arcs, that is, and , are available at any time since travelers can walk freely. For shortest path problems, restricted actions can be set with very large integer weights.

4. Least Expected Time (LET) Path Problem
In Section 3, we propose and explain the graph modeling transit network that captures travelers’ actions, namely, boarding, invehicle, alighting, and walking, and time constraints associated with travelers’ decisions. Below we will study the LET path problem in stochastic schedulebased transit networks using the graph model.
4.1. Problem Definition
The LET path problem in this paper is studied from one origin node for a fixed departure time to all destination nodes over a scenario set . The criteria used for evaluating a path include the number of transfers and the expected total travel time across the set of scenarios .
Let be the travel time on  path , , in scenario . Let us consider  path that is expanded from  path via arc , denoted by . The relationship between travel time on path and that of its subpath for departure time in scenario is given bywhere is the time weight on arc at time . Depending on the type of arc , arc weight is determined by one of (7), (8), (9), and (10). When Assumption (3) holds, the expected (mean) travel time of  path with departure time over scenario set , denoted by , is given bywhere is the occurrence probability of scenario and .
We also have the relationship between the number of transfers on path , that is, , and the number of transfers on its subpath , that is, , in the following:where weight for the number of transfers on arc is given by (11), and .
From the transit travelers’ perspective, it is more useful that we aim to minimize the number of transfers first and then the expected travel time across the scenario set. The LET path is given by Definition 2.
Definition 2 (LET  path). The LET  path with departure time over scenario set , , is given by
4.2. Dominance Condition
A LET  path problem with departure time over scenario set defined in (15) can be solved by enumerating all possible  paths and then minimizing the number of transfers and the expected travel time of each  path in for departure time over using (12) and (14). Such a brute force algorithm is inefficient. We therefore propose a dominance condition by which the optimal LET path is satisfied. First, we define a dominance condition in Definition 3. Then, the LET  path for departure time over scenario set is found in the set of nondominated  paths at time over by Proposition 4.
Note that the dominance condition in Definition 3 is not as strict as the one with at least one scenario such that . This is because, in the constructed graph described in Section 3.2, there might exist many nondominated paths, which present the same choice of routes and are only different from each other in transfer locations.
Definition 3 (nondominated  path). Given a departure time ,  path , , dominates another  path over the scenario set , if Then  path is nondominated in for departure time over scenario set if is not dominated by any  path at time over .
Proposition 4. Given a departure time and a set of scenarios , the LET  path at time over , , belongs to the set of nondominated  paths at time over .
By Definition 3, the problem of determining nondominated  paths can be treated as multicriteria shortest path problem with independent criteria, namely, number of transfers, as well as travel times corresponding to scenarios. Theorem 7 below implies that Bellman’s principle of optimality is valid when nondominated paths are defined with respect to their nondominated subpaths. We later develop a forward labelcorrecting algorithm to solve the LET path problem based on Theorem 7. Note that Theorem 7 is established on the grounds of Lemmas 5 and 6, being only valid when Assumptions (1) and (2) hold.
Lemma 5. For any given arc , , and , , .
Lemma 6. For any given arc , if ,
Theorem 7. Given departure time and a set of scenarios , every nondominated  path is made up from nondominated  subpaths, where is any intermediate node on path .
4.3. Relationship between Nondominated, LET, and the Fastest Paths
Definition 8 (the fastest  path). Given departure time and scenario , the fastest  path at time in scenario , , is given by
Proposition 9. Given the fastest  path at time in scenario , , and a set of nondominated  paths at time over the scenario set , if , belongs to the set of nondominated  paths at time over .
Proposition 10. Given departure time and two sets of scenarios , if , the set of nondominated  paths at time over is a subset of the set of nondominated  paths at time over , .
By Propositions 4, 9, and 10, we can establish the relationship between nondominated, LET, and the fastest  paths for departure time over universal scenario set and its subset as shown in Figure 3. By determining the set of nondominated  paths over the universal scenario set at departure time , we can obtain LET  path at time over any subset of . The relationship is beneficial when prejourney online information is used to determine these subsets.
4.4. LabelCorrecting Algorithm
The algorithm for determining the LET paths is based on the linkbased approach, using the optimality condition stated in Theorem 7, which is only valid as Assumptions (1) and (2) hold. Since the proposed approach helps avoid enumerating all possible origindestination paths, it is feasible in realtime applications. Note that Theorem 7 can be still valid with simple modifications in the dominance condition when other criteria, such as fare and walking distance, are taken into account as long as the arc weights for these criteria are positive and timeindependent. Consequently, we can incorporate travelers’ weightings on different criteria, such as number of transfers, total travel time, fare, and walking distance, in the routing process.
Nevertheless, one drawback of our approach is that it does not allow taking into account travelers’ weightings on different time components, such as boarding time and invehicle travel time, since the arc weights for these time components are timedependent. Several works in literature solved this issue using the path enumeration method [27, 28] or the branch and bound method [8]. However, these solution approaches are infeasible for realtime applications, especially in stochastic transit networks herein considering the stochastic correlation among stop times of transit vehicles. Our proposed algorithm is developed as follows.
Given departure time , for each node and each  path , the algorithm maintains a vector label Let be the set of nondominated labels corresponding to the set of nondominated  paths at time over the set of scenarios . According to Theorem 7, each label contains the information of nondominated  path that has potential to be a nondominated origindestination path at time over when the algorithm terminates, where label is nondominated in at time over if is a nondominated  path at time over (see Definition 3).
At each iteration of the algorithm, label is selected from queue that contains a nondominated candidate path . Path is expanded via arc . Depending on the type of arc , a temporary label for path is constructed with weights calculated by (12) and (14). To determine if a new label is nondominated, it is compared with the nondominated labels at node . Details for the algorithm are presented in Algorithm 1.

Algorithm 1 is equivalent to multicriteria shortest path algorithm for independent criteria and terminates after a finite number of steps with a set of nondominated paths at each node [38]. The algorithm is computationally intractable as the number of nondominated paths examined by the algorithm grows exponentially in the worst case [39]. However, the experiments in Section 5 show that the number of examined nondominated paths in a typical transit network is much smaller than that of the worst case.
4.5. Illustrative Example
Consider the transit network in Figure 1 and its schedules as shown in Table 1. The timedependent graph and its arc times are shown in Figure 2 and Table 3, respectively, with boarding penalty and alighting penalty . Figure 4 shows nondominated vector labels at all nodes in the timedependent graph for origin node , destination node , and departure time over the universal scenario set and two nondominated  paths: after the termination of the LETPath Algorithm 1. Each sequence of dashedline arrows in Figure 4 gives travel times along a nondominated path in the corresponding scenario. Note that each path corresponds to a choice of routes and each sequence of dashedline arrows corresponds to a choice of trips as shown in Table 2.
Table 4 gives the summary of the obtained LET  paths with respect to subsets of . As the relationship shown in Figure 3, the set of nondominated  paths over is the superset of all sets of nondominated  paths over and also contains the fastest  paths when each of scenarios , , and occurs.

5. Experiments
In this section we conduct large numerical experiments aiming to investigate (1) the average running time of the proposed LETPath algorithm; (2) the set of nondominated  paths; and (3) the robustness of LET paths in the presence of unknown scenarios.
5.1. Experiment Setups
The experiments are conducted on Ho Chi Minh City (HCMC) bus network (Figure 5). The network consists of 1,340 stops, 40 routes, and 1,445 physical links, that is, direct links connecting pairs of consecutive stops on routes. Walking shortcuts are available between stops in a radius of less than 500 meters, and the average walking speed is approximately 2 km/h. Intervals between consecutive trips are 15 minutes and scheduled stop times are generated from 7:00 am to 4:00 pm. The graph model has 5,943 nodes and 13,227 arcs with boarding penalty minute and alighting penalty minutes.
The data set for experiments is made up from 500 random user requests where  pairs are generated randomly with the constraint that the distance between origin and destination is at least 5 km, and departure time is generated from 7:30 am 1:00 pm to make sure that path times are not later than the ending time at 4:00 pm. The experimental environment is 2.6 GHz dualcore Intel Xeon ES405 2.00 Hz, 3 GB RAM, on CentOS under Java Runtime Environment 1.6 (JRE 1.6) and MySQL 5.2 database.
For studying the robustness of the proposed scenariobased approach, we compare the path found by our approach with that of certain equivalence (CE) approximation [5], in which stop times of transit vehicles are deterministic.
The CE approximation replaces every stop time random variable by its expected value over the margin distribution. In particular, the expected stop time for the th trip at the th stop on route over the scenario set is calculated bywhere is the independent random stop time for the th trip at the th stop on route over . Thus, the stochastic network with scenarios is transformed into a deterministic network with only one scenario.
So far, we assume that exact information on the probability distribution of stop times is not available, and therefore it is impossible to build a sufficient number of scenarios that can precisely describe the uncertainty of schedules. Suppose that is the unknown scenario that will actually happen and is the set of known scenarios. For a given  pair and departure time , let and be the LET  path at time over scenarios and the fastest  path at time in scenario , where paths and are given by (19) and (15), respectively. Then, when scenario actually happens, the desired optimal path will be . However, since only scenario set is known, the proposed approach is robust if the expected travel time of path does not deviate much from that of path . Hence, the robustness of proposed approach is evaluated using the deviation of travel times of paths and in all unknown scenarios for all triples . The evaluated criteria (or considered performance metrics) are as follows:(1)Precision, the ratio between the number of cases in which LET  path is also the fastest  path and the number of total cases.(2)Mean absolute percentage error (MAPE), the average deviation percentage between the actual and expected travel time of the LET  path .(3)The fastest path mean absolute percentage error (FMAPE), the average deviation percentage between the actual travel time of the fastest  path and the actual travel time of LET  path .
The metrics Precision, MAPE, and FMAPE are given by