Abstract

An urban rail transit (URT) system is operated according to relatively punctual schedule, which is one of the most important constraints for a URT passenger’s travel. Thus, it is the key to estimate passengers’ train choices based on which passenger route choices as well as flow distribution on the URT network can be deduced. In this paper we propose a methodology that can estimate individual passenger’s train choices with real timetable and automatic fare collection (AFC) data. First, we formulate the addressed problem using Manski’s paradigm on modelling choice. Then, an integrated framework for estimating individual passenger’s train choices is developed through a data-driven approach. The approach links each passenger trip to the most feasible train itinerary. Initial case study on Shanghai metro shows that the proposed approach works well and can be further used for deducing other important operational indicators like route choices, passenger flows on section, load factor of train, and so forth.

1. Introduction

Passenger flow is the foundation of making and coordinating operation plans for an urban rail transit (URT) system, while assigning passenger flows on the URT network plays a paramount role in analyzing (calculating, predicting, and simulating) passenger flows. A number of transit assignment models have been developed using both theory and practical experience, and thorough reviews were presented in some of the literature [13]. However, different from urban road traffic systems, a URT system is operated according to relatively punctual schedule, which is an important constraint for a URT passenger’s travel. Thus, the passenger flow distribution on the network is subjected to not only passengers’ physical route choices but also their individual train choices especially in peak hours (Figure 1), which may be a more important issue [4]. For analyzing passenger flows on a schedule-based URT network, it is the key to estimate passengers’ train choices for threefold reasons:(1)On a schedule-based URT network, passenger route choices as well as flow distribution on the network can be deduced if the train choices of passengers are obtained, but that is not so either.(2)It can give more precise estimation results for both spatial and temporal dimensions, since URT passengers may fail to board on a train in certain conditions especially in peak hours because of the overcrowding.(3)These pieces of information would be further useful for improving the customer relationship management of a URT company and for improving train timetables, if each passenger’s train choice can be identified over a long period of time. For example, URT companies can check how passengers select trains after timetable improvements.

As mentioned, there are a number of transit assignment models developed for analyzing passengers flows on the network. In those models, in order to obtain passenger route choice preference data, a conventional approach is to conduct field surveys in rail stations, asking passengers about the exact route they took to reach their destinations. However, the shortcomings of these methods have been identified by more and more researchers. For example, the resulting data from these manual methods may be subject to bias and error and is expensive and time consuming both to collect and to process [5]. In addition, the manual methods usually focus only on particular location and time [6, 7]. As a result, alternative concepts and methods need to be developed.

In recent years, automatic fare collection (AFC) data such as smart card data have been used by transit service providers to analyze passenger demand and system performance. These data have been used for O-D matrices estimation [8, 9], demand analysis [10, 11], travel behavior analysis [12], operational management, and public transit planning [1315], and so forth. In particular, there are emerging studies dealing with AFC data of URT systems. Some impressive publications include works by Chan in 2007 [16], Kusakabe et al. in 2010 [17], Xu et al. in 2011 [7], Sun et al. in 2012 and 2016 [18, 19], Zhou and Xu in 2012 [20], Fu et al. in 2014 [21], Zhu et al. in 2014 [22], and Sun et al. in 2015 [23]. However, in spite of the widespread attention on the use of AFC data, there are fewer studies dealing with the passenger train choice behavior in a URT system. Kusakabe et al. [17] developed a methodology for estimating which train would be boarded by each smart card holder using long-term transaction data. Their approach was based on the assumption that smart card data that could not be identified to the possible train choices would be assigned with equal probability. Zhou and Xu [20] developed a passenger flow assignment model based on entry and exit time constraints from AFC data. The model includes an algorithm for generating path’s boarding plan which is similar to passenger train choice. However, the matching degree employed in this algorithm is more intuitive than rigorously defined. Sun and Schonfeld [19] proposed a schedule-based passenger’s path-choice estimation model using AFC data. The model uses the train schedule connection network (TSCN) which considers passengers’ behaviors of boarding on and alighting from the train. However, a weighted assignment used by the model may be not appropriate for a factual travel choice process which uses only one route at the same time rather than multiroutes. And the problem will further become more obvious for those O-D pairs with fewer passenger trips.

For better understanding of passenger flows on network, the objective of this paper is to propose a methodology that can estimate passenger train choices with real timetable and AFC data. The contributions of this paper are presented as follows:(1)We formulate the addressed problem using Manski’s [24] paradigm on modelling choice, which consists of generating consideration choice set and calculating corresponding choice probability.(2)An integrated framework for estimating passenger train choices is developed. The approach links each AFC transaction (a passenger trip) to the most feasible train itinerary (a boarding plan).(3)Real timetable and AFC data are investigated as the inputs to the proposed methodology, instead of relying on manual methods.

The remainder of this paper is organized as follows. In Section 2, the estimation problem of passenger train choices is described and formulated. Section 3 presents the integrated estimation framework. In particular, methods of deducing passenger boarding plan, choice probability, and travel behavior parameters are developed with real timetable and AFC data. Section 4 demonstrates a case study of the proposed approach. Finally, Section 5 concludes the paper.

2. Formulating the Problem

The topic discussed in this paper falls in the scope of choice modelling. From a variety of studies [24, 25] it is well known that the size and composition of choice sets do matter in cases of choice model estimation. Incorrect choice sets can lead to misspecification of choice models [26, 27]. And, furthermore, for a variety of reasons the specification of train choice sets for train choice modelling is different from and more complex than mode choice and route choice, which is why this topic deserves our special attention.

To clearly formulate the estimation problem, Manski’s [28] paradigm on predicting choice is used. The essential conceptual contribution of this paradigm lies in its explicit treatment of the processes making perfect predictions of choice behavior unattainable. Up to date, most of the existing literature on random utility models still generally imposes distributional assumptions directly and consequently this practice has often caused researchers to remain unaware of the restrictiveness of their models because it leaves so much implicit information.

Manski’s paradigm states that the probability of passenger to choose alternative from the choice set , which is also called his/her consideration set, is given by the following expression:where is the probability that passenger will choose alternative from the universal set of all alternatives available to , is the conditional probability that passenger will choose alternative given that is his/her consideration set where is a subset of , and is the probability that is the consideration set of passenger given his/her universal set .

Thus, the corresponding solution for estimating an individual passenger’s train choices with schedule and AFC data can consist of two works: one is generating the consideration set () of his/her train choices. And the other is calculating the probability that he/she will choose alternative from .

The above addressed solution also can be depicted as in Figure 2. The horizontal axis indicates the travel cost for alternative , and the vertical axis indicates the probability that passenger chooses alternative . As we use travel time as cost measure in this study, “cost” and “time” are treated the same (interchangeable) throughout the paper. The red vertical line indicates the observed travel time of passenger extracted from his/her AFC transaction record. Each alternative in his/her consideration set () can be plotted as a dot in the figure. Then, how to estimate which train itinerary the passenger chose in reality? It seems natural that the alternative, which is close to the red vertical line with higher probability, is most likely to be used by the passenger.

3. Methodology

3.1. Overview of Estimation Procedure

For an individual passenger, his/her train choice solution during the travel can be depicted as a boarding plan which is the order of trains that he/she can take to complete his/her travel. The overall framework of our estimation procedure for this kind of boarding plan is shown in Figure 3. At the beginning of the algorithm, denoted by “a,” the AFC data are extracted from the original transaction data and sorted with fields of origin station, entry time, destination station, and exit time, which will be used later. After these data are sorted, several travel behavior parameters of passengers are extracted from abundant timetable and AFC data, which is denoted by “b.” Then, one of the records of the AFC transaction data (which is also a passenger trip) is extracted for estimation. To generate the consideration set, boarding plan generation algorithm is applied at “c.” And at “d,” calculating choice probability of boarding plan is executed. At “e,” the train choice solution (which equals a boarding plan), the passenger choice is determined based on the probability of each alternative in the consideration set. These processes are repeated until all of the records are estimated.

3.2. Generating Boarding Plan Set
3.2.1. Universal Set Generation

This is a two-step part as shown in Figure 4. Due to the URT system’s networked operation, there may be several alternative routes for a given O-D pair, and passengers in practice will choose not only the shortest route but also the second, third, …, kth shortest route for their imperfect knowledge of the network, individual differences, factor of congestion, and so forth. First, an improved Deletion Algorithm (DA) [29] based on Depth-First Traversal (DFT) is introduced to find the kth shortest route, and the initial route choice set of the O-D pair is obtained. Second, for each route in the initial route choice set, all the boarding choices of a given passenger at each boarding station (origin, destination, or transfer station) on the route are deduced with the corresponding schedule data. And then the universal set of the passenger’s boarding plans can be obtained.

The improved DA based on DFT is provided as follows. Different from other -shortest path algorithm, it will not miss any possible route including ring routes.

Step 1. Determine the shortest tree of directed graph () rooted at origin based on the Dijkstra Algorithm. Let be the shortest path from origin to destination in (). Note that .

Step 2. If does not exceed , which is the maximum number of the th shortest paths, and there is still an alternative path in , let and proceed to Step ; otherwise, the algorithm stops.

Step 3. Let denote the set of incoming arcs to node . Let denote the first node of current path for which . If the primed node of node is not in , proceed to Step ; otherwise, let denote the first node of current path without if the node’s primed node is in and proceed to Step .

Step 4. Add to and and to . Let denote the value of the shortest distance from to . Compute and find the shortest path from to . Let .

Step 5. Let denote any note following . Then execute as follows.
Step  5.1. Add the primed node of node to .
Step  5.2. Add and .
Step  5.3. Compute and find the shortest path from to .

Step 6. Let be the shortest path from to the primed node of node in , so that is the best alternative path of . Set and proceed to Step .

Moreover, considering the influence from congestion, a passenger may fail to board and has to wait for the next train. The maximum “fail to board” (FtB) number is set to 3 based on investigations in China, which means a passenger can board on a train within four runs even if the congestion during peak hours makes the passenger be unable to board on the first train.

3.2.2. Consideration Set Generation

A boarding plan for a given passenger is the order of trains that the passenger can take to complete his/her travel. Obviously, it is difficult to determine which train the passenger board in reality. However, usually the passenger is rarely delayed in the process of walking out of the destination station, and consequently the train he or she alighted from can be determined accurately. Thus, we can calculate from the destination station to the origin station backward. For a given trip data (AFC transaction data) obtained from the URT system, a boarding plan is considered unreasonable and should be removed from the universal set if its boarding time at origin station is impossible for the passenger given the constraint of his/her entry time (Figure 5).

Therefore, a filtering algorithm can be developed to further narrow the universal set and get the consideration set. The algorithm is described as follows.

Step 1. Obtain possible boarding plans (universal set). For an actual passenger trip, with the corresponding train diagram, the passenger’s exit time, and walking time at the destination station, possible boarding plans for each route can be easily deduced.

Step 2. Calculate the departure time of a possible boarding plan. Based on the passenger’s travel chain combined with train diagrams, the departure time of the possible boarding plan of each route can be calculated from the destination station to the origin station backward.

Step 3. Compare and remove. As shown in Figure 5, the calculated departure time of the possible boarding plan at the origin station is compared with the passenger’s arrival time on the platform. If , the boarding plan is reasonable for the passenger to choose; otherwise, the boarding plan is unreasonable and removed from the universal set.

3.3. Calculating Choice Probability of Boarding Plan
3.3.1. Point Probability Calculation

For a given boarding plan in the obtained consideration set, we name a boarding station (origin, destination, or transfer station) in the boarding plan as a boarding point. So, the point probabilities of a boarding plan need to be calculated firstly.

It should be noted that passengers may fail to board the train in certain conditions especially in peak hours because of the overcrowding, though they are usually inclined to board on the first train as we know. Therefore, without loss of generality, we use “point probability” to present the probability for a passenger to board on the train within a given boarding plan. For a boarding point in plan i, the probability of leaving with the train for a passenger is that can be obtained directly from the StB (success to board) rates as shown in Figure 6.

3.3.2. Plan Probability Calculation

The plan probability is the function of the point probabilities. Considering that the boarding point with minimum probability is the bottleneck for the boarding plan to be chosen, instead of the product of those probabilities at all boarding points, we adopt the following function:where is the probability of plan .

For example (as shown in Figure 6), suppose there are two boarding plans in the consideration set. For plan 1, the point probability is 0.66 for the train within the given boarding plan at origin station and 0.27 at transfer station. For plan 2, the point probability is 0.34 for origin station and 0.73 for transfer station. Then, the probabilities for the two plans can be calculated easily as follows:

3.4. Extracting Travel Behavior Parameters

As mentioned, we also need to extract in advance several travel behavior parameters of passengers using abundant AFC data resource, for both of boarding plan set generation and choice probability calculation. These parameters include, for each station on the network, minimum access walking time (), maximum access walking time (), minimum egress walking time (), maximum egress walking time (), minimum transfer walking time (), maximum transfer walking time (), and “success to board” (StB) rate (). Walking time parameters are used for generating the consideration sets, while StB rate parameter is used for calculating the choice probabilities of boarding plans.

3.4.1. Access/Egress Walking Time Extraction

First, we deduce parameters of and at every station on the network based on AFC data. It should be noticed that passengers may be delayed at the origin station and transfer stations by passenger flow, the capacity utilization rate of the train, and other factors but are rarely delayed in the process of walking out of the destination station. Thus, it is easier to deduce the parameters of and . By matching the train’s arrival time derived from schedule data and passengers’ exit time derived from AFC data, passengers’ egress walking times can be obtained and its distribution can be extracted too:

It is a kind of normal distribution and can be calibrated with the AFC data. Then, we set the minimum egress walking time () using the 5th percentile of the calibrated distribution and the maximum egress walking time () using the 95th percentile of the calibrated distribution.

Second, we try to get parameters of and at every station on the network. It is noticed that passengers may be delayed during their walking process of access to platform, which makes distribution of access walking times different from egress walking times, and passengers’ exact arrival times on platform also cannot be obtained directly. However, we can still suppose and , since there is some symmetrical characteristic between the processes of a passenger’s access and egress, and we just want to obtain the threshold rather than the exact distribution.

3.4.2. Transfer Walking Time Extraction

In order to extract parameters of and , two assumptions are adopted in advance as follows:(1)The walking speed of the same passenger should be on the same level in his or her trip train. In other words, for a given passenger, the walking speeds at stations (origin station, destination station, or transfer station) should not be different from each other to a great extent.(2)The delay caused by crowding, high-capacity utilization of the train, and similar factors for an individual passenger happens in the origin station as well as transfer stations with equal probability.(3)Last but not least, we just try to extract the threshold rather than the exact distribution.

Then, the minimum transfer walking time () and maximum transfer walking time () at a transfer station can be calculated as follows.

Step 1. Aggregate the AFC data whose O-D flows use the given transfer station as their unique transfer point.

Step 2. Calculate the egress walking speeds with egress walking times and distances () and set the transfer walking speeds using the egress walking speeds; that is,

Step 3. Calculate the transfer walking times at the transfer station with the calculated transfer walking speeds and distances (); that is,

3.4.3. StB Rate Parameter Extraction

At last, we deduce the parameter of StB (success to board) rate. Assuming StB is a direct outcome of overcrowding which is mostly true in peak periods, we can conclude that as long as passengers depart from the same station in the same direction and period, the StB parameter is the same. In that case we can use those O-D flows without any transfers (and hence no alternative route) to estimate the StB parameter. And then, we can consequently apply those parameters to O-D flows with transfers.

The parameter of StB can be defined as a vector as follows:where , , , and are the probabilities that passengers succeed to board on the first, second, third, and fourth train, and obviously all items in the vector sum up to 1.

Taking a case from the Shanghai metro network, for example, if we want to calculate the StB of down direction during 8:00 AM~9:00 AM at Yanchang Rd. Station of Line number 5, we can use the data of those O-D flows without any transfers, including Yanchang Rd. → Zhongshan Bei Rd., Yanchang Rd. → Shanghai Railway Station, Yanchang Rd. → Hanzhong Rd., Yanchang Rd. → Xinzha Rd., and Yanchang Rd. → People’s Square. Table 1 shows the distribution of passengers boarding on different trains during 8:00 AM~9:00 AM at Yanchang Rd. And based on Table 1, the StB of down direction during 8:00 AM~9:00 AM at the station of Yanchang Rd. can be deduced (Table 2).

4. Case Study

4.1. Test O-D Pair

For the purpose of approach test, a case study is conducted on a specific O-D pair (from Xingzhi Rd. to Xinzhuang) on the Shanghai metro network. As shown in Figure 7, there are three routes connecting the original station (Xingzhi Rd.) and the destination station (Xinzhuang), which can be obtained by improved DA based on DFT. The first route moves through Line 7 and Line 1, with the transfer station: Changshu Rd. The second route moves through Line 7, Line 4, and Line 1, with the transfer stations: Dongan Rd. and Shanghai Indoor Stadium. The third route travels through Line 7, Line 9, and Line 1, with the transfer stations: Zhaojiabang Rd. and Xujiahui. The theoretic travel times of the three routes are 2970 seconds, 3485 seconds, and 3488 seconds, respectively.

4.2. Data Used in the Test

In the test, 57 passenger trips records between 07:00 AM and 08:00 AM and obtained from the AFC system are used to verify the proposed approach. Table 3 gives a sample record from these 57 passenger trips.

Moreover, as another important input of the proposed approach in this paper, the corresponding real timetables of the relevant URT lines (e.g., Line 1, Line 4, Line 7, and Line 9) were obtained from automatic train supervision (ATS) system and used too. Tables 46 show the samples of this data.

4.3. Results and Discussions

Using the above input data, the boarding plan estimation for these 57 passenger trips is performed with the proposed approach. Table 7 gives a sample of the estimation results. As can be seen in the table, each passenger trip (which equals an AFC transaction record) derived from the AFC system can be assigned to the unique boarding plan by the proposed approach.

As mentioned, for a schedule-based URT system, the result in Table 7 is the key for passenger flow analysis, based on which other important indicators (e.g., route choices, passenger flows on section, and load factor of train, as shown in Table 8 and Figure 8) can be deduced furthermore.

Companying with the above case study, some extended discussions can be further made. Previous studies use discrete choice analysis extensively to predict passenger choice behavior. Such a model requires preference data and still displays great variability in real-world estimation. Recently, in this context, some researchers try to reveal route choice from observed passenger travel time derived from smart card system (Kusakabe et al., 2010; Sun and Xu, 2012; Zhou and Xu, 2012; Zhu et al., 2014; Fu et al., 2014; Sun et al., 2015). As demonstrated in the case study, we figure out the key issue of estimating passenger boarding plans, based on which all the route choice, section flow, load factor, and so forth can be deduced, furthermore, and no longer depend on the assumption that smart card data that could not be identified to the possible train choices would be assigned with equal probability (Kusakabe et al., 2010). Furthermore, the proposed approach improves the methodologies of Sun and Schonfeld [19] and Zhou and Xu [20] on calculating passenger boarding plans. On the other hand, compared to the study efforts presented in [21, 23], our approach models the problem of interest considering the temporal dynamics induced by demand profiles, service timetables, and crowdedness.

5. Conclusions

A URT system is operated based on its schedules. Different from those urban road traffic systems, it is more important to estimate passengers’ train choices based on which passenger route choices as well as flow distribution on network can be deduced. Developments in the application of AFC systems have made the collection of detailed passenger trip data in a URT network possible and can be used to obtain more in-depth understanding to passenger travel behaviors. In this paper, we aim to formulate the problem of estimating passenger train choices and subsequently propose an integrated approach for the addressed estimation combining real timetable and AFC data.

Advantages of the proposed approach include the following:(1)A posteriori estimation framework, which uses revealed information combining real timetable and AFC data of URT systems rather than the a priori knowledge, was proposed.(2)The approach links each AFC transaction (a passenger trip) to the most feasible train itinerary (a boarding plan). It is more appropriate for a factual travel choice process which uses only one route at the same time rather than multiroutes.(3)The travel behavior parameters used in the approach are exacted from abundant timetable and AFC data rather than the manual surveys. Meanwhile, those exact pieces of information, which are difficult to be measured such as distributions of passengers’ walking speeds and times, are also avoided to be obtained.

Furthermore, the proposed approach in this paper can be used for other challenges in the field of URT operation and management such as validation of rail transit assignment models, time-dependent train load estimation, and integrated simulation of passenger flows on network.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The study was financially supported by the National Natural Science Foundation of China (71271153), Scientific Research Foundation for Returned Scholars (Ministry of Education of China), Program for Young Excellent Talents in Tongji University (2014KJ015), and Fundamental Research Funds for the Central Universities of China (1600219249). The authors are also appreciative of Shanghai Shentong Metro Co., Ltd., for providing useful information.