Journal of Advanced Transportation

Volume 2019, Article ID 6830450, 15 pages

https://doi.org/10.1155/2019/6830450

## Data-Driven Approaches to Mining Passenger Travel Patterns: “Left-Behinds” in a Congested Urban Rail Transit Network

^{1}Department of Transportation Management Engineering, School of Traffic and Transportation, Beijing Jiaotong University, China^{2}National Research Center of Rail Transit and Transportation Training and Accreditation, Beijing Jiaotong University, China

Correspondence should be addressed to Zixi Bai; nc.ude.utjb@03241111

Received 20 November 2018; Accepted 6 March 2019; Published 1 April 2019

Academic Editor: Eneko Osaba

Copyright © 2019 Xing Chen et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

The “left-behind” phenomenon occurs frequently in Urban Rail Transit (URT) networks with booming travel demand, especially during peak hours in a complex URT network, which makes passenger travel patterns more complicated. This paper proposes a methodology to mine passenger travel patterns based on fare transaction records from automatic fare collection (AFC) systems and Automatic Vehicle Location (AVL) data from Communication Based Train Control (CBTC) Systems or tracking systems. By introducing the concept of a sequence, a space-time-sequence trajectory model is proposed to simulate a passenger’s travel activities, including when they are left-behind. The paper analyzes passenger travel trajectory links and estimates the weight of each feasible trajectory under tap-in/tap-out constraints. The station time parameters, including access/egress and transfer walking-time parameters, are important inputs for the model. The paper also presents a maximum-likelihood approach to estimate these parameters from AFC transaction data and AVL data. The methodology is applied to a case study using AFC and AVL data from the Beijing URT network during peak hours to test the proposed model and algorithm. The estimation results are consistent with the results obtained from the authorities, and this finding verifies the feasibility of our approach.

#### 1. Introduction

During the last decade, Urban Rail Transit (URT) in Mainland China has developed from a total system length of only 763 kilometers ten years ago to 5033 kilometers by the end of 2017 [1]. With the rapid development of the URT network, travel demand has also experienced a booming increase. In the past 10 years, the average daily passenger traffic of the Beijing URT system has increased from 1.92 million in 2007 to 10.35 million in 2017, an increase of 439% [2]. The Mass Transit Railway system (MTR) in Hong Kong has increased approximately 131.6% in patronage since 2006 [3].

The significant increase in travel demand has resulted in congestion and overcrowding both in stations and in train vehicles; this has become a serious problem for URT operators to address, particularly during peak hours. On one hand, congestion brings security risks. On the other hand, congestion and overcrowding reduce the attractiveness of the URT network, and some passengers will choose other modes of transportation. Additionally, a new phenomenon appears that we term “left-behind”; some passengers fail to board the first departing train after their arrival at a platform and must wait for a later one. This occurs mainly because the travel demand exceeds the network supply during a given operational time interval due to train vehicle capacity.

To address the above problems, numerous methods have been proposed and adopted from both the operator’s perspective and the passenger’s perspective. Guan (2013) [4] and Yu et al. (2015) [5] developed a model for network design with the objective of minimizing the number of transfers. Niu and Zhou [6] presented a methodology to optimize the timetable to minimize waiting time under time-dependent travel demands and oversaturation. They analyzed the characteristics of passenger flow and formulated a model to minimize passengers’ waiting times or minimize the number of transfers. To better grasp the distribution of URT network passenger flow, methodologies to study passengers’ travel patterns have been developed. These facilitate a number of applications, including (i) analysis of passengers’ path-choice preferences, such as minimum time and minimum number of transfers, (ii) prediction of individual passengers’ locations and the future distribution of URT network passenger flow, (iii) optimizing train scheduling both from the subway line level by identifying the most congested stations and sections and from the network level by identifying the transfer hot-spots, and (iv) guiding passengers to avoid congested sections as much as possible by informing route suggestions, congestion levels, etc.

Thus, this paper attempts to mine passenger travel patterns based on automatic fare collection (AFC) transaction data and Automatic Vehicle Location (AVL) data. It builds on our prior work on the problem [7] and reconstructs a passenger’s trajectory by introducing the concept of a sequence to describe left-behind. The prior work ignored train vehicle capacity constraints and assumed that all passengers left a platform for another subway station by the first departing train after their arrival at the platform. We propose the concept of sequence to describe the relationships between passenger’s arrival and departure of trains. By separating time periods into segments according to stations, train directions, and the departure times of trains passing stations, this paper reconstructs a passenger’s trajectory and proposes a space-time-sequence trajectory model. The model generates a set of feasible space-time-sequence trajectories that indicate a passenger’s precise travel patterns and the expected number of times a passenger is left-behind on a platform. Then, a methodology is presented to estimate the number of left-behinds and the probability of each trajectory based on the distribution of station access/egress walking time and transfer walking time. These distributions can be obtained from manual surveys. However, these require substantial labor. The paper presents a maximum-likelihood estimation methodology based on passengers’ trajectories.

The main contributions of this study include the following:

(i) A space-time-sequence trajectory model to simulate a passenger’s travel itinerary in a congested URT system. The model introduces the concept of sequence and provides a means to indicate the left-behind phenomenon.

(ii) A maximum-likelihood estimation methodology based on AFC and AVL data that estimates passenger’s travel patterns. More automatic data instead of empirical data or manual survey data is used in the methodology; this minimizes the occasional deviation caused by human factors. Additionally, it reduces the difficulty of obtaining data.

(iii) A data-driven method to estimate station walking-time parameters and the expected number of times passengers are left-behind. Station walking-time parameters, including access walking time and egress walking time, are the basic parameters for a station. They are usually obtained by manual survey or observation; however, these require excessive labor and cost. The method proposed in this paper estimates these parameters using statistical methods based on passengers’ space-time-sequence trajectories.

The remainder of this study is organized as follows. Section 2 reviews relevant studies in the literature. Section 3 introduces the main idea of mining passengers’ travel patterns and illustrates an example. Section 4 describes the model, followed by the introduction of the solution algorithm in Section 5. Section 6 presents a numerical experiment on a real-world network. The final section provides our conclusions and suggestions for future research directions.

#### 2. Literature Review

Many scholars and researchers have studied URT network passenger travel patterns during the last decades. At the beginning of these studies, it was generally not possible to obtain bulk data, including passengers’ tap-in/tap-out information and actual train movement data. Because of the lack of data, numerous methods were proposed at macroscale level. These methodologies mainly analyzed passengers’ travel patterns from a network-flow perspective. They generated sets of feasible paths for each origin-destination (OD) and assigned passenger flows to each path following specific principles. The three most well-known principles are the all-or-nothing principle, the stochastic-assignment principle [8–12], and the user-equilibrium-assignment principle [13–18].

With the wide adoption of AFC systems and rapid development of train tracking systems such as the Communication Based Train Control (CBTC) System, massive detailed data was collected and saved to databases. Most AFC systems record passenger tap-in/tap-out information accurately except for some cases such as the New York City Transit Authority (NYCT) system, in which exit swipe information is not recorded. The Automatic Vehicle Location (AVL) system records train arrival and departure times at stations accurately and in detail. With bulk data collected daily automatically and continuously, some novel methodologies for transit performance [19–21] and management [6, 22, 23] have been developed.

Dai (2015) [22] presented a multimodal evacuation model for metro disruptions based on AFC data in Shanghai, China. Using AFC data of stations in urban areas of Hong Kong, Wang (2015) [21] developed a methodology to analyze metro trip patterns at an aggregate level. Kusakabe and Asakura (2014) [24] estimated passengers’ behavioral attributes of trips with a data fusion methodology using smart cards. They observed and compared continuous long-term changes in passengers’ trips as well as personal trip survey data and constructed a Bayes probabilistic model to estimate the purposes of passengers’ trips. Jin (2015) [19] evaluated transit service performance by developing a data-mining logic methodology based on transit smart-card data.

Some scholars and researchers have analyzed network passenger flow at the individual level. Some have proposed a specific trajectory model to simulate a passenger’s trip activities and estimated the maximum-likelihood path based on tap-in/tap-out constraints; numerous assumptions are embedded into stages of the models’ building process. Poon M.H. (2004) [25] assumed that all passengers have full predictive information about present and future network conditions. Chen (2018) [7] assumed that all passengers can always board the first train arriving. Some additional input parameters are needed and have a significant impact on the accuracy of the estimation result.

Sun (2016) [26] presented a schedule-based passenger’s path-choice estimation model for a multioperator rail transit network using automatic fare collection data. In this paper, a Train Schedule Connection Network (TSCN) was constructed, and the estimated passenger path-choice was converted to the problem of generating a feasible set of network paths. The Fail-to-Board (FtB) phenomenon was modeled and the weight of each path could be calculated based on the set of feasible paths. The accuracy of the result was highly dependent on inputs, such as FtB parameters. However, these parameters cannot be obtained directly and are not easy to calculate.

Poon M. H. et al. (2004) [25] present a schedule-based transit model to solve passenger assignments for a congested network. They assumed that all passengers have full predictive information about present and future network conditions and always travel by the minimum-cost path. However, it is not possible for passengers to be informed of full information about network conditions. Furthermore, the minimum-cost path is time-dependent. Frequent URT passengers have their respective perceptions for choosing paths, gaining experience day by day. Additionally, the tap-out times are not taken into account when loading network flow.

Timon Stasko (2015) [20] analyzed passengers’ ridership at the train level using actual train movement data. He built a customized network representation by estimating train movements and developing an origin-destination table. The methodology formulates a trip trajectory with 10 types of arcs. It assigns passengers to trains using a Frank-Wolfe approach, with customizations designed for transit. However, it is unnecessary to infer a destination for most of the URT network. Finally, the accuracy of the result is highly dependent on boarding penalties.

#### 3. Problem Description

Time and space are two important attributes of passenger travel. Although AFC systems record passenger transaction information in detail, including precise transaction times and locations when the passenger swipes his/her smart card, detailed information on a passenger’s itinerary is not included. This section describes methods for estimating a passenger’s detailed travel information in both time and space.

Space-time models attempt to integrate travellers’ time-dependent movements/trajectories with the transportation network and are widely used in transportation geography modeling literature. A space-time trajectory indicates a passenger’s movements among activity locations with respect to time, providing a useful means to describe both the spatial and temporal aspects of a passenger’s travel status. However, the key focus of this paper, the number of left-behinds due to crowding, cannot be obtained directly from a specific space-time trajectory. To estimate the number of times passengers are left-behind, a parameter defined as a sequence is introduced, and a passenger space-time-sequence trajectory model is developed.

The time duration of the study is segmented into a set of successive intervals based on trains’ departure times, stations, and directions, such as upward and downward directions. For example, there are a total of 55 trains passing a station in the upward direction between 7:00 AM and 9:00 AM, and the first/last train’s departure time is 07:02:30/08:59:00. Thus, the time period is segmented into 56 successive intervals in the upward direction; the sequence number of the first time interval from 07:00:00 to 07:02:30 should be 1. The sequence number of the last time interval from 08:59:00 to 09:00:00 should be 56. The sequence numbers of these intervals should also be successive and the sequence number of any time interval should be smaller than that of any later time interval.

Assume that there is a passenger travelling from Station A to Station G through the URT system shown in Figure 1 during peak hours.