Sensors in Connected Vehicle Technology: How Sensors Play a Critical RoleView this Special Issue
Research Article | Open Access
Shi An, Lei Wang, Haiqiang Yang, Jian Wang, "Discovering Public Transit Riders’ Travel Pattern from GPS Data: A Case Study in Harbin", Journal of Sensors, vol. 2017, Article ID 5290795, 8 pages, 2017. https://doi.org/10.1155/2017/5290795
Discovering Public Transit Riders’ Travel Pattern from GPS Data: A Case Study in Harbin
This paper proposes a public transit riders’ travel pattern measuring method based on divided cells and public transit vehicle’s GPS data. The method consists of two parts: detecting urban origin and destination areas and measuring the public transit riders’ travel pattern. Moreover, a series of indicators are proposed to reflect the public transit riders’ travel pattern. A case study is carried out to evaluate the methods, which use the GPS data collected from taxis and buses in Harbin, China. The study is expected to provide a better understanding of public transit riders’ travel patterns.
The dramatic increase of urban vehicles leads to many serious problems including traffic congestions, road accidents, and air pollutions, which become general conundrums in many metropolitans all over the world, or even in smaller cities. Public transit is considered to be one of the most effective solutions for these general conundrums. Public transit includes various services that provide mobility to the general public, including buses, trains, ferries, shared taxi, and their variations [1, 2]. Public transit has many obvious advantages, such as less expenses, more effective mobility, and saving travel time, which makes more and more urban residents turn to public transit service. The city travelers and commuters proportion of Beijing who takes public transit system (including bus, taxi, and rail transit) has continuously increased in last decade. And this number came up to 54.2% in 2015, which represents more than 15.5 million trips per day . This trend also occurred in other metropolitans and smaller cities all over the world.
Public transit riders usually exhibit a fixed travel pattern. That is to say, at a macrolevel, usually a fixed number of Origin-Destination pairs locate in same place of urban area, and the number of trips between these OD pairs stays steady every day [4–6]. The usage ratios of each kind of public transit stay steady every day. And the trips at morning or evening rush hours occupy a large proportion of the total trips every day . On the other hand, at a microlevel, a single traveler moves from resident area to work place in the morning and moves back in the evening for each day. If the travel pattern of public transit riders could be identified, the urban public transit manager can benefit from it. For example, analyzing those riders who prefer to choose public transit vehicles rather than private cars helps transit authorities to improve the strategies and even make new policy to attract new riders . With a better understanding of the transfer behavior of public riders, transit agencies can adjust the bus route to make transfer easier, which can enhance the riders’ satisfaction . By calculating the shortest path lengths between all station pairs, the original-destination matrix, and trip lengths, transit agencies can develop fare change plans to manage demand or raise revenue [9, 10].
The Original-Destination (OD) matrix is a typical representation of residents’ travel pattern, which reflects travel demand, trip generation, travel distribution, and so on. The traditional models of establishing OD matrix usually rely on travel behavior survey. In practice, household travel surveys are conducted in many countries . However, the survey data contain many limitations and errors. For example, some metropolitans in Japan (Tokyo, Kyoto, Osaka, etc.) survey residents’ travel behavior every 10 years. Since the cities are growing rapidly, the survey data will definitely be out of date . Furthermore, the sampling rate is usually very low, which brings sampling errors . Meanwhile, many human factors may affect the accuracy of the OD matrix, such as, willfully filtering some trips, forgetfulness, and other related factors .
Compared with traditional survey data, GPS data and smart card data exhibit wider coverage, lower cost, and higher accuracy. With the rapid development of data-based technology, various intelligent transportation systems are widely applied in public transit system. These systems could collect residents’ mobility data every day, including longitude, latitude, boarding time, and dropping off time . In the last decade, various researches based on these data have been carried out, for example, mining urban recurrent congestion evolution patterns from GPS-equipped vehicle mobility data , comparing accessibility in urban slums using smart card and bus GPS data [15, 16], discovering functional zones using bus smart card data , and partitioning bus operating hours into time of day intervals based on bus GPS data , which makes the data based transportation research to be a hot spot of transportation field .
GPS data and smart card data are usually collected from different subsystems of one whole intelligent transportation system, or even from different systems. This is because bus, taxi, and rail transit usually belong to different public transit companies . Therefore, most previous data based researches into transit traveler behaviors utilize smart data [20–22] or GPS data [23, 24], respectively. Accordingly, bus and rail transit riders’ travel pattern can be generated [20–22] or travel behavior of taxi riders [23, 24], respectively. To the best of my knowledge, public transit riders’ travel pattern researches are barely utilizing both smart card and GPS data. However, smart card data only provide riders’ boarding and dropping off information and lack locating information [21, 22]. As a result, only approximate location can be acquired, which causes inaccuracy of origin and destination inference. Moreover, on the microlevel, GPS-equipped public transit vehicle riders’ trips are far less than other public transit vehicles . And the insufficient sampling number will definitely lead to inaccuracy of trip distribution. In this light, the smart data and GPS data should be integrated to discover public transit riders’ travel pattern.
The aim of this paper is to propose an effective method to explore the public transit riders’ travel pattern in an urban area. There are two subgoals identified: detecting urban origin and destination areas at a cell level and measuring public transit riders’ travel pattern.
This paper is organized as follows. Section 2 discusses the definition of cells and locating points that will be applied to this research. Section 3 describes the proposed urban public transit riders’ travel pattern measuring method. Section 4 applies the proposed public transit riders’ travel pattern measuring methodology using taxi and bus GPS data and the urban road network of Harbin. Section 5 provides conclusions and recommendations for future research.
2. Definition of Cells and Locating Points
In this part, we are going to define some parameters of cells and locating points. Firstly, the urban area is divided into small cells with same size. is one of these cells, where and .
According to the taxi GPS dataset, there are four types of occupation status. In this paper, we are going to study the public transit riders’ travel pattern, so we defined two types of locating points according to the boarding and dropping off status, which are described as follows:
Type 1. represents taxi vehicles’ locating points whose occupation status value is 768 (i.e. the occupation status is boarding).
Type 2. represents taxi vehicles’ locating points whose occupation status value is 16640 (i.e. the occupation status is dropping off).
The locating points whose occupation status value is 256 (represents the taxi vehicles being vacant) are of no use to the public transit riders’ travel pattern, so they are not utilized in our research.
represents a specific locating point of a taxi vehicle, where and are the Taxi ID and Timestamp from the taxi GPS datasets.
Passenger only needs to touch smart card once while boarding bus in many cities of China, such as Guangzhou, Xi’an, and Harbin, whereas, in Beijing, passenger needs to touch smart card once again while dropping off. It is easy to know both boarding locating points and dropping off locating points in Beijing. In Harbin, the dropping off locating points can be inferred based on boarding points and time period. In order to simplify the method, in this paper, we use the most periodic commuters’ data from the bus datasets. That is to say, if a specific bus rider gets to work every morning and comes back home every evening, only this type of rider’s locating points will be included in our research. This type of bus rider generates two trips each day. In this light, we defined two types of bus locating points as follows.
Type 1. represents boarding passengers’ locating points.
Type 2. represents dropping off passengers’ locating points.
represents a specific locating point of a bus vehicle, where and are the Card ID and Timestamp from the bus GPS datasets.
How to identify the boarding locating points and dropping off locating points from smart card data is described as follows.
Step 1. Extract the locating points with same card ID (i.e., ) from the whole smart dataset, where is the locating point whose card ID is and is the number of this kind of locating points.
Step 2. Extract the locating points whose and and .
Step 3. Let and .
The methodology for discovering public transit riders’ travel pattern is described in this section. Two stepwise methods are proposed to achieve the main goal, including detecting Origin and Destination areas, measuring the commuter pattern between each OD pair.
3.1. Detecting Origin and Destination Areas
In this part, we are going to detect Origin and Destination areas from public transit riders’ GPS data (i.e., taxi and bus GPS data). Because of the reasonable urban planning and constructions in recent decades, the urban area is usually divided into many different function zones. Therefore, all the origin points and destination points of passengers’ trips will be clustered into several origin areas and destination areas. In this light, we use cluster algorithm to detect origin areas and destination areas in urban area. Moreover, we do not know how many clusters are in advance, and this type of cluster is usually not spherically-shaped. Therefore, we apply a customized Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm to solve this problem.
For a specific cell , we define a parameter to present the locating point (i.e. and ) number in this cell during a specific period . Taking a cells, for example (as shown in Figure 1(a)), in one period all cells’ parameters can be easily calculated. For two specific cells and , the distance between them is calculated as follows:where is the length of the cell. As shown in Figure 1(b), the distance between and is , and the distance between and is .
Some relative parameters are defined as follows: Object represents the cell. Core object : the cell satisfies , where is the threshold of parameter . -Neighborhood of a core object is the space within a radius centered at .
Figure 2 illustrates the flow chart of the customized DBSCAN algorithm in this research. The Original DBSCAN algorithm is based on density. Comparatively speaking, the density reachable points in our customized algorithm are defined by . And we define the minPts value as 1 in our paper.
3.2. Measuring Public Transit Riders’ Travel Pattern
The public transit riders’ travel pattern can be reflected by some indicators, like trip number between each OD pair, the proportion of different transit, path between each OD pair, travel time of the path, and so on. In order to show these indicators, we defined some preliminary terms as follows.
is the trip number of a specific transit mode (i.e. taxi or bus) from cluster to cluster.
Based on clusters of origin and destination area, given a specific period , the public transit OD matrix can be calculated.
is the proportion of a specific transit mode (i.e. taxi or bus) from cluster to cluster, which is calculated as follows:where is the trip number of all transit modes from cluster to cluster.
is the travel time by the specific transit mode (i.e. taxi or bus) from cluster to cluster.
4. Case Study
We apply the proposed methods to Harbin city (China). First of all, the datasets used in this case study are described, and the stepwise methods are implemented one by one.
The smart card data and GPS data are collected from all operating buses and taxis in Harbin. Harbin is the capital of Heilongjiang province in northeast of China. With 4.74 million populations and 7,086 km2 areas in urban area , Harbin is a typical developing city in China. The public transit system of Harbin consists of taxi, bus, metro, and other rail transit. In addition, all taxis are equipped with GPS device and all buses are equipped with smart card system, which make collecting GPS data and smart card data possible.
Approximately 16,000 operating taxis equipped with GPS device are running around Harbin’s urban area day and night. The location information is uploaded to the management system every 30 s during the day and 2 min at night. The data are accumulated to 2G in size and around 25 million rows each day. The taxi GPS data collected from 3rd Aug. to 7th Aug. 2015 is used, consisting of taxi ID, timestamp, latitude, longitude, and status, as shown in Table 1. There are 4 kinds of “status” in the table: 17152, 16640, 256, and 768, which represent occupation, dropping off, vacant, and boarding, respectively.
|Note. The columns from the left to right, respectively, represent the taxi identification; the taxi sampling time; the latitude and longitude of the position; and the occupation status.|
There are nearly 1500 buses traveling around urban area of Harbin, and these buses belong to 100 routes. The sampling frequency is 30 s. The bus IC records collected from 3rd Aug. to 7th Aug. in 2015 are used, consisting of Route ID, Bus ID, Card ID, Timestamp, Latitude, and Longitude, as shown in Table 2.
|Note. The columns from the left to right, respectively, represent the bus route number; bus ID; smart card ID; the card sampling time; the latitude and longitude of the position.|
In our research, another dataset is digital map of Harbin, which consists of most urban areas of Harbin. The research area is nearly 100 square kilometers and just covers the range of 2nd Ring Road of Harbin, as shown in Figure 3. About 80 percent of GPS points locate in this research area. We divided this area into 250 square cells with same size. And each of the cell is meters square, which is shown in Figure 3.
4.2. Detecting Origin and Destination Areas
In this paper, we only measure the public transit riders’ travel pattern during working days. According to Section 3.1, we can calculate the values of all cells. Taking one week’s dropping off data (consisting of and ) for example, that is, from 3rd Aug. to 7th Aug. in 2015, the relative data are illustrated in Table 3. And the sampling time is from 07:00 am to 19:00 pm.
|Note. Total is the number of dropping off points in all cells, average is the average number of dropping off points in one cell, minimum is the minimum number of dropping off points in one cell, and maximum is the maximum number of dropping off points in one cell.|
And then we apply the customized DBSCAN algorithm to measure the origin and the destination clusters. We set value as 282.8 meters (i.e. , where is 200 meters). We apply different values to our experiment, in order to find out the optimal value. Table 4 illustrates the clustering results by computing with different values. According to the results, when the value is larger than 1500, the cluster number is going to be stable. So we set the value as 2,000 each day and 10,000 for 5 days. In this light, we can measure the origin clusters (i.e. boarding clusters) and destination clusters (i.e. dropping off clusters) for the 5 days, as shown in Figure 4.
(a) Origin clusters (boarding clusters)
(b) Destination clusters (dropping off clusters)
4.3. Measuring Public Transit Riders’ Travel Pattern
For a specific OD pair, we can measure the travel pattern between them, that is, , , and , as mentioned in Section 3.2. Taking a cells area, for example, as shown in Figure 5, there are one typical origin cluster and one typical destination cluster. And the three indicators’ values in the survey 5 days can be calculated, which are shown in Table 5.
The Origin Cluster in this case study is the cell , and the Destination Cluster consists of five cells (i.e. , , , , and ). The cell is located nearby Harbin Hongqi Resident District, Heping Resident District, Yuanda Central Park Resident District, and so on. And there is a bus station locating in this cell, which consists of 12 bus routes (e.g. 14, 25, 31, and 44). And the Destination Cluster is located nearby Harbin International Golf Club, Harbin Wanda Plaza, several banks, and so on. This cluster area locates in an important commercial district in Harbin and definitely attracts many residents’ travels. In this Cluster, there are several bus stations consisting of more than 20 bus routes (e.g. 17, 27, 34, and 71). From the Origin Cluster to Destination Cluster, there are at least three direct bus routes (i.e. 71, 82, and 209).
This paper presented a cell-based urban public transit riders’ travel pattern measurement method. The method used GPS-equipped public transit vehicle’s locating data, which is more realistic and easy to collect. We proposed a customized DBSCAN algorithm to detect the origin and destination areas. We computed three indicators for each OD pair, which can reflect the relationship between the origin area and destination area. We carried out a numerical case study to evaluate our method, which uses taxi and bus GPS data in Harbin, China. The results can reflect OD pairs and relationship between each OD pair.
In the future, we will improve the customized DBSCAN algorithm by accuracy and efficiency. The travel pattern is a little simple in this paper, so we will further discover some other indicators in depth. And we plan to extend this research by utilizing more kinds of GPS data from different public transit vehicles.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
This research is supported by the National Natural Science Foundation of China (Project no. 51478151). This work was performed at the Key Laboratory of Advanced Materials & Intelligent Control Technology on Transportation Safety, Ministry of Communications, China.
- T. Litman, Evaluating Public Transit Benefits and Costs, Victoria Transport Policy Institute, British Columbia, Canada, 2015.
- C. Ding, Y. Lin, and C. Liu, “Exploring the influence of built environment on tour-based commuter mode choice: a cross-classified multilevel modeling approach,” Transportation Research Part D: Transport and Environment, vol. 32, pp. 230–238, 2014.
- Beijing Transport Annual Report, Beijing Transportation Research Center, Beijing, China, 2015.
- M. A. Munizaga and C. Palma, “Estimation of a disaggregate multimodal public transport origin-destination matrix from passive smartcard data from Santiago, Chile,” Transportation Research Part C: Emerging Technologies, vol. 24, pp. 9–18, 2012.
- C. Ding, S. Mishra, G. Lu, J. Yang, and C. Liu, “Influences of built environment characteristics and individual factors on commuting distance: a multilevel mixture hazard modeling approach,” Transportation Research Part D: Transport and Environment, vol. 51, pp. 314–325, 2017.
- C. Ding, D. Wang, C. Liu, Y. Zhang, and J. Yang, “Exploring the influence of built environment on travel mode choice considering the mediating effects of car ownership and travel distance,” Transportation Research Part A: Policy and Practice, vol. 100, pp. 65–80, 2017.
- A. Imaz, K. M. N. Habib, A. Shalaby, and A. O. Idris, “Investigating the factors affecting transit user loyalty,” Public Transport, vol. 7, no. 1, pp. 39–60, 2014.
- N. Nassir, M. Hickman, and Z.-L. Ma, “Activity detection and transfer identification for public transit fare card data,” Transportation, vol. 42, no. 4, pp. 683–705, 2015.
- Z.-J. Wang, X.-H. Li, and F. Chen, “Impact evaluation of a mass transit fare change on demand and revenue utilizing smart card data,” Transportation Research Part A: Policy and Practice, vol. 77, pp. 213–224, 2015.
- C. Ding, X. Wu, G. Yu, and Y. Wang, “A gradient boosting logit model to investigate driver's stop-or-run behavior at signalized intersections using high-resolution traffic data,” Transportation Research Part C: Emerging Technologies, vol. 72, pp. 225–238, 2016.
- Q. Ge and D. Fukuda, “Updating origin-destination matrices with aggregated data of GPS traces,” Transportation Research Part C: Emerging Technologies, vol. 69, pp. 291–312, 2016.
- A. Santos, N. McGuckin, H. Y. Nakamoto, D. Gray, and S. Liss, Summary of travel trends: 2009 national household travel survey (No. FHWA-PL-ll-022), 2011.
- K. M. Currans and K. J. Clifton, “Using household travel surveys to adjust ITE trip generation rates,” Journal of Transport and Land Use, vol. 8, no. 1, pp. 85–119, 2015.
- S. An, H. Yang, J. Wang, N. Cui, and J. Cui, “Mining urban recurrent congestion evolution patterns from GPS-equipped vehicle mobility data,” Information Sciences, vol. 373, pp. 515–526, 2016.
- R. O. Arbex, B. B. Alves, and M. A. Giannotti, “Comparing accessibility in urban slums using smart card and bus GPS data,” in Proceedings of the Transportation Research Board 95th Annual Meeting (No. 16-5614), 2016.
- J. Hu, B. Park, and A. E. Parkany, “Transit signal priority with connected vehicle technology,” Transportation Research Record, vol. 2418, pp. 20–29, 2014.
- Y. Long and Z. Shen, “Discovering functional zones using bus smart card data and points of interest in Beijing,” in Geospatial Analysis to Support Urban Planning in Beijing, vol. 116 of GeoJournal Library, pp. 193–217, Springer International Publishing, Beijing, China, 2015.
- Y. Bie, X. Gong, and Z. Liu, “Time of day intervals partition for bus schedule using GPS data,” Transportation Research Part C, vol. 60, pp. 443–456, 2015.
- J. Hu, M. D. Fontaine, B. B. Park, and J. Ma, “Field evaluations of an adaptive traffic signal—using private-sector probe data,” Journal of Transportation Engineering, vol. 142, no. 1, Article ID 04015033, 2016.
- X. Ma, Y. J. Wu, Y. Wang, F. Chen, and J. Liu, “Mining smart card data for transit riders travel patterns,” Transportation Research Part C: Emerging Technologies, vol. 36, no. Part C, p. 12, 2013.
- S. Tao, D. Rohde, and J. Corcoran, “Examining the spatial-temporal dynamics of bus passenger travel behaviour using smart card data and the flow-comap,” Journal of Transport Geography, vol. 41, pp. 21–36, 2014.
- L.-M. Kieu, A. Bhaskar, and E. Chung, “A modified density-based scanning algorithm with noise for spatial travel pattern analysis from smart card AFC data,” Transportation Research Part C: Emerging Technologies, vol. 58, pp. 193–207, 2015.
- R. Paleti, P. Vovsha, D. Givon, and Y. Birotker, “Impact of individual daily travel pattern on value of time,” Transportation, vol. 42, no. 6, pp. 1003–1017, 2015.
- J. Cui, F. Liu, J. Hu, D. Janssens, G. Wets, and M. Cools, “Identifying mismatch between urban travel demand and transport network services using GPS data: a case study in the fast growing Chinese city of Harbin,” Neurocomputing, vol. 181, pp. 4–18, 2016.
- Harbin Statistics Yearbook Editorial Department, Energy Statistics Yearbook, China Statistics Press, Harbin, China, 2014.
Copyright © 2017 Shi An et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.