Journal of Advanced Transportation

Volume 2017 (2017), Article ID 6374858, 10 pages

https://doi.org/10.1155/2017/6374858

## Estimating Bus Loads and OD Flows Using Location-Stamped Farebox and Wi-Fi Signal Data

^{1}Key Laboratory of Road and Traffic Engineering of the Ministry of Education, Tongji University, Shanghai 201804, China^{2}School of Automotive Studies, Clean Energy of Automotive Engineering Research Center, Tongji University, Shanghai 201804, China

Correspondence should be addressed to Yuchuan Du

Received 2 February 2017; Revised 7 April 2017; Accepted 23 April 2017; Published 23 May 2017

Academic Editor: Wai Yuen Szeto

Copyright © 2017 Yuxiong Ji et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Electronic fareboxes integrated with Automatic Vehicle Location (AVL) systems can provide location-stamped records to infer passenger boarding at individual stops. However, bus loads and Origin-Destination (OD) flows, which are useful for route planning, design, and real-time controls, cannot be derived directly from farebox data. Recently, Wi-Fi sensors have been used to collect passenger OD flow information. But the data are insufficient to capture the variation of passenger demand across bus trips. In this study, we propose a hierarchical Bayesian model to estimate trip-level OD flow matrices and a period-level OD flow matrix using sampled OD flow data collected by Wi-Fi sensors and boarding data provided by fareboxes. Bus loads on each bus trip are derived directly from the estimated trip-level OD flow matrices. The proposed method is evaluated empirically on an operational bus route and the results demonstrate that it provides good and detailed transit route-level passenger demand information by combining farebox and Wi-Fi signal data.

#### 1. Introduction

Bus loads and OD flow matrices are commonly used to represent transit passenger demand. Bus loads, which depict the number of passengers on a bus, reflect bus crowding along a bus route. They are useful for schedule optimization and performance analyses [1, 2]. Origin-Destination (OD) flow matrices, which provide the number of passengers travelling between two specific stops, are important inputs in the design of multiple service patterns (e.g., short-turning, limited-stop) [3, 4] and real-time controls [5].

Transit agencies have increasingly deployed automated technologies to collect passenger demand information. For example, Automatic Passenger Counter (APC) systems collect boarding and alighting instances at each bus stop automatically. APC data are mainly used to estimate total ridership, average journey length, and bus loads. They could also be used to estimate OD flow matrices [6–10].

Automatic Fare Collection (AFC) systems have been deployed to reduce the costs for fare collection. The fare media used in AFC systems includes token, paper ticket, magnetic stripe card, and smartcard. The unique ID assigned to each smartcard makes it possible to track the transit movements of each smartcard holder. The smartcard data have been used to derive passenger OD flows [11–15]. However, many AFC systems for bus transit are access-based. The systems only have records of passengers entering the system and do not have records of passengers leaving the system. As a result, additional information or assumptions are needed to infer the destinations of passengers.

Traditional electronic farebox equipment are still widely used to collect transit fares. Electronic fareboxes could provide transactional records. Each record includes fare category, fare medium (e.g., cash, card, or transfer), current identifiers (i.e., operator, route, and run number), and a time stamp [5, 16]. Similar information could also be extracted from AFC systems. But passengers who pay by cash are not counted in AFC systems. When integrated with Automatic Vehicle Location (AVL) systems, the fareboxes could offer location-stamped data, which can be used to infer passenger boarding at individual bus stops [16]. Although passenger boarding instances are useful to understand ridership trends, it is insufficient for schedule, planning, and control applications. Navick and Furth proposed methods to estimate passenger miles, OD patterns, and bus loads using farebox data, assuming that the pattern of passenger alighting instances in one direction is symmetric with the pattern of passenger boarding in the opposite direction [5]. Using similar assumptions, Lu and Reddy developed algorithms to measure daily bus passenger miles using farebox data [17]. Nevertheless, the symmetry assumption tends to be invalid if passengers use different bus routes or modes for their return trips.

Electronic fareboxes are more prevalent than APC techniques in bus systems. For example, Shanghai bus system served approximately 7.3 million passengers daily in 2014 using 16,717 buses on 1,354 bus routes [18]. Almost all buses are equipped with Global Position System (GPS) based AVL systems and fareboxes consisting of smartcards POS machines and coin fare collectors. However, only 265 of them are equipped with APC systems. On some bus routes, passenger boarding at individual stops has become available in real time, thanks to the integration of the farebox and AVL systems. However, efficient and economic solutions to obtain loads and OD flows are still unavailable.

Recently, some researchers attempted to collect passenger OD flows using Wi-Fi signal sensors [19, 20]. The sensors detect the unique Wi-Fi media access control (MAC) address of each device (e.g., smartphone). By detecting the MAC address at multiple locations over time, the origin and destination stops of the corresponding device can be inferred. Nevertheless, the detected devices only represent a sample of passengers. The resulting OD flow matrices may not be sufficient to capture the variation of passenger demand across bus trips.

In this study, we develop a hierarchical Bayesian model to estimate bus trip-level OD flow matrices and a period-level OD flow matrix based on farebox and Wi-Fi signal data. The period-level OD flow matrix represents passenger travel patterns on a bus route in a period. Bus loads and average journey length on each bus trip are derived directly from the estimated trip-level OD flow matrices. The performance of the proposed method is evaluated empirically on an operational bus route.

The remainder of this paper is organized as follows. The Wi-Fi signal sensor is introduced in Section 2 and the methodology of incorporating the farebox and Wi-Fi signal data is presented in Section 3. In Section 4, we show the empirical study followed by a discussion of the results and directions for future research.

#### 2. Wi-Fi Signal Sensor

Wi-Fi signal sensors detect the signals emitted from Wi-Fi modules installed in various mobile devices. The Wi-Fi modules are defined based on IEEE 802.11 standard [21]. The basic unit for information exchange between devices is frame. The standard protocol defines multiple types of frames, such as Beacon, Acknowledgment (ACK), Data, and Probe. An access point periodically sends Beacon frames to announce its presence. When a mobile device is connected to an access point, information is exchanged via Data or ACK frames. If a mobile device is not connected to an access point, it would send Probe frames to search for available access points. The information detected by Wi-Fi signal sensors includes MAC addresses of the mobile device and the access point, frame type, time stamp, and signal strength. Signal strength is correlated with the distance between Wi-Fi sensor and mobile device.

It is worth mentioning that Apple Inc. introduced random MAC addresses during the release of the iOS 8 mobile operating system to protect user location privacy [22]. The iOS 8 system provides a random address when the device is searching for a Wi-Fi network. Nevertheless, the feature works only on iPhone 5S and iPhone 6 when the phone wakes up from a sleep mode and when the phone is not associated with a Wi-Fi network [23]. In reality, the phone is seldom in sleep mode. Many applications that use location services keep the phone awake despite the screen being switched off. In addition, a delicate combination of settings is involved to make the feature of random MAC addresses work. Therefore, unless the user turns off Wi-Fi or switches to airplane mode, he or she is still likely to be tracked.

#### 3. Methodologies

##### 3.1. A Hierarchical Bayesian Model

On a given bus route, it is assumed that farebox system, AVL system, and Wi-Fi signal sensor have been installed on buses. For each bus trip, farebox data provide stop-level boarding and Wi-Fi signal data provide sampled OD flows. Based on passenger boarding and sampled OD flows, we develop a hierarchical Bayesian model to estimate trip-level OD flow matrices and a period-level OD flow matrix. The period-level OD flow matrix is defined by an Alighting Probability (AP) matrix , where represent the probability of a passenger alighting at stop conditional on having boarded at stop . The AP matrix is assumed to be stable over bus trips in a homogeneous period (see Ji et al. [24] for the definition of homogeneous periods). The volume OD flow matrix for bus trip is denoted by , where represents the number of passengers travelling from stop to stop . Note that is unobservable in current problem, and we observed only two related quantities, the row totals of in the form of passenger boarding obtained using farebox system and a sample of observed using Wi-Fi signal sensors.

It is reasonable to assume that the estimates of ’s are correlated and the relationship could be captured through taking ’s as samples from a common population distribution with parameter , the AP matrix. Therefore, we model the current problem hierarchically. That is, the observations are modelled conditional on parameters such as the trip-level OD flow matrices ’s and these parameters are modelled conditional on other hyperparameters such as the AP matrix .

Some notations are necessary to present the hierarchical model. Let be the collection of on all bus trips. Denote as passenger boarding at stop on bus trip , as the collection of on bus trip , and as the collection of . Let be the number of sampled passengers boarding at stop and alighting at stop on bus trip , be the sampled OD flow matrix on bus trip , and be the collection of .

The trip-level volume OD flow matrix is determined by the AP matrix and passenger boarding . The associations among observed data on different bus trips are captured using a joint probability distribution for the volume OD flow matrices on different bus trips.

The hierarchical model structure could be better understood in the context of assumed distributions on variables and observed data. The distributions adopted in this study have been widely used in the literature to depict the randomness of the observed data or process. Specifically, (1) conditional on the alighting probabilities and passenger boarding at a given stop on a given bus trip, the OD flows originating from the given stop and destined to downstream stops on the given bus trip are assumed to follow multinomial distribution [7, 25]; (2) conditional on the volume OD flows on a given bus trip, the sampled OD flows are assumed to follow hypergeometric distribution [26]; (3) the prior distribution of the alighting probabilities is assumed to be Dirichlet, which is a conjugate prior distribution of the multinomial distribution [27].

Based on the above assumptions, the joint posterior likelihood of and is given bywhere represents the gamma function. represents the hyperparameter of the prior distribution of the alighting probabilities and is positive for any feasible OD pair.

The hyperparameter can be seen as the number of observations for each OD pair that we have already observed. Prior information about the probability OD matrix, such as a model derived OD matrix and historical OD flow data, is incorporated through the hyperparameter . When no prior information is available, a uniform prior distribution is taken by setting for all feasible OD pairs. That is, we assign equal probability to vector with entries sum to one.

##### 3.2. Estimates of OD Flow Matrices

Point estimates of and are valuable for various applications in practice. It is natural to choose the marginal posterior mode as the point estimate [28]. The derivations are presented in the following.

The marginal posterior likelihood of is derived from (1) by summing it over all feasible OD matrices for each bus trip:where represents the alighting probabilities at stops downstream of stop for passengers having boarded at stop . The marginal posterior likelihood of is determined by the prior and the sampled OD flows on all bus trips. Equation (2) is the density of Dirichlet distribution [27], and its mode is given bywhere represents the number of stops downstream of stop .

The marginal posterior likelihood of derived from (1) by integrating over all feasible is given by

The mode of the likelihood of (4) can be obtained by solving the following maximization problem:

The objective function in (5) equals approximately the logarithm of (4). Stirling’s approximation is applied to the logarithm of the gamma function [29]. Analyzing the first-order necessary conditions of the model yields the optimal value of :where represents the sum of the sampled OD flows originating from stop on bus trip . Equation (7) reveals that the sampled OD flows on bus trips other than trip also provide valuable information for the estimation of the OD flows on bus trip .

The trip-level passenger alighting instances, bus loads, and average journey length can be derived from the trip-level OD flow matrices. Specifically, the alighting count, , at stop on bus trip is given byBus load, , between stop and stop on bus trip is given byAnd the average journey length, , on bus trip is given bywhere is the cumulative distance of stop from the departure terminal. The numerator of (10) represents the total distance all passengers on bus trip travelled. Note that it is not necessary to know the origin and destination stops of each passenger to obtain the total distance. The denominator represents the total number of passengers travelling on bus trip .

##### 3.3. Inferring OD Flows from Wi-Fi Data

If a mobile device emits signals in a high frequency, we could infer the origin and destination stops for the device with high confidence. However, the time intervals between consecutive signals are random and could be long. Thus, we propose a probabilistic method to quantify the uncertainties of the OD pairs that the detected passenger may travel along. For illustration, we consider a given device on a general bus trip and let represent the number of signals emitted from the device and represent the time interval between the th and ()th signals. Note that the subscript indicating bus trips is omitted for convenience in the following.

It is assumed that follows a distribution with the parameter of . Conditional on Wi-Fi signals and passenger boarding , the posterior likelihood that the passenger carrying the given device boards at stop and alights at stop is given bywhere and represent the origin and destination stops of the given passenger, respectively. is the proportionality constant that satisfies the totality axiom. represents the prior distribution of . represents the probability of the given passenger originating from stop , conditional on passenger boarding . represents the probability of the given passenger destined to stop . And,where is the time interval between the first Wi-Fi signal and bus arrival time at stop . is the time interval between bus arrival time at stop and the last Wi-Fi signal.

Substituting (12) into (11), (11) can be expressed by

The proportionality constant is a value such that the summation of the posterior likelihoods over all feasible stop pairs equals one. Let represent the posterior likelihood of travelling between stop and stop for passenger . The sampled passenger flow between stop and stop is estimated by aggregating over :

For simplicity, (14) is used in (3) and (7) to estimate the period-level AP matrix and trip-level OD flow matrices. How to incorporate the uncertainty in Wi-Fi OD flows in the estimation of OD flow matrices is reserved for future research.

#### 4. Empirical Evaluation

##### 4.1. Data

The data used in this study were collected on Route Jiahuang in Jiading district of Shanghai in the period between 8 and 9 am on weekdays in June of 2016 (see Figure 1 for the route map). The bus route is 19 km long. Buses operate with the headway of 18 minutes. Along the bus route, Huangdu, Laozhai, and Fangtai are three densely populated towns and Downtown of Jiading is the center of Jiading District. This study focuses on passengers travelling from Huangdu town to North Jiading. The route in this direction has 20 bus stops and 190 feasible OD pairs.