Journal of Advanced Transportation

Volume 2017, Article ID 4341532, 18 pages

https://doi.org/10.1155/2017/4341532

## Origin-Destination Estimation Using Probe Vehicle Trajectory and Link Counts

^{1}Department of Civil, Construction & Environmental Engineering, San Diego State University, San Diego, CA, USA^{2}Baidu Online Network Technology Co., Ltd., Beijing, China^{3}University Transportation Research Center, City College of New York, New York, NY, USA

Correspondence should be addressed to Xianfeng Yang; ude.usds.liam@gnayx

Received 23 June 2016; Revised 13 October 2016; Accepted 14 November 2016; Published 23 January 2017

Academic Editor: Dongjoo Park

Copyright © 2017 Xianfeng Yang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

This paper presents two origin-destination flow estimation models using sampled GPS positions of probe vehicles and link flow counts. The first model, named as SPP model (scaled probe OD as prior OD), uses scaled probe vehicle OD matrix as prior OD matrix and applies conventional generalized least squares (GLS) framework to conduct OD correction using link counts; the second model, PRA model (probe ratio assignment), is an extension of SPP in which the observed link probe ratios are also included as additional information in the OD estimation process. For both models, the study explored a new way to construct assignment matrices directly from sampled probe trajectories to avoid sophisticated traffic assignment process. Then, for performance evaluation, a comprehensive numerical experiment was conducted using simulation dataset. The results showed that when the distribution of probe vehicle ratios is homogeneous among different OD pairs, both proposed models achieved similar degree of improvement compared with the prior OD pattern. However, under the case that the distribution of probe vehicle ratios is heterogeneous across different OD pairs, PRA model achieved more significant reduction on OD flow estimations compared with SPP model. Grounded on both theoretical derivations and empirical tests, the study provided in-depth discussions regarding the strengths and challenges of probe vehicle based OD estimation models.

#### 1. Introduction

The origin-destination flow matrix (referred to as OD matrix) is one essential input to many dynamic traffic assignment and traffic simulation systems. Conventionally, a prior OD matrix (or seed OD matrix) is estimated from a large-scale travel survey which is usually conducted every several years due to prohibitively high cost. As a result, survey of both sampling rate and update frequency of travel is constrained which sometimes leads to a biased estimation of OD matrix. To contend with such problem, researchers developed various OD correction models using link flow counts which are collected routinely by detectors. According to Cascetta and Nguyen [1], most popular OD estimation models utilizing link flow counts can be formulated as an optimization problem taking the following generalized form:where is the unknown OD flow matrix and is the prior OD matrix (or seed matrix) representing modeler’s prior belief regarding the temporal and spatial distribution of the travel pattern. and are the estimated and observed link flow vectors, respectively; () is distance metric function to measure the discrepancy between and ( and ). In the first constraint of (1), the abstract function essentially represents the traffic assignment process through which OD flows are mapped to estimated link flows . is usually referred to as assignment matrix. is the feasible domain of .

Depending on the form of metric functions, previous studies can be categorized into three groups: (1) generalized least squares models (GLS) studied by Bell [2, 3], Cascetta [4], Cascetta et al. [5], and Cascetta et al. [6]; (2) maximum likelihood models (ML) such as Spiess [7] and Cascetta and Nguyen [1]; and (3) Bayesian inference models (BI) studied by Maher [8]. In the above methods, the reference time period is divided into a sequence of uniform intervals and OD flows of all intervals are estimated simultaneously. This type of modeling approach is called simultaneous OD estimation method which is developed primarily for offline applications. Meanwhile, for online applications, sequential OD estimation technique based on Kalman filter technique received intensive research attention during the last several decades. Starting from Okutani and Stephanedes [9], subsequent studies along this direction include Nihan and Davis [10], Chang and Wu [11], Ashok and Ben-Akiva [12, 13], and Antoniou et al. [14]. The major difference between sequential and simultaneous OD estimation is that sequential estimation performs OD adjustment recursively at each interval based on measurements of current interval and estimated results from previous intervals. Consequently, sequential estimation methods are more suitable for online traffic state estimation applications due to higher computational efficiency.

Recently, the application of emerging surveillance technologies in the OD estimation field has drawn increasing concerns from the research community. Many researchers developed their own models to combine the information from conventional detectors and emerging sensing technologies. Representative studies along this line include AVI system (Dixon and Rilett [15], Zhou and Mahmassani [16], and Chen et al. [17]), vehicle plate scanning (Castillo et al. [18]), sporadic routing data (Parry and Hazelton [19]), GPS probe vehicles (Eisenman and List [20], Cao et al. [21]), floating car data (Ásmundsdóttir [22], Yang et al. [23]), and cell phone data (Sohn and Kim [24], Calabrese et al. [25], and Iqbal et al. [26]). The basic idea behind these studies is to improve OD estimation accuracy using extra information which was not available before. Conventional OD estimation model given by formulation (1) faces three major challenges: (1) the underspecified nature of OD estimation problem; (2) reliability of prior OD matrix; and (3) accuracy of assignment matrix estimation.

First of all, in most transportation networks, the number of OD pairs exceeds the number of sensors. Therefore, without specifying additional constraints, such as prior OD matrix, the OD estimation problem is an underspecified problem. And through both laboratory experiments and real world dataset, Marzano et al. [27] and Cascetta et al. [6] concluded that “a satisfactory updating, regardless of the quality of the prior estimate, can be obtained generally only when the ratio between the number of equations (independent observed link flows) and the number of unknowns (i.e., OD flows) is close to one.” Hence, increasing the amount of observation is a major motivation of exploiting those emerging data sources. A second challenge is the availability and accuracy of the prior OD matrix. Both Marzano et al. [27] and Cascetta et al. [6] demonstrated the importance of the prior OD matrix; also Frederix et al. [28] discussed the possibility of falling into local optimal solutions given an inappropriate seed OD matrix. Besides, depending on the area of analysis, the prior OD matrix with acceptable level of reliability may not even exist. The third challenge is the computation of the assignment which usually involves traffic assignment process. A benchmark study in this regard is the bilevel optimization model proposed by Yang et al. [29, 30]. Although the method can to some extent capture the impact of congestion on drivers’ route choice, some of the assumptions (such as perfect information) of user equilibrium condition may not be satisfied in reality. And to the best of our knowledge, no systematical way has been proposed to correct such potential bias with observed data due to the low observability of the assignment matrix.

In view of the above challenges and potential of GPS location data, this study discusses two GLS based estimation frameworks for OD flows. The basic idea is to take full advantage of the sampled probe vehicle trajectory data to tackle all three above-mentioned difficulties. First of all, by aggregating the origin and destination zones of each probe vehicle trace, one can obtain observed probe vehicle OD flows. Then by scaling up the probe OD matrix using certain set of penetration ratios (the proportion of probe vehicles among the entire vehicle population), a crude estimation of OD flows can be obtained. Such scaled probe OD matrix can serve as a perfect supplement or replacement of the target OD matrix. Eisenman and List [20] conducted an exploratory study in which the scaled probe OD matrix is used in conjunction with prespecified target OD matrix in the GLS formulation with externally computed assignment matrix. Van Aerde et al. [31] focused on the computation of probe penetration ratios by averaging the observed probe ratios at different sensor locations across the network. Although the model does not require target OD matrix as input, unsatisfactory estimation results were reported even under 20% probe penetration rate.

Another relevant study was conducted by Iqbal et al. [26] using mobile phone call records. The phone call records were first used to generate tower-to-tower transient OD matrix which is then scaled up using an optimization model in conjunction with microscopic simulation model. Also, Cao et al. [21] proposed a two-step framework to incorporate probe vehicle data: in the first phase, link flows without traffic sensors are estimated based on observed link speed (from probe vehicles) and precalibrated macroscopic speed-density relationship; in the second phase, a bilevel GLS estimator is formulated to estimate OD flows. Similarly, Tan et al. [32] developed a dynamic OD estimation model using Automatic Vehicle Location information, where DTA is used for obtaining traffic assignment matrix. An important issue that is overlooked by above studies is the heterogeneity of probe penetration ratios among different OD pairs. Such situation may occur especially when probe vehicles are certain type of commercial vehicles. In practice, both delivery vehicles and taxi vehicles could be used as probes. When using delivery vehicles as probes, their restricted OD distribution in a given network would inevitably be an issue and it may affect the accuracy of OD estimation. So taxis would be a better choice. However, the proportion of taxi trips between distant OD pairs may still significantly differ from the other OD pairs. Such probe ratio heterogeneity is considered explicitly in this study. Another key concept proposed in this study is to estimate the assignment matrix directly from sampled trajectory data instead of running some complex traffic assignment process. There are two benefits in doing so: first is by replacing traffic assignment with map matching and data processing, one can avoid sophisticated traffic assignment computation and parameter calibration; moreover, the model does not depend on any theoretical assumption regarding drivers’ behavior.

Based on above discussions, two models are presented in this paper. The first model, SPP model (stands for scaled probe OD matrix as prior OD), uses scaled probe vehicle OD matrix as prior OD matrix and applies conventional GLS framework to conduct OD correction using link counts; the second model, PRA model (stands for probe ratio assignment), is an extension of SPP in which the observed link probe ratios are included as additional information in order to explicitly account for heterogeneity of probe penetration ratios. The remaining part of this paper is organized as follows: Section 2 explains basic concepts and notations. Section 3 contains detailed model specifications and is organized into three subsections: the development of SPP and PRA model and computation of assignment matrices. Section 4 discusses the solution algorithm. Section 5 presents the numerical experiment. Finally, Section 6 summarizes the conclusions.

#### 2. Some Definitions and Notations

Considering a road network represented by a direct graph where is the node set and is the link set, the analysis period is divided into uniform intervals. Each interval is called a demand interval. Let denote the origin-destination pair set; then the travel demand pattern of the network during the analysis period is represented by the OD flow matrix and . Let denote a collection of links installed with sensors; is a subset of . , are observed traffic flow counts of all vehicles at sensor locations.

It is assumed that there are two types of vehicles travel in the network: probe vehicles and regular vehicles. Each probe vehicle is able to actively report its position in the form of GPS coordinates. And through map matching algorithm, those GPS coordinates are transformed to corresponding locations in the network. In reality, the actual locations of probe vehicles usually cannot be determined fully due to measurement error of GPS. Since this study emphasizes more on the theoretical aspects, it is assumed that the exact location information is available whenever a probe vehicle reports its GPS coordinate.

The numbers of probe vehicles traveling between different OD pairs within each interval are called probe vehicle OD flows (simplified as probe OD flows) and are denoted by , and ; for each OD pair, the proportion of probe vehicle in the total vehicle population within the same interval is called OD probe vehicle penetration ratio (simplified as OD probe ratio) and is denoted by , and . Note that OD probe ratios are both time- and OD-dependent.

The observed numbers of probe vehicles passing each sensor location are called probe vehicle link flows (simplified as probe link flows) and are represented by and . The ratio of probe link flow to corresponding link traffic flow during each interval is called link probe vehicle penetration ratio (simplified as link probe ratio) and is represented by , and .

#### 3. Model Specifications

This section introduced two different models for OD estimations. It is noticeable that traffic congestion may bring great challenge to the estimation process. For simplicity, this study does not account for the dynamic routings issue which may have happened in practice.

##### 3.1. Scaled Probe OD Matrix as Prior Matrix (SPP) Method

SPP method consists of two steps: firstly, the prior OD matrix is estimated by scaling up the probe OD flows with corresponding OD probe ratios estimated by averaging the link probe ratios across the network; then OD flows are solved correcting the prior OD matrix with a GLS formulation.

To make the paper self-contained, the “direct scaling” method proposed by Van Aerde et al. [31] is summarized briefly in this section. To estimate OD probe ratios, the average link probe ratios across the entire network within each interval is computed by the following expression:where are the observed link probe flows, are the observed link flows, and is the common value of . Equation (2) computes the average ratio of the total number of probe vehicles to the total number of vehicles observed across the entire network during one interval. Therefore expression (2) implicitly assumes that OD probe ratios are homogeneous among all OD pairs. Then are used to compute the prior OD flows using the following equation:where are the observed probe OD flows.

It is noticeable that the prior OD flows themselves are an estimator of OD flows. Such method is referred to as direct scaling model (DS) in this study. Since the prior OD matrix is only a crude estimation, it is then adjusted using the following GLS formulation: SPP model:where are unknown and prior OD flows, respectively; are estimated and observed link flows, respectively; are, respectively, variances of and ; and is the maximum percentage change of OD flows between two consecutive intervals.

Constrain (4b) essentially represents the flow assignment process. are the proportion (or probability) of vehicles departed during th interval traveling between OD pair to pass link during th interval; is referred to as flow assignment fractions in this study. The estimated link flows, , are then expressed as the weighted sum of all OD flows departed before interval . is the maximum travel time among all journeys converted to the number of demand intervals. In this study, the flow assignment fractions are computed directly from probe vehicle trajectories.

The basic idea of SPP model is to reduce the estimation bias of the direct scaling method using sensor count information. Let and be the numbers of OD pairs and traffic sensors, respectively; then the total number of unknowns to independent observations ratio of SPP model is . Note that the target OD flows are obtained from GPS probe vehicle data; no additional target OD matrix from other sources is required by model.

Finally, formulations (4a)~(4d) constitute a nonlinear optimization problem with convex objective function and linear constraints. Therefore the global optimal solution exists and any solution satisfying local optimality condition is also the global optimal solution of the problem. A gradient based searching algorithm as discussed in Appendix A is adopted to solve SPP model.

##### 3.2. Probe Ratios Assignment Model (PRA)

The second formulation proposed in this study is called the probe ratio assignment model (referred to as PRA). The underlying idea of PRA is to explicitly consider the correlation between OD probe ratios and observed link probe ratios.

Thus, there exist some function that links OD probe ratios and estimated link probe ratios :

The structure of is similar to that of flow assignment matrix ; the difference is that represents the assignment of probe vehicle ratios instead of flows. In this study, is called probe ratio assignment matrix. Based on the discussion, we can now define the following equation:

Here are probe ratio assignment fractions. represents the contribution of the probe ratio during th interval between OD pairs on the probe ratio of link during th interval. Note that in above equation, the OD probe ratios are estimated by taking the ratio between , the observed OD flows,and , the OD flows we want to estimate. Let be the observed link probe ratios at all sensor locations. Then PRA model can be obtained by extending SPP formulation by considering and incorporating equation (6):

In the above formulations, please note that is the variance of . Other notations are introduced previously.

The objective function of PRA model (7a) adds a third term to that of SPP (4a) which is the sum of weighted distances between estimated and observed link probe ratios. And the first and second terms of (7a) are identical to those of (4a). Constraints (7b) and (7c) represent the assignment of OD flows and OD probe ratios.

The primary feature of PRA as given by (7a)~(7e) is the utilization of a new set of field observations: the observed link probe ratios which is the combined information of flow counts and probe vehicle trajectories. Also the OD probe ratio assignment matrix is a new concept introduced in this study. Let and be the number of OD flows and traffic sensors, respectively; then the total number of unknowns to independent observation ratio of PRA model is which is lower than that of SPP model. Essentially, each sensor provides two observations instead of only one after considering the probe vehicle trajectories: the first is the flow count of the entire vehicle population and the second is the proportion of probe vehicles passing the sensor location. The computation of is discussed in the next subsection.

Note that the PRA formulation is no longer a convex optimization problem due to the existence of (7c); therefore solving the problem using gradient based searching algorithm faces the possibility of being trapped in local optimums. The solution algorithm of PRA model is discussed in Section 4.

##### 3.3. Computation of Assignment Matrices Using Probe Vehicle Trajectories

In this study, both flow assignment fractions and probe ratio assignment fractions are estimated through analyzing the GPS trajectories of probe vehicles. Compared with conventional methods (i.e., traffic assignment model), the main feature of the proposed approach is that it replaced dynamic traffic assignment process with map matching procedure of GPS coordinates.

The underlying concept is relatively straightforward. The entire vehicle population is divided into two groups: probe vehicles and regular vehicles. And all the assignment fractions of probe vehicle population can be obtained from their time-dependent location information. Then one can use those to approximate the assignment fractions of the entire vehicle population assuming that probe vehicles are randomly sampled.

Defining as the observed number of probe vehicles traveling between OD pair and departed during interval passed sensor during interval , then the flow assignment fractions of probe vehicles can be computed as

In the above equation, the denominator is essentially the total number of probe vehicles departed during interval between OD pairs . Therefore the fraction inside the parenthesis is the proportion of vehicles passing link after intervals counted from their departure time interval. Note that probe vehicles departed at different time intervals are aggregated together to obtain a single estimation of the assignment matrix. The underlying assumption is that there exists some time period during which drivers’ route choice behavior and network traffic condition remain approximately stable, and one can use a single flow assignment matrix (or OD probe ratio assignment matrix) to represent the correlation between OD flows and link flows (or OD probe ratios and link probe ratios). The purpose is to increase the estimation accuracy of assignment matrix by aggregating probe vehicles from multiple demand intervals. Extending (8) into a time-dependent form is also straightforward. Essentially probe vehicles are grouped according to their departure time. Consider the fact that the entire period of analysis is divided into assignment intervals. Each assignment interval contains demand intervals. Then let , be the assignment matrices; then can be estimated based on probe data collected during assignment interval usingwhere is the th demand interval during th assignment interval. It is noticeable that traffic congestion level on the network would directly affect the value of .

The probe ratio assignment fractions can be computed using the following equation:

Similar to (9), the estimation of probe ratio assignment fractions in (10) can be approximated by using the observed number of probe vehicles departed during interval passed sensor during interval and sensor detected link flows.

This section provides an illustrative example to show how (9) and (10) work. Consider a hypothetical network consisting of four nodes and five links (shown in Figure 1(a)), among all the nodes, nodes 1, 2, and 3 are demand generation nodes and node 4 is a demand absorption node. Therefore there are three OD pairs, i.e., 1-4, 2-4, and 3-4. It is assumed that traffic sensors are installed at the middle of each link. A total of 12 probe vehicles departed during one demand interval (therefore ); the distribution of the three OD pairs is 6, 4, and 2 vehicles, respectively.