Research Article  Open Access
Xianfeng Yang, Yang Lu, Wei Hao, "OriginDestination Estimation Using Probe Vehicle Trajectory and Link Counts", Journal of Advanced Transportation, vol. 2017, Article ID 4341532, 18 pages, 2017. https://doi.org/10.1155/2017/4341532
OriginDestination Estimation Using Probe Vehicle Trajectory and Link Counts
Abstract
This paper presents two origindestination flow estimation models using sampled GPS positions of probe vehicles and link flow counts. The first model, named as SPP model (scaled probe OD as prior OD), uses scaled probe vehicle OD matrix as prior OD matrix and applies conventional generalized least squares (GLS) framework to conduct OD correction using link counts; the second model, PRA model (probe ratio assignment), is an extension of SPP in which the observed link probe ratios are also included as additional information in the OD estimation process. For both models, the study explored a new way to construct assignment matrices directly from sampled probe trajectories to avoid sophisticated traffic assignment process. Then, for performance evaluation, a comprehensive numerical experiment was conducted using simulation dataset. The results showed that when the distribution of probe vehicle ratios is homogeneous among different OD pairs, both proposed models achieved similar degree of improvement compared with the prior OD pattern. However, under the case that the distribution of probe vehicle ratios is heterogeneous across different OD pairs, PRA model achieved more significant reduction on OD flow estimations compared with SPP model. Grounded on both theoretical derivations and empirical tests, the study provided indepth discussions regarding the strengths and challenges of probe vehicle based OD estimation models.
1. Introduction
The origindestination flow matrix (referred to as OD matrix) is one essential input to many dynamic traffic assignment and traffic simulation systems. Conventionally, a prior OD matrix (or seed OD matrix) is estimated from a largescale travel survey which is usually conducted every several years due to prohibitively high cost. As a result, survey of both sampling rate and update frequency of travel is constrained which sometimes leads to a biased estimation of OD matrix. To contend with such problem, researchers developed various OD correction models using link flow counts which are collected routinely by detectors. According to Cascetta and Nguyen [1], most popular OD estimation models utilizing link flow counts can be formulated as an optimization problem taking the following generalized form:where is the unknown OD flow matrix and is the prior OD matrix (or seed matrix) representing modeler’s prior belief regarding the temporal and spatial distribution of the travel pattern. and are the estimated and observed link flow vectors, respectively; () is distance metric function to measure the discrepancy between and ( and ). In the first constraint of (1), the abstract function essentially represents the traffic assignment process through which OD flows are mapped to estimated link flows . is usually referred to as assignment matrix. is the feasible domain of .
Depending on the form of metric functions, previous studies can be categorized into three groups: (1) generalized least squares models (GLS) studied by Bell [2, 3], Cascetta [4], Cascetta et al. [5], and Cascetta et al. [6]; (2) maximum likelihood models (ML) such as Spiess [7] and Cascetta and Nguyen [1]; and (3) Bayesian inference models (BI) studied by Maher [8]. In the above methods, the reference time period is divided into a sequence of uniform intervals and OD flows of all intervals are estimated simultaneously. This type of modeling approach is called simultaneous OD estimation method which is developed primarily for offline applications. Meanwhile, for online applications, sequential OD estimation technique based on Kalman filter technique received intensive research attention during the last several decades. Starting from Okutani and Stephanedes [9], subsequent studies along this direction include Nihan and Davis [10], Chang and Wu [11], Ashok and BenAkiva [12, 13], and Antoniou et al. [14]. The major difference between sequential and simultaneous OD estimation is that sequential estimation performs OD adjustment recursively at each interval based on measurements of current interval and estimated results from previous intervals. Consequently, sequential estimation methods are more suitable for online traffic state estimation applications due to higher computational efficiency.
Recently, the application of emerging surveillance technologies in the OD estimation field has drawn increasing concerns from the research community. Many researchers developed their own models to combine the information from conventional detectors and emerging sensing technologies. Representative studies along this line include AVI system (Dixon and Rilett [15], Zhou and Mahmassani [16], and Chen et al. [17]), vehicle plate scanning (Castillo et al. [18]), sporadic routing data (Parry and Hazelton [19]), GPS probe vehicles (Eisenman and List [20], Cao et al. [21]), floating car data (Ásmundsdóttir [22], Yang et al. [23]), and cell phone data (Sohn and Kim [24], Calabrese et al. [25], and Iqbal et al. [26]). The basic idea behind these studies is to improve OD estimation accuracy using extra information which was not available before. Conventional OD estimation model given by formulation (1) faces three major challenges: (1) the underspecified nature of OD estimation problem; (2) reliability of prior OD matrix; and (3) accuracy of assignment matrix estimation.
First of all, in most transportation networks, the number of OD pairs exceeds the number of sensors. Therefore, without specifying additional constraints, such as prior OD matrix, the OD estimation problem is an underspecified problem. And through both laboratory experiments and real world dataset, Marzano et al. [27] and Cascetta et al. [6] concluded that “a satisfactory updating, regardless of the quality of the prior estimate, can be obtained generally only when the ratio between the number of equations (independent observed link flows) and the number of unknowns (i.e., OD flows) is close to one.” Hence, increasing the amount of observation is a major motivation of exploiting those emerging data sources. A second challenge is the availability and accuracy of the prior OD matrix. Both Marzano et al. [27] and Cascetta et al. [6] demonstrated the importance of the prior OD matrix; also Frederix et al. [28] discussed the possibility of falling into local optimal solutions given an inappropriate seed OD matrix. Besides, depending on the area of analysis, the prior OD matrix with acceptable level of reliability may not even exist. The third challenge is the computation of the assignment which usually involves traffic assignment process. A benchmark study in this regard is the bilevel optimization model proposed by Yang et al. [29, 30]. Although the method can to some extent capture the impact of congestion on drivers’ route choice, some of the assumptions (such as perfect information) of user equilibrium condition may not be satisfied in reality. And to the best of our knowledge, no systematical way has been proposed to correct such potential bias with observed data due to the low observability of the assignment matrix.
In view of the above challenges and potential of GPS location data, this study discusses two GLS based estimation frameworks for OD flows. The basic idea is to take full advantage of the sampled probe vehicle trajectory data to tackle all three abovementioned difficulties. First of all, by aggregating the origin and destination zones of each probe vehicle trace, one can obtain observed probe vehicle OD flows. Then by scaling up the probe OD matrix using certain set of penetration ratios (the proportion of probe vehicles among the entire vehicle population), a crude estimation of OD flows can be obtained. Such scaled probe OD matrix can serve as a perfect supplement or replacement of the target OD matrix. Eisenman and List [20] conducted an exploratory study in which the scaled probe OD matrix is used in conjunction with prespecified target OD matrix in the GLS formulation with externally computed assignment matrix. Van Aerde et al. [31] focused on the computation of probe penetration ratios by averaging the observed probe ratios at different sensor locations across the network. Although the model does not require target OD matrix as input, unsatisfactory estimation results were reported even under 20% probe penetration rate.
Another relevant study was conducted by Iqbal et al. [26] using mobile phone call records. The phone call records were first used to generate towertotower transient OD matrix which is then scaled up using an optimization model in conjunction with microscopic simulation model. Also, Cao et al. [21] proposed a twostep framework to incorporate probe vehicle data: in the first phase, link flows without traffic sensors are estimated based on observed link speed (from probe vehicles) and precalibrated macroscopic speeddensity relationship; in the second phase, a bilevel GLS estimator is formulated to estimate OD flows. Similarly, Tan et al. [32] developed a dynamic OD estimation model using Automatic Vehicle Location information, where DTA is used for obtaining traffic assignment matrix. An important issue that is overlooked by above studies is the heterogeneity of probe penetration ratios among different OD pairs. Such situation may occur especially when probe vehicles are certain type of commercial vehicles. In practice, both delivery vehicles and taxi vehicles could be used as probes. When using delivery vehicles as probes, their restricted OD distribution in a given network would inevitably be an issue and it may affect the accuracy of OD estimation. So taxis would be a better choice. However, the proportion of taxi trips between distant OD pairs may still significantly differ from the other OD pairs. Such probe ratio heterogeneity is considered explicitly in this study. Another key concept proposed in this study is to estimate the assignment matrix directly from sampled trajectory data instead of running some complex traffic assignment process. There are two benefits in doing so: first is by replacing traffic assignment with map matching and data processing, one can avoid sophisticated traffic assignment computation and parameter calibration; moreover, the model does not depend on any theoretical assumption regarding drivers’ behavior.
Based on above discussions, two models are presented in this paper. The first model, SPP model (stands for scaled probe OD matrix as prior OD), uses scaled probe vehicle OD matrix as prior OD matrix and applies conventional GLS framework to conduct OD correction using link counts; the second model, PRA model (stands for probe ratio assignment), is an extension of SPP in which the observed link probe ratios are included as additional information in order to explicitly account for heterogeneity of probe penetration ratios. The remaining part of this paper is organized as follows: Section 2 explains basic concepts and notations. Section 3 contains detailed model specifications and is organized into three subsections: the development of SPP and PRA model and computation of assignment matrices. Section 4 discusses the solution algorithm. Section 5 presents the numerical experiment. Finally, Section 6 summarizes the conclusions.
2. Some Definitions and Notations
Considering a road network represented by a direct graph where is the node set and is the link set, the analysis period is divided into uniform intervals. Each interval is called a demand interval. Let denote the origindestination pair set; then the travel demand pattern of the network during the analysis period is represented by the OD flow matrix and . Let denote a collection of links installed with sensors; is a subset of . , are observed traffic flow counts of all vehicles at sensor locations.
It is assumed that there are two types of vehicles travel in the network: probe vehicles and regular vehicles. Each probe vehicle is able to actively report its position in the form of GPS coordinates. And through map matching algorithm, those GPS coordinates are transformed to corresponding locations in the network. In reality, the actual locations of probe vehicles usually cannot be determined fully due to measurement error of GPS. Since this study emphasizes more on the theoretical aspects, it is assumed that the exact location information is available whenever a probe vehicle reports its GPS coordinate.
The numbers of probe vehicles traveling between different OD pairs within each interval are called probe vehicle OD flows (simplified as probe OD flows) and are denoted by , and ; for each OD pair, the proportion of probe vehicle in the total vehicle population within the same interval is called OD probe vehicle penetration ratio (simplified as OD probe ratio) and is denoted by , and . Note that OD probe ratios are both time and ODdependent.
The observed numbers of probe vehicles passing each sensor location are called probe vehicle link flows (simplified as probe link flows) and are represented by and . The ratio of probe link flow to corresponding link traffic flow during each interval is called link probe vehicle penetration ratio (simplified as link probe ratio) and is represented by , and .
3. Model Specifications
This section introduced two different models for OD estimations. It is noticeable that traffic congestion may bring great challenge to the estimation process. For simplicity, this study does not account for the dynamic routings issue which may have happened in practice.
3.1. Scaled Probe OD Matrix as Prior Matrix (SPP) Method
SPP method consists of two steps: firstly, the prior OD matrix is estimated by scaling up the probe OD flows with corresponding OD probe ratios estimated by averaging the link probe ratios across the network; then OD flows are solved correcting the prior OD matrix with a GLS formulation.
To make the paper selfcontained, the “direct scaling” method proposed by Van Aerde et al. [31] is summarized briefly in this section. To estimate OD probe ratios, the average link probe ratios across the entire network within each interval is computed by the following expression:where are the observed link probe flows, are the observed link flows, and is the common value of . Equation (2) computes the average ratio of the total number of probe vehicles to the total number of vehicles observed across the entire network during one interval. Therefore expression (2) implicitly assumes that OD probe ratios are homogeneous among all OD pairs. Then are used to compute the prior OD flows using the following equation:where are the observed probe OD flows.
It is noticeable that the prior OD flows themselves are an estimator of OD flows. Such method is referred to as direct scaling model (DS) in this study. Since the prior OD matrix is only a crude estimation, it is then adjusted using the following GLS formulation: SPP model:where are unknown and prior OD flows, respectively; are estimated and observed link flows, respectively; are, respectively, variances of and ; and is the maximum percentage change of OD flows between two consecutive intervals.
Constrain (4b) essentially represents the flow assignment process. are the proportion (or probability) of vehicles departed during th interval traveling between OD pair to pass link during th interval; is referred to as flow assignment fractions in this study. The estimated link flows, , are then expressed as the weighted sum of all OD flows departed before interval . is the maximum travel time among all journeys converted to the number of demand intervals. In this study, the flow assignment fractions are computed directly from probe vehicle trajectories.
The basic idea of SPP model is to reduce the estimation bias of the direct scaling method using sensor count information. Let and be the numbers of OD pairs and traffic sensors, respectively; then the total number of unknowns to independent observations ratio of SPP model is . Note that the target OD flows are obtained from GPS probe vehicle data; no additional target OD matrix from other sources is required by model.
Finally, formulations (4a)~(4d) constitute a nonlinear optimization problem with convex objective function and linear constraints. Therefore the global optimal solution exists and any solution satisfying local optimality condition is also the global optimal solution of the problem. A gradient based searching algorithm as discussed in Appendix A is adopted to solve SPP model.
3.2. Probe Ratios Assignment Model (PRA)
The second formulation proposed in this study is called the probe ratio assignment model (referred to as PRA). The underlying idea of PRA is to explicitly consider the correlation between OD probe ratios and observed link probe ratios.
Thus, there exist some function that links OD probe ratios and estimated link probe ratios :
The structure of is similar to that of flow assignment matrix ; the difference is that represents the assignment of probe vehicle ratios instead of flows. In this study, is called probe ratio assignment matrix. Based on the discussion, we can now define the following equation:
Here are probe ratio assignment fractions. represents the contribution of the probe ratio during th interval between OD pairs on the probe ratio of link during th interval. Note that in above equation, the OD probe ratios are estimated by taking the ratio between , the observed OD flows,and , the OD flows we want to estimate. Let be the observed link probe ratios at all sensor locations. Then PRA model can be obtained by extending SPP formulation by considering and incorporating equation (6):
In the above formulations, please note that is the variance of . Other notations are introduced previously.
The objective function of PRA model (7a) adds a third term to that of SPP (4a) which is the sum of weighted distances between estimated and observed link probe ratios. And the first and second terms of (7a) are identical to those of (4a). Constraints (7b) and (7c) represent the assignment of OD flows and OD probe ratios.
The primary feature of PRA as given by (7a)~(7e) is the utilization of a new set of field observations: the observed link probe ratios which is the combined information of flow counts and probe vehicle trajectories. Also the OD probe ratio assignment matrix is a new concept introduced in this study. Let and be the number of OD flows and traffic sensors, respectively; then the total number of unknowns to independent observation ratio of PRA model is which is lower than that of SPP model. Essentially, each sensor provides two observations instead of only one after considering the probe vehicle trajectories: the first is the flow count of the entire vehicle population and the second is the proportion of probe vehicles passing the sensor location. The computation of is discussed in the next subsection.
Note that the PRA formulation is no longer a convex optimization problem due to the existence of (7c); therefore solving the problem using gradient based searching algorithm faces the possibility of being trapped in local optimums. The solution algorithm of PRA model is discussed in Section 4.
3.3. Computation of Assignment Matrices Using Probe Vehicle Trajectories
In this study, both flow assignment fractions and probe ratio assignment fractions are estimated through analyzing the GPS trajectories of probe vehicles. Compared with conventional methods (i.e., traffic assignment model), the main feature of the proposed approach is that it replaced dynamic traffic assignment process with map matching procedure of GPS coordinates.
The underlying concept is relatively straightforward. The entire vehicle population is divided into two groups: probe vehicles and regular vehicles. And all the assignment fractions of probe vehicle population can be obtained from their timedependent location information. Then one can use those to approximate the assignment fractions of the entire vehicle population assuming that probe vehicles are randomly sampled.
Defining as the observed number of probe vehicles traveling between OD pair and departed during interval passed sensor during interval , then the flow assignment fractions of probe vehicles can be computed as
In the above equation, the denominator is essentially the total number of probe vehicles departed during interval between OD pairs . Therefore the fraction inside the parenthesis is the proportion of vehicles passing link after intervals counted from their departure time interval. Note that probe vehicles departed at different time intervals are aggregated together to obtain a single estimation of the assignment matrix. The underlying assumption is that there exists some time period during which drivers’ route choice behavior and network traffic condition remain approximately stable, and one can use a single flow assignment matrix (or OD probe ratio assignment matrix) to represent the correlation between OD flows and link flows (or OD probe ratios and link probe ratios). The purpose is to increase the estimation accuracy of assignment matrix by aggregating probe vehicles from multiple demand intervals. Extending (8) into a timedependent form is also straightforward. Essentially probe vehicles are grouped according to their departure time. Consider the fact that the entire period of analysis is divided into assignment intervals. Each assignment interval contains demand intervals. Then let , be the assignment matrices; then can be estimated based on probe data collected during assignment interval usingwhere is the th demand interval during th assignment interval. It is noticeable that traffic congestion level on the network would directly affect the value of .
The probe ratio assignment fractions can be computed using the following equation:
Similar to (9), the estimation of probe ratio assignment fractions in (10) can be approximated by using the observed number of probe vehicles departed during interval passed sensor during interval and sensor detected link flows.
This section provides an illustrative example to show how (9) and (10) work. Consider a hypothetical network consisting of four nodes and five links (shown in Figure 1(a)), among all the nodes, nodes 1, 2, and 3 are demand generation nodes and node 4 is a demand absorption node. Therefore there are three OD pairs, i.e., 14, 24, and 34. It is assumed that traffic sensors are installed at the middle of each link. A total of 12 probe vehicles departed during one demand interval (therefore ); the distribution of the three OD pairs is 6, 4, and 2 vehicles, respectively.
(a) Network topology and sensor locations
(b) Initial network condition
Note that the initial locations of all probe vehicles are plotted in Figure 1(b).
Suppose that all vehicles finished their trip within the following four time intervals and each vehicle is observed at least once during each interval. The positions of all vehicles at the end of each subsequent interval are visualized by Figures 2(a)~2(d). Note that vehicles traveling between different OD pairs are painted with different patterns.
(a) Probe vehicle locations at the end of interval 1
(b) Probe vehicle locations at the end of interval 2
(c) Probe vehicle locations at the end of interval 3
(d) Probe vehicle locations at the end of interval 4
According to observed vehicle positions at the end of each interval, one can identify, from individual vehicle’s viewpoint, the sensor location passed by each probe vehicle during each interval which is summarized in Table 1.

According to Table 1, are computed and summarized in Table 7. Then the flow assignment fractions and probe ratio assignment fractions are computed according to (9) and (10). The results are summarized in Table 8.
4. Computation Procedures
This section presents the numerical solution algorithm for the proposed models. Our objective is a nonlinear optimization problem with equality and inequality constraints. Mathematically, such optimization problem takes the following general form:
In formulation (11), the objective function f is an dimensional scalar function. Since is continuous and differentiable in this study, the gradient of can be obtained analytically. Also there are linear equality constraints and linear inequality constraints. The gradient vector of is denoted by :
The solution algorithm is summarized as follows.
Step 1 (initialization). Determine an initial feasible solution that satisfies all constraints. Let be the initial feasible solution and set the current iteration ; then enter the main optimization loop consisting of Steps 2~6.
Step 2 (gradient computation). Based on , compute the gradient of objective function . The specific formulas used for gradient computation of SPP and PRA model are given by equations (A.3) and (B.3) in Appendices A and B, respectively.
Step 3 (optimal search direction calculation). Based on the current gradient, compute an optimal search direction considering all constraints. Let be the search direction of current iteration; is computed by solving the following LP model:In the above LP problem, is the coefficient vector of th constraint, ; and are the equality constraint set and bounded inequality constraint set.
Step 4 (optimal search step length calculation). According to the optimal search direction , perform the following line search to determine optimal step length :This algorithm follows an iterative procedure and it is quite similar to traditional FrankWolfe algorithm.
Step 5 (update ). .
Step 6 (check convergence criteria). If , then terminate the computation; otherwise repeat the process from Step 2 to Step 6.
5. Numerical Examples
5.1. Simulation Setup and Results
To evaluate effectiveness of the proposed models, numerical experiments are conducted using VISSIM as a laboratory experiment tool. The use of synthetic dataset is due to the lack of real world dataset. A road network in the northern part of Maryland State (near I495 beltway) is selected as the test site. The network consists of 28 nodes and 74 links. The bird view map of the target area and the network topology constructed in VISSIM are shown by Figures 3 and 4.
The simulation period is set to 3 hours which is divided into 18 demand intervals (each interval is 10 minutes). To simplify the simulation process, 39 major OD pairs are selected in the simulation. As shown in Figure 4, 10 out of 38 links are installed with traffic sensors for traffic data collection. Table 10 summarizes the origin and destination nodes of each OD pair along with its timedependent demand volumes; Table 11 summarizes the route choice probability of all paths between all OD pairs. For convenience of study, between each OD pair shown in Table 11, we only selected those paths whose lengths are obviously shorter than the others. Given the OD flows and route choice in the simulation network, it shall be noted that congestion v/c ratios on all links are below 0.4. By running the simulation network in VISSIM, we collected the GPS trajectory of each vehicle and traffic flow rate on those links with sensors. Notably, the GPS trajectory of each vehicle can directly yield the ground truth of OD flows. For model evaluations and comparisons, this study uses a part of trajectory dataset (based on the preset probe ratio in each scenario) as model inputs which represent the probe vehicles.
To reveal the model property under different network conditions, two scenarios are simulated. The first scenario (referred to as scenario A) represents the situation in which the probe vehicle penetration ratios are approximately homogeneous among different OD pairs; and the second scenario (referred to as scenario B) represents the heterogeneous probe ratios across multiple OD case. For both scenarios, the average probe vehicle penetration ratio is set as 15%. The probe ratios across different OD flows in scenarios B rages from 5% to 30%.
For each model (DS, SPP, and PRA), the estimation accuracy of four sets of parameters is examined: (1) OD flows; (2) OD probe ratios; (3) link flow counts at sensor locations; and (4) link probe vehicle ratios at sensor locations. The estimated values are compared with ground truth values extracted from the simulator. The estimated qualities are then quantified by the following five performance indicators: MSE (Mean Square Error), RMSE (Root Mean Square Error), MAPE (Mean Absolute Percentage Error), MSPE (Mean Square Percentage Error), and RMSPE (Root Mean Square Percentage Error). The definitions of performance indicators are summarized by the following equation.
Performance Indicators
and are estimated and ground truth values. is the number of estimates.
Tables 2~5 summarize all the performance indicators for OD flows, OD probe ratios, link flows, and link probe ratios given by DS, SPP, and PRA models under scenarios A and B. Also using SPP model as the benchmark, the improvement of PRA in estimations is also presented. By using the proposed solution algorithm, the estimations with all models can be completed within 3 minutes.

5.2. Results Interpretation
Figures 5, 6, and 7 show the comparison between estimated and ground truth values of different parameters displayed in the form of scatter plots.
Several important observations made from model outputs are discussed in this section. First of all, DS model tends to produce biased OD estimation in both scenarios which can be observed from Table 2 and Figures 5(a) and 5(d). Particularly when the probe vehicle ratios are not homogeneous across different OD pairs, the percentage error of DS model reached as high as 82% in MAPE and 125% in RMSPE. Therefore, directly scaled OD flows based solely on observed link probe ratios are not reliable estimator of OD flows and require additional adjustment.
Secondly, in scenario A, both SPP and PRA models offered significantly higher estimation quality comparing with the DS model. Using the SPP as the benchmark, in scenario A, the PRA model reduced the MAPE and MSPE of the OD matrix, respectively, by 1.3% and 7.5%. Based on the performance indicators reported by the tables, one can argue that when the probe ratios are homogeneous across OD pairs, the two models have similar degree of accuracy.
As for scenario B where the probe ratios are not homogeneous, results showed that PRA model is more effective compared with SPP model. According to Table 2, PRA model reduced the MAPE and MSPE of SPP model by 19.8% and 36.5%, respectively. Therefore it indicates that PRA model offers additional correction capability compared with SPP model. Such conclusion can be also inspected visually through Figures 5(e) and 5(f). Similar results can also be found in Tables 3~4. In Table 5, both SPP and PRA can outperform DS in estimating link probe ratios while the MES and RMSE showed that the two models are almost identical. However, in terms of MAPE, MSPE, and RMSPE, the PRA model can yield significant improvement which validates the model’s effectiveness.



Another important finding of the experiment is that it is generally more challenging to correct link probe ratios comparing with flow counts. From Figures 6(b), 6(c), 6(e), and 6(f), one can observe that the regression line between estimated and observed link flow counts is very close to 45degree line for both models, while, in Figures 7(b), 7(c), 7(e), and 7(f), the dispersions of link probe ratio scatter plots are obviously larger. One viable explanation is that the computation of the probe ratio assignment matrix is more difficult than flow assignment matrix and a better correction of link probe ratios can be achieved only when the probe ratio assignment matrix is more precisely specified.
To summarize, as a direct enhancement of the DS method, SPP is very effective when the distribution of OD probe ratios is homogeneous; on the other hand, the PRA model, by considering additional link observations, generally outperforms SPP model when the OD probe ratios vary significantly among different OD pairs. However, compared with SPP and DS models, the PRA method also introduces several additional complexities. To maximize the benefit of PRA, one needs to correctly specify the probe ratio assignment matrix and link probe ratio variances and also take some care on the solution algorithm in order to avoid being trapped in local optimal points.
To further evaluate the effectiveness of PRA models in estimating OD flows with different probe penetration ratios, this study conducted a sensitivity analysis, which ranges the ratio from 5% to 30%. As shown in Table 6, the estimation accuracy of PRA is quite sensitive to the average probe ratios. Higher probe ratios will result in more accurate OD flow estimation. In addition, it is noticeable that when the probe ratio drops to 5%, the estimation results from PRA are not sufficiently reliable. Under such condition, increasing the number of link sensors would help improve the model’s performance.



6. Conclusions and Future Research
This paper presented the development of two offline OD estimation models using probe vehicle data: the SPP and PRA models. Both mathematical formulations and solution algorithm are discussed in detail. Also the study successfully explored the possibility of computing assignment matrices directly from GPS trajectories to avoid complex traffic assignment process.
Then, through a comprehensive numerical experiment, the performances of proposed models are analyzed. It is shown that the distribution of OD probe ratios can affect the correction power of different models when the probe vehicle data is used. When the OD probe ratios are approximately homogeneous across different OD pairs, both SPP and PRA performed equally well by reducing about half of the relative error of DS method; however when the OD probe ratios are nonhomogeneous, then PRA model outperformed SPP model to some extent. The results also implied that when the OD probe ratios are heterogeneous, incorporating observed link probe ratios into the objective function can improve the overall estimation accuracy. However, unlike link flows, the correction of link probe ratios turns out to be much more challenging and one needs to carefully specify the correlation between OD and link probe ratios which is the probe ratio assignment matrix in this study.
As for future research, the proposed models can be integrated with the quasidynamic approach proposed by Cascetta et al. [6] to further reduce the unknowntoobservation ratio; also sensitivity analysis can be performed to investigate the impact of different input parameters on the final outcomes; moreover numerical experiment using real world dataset is another important future work in order to better assess the model performance. In addition, one can note that the probe OD ratios are used as the approximates of the flow assignment fractions, which may not be true in practice. Hence, it is also critical to develop an advanced model to overcome this limitation and apply some stateoftheart method to estimate the traffic assignment matrix.
Appendix
A. Gradient Computation of SPP Model
According to the objective function given by (4a), define the following functions:Essentially, original objective function is divided into two parts; therefore .
Taking derivatives of and with respect to , one can obtain the following expressions:
Note that the second derivative is computed based on the correlation between and given by constraint (4b). To summarize,
B. Gradient Computation of PRA Model
According to the objective function given by (6), define the following three functions:
Essentially, the original objective function is divided into three parts; therefore . Taking derivatives of , , and with respect to , one can obtain the following expressions:Therefore, to summarize,
C. Assignment Fractions of Example



D. OD Flows and Turning Ratios in the Network
Competing Interests
The authors declare that they have no competing interests.
References
 E. Cascetta and S. Nguyen, “A unified framework for estimating or updating origin/destination matrices from traffic counts,” Transportation Research Part B: Methodological, vol. 22, no. 6, pp. 437–455, 1988. View at: Publisher Site  Google Scholar  MathSciNet
 M. G. Bell, “Estimation of an origindestination matrix from traffic counts,” Transportation Science, vol. 17, no. 2, pp. 198–217, 1983. View at: Publisher Site  Google Scholar
 M. G. H. Bell, “The real time estimation of origindestination flows in the presence of platoon dispersion,” Transportation Research Part B: Methodological, vol. 25, no. 23, pp. 115–125, 1991. View at: Publisher Site  Google Scholar
 E. Cascetta, “Estimation of trip matrices from traffic counts and survey data: a generalized least squares estimator,” Transportation Research Part B: Methodological, vol. 18, no. 45, pp. 289–299, 1984. View at: Publisher Site  Google Scholar
 E. Cascetta, D. Inaudi, and G. Marquis, “Dynamic estimators of origindestination matrices using traffic counts,” Transportation Science, vol. 27, no. 4, pp. 363–373, 1993. View at: Publisher Site  Google Scholar
 E. Cascetta, A. Papola, V. Marzano, F. Simonelli, and I. Vitiello, “Quasidynamic estimation of o–d flows from traffic counts: formulation, statistical validation and performance analysis on real data,” Transportation Research Part B: Methodological, vol. 55, pp. 171–187, 2013. View at: Publisher Site  Google Scholar
 H. Spiess, “A maximum likelihood model for estimating origindestination matrices,” Transportation Research Part B, vol. 21, no. 5, pp. 395–412, 1987. View at: Publisher Site  Google Scholar
 M. J. Maher, “Inferences on trip matrices from observations on link volumes: a Bayesian statistical approach,” Transportation Research. Part B: Methodological, vol. 17, no. 6, pp. 435–447, 1983. View at: Publisher Site  Google Scholar  MathSciNet
 I. Okutani and Y. J. Stephanedes, “Dynamic prediction of traffic volume through Kalman filtering theory,” Transportation Research Part B, vol. 18, no. 1, pp. 1–11, 1984. View at: Publisher Site  Google Scholar
 N. L. Nihan and G. A. Davis, “Recursive estimation of origindestination matrices from input/output counts,” Transportation Research Part B, vol. 21, no. 2, pp. 149–163, 1987. View at: Publisher Site  Google Scholar
 G.L. Chang and J. Wu, “Recursive estimation of timevarying origindestination flows from traffic counts in freeway corridors,” Transportation Research Part B: Methodological, vol. 28, no. 2, pp. 141–160, 1994. View at: Publisher Site  Google Scholar
 K. Ashok and M. E. BenAkiva, “Alternative approaches for realtime estimation and prediction of timedependent origindestination flows,” Transportation Science, vol. 34, no. 1, pp. 21–36, 2000. View at: Publisher Site  Google Scholar
 K. Ashok and M. E. BenAkiva, “Estimation and prediction of timedependent origindestination flows with a stochastic mapping to path flows and link flows,” Transportation Science, vol. 36, no. 2, pp. 184–198, 2002. View at: Publisher Site  Google Scholar
 C. Antoniou, M. BenAkiva, and H. N. Koutsopoulos, “Dynamic traffic demand prediction using conventional and emerging data sources,” IEE ProceedingsIntelligent Transport Systems, vol. 153, no. 1, pp. 97–104, 2006. View at: Google Scholar
 M. P. Dixon and L. R. Rilett, “Realtime OD estimation using automatic vehicle identification and traffic count data,” ComputerAided Civil and Infrastructure Engineering, vol. 17, no. 1, pp. 7–21, 2002. View at: Publisher Site  Google Scholar
 X. Zhou and H. S. Mahmassani, “Dynamic origindestination demand estimation using automatic vehicle identification data,” IEEE Transactions on Intelligent Transportation Systems, vol. 7, no. 1, pp. 105–114, 2006. View at: Publisher Site  Google Scholar
 R. Chen, J. Sun, and Y. Feng, “A novel OD estimation method based on automatic vehicle identification data,” in Intelligent Computing and Information Science, vol. 135, pp. 461–470, Springer, Berlin, Germany, 2011. View at: Google Scholar
 E. Castillo, J. M. Menéndez, and P. Jiménez, “Trip matrix and path flow reconstruction and estimation based on plate scanning and link observations,” Transportation Research Part B: Methodological, vol. 42, no. 5, pp. 455–481, 2008. View at: Publisher Site  Google Scholar
 K. Parry and M. L. Hazelton, “Estimation of origindestination matrices from link counts and sporadic routing data,” Transportation Research Part B: Methodological, vol. 46, no. 1, pp. 175–188, 2012. View at: Publisher Site  Google Scholar
 S. M. Eisenman and G. F. List, “Using probe data to estimate OD matrices,” in Proceedings of the 7th International IEEE Conference on Intelligent Transportation Systems (ITSC '04), pp. 291–296, Washington, DC, USA, October 2004. View at: Google Scholar
 P. Cao, T. Miwa, T. Yamamoto, and T. Morikawa, “Bilevel generalized least squares estimation of dynamic origindestination matrix for urban network with probe vehicle data,” Transportation Research Record, vol. 2333, pp. 66–73, 2013. View at: Publisher Site  Google Scholar
 R. Ásmundsdóttir, Dynamic OD matrix estimation using floating car data [Ph.D. dissertation], Delft University of Technology, 2008.
 Y. Yang, H. P. Lu, and Q. Hu, “A Bilevel programming model for origindestination estimation based on FCD,” in Proceedings of the 10th International Conference of Chinese Transportation Professionals (ICCTP '10), pp. 117–124, American Society of Civil Engineers, Beijing, China, August 2010. View at: Publisher Site  Google Scholar
 K. Sohn and D. Kim, “Dynamic origindestination flow estimation using cellular communication system,” IEEE Transactions on Vehicular Technology, vol. 57, no. 5, pp. 2703–2713, 2008. View at: Publisher Site  Google Scholar
 F. Calabrese, G. Di Lorenzo, L. Liu, and C. Ratti, “Estimating origindestination flows using mobile phone location data,” IEEE Pervasive Computing, vol. 10, no. 4, pp. 36–44, 2011. View at: Publisher Site  Google Scholar
 M. S. Iqbal, C. F. Choudhury, P. Wang, and M. C. González, “Development of origindestination matrices using mobile phone call data,” Transportation Research Part C: Emerging Technologies, vol. 40, pp. 63–74, 2014. View at: Publisher Site  Google Scholar
 V. Marzano, A. Papola, and F. Simonelli, “Limits and perspectives of effective O–D matrix correction using traffic counts,” Transportation Research Part C: Emerging Technologies, vol. 17, no. 2, pp. 120–132, 2009. View at: Publisher Site  Google Scholar
 R. Frederix, F. Viti, R. Corthout, and C. M. J. Tampère, “New gradient approximation method for dynamic origindestination matrix estimation on congested networks,” Transportation Research Record, vol. 2263, pp. 19–25, 2011. View at: Publisher Site  Google Scholar
 H. Yang, T. Sasaki, Y. Iida, and Y. Asakura, “Estimation of origindestination matrices from link traffic counts on congested networks,” Transportation Research Part B: Methodological, vol. 26, no. 6, pp. 417–434, 1992. View at: Publisher Site  Google Scholar
 H. Yang, Q. Meng, and M. G. H. Bell, “Simultaneous estimation of the origindestination matrices and travelcost coefficient for congested networks in a stochastic user equilibrium,” Transportation Science, vol. 35, no. 2, pp. 107–123, 2001. View at: Publisher Site  Google Scholar
 M. Van Aerde, B. Hellinga, L. Yu, and H. Rakha, Vehicle Probes as RealTime ATMS Sources of Dynamic OD and Travel Time Data, Queen's University, Department Of Civil Engineering, 1993.
 G. Tan, L. Liu, F. Wang, and Y. Wang, “Dynamic OD estimation using automatic vehicle location information,” in Proceedings of the 2011 6th IEEE Joint International Information Technology and Artificial Intelligence Conference (ITAIC '11), vol. 1, pp. 352–355, Chongqing, China, August 2011. View at: Publisher Site  Google Scholar
Copyright
Copyright © 2017 Xianfeng Yang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.