#### Abstract

This paper explores the travel time distribution of different types of urban roads, the link and path average travel time, and variance estimation methods by analyzing the large-scale travel time dataset detected from automatic number plate readers installed throughout Beijing. The results show that the best-fitting travel time distribution for different road links in 15 min time intervals differs for different traffic congestion levels. The average travel time for all links on all days can be estimated with acceptable precision by using normal distribution. However, this distribution is not suitable to estimate travel time variance under some types of traffic conditions. Path travel time can be estimated with high precision by summing the travel time of the links that constitute the path. In addition, the path travel time variance can be estimated by the travel time variance of the links, provided that the travel times on all the links along a given path are generated by statistically independent distributions. These findings can be used to develop and validate microscopic simulations or online travel time estimation and prediction systems.

#### 1. Introduction

Traffic congestion during peak hours has become unavoidable in numerous cities worldwide because of the rapid increase in car ownerships and the lack of resources for proportionately increasing the supply capacity of road systems. This problem is causing travel time to be highly unreliable. Travel time and travel time reliability have become important performance measures in assessing traffic system conditions. A previous study [1] suggests that travel time reliability may be more important than travel time savings; that is, road users may choose a reliable route over an unreliable one despite the longer travel time of the former. The reliability and variability of travel time have attracted significant attention in the past decade.

Traffic systems are complex and stochastic. For instance, a number of notable sources of traffic congestion (traffic incidents, work zones, bad weather, special events, and traffic demand fluctuations) can disrupt system performance and lengthen travel time. A high variability indicates the unpredictability of travel time and the reduced reliability of traffic service [2]. However, the accurate estimation and prediction of travel time are essential to traffic operation and traveler information systems.

The insufficient amount of actual travel time data has caused several previous studies to use either loop detector data to analyze travel time reliability or microsimulation technique. In the past decade, numerous cities have implemented various travel time direct measurement techniques, such as automatic number plate readers (ANPR), automated vehicle identification (AVI) systems, GPS-equipped vehicles, smart phone devices, and Bluetooth [3]. All of these techniques provide accurate individual vehicle travel time data for analysis. For instance, gathering travel times with moving GPS-equipped vehicles produces accurate, continuous, and automated point-to-point measures, which are more representative of road performance compared with the point estimates of speed from fixed detectors. However, the sampling rate is occasionally low (no more than 10% of the traffic flow) without the wide use of GPS equipment in private cars.

The remainder of the paper is organized as follows. Section 2 provides a brief review of the literature with respect to modeling travel time distribution. A description of the data set used in this study and data preprocessing are then presented in Section 3. Section 4 explains the methodology used to investigate the statistical properties of the travel time distributions and results and discusses whether the normal distribution can be used for all travel times to obtain the mean and variance, which usually indicates the travel time variability. Finally, Section 5 provides the conclusion of the study.

#### 2. Literature Review

Travel time distribution is an important basis for modeling travel time variability and reliability, which can be measured using several travel time distribution properties, such as standard deviation and coefficient of variation. Previous studies on travel time variability and reliability assumed that travel time distribution may follow either normal distribution or log-normal distribution [4], particularly for freeways. However, several studies suggested that travel time data are skewed and that such data possess long upper tail [5, 6].

Numerous studies have been conducted to investigate the probability distribution of travel time on freeways and signalized arterial roads. Recently collected empirical travel time data exhibit positive skews and long tails. Therefore, normal distribution is not suitable for these data. During the 1950s, Wardrop [7] reported that travel times follow a skewed distribution with a long tail. Using the data of trip times to and from work on 25 routes in Detroit collected over 20 months; Herman and Lam [8] subsequently verified this observation with an empirical study on travel time variability. This study showed that the to-work trip time histogram tended to follow a normal distribution, whereas the from-work trip time histogram closely resembled a uniform distribution. Using the travel time data collected in Michigan, Polus [9] concluded that travel times fit a gamma distribution relatively well. Richardson and Taylor [10] analyzed the travel time data in Melbourne and found that the observed travel time variability may be represented by a log-normal distribution.

Dandy and McBean suggested a log-normal or gamma distribution [11] and proposed that the log-normal distribution is a useful descriptor of in-vehicle travel times. A log-normal fit has been derived in several studies, such as those by Mogridge and Fry [12] and Arroyo and Kornhauser [13]. Montgomery and May [14] conducted a survey in Leeds city and found that individual travel times were fit to a log-normal distribution with means and dispersions that are considerably more stable on a number of routes than others. Rakha et al. [15] studied the AVI data from San Antonio through goodness-of-fit tests and demonstrated that normal distribution is inconsistent with the field travel time observations. A previous study [16] showed that a lognormal distribution is highly representative of travel times, particularly under steady state conditions, and that a mixture distribution is appropriate for modeling travel times under nonsteady state conditions. Chen et al. [17] modeled the travel rate (travel time per unit distance) distribution and found that a log-normal distribution exhibits good approximation performance for the travel rate distribution at an a.m. peak hour on Wednesday.

To obtain a better view of the travel time distribution, Susilawati et al. [18] analyzed the GPS travel time data from the Adelaide database and found that the Burr distribution fits the empirical travel time data, that such distribution can be used to represent the observed data at two urban arterial roads in Adelaide, and that the bimodal distribution is appropriate for travel time distribution in a number of short arterial links [19]. Fosgerau and Fukuda [20] studied the minute-by-minute travel times for a congested urban road over five months and found that a stable distribution fits the standardized travel time of a link or a sequence of links. A number of recent studies have focused on the link travel time distribution prediction to generate reliable real-time traffic condition forecasts [21, 22].

In most of these studies, the data derived from GPS-based probe cars only provide the travel time for the probe vehicles, resulting in a small sampling rate of traffic flow and preventing the collection of a large sampling data set for various routes and times of day. Moreover, this low sampling rate may not reflect the real distribution of travel time during a short period, such as 15 min. With the rapid development of traffic flow data collection techniques, the ANPR system allows traffic management engineers to record the time when a vehicle passes a specific location on roads. The time differences between continuous locations can be directly used as the travel time of this vehicle. The ANPR system provides almost the entire sampling rate of the traffic flow, except for the identification error of the camera. These accurate travel time data are beneficial to the studies on travel time variations in urban environments.

#### 3. Data Preprocessing

Different travel time monitoring techniques have been developed over the past decades. These techniques include ANPR cameras, Bluetooth scanners, GPS-based in-car devices, smart phones, and speed sensors. The first two have been proven to be promising methods. Various models for estimating or predicting travel time distribution have been proposed on the basis of the data obtained from these techniques, and these models have exhibited excellent performance on freeways.

Approximately 200 ANPR identification detectors are currently being mounted throughout Beijing city to collect vehicle passing time. These detectors allow the travel time on the target road link to be obtained through the comparison and analysis of the passing time between two consecutive detectors.

However, the major problem of ANPR in urban networks lies in the difficulty in determining whether a vehicle has traveled exactly along the route between two locations without making unexpected stops. Thus, a number of invalid travel times from individual vehicles are inevitably observed. These invalid travel times do not represent the average traffic conditions on the link considered at the time the vehicle was detected. For instance, an invalid travel time that is considerably longer than the average travel time of vehicles can be observed from a vehicle making unexpected stops between two detection stations or from a bus that must stop at bus stops to load and unload passengers. Such travel times must be removed from the dataset to avoid bias in the analysis results.

The data must be preprocessed to remove the outlier value before the estimation of the date distribution. Quartile screening method using quartile interval is applied in this study to reflect the variation scale.

Specifically, the data interval is the difference between the upper and lower quartiles. If the data lies outside the interval, then it will be identified as abnormal data and deleted: where indicates the valid data interval, and are the lower and upper quartile values (the lowest and highest 25% of the data), respectively, and is the quartile range.

Figure 1 presents a comparison diagram that shows the before and after preprocessing results obtained through quartile screening method for the travel time dataset of a single day in link 69–68. Several scattered and unreasonable values were discarded.

**(a)**

**(b)**

#### 4. Estimation

##### 4.1. Estimation of Travel Time Distribution

The influence of signal timing and other parameters causes the travel time on arterial road links to show unimodal, bimodal, or multimodal distribution shape. Therefore, in this study, the distribution patterns of the travel times of different links were analyzed in order of priority. A total of 17 unimodal distribution links were selected: seven signalized arterials with lengths ranging from 417 m to 2028 m and 10 nonsignalized urban expressways with lengths ranging from 1600 m to 4100 m.

A 15 min interval is used to study how the travel time distribution on actual road varies over time. This interval provides sufficient data for most times of a day, and these data can be used for travel time distribution estimation. In addition, this interval is short enough to capture short-term variations in travel time.

The data collected on June 16, 2011, were selected for use in this paper. For each road link, a total of 96 15 min travel time datasets may be used for analysis. Normal, log-normal, gamma, and Weibull distributions are fitted to these 15 min travel time datasets, and the Chi squares are selected for testing goodness-of-fit.

To determine the relationship between the distribution pattern and the congestion degree of the road link, travel speed is used to indicate the congestion degree of the road link. Then, the average travel speed on each link is calculated using the following formula for each 15 min interval: where is the length of link and is the estimated time on link .

The relationship between the travel speed and the travel time can be studied. The traveler information system of the Beijing Traffic Management Bureau was used to classify the traffic conditions of various types of roads in Beijing according to the average travel speed, as shown in Table 1.

As shown in Table 1, the numbers and percentiles of the different distributions for each traffic condition were observed for each road type, that is, arterials and urban expressways. A total of 660 15 min intervals without adequate data to test the distribution are not shown in Table 1.

Table 1 shows that when the average speed on certain road links is relatively low (i.e., congested traffic), the Weibull distribution percentile is the largest (arterial: 0.432; urban expressway: 0.378). The average travel speed increase causes a decrease in the Weibull distribution percentile and an increase in log-normal distribution percentile. Under free flow traffic conditions in urban expressways, the log-normal distribution percentile is 0.569, indicating that the log-normal distribution best fits the distribution of the 15 min travel time for most intervals under free flow conditions. Evidently, different traffic conditions have different travel time distributions, suggesting that we should adopt different distributions for travel time reliability and variability estimation.

##### 4.2. 15 min Travel Time Estimation

For a 15 min travel time and its variance estimation, the specific distribution of travel time for certain road links during certain times must first be obtained using distribution fitting process; then, the average travel time and the standard deviation of travel time can be derived from the corresponding distribution function. However, in actual application, the parameter estimation of log-normal, gamma, and Weibull distributions is complicated, thereby highly constraining online real-time application. By contrast, the parameter estimation of normal distribution is simple. Therefore, the average travel time and variance estimation results for the normal distribution were compared with those for the other distributions (i.e., log-normal, gamma, and Weibull). In actual application, such as travel time estimation for dynamic traffic assignment or travel time reliability estimation, if the difference is within an acceptable range, then normal distribution can be considered a substitute for other distributions to obtain higher real-time calculation efficiency and reduce computer workload.

First, we consider the “Nanheyan Intersection -> Wangfujing Intersection” as an analysis example. The dataset consists of four 15 min intervals from 8 AM to 9 AM on June 16, 2011. The average travel time and variance estimation results of different distributions in various 15 min intervals are shown in Table 2: where represents the log-normal, gamma, or Weibull distribution; and indicate the estimated average travel time and standard deviation of travel time from distribution, respectively; and indicate the estimated average travel time and standard deviation of travel time from normal distribution, respectively; and are the absolute percentage errors of the estimated average travel time and standard deviation of travel time, respectively.

The following observations are drawn from Table 2. (1) For the estimation of average travel time, the errors between the estimation results obtained from the normal distribution and those obtained from other complex distributions are insignificant, with the maximum being 2.35%. (2) For the estimation of standard deviation of travel time, the errors are large (ranging from 1.47% to 26.49%). Therefore, whether the normal distribution can substitute other distributions for parameter estimation must be decided on the basis of the accuracy requirements for actual application.

Without loss of generality, 17 road links observed on June 16, 2011, were used to determine the estimation error between the best-fitting and normal distributions. A total of 962 valid 15 min travel time datasets were left for analysis after the invalid datasets were removed. The errors of average travel time estimation and standard deviation estimation derived using normal distribution with other distributions are shown in Figures 2(a) and 2(b), respectively.

**(a)**

**(b)**

As shown in Figures 2(a) and 2(b), the average travel time estimation using normal distribution is either more or less than that using other distributions, and the maximum error is ±2%. However, the errors between standard deviation estimation with different distributions are relatively large, with the maximum error reaching –90%. The mean absolute relative error (MARE) was only 6.9%. The standard deviation estimation errors are mostly distributed within (–0.1, 0.1), accounting for over 85%; that is, under most traffic situations, the accuracy of the average travel time estimation and standard deviation estimation through normal distribution estimation for various road links is acceptable.

These results suggest that the average travel time under most traffic situations can be estimated using normal distribution. For the estimation of the standard deviation of travel time, that is, travel time variability or reliability, the errors are occasionally large. For high accuracy requirements, for example, more than 95%, the accuracy of normal distribution estimation cannot be ensured.

##### 4.3. Trip Travel Time Estimation

In actual trips, travelers are more concerned with the travel time of a path or trip, which is the travel time from the original point to the destination. For a path composed of links that form the set , the path travel time can be computed using the following two methods.

*Method 1*. If there is a set of vehicles travelling along the path , then the path travel time at a certain time interval can be calculated as follows:
where denotes the conditional expectation over all the realizations and is the travel time experienced by a vehicle along the path .

Similarly, the path travel time variance can be computed using the path travel time realizations [16]:

This equation serves as the ground truth used for comparison purposes.

*Method 2.* The majority of vehicles do not travel the entire trip length; thus, the trip travel times from various road segments or link travel times should be estimated. Without loss of generality, the path travel time can be determined as the sum of the experienced link travel times along the set of links constituting the path, as shown in the following:
where is the travel time experienced by vehicles along link .

The expected travel time along a link can be computed as follows: where a total of vehicles are traveling along link .

The travel time variance for link can be computed as follows:

For method 2, three methods were used to compute the path travel time variance from the link travel time variance [16].

*Method 2.1*. The current state-of-the-practice technique for estimating path travel time variability assumes that the travel times on all the links along a given path are generated by statistically independent distributions. Therefore, the path variance can be computed as the summation of the link travel time variances for all links along a path as follows (Method 2.1, noted with subscript 1):

*Method 2.2*. Sherali et al. [23] used the maximum and minimum link travel time coefficients of variations (CV) to construct bounds on the path CV because the CV is independent of the length of the link. The following equation is derived to estimate the path variance (Method 2.2, noted with subscript 2):
where is the coefficient of variation.

*Method 2.3*. Another method for estimating the trip variance [16] (Method 2.3, noted with subscript 3) is to compute the expected path CV as the conditional expectation over all realizations of the various roadway links making up a path. The path variance estimation can be computed as follows:

The majority of vehicles do not travel the entire trip length; thus, the travel time data of a path within a short-term interval is limited. To retrieve a significant amount of travel time data, the sampling time interval is set to 30 mins in this study. Five paths are used for analysis. The main consideration in choosing these five paths are as follows. (1) In the observed period, the datasets for these paths have an enough number of valid records to support the analysis. (2) These paths have consecutive links within the path. The dataset has a travel time for each link and a travel time for the entire path, thus allowing the correlation analysis of the links and error estimation. The data of the selected path consist of 24 hour data obtained from September 3, 2008. This dataset is then divided into 48 groups of values for every 30 mins.

The five paths travel time estimation error is illustrated in Figures 3(a) and 3(b). The standard path travel time for error estimation is directly obtained from the estimation of path travel time and its standard error, as described in method 1 and shown in (4).

**(a)**

**(b)**

As shown in Figures 3(a) and 3(b), most of the related errors of the five paths are lower than 10%, except for a few values exceeding 10%.

Figure 4 shows that the five paths’ travel time variance estimation errors derived by the three methods. As shown in Figure 4, most of the cases using Method 2.1 have the minimum estimation error of path travel time variance. For path 18-17-16 (Chang’an Street, signalized arterials) and path 136-135-134 (Jing-Kai Expressway), Methods 2.2 and 2.3 have similar errors for most intervals. For the other three paths (all of which are expressways), no evident regular patterns are found. The MARE of three methods for path travel time variance estimation of the five paths is shown in Table 3.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

The following observations are drawn from Table 3. (1) For the entire day, the MARE of paths 18-17-16, 70-69-68, and 136-135-134 show that Method 2.1 has the lowest error, whereas the MARE of paths 133-134-135 and 140-139-138 show that Method 2.3 has the lowest error. (2) For the a.m. peak hours, the MARE of four paths (i.e., 18-17-16, 70-69-68, 140-139-138, and 136-135-134) show that Method 2.1 has the lowest error. For path 133-134-135, Methods 2.1 and 2.3 have similar precision. (3) For the p.m. peak hours, the MARE of the two paths with data show that Method 2.1 has the lowest error. For nonpeak hours, a similar regular pattern throughout the entire day is found.

These results are relatively different from those described in [16], which utilized field data and simulated data. In this study, Method 2.2 is shown to have the highest estimation errors, whereas Method 2.1 is shown to have the best estimation performance under most situations.

These findings suggest that the path average travel time can be estimated by summing the travel time along the set of links constituting the path. On the basis of the performance of Method 2.1, the assumption that travel times on all links along a path are generated by statistically independent distributions is reasonable under most traffic conditions, particularly for congested periods. Nevertheless, for a number of paths, Method 2.3 is more suitable than Method 2.1 for calculating the path travel time variance.

#### 5. Conclusion

This paper discussed three issues concerning travel time variability and reliability estimation on the basis of the ANPR data in the Beijing road network. The first observation involves link travel time distribution estimation through goodness-of-fit tests within short-term time intervals of 15 min. The results show that for congested periods, approximately 40% of the 15 min travel time distribution followed Weibull distribution, with 43.2% for signalized arterials and 37.8% for urban expressways. The increase in average travel speed caused the 15 min travel time distribution percentile following log-normal distribution to gradually increase until the values become 45.5% for signalized arterials and 56.9% for urban expressways (uncongested traffic conditions).

The second observation involves the quick estimation process of the average travel time for 15 min intervals. The comparison between estimation results based on normal distribution and those based on other more fitting distributions shows that the estimation results based on normal distribution for the average travel time estimation are acceptable, with errors of no more than 2% and less than 1% for more than 90% of the estimations. For travel time variance estimation, although the MARE is 6.9%, approximately 1/3 of the absolute relative error is more than 50%. Thus, under most traffic conditions, normal distribution can be used to estimate the average travel time, whereas the most fitting distribution can be used to estimate the travel time variance.

The third observation involves the estimation process of the path travel time and variance on the basis of the travel time of the links constituting the path. The results show that the path travel time can be computed by summing the average travel time of links constituting the path. In addition, Method 2.1 produces the best results for path travel time variance estimation under most traffic conditions, particularly during congested periods.

In the future, more travel time data should be collected to conduct similar studies based on large-scale data sets and to further understand the travel time distribution pattern and relationship between path travel time and travel time of links constituting a given path, particularly for long distance paths or trips with more than three links.

#### Conflict of Interests

The authors declare the lack of conflict of interests regarding the publication of this paper.

#### Acknowledgments

This work was supported by the National Natural Science Foundation of China (Grant no. 71361130015). The authors are personally grateful to Beijing Traffic Management Bureau for its support.