#### Abstract

The considerable impact of congestion on transportation networks is reflected by the vast amount of research papers dedicated to congestion identification, modeling, and alleviation. Despite this, the statistical characteristics of congestion, and particularly of its duration, have not been systematically studied, regardless of the fact that they can offer significant insights on its formation, effects and alleviation. We extend previous research by proposing the autoregressive conditional duration (ACD) approach for modeling congestion duration in urban signalized arterials. Results based on data from a signalized arterial indicate that a multiregime nonlinear ACD model best describes the observed congestion duration data while when it lasts longer than 18 minutes, traffic exhibits persistence and slow recovery rate.

#### 1. Introduction

Congestions impact on transportation networks are of particular importance for both planners and users. Congestion leads to longer travel times and limited reliability of the transportation system; further, during congestion, users experience delays, inability to forecast travel conditions, increased transportation cost, emissions, and so on [1]. The dynamics of congestion duration may contain useful information about intraday traffic operations and should be further explored.

Regardless of its significance for all aspects of traffic operations, the investigation of congestion duration’s statistical characteristics has attracted only limited attention in the literature. During the last two decades, focus has been given to developing short-term prediction models for accurately forecasting the volume, occupancy, and speed, and determining the anticipated traffic flow conditions in urban road networks; existing literature is quite extensive and covers a broad range of statistical and artificial intelligence models developed for short-term predictions [2]. However, short-term prediction research does have certain shortcomings; first, approaches are not efficient during congested-unstable traffic conditions [3, 4]. Second, most traffic prediction models do not cope with the transitional nature of traffic flow and, consequently, may fail at accurately detecting traffic congestion. Recent empirical evidence has clearly demonstrated that traffic possesses complex statistical characteristics and strong transitional behavior that cannot be tackled by a single prediction model of traffic variables [5, 6]. Third, even the more sophisticated prediction approaches fail to incorporate the effects of signalization on traffic prediction (for a recent review see [7]).

These shortcomings suggest the importance of predicting congestion duration in urban signalized road networks; to this end, Stathopoulos and Karlaftis [8] applied the principles of parametric duration modeling to analyze the duration of congestion in signalized arterials. They tested several functional forms for the duration models and concluded that the Loglogistic functional form best describes congestion duration. Their approach, however, resulted in static models in which the effects of previous congestion occurrences have not been taken into account. In this paper, we extend the conceptual framework of past research by introducing the autoregressive conditional duration models for modeling urban congestion duration that account for the time-series dimension of congestion events and their duration. We propose and evaluate several duration models that allow for the conditional expected congestion duration to be a nonlinear function of past information while we also address nonlinearity issues and the manner in which they impact congestion duration.

#### 2. The Autoregressive Conditional Duration Models

Duration models have been applied to problems in many scientific fields such as medicine, economics, and transportation to investigate phenomena in which their temporal characteristics (durations for example) are of primary importance [9–13]. However, in classical econometric techniques, time-series are often taken to be sequences of data points separated by uniform time intervals; as noted by Tsay [14], “traditional” duration models do not account for the possibility of a time-series dimension in these models and the related phenomena (an excellent example of such a phenomenon are stock transactions as discussed in [15, 16]). Congestion events are typical of such behavior; congestion occurs in unequally spaced time intervals; classical duration modeling may not be adequate to model such events.

Using concepts similar to the generalized autoregressive conditional heteroskedastic (GARCH) models, Engle and Russell [15] proposed the autoregressive conditional duration (ACD) model to describe the evolution of data that arrive in unequally spaced time intervals. Let the time of congestion occurrence, with and be the duration congestion events. Associated with the arrival times is the counting function which is the number of events that have occurred by time . Let be the expectation of the th duration given by with parameter . The basic assumption of the ACD model is that the standardized durations are independent and identically distributed [15]: where is the general distribution over with mean equal to one and parameter vector . From the above, it appears that there is a number of potential ACD models that vary with respect to the different specification of the expected durations, as well as the distribution of the .

In order to define the conditional intensity or hazard function, let be the density function of and let , the associated survival function then
is the baseline hazard utilized to express the hazard function as [15]
Equation (2.3) implies that the past history influences conditional intensity by both a multiplicative effect and a shift in the baseline hazard [15]; Engle and Russell call this model The “*accelerated failure time*” model since the past information influences the rate at which time passes [15]. This is in line with most observations in traffic flow theory; sometimes flow changes rapidly and time between congestion occurrences flows rapidly while in other cases the opposite applies. In this case, the rate of time flow depends on the past event arrival times through the function .

An ACD model specifies the conditional mean duration as a linear function of lagged durations and their conditional expectations [14] Once a parametric distribution of has been specified, maximum likelihood estimates of can be obtained by using different numerical optimization algorithms. When the error distribution is exponential, the resulting model is called an EACD model. Similarly, if follows a Weibull distribution, the model is referred to as a WACD model, and so on. We focus on the WACD model that is (a) flexible in terms of hazard distribution features, (b) straightforward to estimate, and (c) can account for serial dependence in high frequency data [15, 17]. In the case of Weibull distribution with parameters , the hazard function is and the conditional intensity function is [15] where is the Gamma function. Equation (2.6) shows that the conditional intensity is now dependent on two parameters which, in turn, indicate that either increasing or decreasing hazard functions may result; this makes especially long durations more or less likely than for the exponential depending on whether is less or greater than unity, respectively [15]. The log likelihood for the Weibull ACD is [14] The Weibull distribution reduces to the exponential distribution if equals 1, but allows for an increasing (decreasing) hazard function if () [14].

The ACD models can be modified to account for nonlinearities that are quite common in high-resolution datasets. Zhang et al. [18] extend the ACD model to account for nonlinearity and structural breaks in the data. A threshold autoregressive conditional duration (TACD) model allows for the expected duration to depend nonlinearly on past information variables. A positive stochastic process follows a th regime threshold ACD model when the threshold variable : where the delay parameter is a positive integer, is the conditional mean of , , for a positive integer, are the threshold variables [18]. Based on the TACD formulation, the different regimes of a time-series dataset are allowed to have different duration persistence and error distributions, making modeling more flexible and efficient. However, in such models, the proper selection of the number of regimes is critical as it significantly affects the estimation process; Zhang et al. [18] underline the computational difficulty in estimating a 3-regime TACD(1,1) model for financial transaction duration data. Further specifications and details on ACD model development, estimation, and testing can be found in [14, 16, 17].

#### 3. Modeling Congestion Duration

##### 3.1. The Data

Data from Athens (Greece) are used for modeling traffic congestion duration. The urban arterial examined has signalized intersections with link lengths varying from 150 to 500 m while there are three lanes for through traffic per direction. Traffic data on arterial links are collected using a system of loop detectors located around 90 m from stop lines; volume and occupancy data are extracted every 90 sec. Figure 1 depicts the time-series of volume and occupancy for a typical day. We examine the temporal dependence of both time-series via the autocorrelation function (Figure 2), where it is apparent that both volume and occupancy possess strong long memory characteristics reflected in the hyperbolic decay of their autocorrelation structure [19].

**(a)**

**(b)**

**(a)**

**(b)**

In order to determine the periods of congestion from the available data, it is necessary to identify congestion occurrences. The identification of congestion is based on a methodology developed using an advanced artificial intelligence approach to detect and cluster the transitional characteristics of volume and occupancy [20]; this approach is found to be consistent with a kinematic wave theoretic model and can identify the spillover region (the region where queues occurring are longer than the signalized arterial links). As can be observed in Figure 3, congested conditions (spillover region) and the critical area before congestion are separated by the spillover line where is the volume (veh/time interval), the typical vehicle length, the free-flow speed (km/h), the cycle length, and the red phase duration. Equation (2.5) is applied to extract the duration of congestion episodes. Table 1 summarizes the parameter values that will be used for congestion (spillover) detection. Figure 4 demonstrates the distributions of congestion duration. Figure 5 shows the distribution of the congestion durations observed and Figure 6 the autocorrelation function graph for the observed congestion durations time series. Summary statistics for the congestion data are presented in Table 2; the high Ljung-Box statistics, which is a distributed statistic given by , where is the series length, is the ACF of the th lag sample. is the degrees of freedom, show strong serial correlation.

**(a)**

**(b)**

##### 3.2. Congestion Duration Models: Specifications, Estimation, and Diagnostics

For the estimation of congestion duration, an autoregressive conditional duration model with the Weibull distribution describing the errors (WACD) is considered. The estimated values of the fitted models are presented in Table 3. The model’s parameters are significant at the 5% level. The fitted models have a Weibull error distribution with parameter ; the estimated value of the Weibull distribution is very close to 1, indicating a conditional hazard function that monotonously decreases at a slow rate. The sum of and is less than 1, pointing to an ergodic process. The Ljung-Box statistics for the residuals series show that the standardized innovations are not significantly correlated.

The fitted WACD model is contrasted to two parametric duration models previously applied in congestion modeling [8]. Comparisons are established based on the adjusted Anderson-Darling test statistics (AD) and the correlation coefficients (COR); the best fitted model will have the lowest value of AD and the highest COR value. Table 4 shows the goodness-of-fit tests for the three models. Although all COR values are relatively high, there exists a difference with respect to the AD values; the best fitted model is the WACD. Based on the results presented on Tables 3 and 4 some interesting remarks are extracted; first, none of the models presented in Table 4 has AD values below the critical value at 95% confidence level (2.492) [21]. Second, the nonlinearity test (the null hypothesis is that the true model is an AR process.) conducted [22] suggests that some nonlinearities remain in the residuals (Table 3). Third, Engle’s LM ARCH test (the null hypothesis is that there is no ARCH effect in the time-series under study (constant conditional variance.)) [23] points towards a heteroscedastic behavior for the residuals (Table 3). All the above indicate the need to further refine the model.

In order to account for the remaining nonlinearities and the ARCH effect in the residuals, a threshold conditional duration model with Weibull distribution for the error (T-WACD) is developed. A recursive approach is used to identify the number and magnitude—with respect to the congestion duration boundaries—of regimes that best describe the available congestion duration data. Results for the model are summarized in Table 3. Figure 7 shows the scatter plot of the actual versus estimated congestion durations for WACD and T-WACD; both models fit data well. Moreover, based on the results the mean absolute percent error (MAPE) is calculated (Table 3). Although results show superiority of the TR-WACD model over the WACD model, the error levels cannot be fully evaluated as no MAPE results for duration modeling have been reported in previous researches.

**(a)**

**(b)**

A thorough investigation of the results shows that the T-WACD model provides a better fit to the original data when compared to the single regime WACD model. The T-WACD can explain most of the temporal dependence in the congestion duration periods; additionally, most of the nonlinearities and ARCH effects are efficiently addressed. Some differences may also be identified in the regimes. For example, the estimated Weibull distribution parameter monotonically decreases at a slow rate during congestion episodes that last up to 18 minutes whereas the Weibull distribution monotonically decreases for congestion episodes of 18 minutes and above. Moreover, a significant observation refers to comparing the value of the sum of across the two identified regimes; for the regime 1, the estimated model returns while the opposite applies for the second regime. This suggests that the first regime—indicating congestion events that last up to 18 minutes—describes a stationary process while congestion durations longer than 18 minutes are governed by nonstationary dynamics.

##### 3.3. Understanding the Nonlinear Dynamics of Congestion and Its Implications on Traffic

Modeling the duration of traffic congestion as both a linear and a nonlinear autoregressive process enables a more efficient description of the duration between consecutive congestion occurrences and yields useful information on the dynamics of traffic flow. A first important finding refers to the need to study the structural properties of the congestion duration data in relation to structural breaks. When contrasting the WACD model that accounts for the overall process of congestion duration to the 2-regime T-WACD model, it becomes evident that the system’s (i.e., overall congestion duration data) stationarity does not imply any subsystem’s stationarity. This is a critical finding as it reflects the occurrence of structural breaks in the traffic flow evolution or possible changes in the traffic demand in an intraday microscale level.

Focusing on the sum of which indicates the stability of the congestion duration process, the modeling effort exhibits significant differences across regimes, indicating differences with respect to traffic flow evolution and persistence. Based on the estimated models for each regime, it appears that when congestion durations equal or are less than 18 minutes, congestion is not persistent and traffic will not remain congested for long. In the case of longer congestion durations, the large value of suggests that traffic becomes persistent and congestion is magnified at an exponential rate. Figure 8 depicts the hazard function for the two regimes in the T-WACD congestion duration model. The conditional intensity of the two congestion duration models is different, suggesting different dynamics for congestion duration in the two regimes; in the case of high congestion durations (over 18 minutes) traffic becomes persistent. These results have significant implication to the modeling of short-term traffic flow and are in line with recent evidence on the memory properties of traffic flow time-series [19].

#### 4. Conclusions

High-frequency traffic flow data have offered transportation researchers the opportunity to understand the dynamics of certain nonrecurrent events such as the duration of congestion. A study of the related microstructures of traffic flow time-series has been proven to be quite informative with respect to the pattern-based evolution of traffic flow and the volatility of traffic phenomena. A significant consideration in traffic flow analysis is the frequency and the manner congestion occurs in urban networks; to this end, we proposed linear and nonlinear autoregressive conditional duration models which are a novel methodological approach in transportation applications.

Results from an urban signalized arterial indicated that congestion duration data are typically nonlinear and volatile. We showed that a multiregime nonlinear ACD model fits the observed data best. The estimated model suggests—for the specific application—the existence of two distinct congestion duration regimes; in congestion incidents that last up to 18 minutes, traffic is most likely to quickly exit congestion, whereas in congestion duration longer than 18 minutes, traffic exhibits persistence and congestion is expected to last. It is worth noting that although the results cannot claim transferability with regards to differences in arterial geometry, traffic demand, and signalization plans, the multi-regime nature of traffic flow is both well mathematically established in the proposed models and supported by the estimation results. From a methodological perspective, the paper’s novelty is that models applied allow for the conditional expected duration to be a nonlinear function of the past duration incidents. Additionally, we considered possible nonstationarity that may lie in the microstructure of traffic congestion occurrences. Nevertheless, regardless of the flexibility of the nonlinear ACD models, there is still important information that has to be considered in modeling; for example, other distributional error forms should be considered including the gamma and loglogistic functional forms.