#### Abstract

Accurate prediction of incident duration is not only important information of Traffic Incident Management System, but also an effective input for travel time prediction. In this paper, the hazard based prediction models are developed for both incident clearance time and arrival time. The data are obtained from the Queensland Department of Transport and Main Roads’ STREAMS Incident Management System (SIMS) for one year ending in November 2010. The best fitting distributions are drawn for both clearance and arrival time for 3 types of incident: crash, stationary vehicle, and hazard. The results show that Gamma, Log-logistic, and Weibull are the best fit for crash, stationary vehicle, and hazard incident, respectively. The obvious impact factors are given for crash clearance time and arrival time. The quantitative influences for crash and hazard incident are presented for both clearance and arrival. The model accuracy is analyzed at the end.

#### 1. Introduction

Traffic incident is considered as one of the major factors which cause traffic congestion and delay. At the same time traffic incident has wide negative impact on both traffic system and related social activities. Many studies have been done to predict, estimate, or try to decrease traffic duration time just because the accurate prediction of incident time can reduce incident duration time, associate Traffic Incident Management (TIM) to quickly respond to incident to mitigate the impact of incidents (Chou [1]), and improve travel time reliability by predicting travel time while occurrence of incident (Tsubota et al. [2]).

Traffic incident is nonrecurrent events which cause a capacity reduction or an abnormal increase in traffic demand, such as crash accident, stalled vehicles, debris, fire, construction, and sporting events. A general incident timeline as shown in Figure 1 reveals that incident duration can be divided into verification, response, clearance, and recovery period by recording timestamps at various stage of an incident.

However, most of the prediction models did not include all four parts or did not give the exact definition of incident time. Each part of incident time has statistic distribution and has different influence factors. An example is given in Figure 2.

Figure 2 is the occupancy contour plot from 3:00 a.m. to 11:00 a.m., in which the -axis denotes the distance from start point of the freeway and -axis indicates the time. The green indicates low occupancy (free flow conditions), the yellow indicates increasing congestion, and the red represents much congestion on the links. The incident is marked as the red area where the incident occurred time (7:00 a.m.), cleared time (7:53 a.m.), and traffic recovered time (8:20 a.m.) are clearly marked. According to the incident data base (SIMS), a multiple vehicle crash incident on 22 July in 2011 was verified on 7:39 a.m. and was cleared on 7:53 a.m., and the tow assistance arrived on 7:47 a.m.. The total duration time was 80 minutes calculated based on real traffic flow data, while the duration time was only 14 minutes according to incident data base. The real duration time was almost 6 times of incident data base, resulting in errors of prediction model.

However, the accuracy of the incident prediction model cannot be improved, which is partly caused by the definition of the incident duration. It is difficult to get real incident duration time as shown in Figure 1, because the incident occurred time and the incident ended time cannot be collected exactly. In this paper, the clearance time and arrival time are investigated individually. The major impact factors for the clearance time and arrival time are studied, and two prediction models are then developed. The comparison of different impact factors is put forward.

The rest of this paper is structured as follows. The introduction of Incident Management System (SIMS) in the Pacific Motorway, Brisbane, Australia, and the incident parameter properties are represented in Section 3. Section 4 describes model development: firstly the hazard based model is introduced; secondly, the best statistic fitting distribution for 3 types of incident is given for both clearance time and arrival time; thirdly, the obvious impact factors of the hazard based model are given separately after data filter; finally, the quantitative impact evaluation and precision are obtained. The conclusions are summarized in Section 5.

#### 2. Literature Review

Over the past several decades, various models have been developed to predict traffic incidents. Most of these studies are based on statistics theory. These representative prediction models can be classified into the following categories: probabilistic distribution, linear regression analyses, time sequential methods, decision tree models, neural network models, Bayesian classifier, hazard analysis model, and so forth.

Golob et al. [3] provided the first probabilistic distribution of incident duration. They found that each part of duration time was related to the previous duration time and demonstrated that the duration fits a lognormal distribution. Giuliano [4] extended Golob’s work by applying a lognormal distribution among similar type of incidents and reported that whether involving truck was the key impact factors of incident duration. Garib et al. [5] and Sullivan [6] did similar research. Jones et al. [7] developed double logarithmic distribution of incident duration time. Nam and Mannering [8] found that the Weibull distribution can also be used to describe incident data.

Linear regression models have been widely used due to their simplicity and effectiveness. The relationship between impact factors and incident duration can be obviously reflected in regression models [5, 9]. Ozbay and Kachroo [10] developed incident clearance time based on regression method using 121 traffic incident data collected from Chicago, USA. Nine statistically significant variables were put forward, but the accuracy and effectiveness of the model were not given. Kau [11] estimated freeway incident duration using multiple linear regression method, with confidence interval of 95%. He defined the incident clearance time as the duration from the time that a police vehicle or freeway service patrol vehicle arrives at the scene until the vehicles are cleared from the scene.

The advantage of time sequential model is that it can do regression prediction using a few incident property variables at the early stage of the incident and can update the prediction result when more information is collected. Khattak et al. [12] found that incident type and severity were the most significant factors. He applied time sequential model based on small incident sample, but the practicability had not been demonstrated because of poor accuracy and little incident data.

The decision tree model is independent of distribution of dependent variables. Sethi et al. [13] indicated that the average duration time was 21 minutes based on 801 incident data, including data set of traffic interruption, disable vehicle, and severity. The results showed that the incident type was the most obvious factor on duration. Ji et al. [14] set up the decision tree model based on Bayesian using 1853 incident data in Utrecht, the Netherlands. The precision was improved to 73.39% compared to the decision model. Stephen et al. [15] developed an incident duration model based on a naïve Bayesian classifier. He emphasized that incident duration was a highly variable quantity and although the model performed better than a linear regression, its classification was still correct only in half of the time.

Artificial neural networks (ANN) have been widely used for prediction and pattern classification problems. Lopes et al. [16] presented an adaptive model to forecast the clearance time of real time traffic incidents. The solutions included four models which were calibrated and tested by incident records from Portuguese highways. The performance showed that it was able to estimate 72% of incident with less than 10 minutes error and about 92% with less than 20 minutes error. Some other examples can be found in [17, 18].

The hazard analysis model has been used in traffic engineering, which is a common topic in many fields such as life sciences, biomedical, and reliability engineering. The model is more effective to analyze time-related problem, which is generally used to describe the analysis of data in the form of time from a well-defined time origin until the occurrence of some particular event of an end point [19]. Examples include the time between incident occurrence and its clearance [8, 20], the time between planning and execution of an activity [21], and the analysis of urban travel time [22].

Furthermore, hazard-based model has been used to model incident duration time. Chung [23] presented an accident duration model using 2-year-accident dataset from 2006 to 2007 in Korean freeway systems, and the Log-logistic distribution was selected for accelerated failure time metric model. Although the model had large prediction error, statistical test results indicated that this model was stable over time. Tavassoli et al. [24] developed parametric accelerated failure time survival models of incident duration. They found that the duration of each type of incident is uniquely different and responds to different factors.

One distinctive feature of hazard based model is that the model precision will be improved if the best fitting distribution of time variable is chosen. In this study, hazard based models, in particular the accelerated failure time (AFT) metric, are utilized to model both incident clearance time and arrival time.

#### 3. Description of Incident Base

Incident data was collected by Queensland Department of Transport and Main Roads’ STREAMS Incident Management System (SIMS) for South East Queensland urban networks from November 2009 to November 2010. SIMS is an incident management system which is used throughout Queensland, Australia, to capture traffic incidents which cause an impact on traffic flow on the road network. There are total 35103 incident data for one year, which can be classified into 9 types: alert, congestion, crash, fault, flood, hazard, planned incident, road works, and stationary vehicles. There are many detailed properties in SIMS incident data base, but not all of them are closely related to incident time prediction, such as location, SIMS ID, and status. Hence, the major properties of each incident data are shown in Table 1. However, not all these properties are recorded for each incident occurrence. For example, the parameters are only applicable for crash data which are “number of vehicles involved,” “number of people injured,” and “number of fatalities”.

Only 3 types of incidents: crash, stationary vehicle, and hazard are used to model development though 9 types of incident recorded in SIMS data base. Other incident type data are rare recorded. Consequently, the clearance time and arrival time prediction model are only developed for 3 types of incident.

#### 4. Model Development

Hazard based time models were originally used for problems in biomedical, engineering, and social sciences, which are a class of statistical methods for studying the occurrence and timing of events. Recently, they were used to model time related issues in transportation. A review of the application of the hazard based duration models in transportation up to the early 1990s [25].

The incident time in hazard based model is a realization of a continuous random variable , with a cumulative distribution function , which is called the failure function. A probability density function , survival function , and hazard function are given as (2)–(4). The relationships between these four functions are formulated in (1)–(4), and means probability. The function of a random variable is given by In (4), with fully parametric models, three distributional alternatives were considered, namely: Gamma, Log-logistic, and Weibull, for the hazard function and are tested to find the best fit to the incident clearance time and arrival time. The functional forms of the hazard function for each model can be derived by using each distribution model and general function.

##### 4.1. Gamma Distribution

The Gamma distribution is briefly described as a two-parameter family of continuous probability distributions. The scale parameter is and the shape parameter is , where and . The Gamma function is mathematically defined as [26] After algebra transform, the p.d.f. (probability density function) of the Gamma distribution, generally written as , is given by When , the Gamma density function reduces to the exponential density function, and the exponential distribution is also a special case of the Gamma distribution.

When , (5) reduces to the one-parameter Gamma distribution, also referred to as the standard Gamma distribution of , written as with its c.d.f. (cumulative distribution function) defined as Specification of the survival and hazard functions for the Gamma distribution are based on (8), which is called the incomplete Gamma function. The survival function is given by the following equation: The Gamma distribution hazard function can be expressed as

##### 4.2. Weibull Distribution

The Weibull distribution model is almost the most widely applied parametric function in survival analysis because of its flexibility and simplicity among all the families of parametric time distributions [26].

The Weibull probability of event time , a continuous function, is featured by the use of two parameters: a scale parameter and a shape parameter . The survival function with the Weibull distribution is given by Given the intimate associations among various lifetime measures, the hazard function in the Weibull distribution can be readily derived from the above equation.

Consider The cumulative hazard function can be expressed in terms of , given by Therefore, the cumulative hazard function can be written as Taking natural Log values on both sides of (14), (14) can be written as

Specifications of and lead to the following equation for the Weibull p.d.f. function: Likewise, the c.d.f. at time is derived by Given , the Weibull hazard function can be reexpressed as

##### 4.3. Log-Logistic Distribution

The lognormal distribution is widely used to describe events whose rate increases initially and decreases consistently afterwards. The Log-logistic distribution of is the antilogarithm of the familiar logistic distribution. Let . The density function of is defined as the familiar logistic distribution [26]: where and are parameters for the logistic function of , described as . Let and . The antilogarithm of (19) is the density function of : where and are parameters of the Log-logistic distribution, written as . The c.d.f. of is then given as Therefore, the survival and hazard rate functions of can then be readily derived as follows:

#### 5. Model Result

##### 5.1. The Fitness of Distribution

Understanding of incident characteristics and patterns is essential to establish an appropriate prediction model; therefore, the statistical analysis is carried out firstly. There are 4966 crash records, 15791 stationary vehicle data, and 3847 hazard records for clearance time which are used to do distribution fitting analysis. Four probability density functions, which are Gamma, Log-logistic, Weibull, and lognormal, are fitted to the clearance time for crash, stationary vehicle, and hazard incidents, respectively, (see Figures 3(a), 3(b), and 3(c)). Thick full lines indicate the best fitness distribution. The figures indicate that each incident classification has its respective best fitness distribution function. Four parameters estimates of clearance time probability density distribution: Log likelihood, domain mean, and variance are listed in Table 2. Log likelihood and variance statistics indicate the goodness of fit distribution.

**(a)**

**(b)**

**(c)**

The less the variance is, the better the distribution fitting will be. For example the Gamma distribution variance for crash clearance time is 967.94, which is the least one comparing other distributions. It is clearly shown that Gamma distribution is best fit for crash clearance time, Log-logistic for stationary vehicle, and Weibull for the hazard.

There are totally 4569 crashes, 14665 stationary vehicle data, and 3382 hazard records for arrival time which are used in this distribution fitting. The number of arrival time record is less than the counterpart for clearance time, because there exists an abundant of invalid arrival time data records in SIMS. All the invalid and defective data are filtered. Figures 4(a), 4(b), and 4(c) represent the probability density distributions of arrival time for crash, stationary vehicle, and hazard separately. Table 3 lists the parameters estimates of arrival time probability density distribution for each incident type. Both the estimate parameters and the figure indicate that the Gamma distribution is best fit for crash arrival time, Log-logistic for stationary vehicle, and Weibull for the hazard.

**(a)**

**(b)**

**(c)**

In summary, the clearance time and arrival time for the same incident classification follow the same probability distribution, but with different estimated parameters. Different incident classification has different probability distribution characteristics for both clearance time and arrival time. The best fitting model was selected for each incident type based on the results above to develop hazard based prediction model.

##### 5.2. Hazard Based Model for Crash Clearance Time and Arrival Time

The crash clearance time and arrival time are developed based on the Gamma distribution survival model. Data filter is carried out before model development in order to improve model precision. For example, the clearance times longer than 3 hours, accounting for less than 5% of the total crash dataset, are filtered as shown in the green bars in Figure 5 in order to improve the model accuracy.

Tables 4 and 5 list the parameter estimate results for model estimated for crash incident clearance time and arrival time. A positive sign of an estimate coefficient suggests an increase in the incident clearance time and a decrease in hazard function associated with an increase in that property variable.

All variables are statistically significant at a 95% confidence level. Therefore, all 14 significant property variables for crash clearance time are listed in Table 4. However, there are only 8 significant variables for arrival time, which is obviously less than that for clearance time. For example, priority, blockage type, weather, and so forth have a significant effect on the clearance time but not on the arrival time, because of different incident time characteristics. Another reason is that the information recorded in the SIMS for arrival time is less than clearance time.

##### 5.3. Influence of Property Parameters on the Prediction Time

Table 6 represents the percentage change in clearance time and arrival time for crash and hazard incident. A negative percentage indicates a decrease in the clearance time with an increase in that property variable. Line “—” means that the variable has no significant effect on the incident time. For instance, when priority is increased by one, the crash clearance time is 16.13% shorter, and the hazard clearance time is 7.03% shorter, but no significant influence on arrival time. As the number of injured people is increased by one, the crash clearance time is 109.83% longer, while crash arrival time is 57.02% longer, but no significant influence on hazard clearance time and arrival time.

The evaluation of the prediction accuracy for crash incident clearance time is given in Table 7 as an example. There are 4966 crash clearance time data which are used to set up hazard based model. 30% of them are used to evaluate the prediction accuracy. Results in Table 7 indicate the absolute value of the difference between prediction clearance time and measured clearance time. For example there are 543 incidents whose difference between prediction and measured time is less than 10 minutes, which account for 39.68% of the total evaluation incident data. The accuracy of the model is similar with that of Chung [23].

#### 6. Conclusions

Hazard based prediction model for both incident clearance time and arrival time are developed. Three types of incidents are investigated based on data collected from SIMS. Before model development, the best fitting model was selected for each incident type based on the results of the likelihood ratio and variance. The results show the following.(1)Clearance time and arrival time follow the same distribution with the different estimated parameters for each incident type.(2)Gamma, Log-logistic, and Weibull distribution are best fit for crash, stationary vehicle, and hazard incident, respectively. After data filter, the hazard based prediction model is developed for crash incident as example.

There are 14 significant incident property variables for clearance time, while there are only 8 significant variables for arrival time. It clearly indicates that clearance time and arrival time have different impact factors.

The percentage changes in clearance time and arrival time for crash and hazard incident are given. The impact of each property variable on clearance time and arrival time is quantitatively provided. The model accuracy is given at the end of paper.

#### Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

#### Acknowledgments

This study has been substantially supported by the National Natural Science Foundation Council of China (no. 71201103). The first author finished the work during the postdoctoral research in Tongji University, and the support of a Project of Shanghai Shuguang Program (13SG23) is acknowledged. The first author thanks the support of the Smart Transport Research Center at the Queensland University of Technology by providing real traffic incident data.