Abstract

In this study, a random parameter Tobit regression model approach was used to account for the distinct censoring problem and unobserved heterogeneity in accident data. We used accident rate data (continuous data) instead of accident frequency data (discrete count data) to address the zero cell problems from data where roadway segments do not have any recorded accidents over the observed time period. The unobserved heterogeneity problem is also considered by using random parameters, which are parameter estimates that vary across observations instead of fixed parameters, which are parameter estimates that are fixed/constant over observations. Nine years (1999–2007) of panel data related to severe injury accidents in Washington State, USA, were used to develop the random parameter Tobit model. The results showed that the Tobit regression model with random parameters is a better approach to explore factors influencing severe injury accident rates on roadway segments under consideration of unobserved heterogeneity problems.

1. Introduction

Over the last decade, numerous studies have been conducted to explore the factors that cause accidents on roadway segments; various statistical modeling techniques have been employed, especially count models. Initially, simple linear regression models were employed. Although this is the simplest and easiest model, the data generally violate the basic assumption of homoscedasticity, which means that the variance increases as the variable increases. In addition, negative accident count values are predicted which should be greater than or equal to zero in reality [1]. To address these problems that the linear regression model had, previous investigators suggested a Poisson regression model wherein accident frequency is translated as a discrete random variable [2]. However, there was an important constraint in this model, namely, that the mean must be equal to the variance; the standard errors will be biased when this restriction is not valid. Meanwhile, real accident data were found to be overdispersed, meaning that the variance is greater than the mean [3, 4]. As a result, the Poisson regression model incorrectly estimates the likelihood of accident frequencies and, to overcome this overdispersion problem, a negative binomial model was suggested, which relaxes the constraint that the mean is equal to the variance. Negative binomial models had been shown to be more appropriate than the Poisson model for describing the relationships between accident frequency and geometric elements [58]. Besides, a variety of attempts to analyze accident frequencies were made, and these resulted in a reduction in accidents and improved accident prevention (random effects models [9, 10], zero-inflated count models [1113], and random parameters count models [1418]).

Aside from methodologies that address accident frequency data, a Tobit regression method that uses accident rates as the dependent variable was also suggested [19]. This method employs continuous variables, that is, the number of accidents per vehicle-mile traveled, instead of accident frequency (discrete count data) data. In addition, the likelihood that segments/spots will have no accident records during some period can be taken into account using data that is left-censored at zero. Particularly, accident frequencies that involve severe injury accidents (fatal and disabling injuries) are lower than those of other types of accidents such as EPDO, possible injuries, and evident injuries, that is, zero records in many cases. This has made it difficult to analyze severe injury accidents using the traditional frequency method, which is the reason why the Tobit model is a more appropriate approach to analyze the causes of severe injury accidents than the traditional frequency models.

In addition, there is another remaining problem, namely, that previous methods mentioned above do not account for heterogeneity. Since aggregated accident data for analysis do not have information related to the residual environmental effects and socioeconomic, driver, and vehicle characteristics, there is a possibility that unobserved heterogeneity may occur in accident data, which creates variation in the impact of the effect of observed variables on accident frequencies [20]. This problem cannot be addressed in the traditional count models in which the estimated parameters are fixed. Therefore, we must consider models that include random parameters, which allow some or all parameters to vary randomly across observations. Some relevant research has shown that the random parameters approach can account for variations in the variables [1416, 18, 21].

Here, we developed a random parameter Tobit model that allowed us to address the unobserved heterogeneity in accident data for severe injuries and also compared the estimated results from a fixed parameter Tobit model on roadway segments except for interchange segments on interstates. To the best of the authors’ knowledge, this is the first attempt to model severe injury accident case using random parameters Tobit model method. And this approach using accident rates could be applied in the process of select performance measures in HSM (Highway Safety Manual) whose framework and modeling architecture have been introduced in [22].

2. Methodology

The Tobit model is a regression model proposed by Tobin (1958) in which the dependent variable is either left- or right-censored. Here, left-censored means that the data are censored at a low threshold, while right-censored data are censored at a high threshold. In accident data, the data will be left-censored with clustering at a zero base since vehicle crash frequencies may not be observed on all/some segments during the observation period. Using this information, the Tobit model was constructed as follows:Here, is the number of observations, is the dependent variable (severe injury accident rate), is a vector of estimable parameters, is a vector of independent variables (e.g., traffic volumes and segment geometrics), and is a normally and independently distributed error term with zero mean and constant variance . Here, there is an implicit and stochastic index (latent variable) expressed as , which is observed only when the value of is greater than zero (positive). Hence, the likelihood function for the Tobit model over zero and positive observations is as follows: Here, refers to the standard normal distribution function and is the standard normal density function.

The traditional Tobit (fixed parameter) model is described. However, it is difficult to account for heterogeneity (unobserved factors that may vary across observations) in this model. In order to account for heterogeneity using a random parameter, Greene [23] developed a simulated maximum likelihood estimation procedure, which has been shown to be an acceptable method [15, 16, 18, 21].

Estimable parameters that allow for random parameters are as follows:Here, indicates estimated parameters and is a randomly distributed term. Uniform, normal, lognormal, and other forms are considered to be potential density functions for random parameter estimation. The latent variable mentioned in (1) becomes , and the likelihood function from (2) is as follows in log-likelihood form:Here, refers to the probability density function of .

To estimate the random parameters, a simulation-based maximum likelihood using Halton draws was employed which is an efficient distribution of draws for numerical integration [24, 25]. In summary, the random parameter Tobit model could account for unobserved factors and at the same time support the complete use of available data from left-censored severe injury traffic accident data.

3. Data

Vehicle crash accident data of roadway segments on interstates in Washington State (I-5, I-82, I-90, I-182, I-205, I-405, and I-705) had been collected over 9 years (1999 to 2007) to investigate the effects of geometrics and traffic flow conditions such as number of lanes, right and left shoulder width, number of horizontal and vertical curves, and traffic volumes on severe injury accident rates per 100-million vehicle-miles traveled (VMT).

Firstly, the collected data were divided into data on roadway segments and data on interchange segments of the interstate highways. In this study, only crash data on roadway segments were used because crashes on interchange might generally occur within various effects including traffic flow changes, weaving maneuvers, complex geometrics, driver behaviors by traffic signs, and other different conditions from roadway segments. Consequentially, over a continuous period of nine years, the 589 roadway segments which were used for the analysis yielded a panel of 5,301.

Accident rate, the dependent variable, was calculated using the following equation:Here, accident is the total number of severe accidents per 100-million VMT on segment , is the year of observed data, is the number of severe accidents on segment in year , is the average annual daily traffic volume on segment in year , and is the length of segment . Since we sought to determine the effects of geometrics on severe accidents, the dependent variable is defined as the summation of disabling and fatal injuries.

The descriptive statistic values for the primary variables are shown in Table 1. The average length of roadway segments was 1.837 miles, and 13,052 vehicles is the mean value of the average annual daily traffic volume on the objective segments during the study period. On average, 2.6 lanes per direction exist with a minimum of one lane and a maximum of five lanes. The mean value of the shoulder width is 6.9 ft on both the left and the right shoulder. In terms of curves, 1.8 horizontal curves and 3.2 vertical curves exist on the roadway segments.

4. Model Estimation Results

Two types of modeling methods were used to estimate whether parameters are fixed (fixed parameters, left side in Table 2) or they vary across observations (random parameters, right side in Table 2). For random parameter estimations, Halton draws were used, which has been shown to produce accurate parameter estimations [25]. The normal distribution of density functional forms gave the best statistical results among the normal, uniform, and lognormal distributions mentioned in the Methodology.

The overall log-likelihood at convergence in the random parameter Tobit model (−3,169.62) showed relatively greater improvement than the fixed parameter Tobit model (−3,191.58). As described in Table 2, a total of seven variables with random parameters were derived to have an effect on the severe accident rates. These parameters are segment length, average annual daily traffic volume, number of lanes, left/right shoulder width, and number of horizontal/vertical curves. A random parameter was used when both the mean and the standard deviation of the parameter distribution were statistically significant (≠0). In this sense, a parameter in which standard deviation is not statistically significant (=0) indicates that the effects are fixed across all segments. All derived variables with random parameters showed statistically significant mean and standard deviation values. On the other hand, some variables with fixed parameters were found to be statistically insignificant, which shows the flexibility of the random parameters in that the effect of the covariates must be constant/fixed across all observations [21].

The results of modeling, the marginal effect, and elasticity of the random parameters and fixed parameters models are presented in Tables 2 and 3, respectively. The logarithms of segment length, traffic volumes, and number of lanes were shown to have statistically significant fixed and random parameters with positive signs. This is consistent with the expectation of increased frequency of severe injury accidents with higher exposure (longer length, higher traffic volumes, and more lanes) on the roads.

A random parameter of the segment length that is normally distributed with a mean of 1.033 and standard deviation of 0.989 indicates that the effect of segment length decreases the severe injury accident frequency rate on 14.83% of the observed segments and increases the rate on 85.17% of the observed segments. In terms of elasticity, a 1% increase in length contributed to a 1.015% (fixed parameter) and 1.033% (random parameter) increase in severe injury accident rate; these are indications of elasticity.

We found that traffic volumes have a normally distributed random parameter with a mean of 0.353 and a standard deviation of 0.303. Given this distribution, the effect of traffic volume decreases the severe accident rate on 12.12% of segments and increases the severe accident rate on 87.88% of segments.

The number of lanes variable had a random parameter with a normal distribution with a mean of 0.358 and a standard deviation of 0.288. Given these distribution values, 10.69% of segments showed a decrease in the severe injury rate, and 89.31% of segments showed an increase in the severe injury rates.

With regard to shoulder width, a negative sign (severe injury rate decrease) was found for both fixed and random parameters. The left shoulder width had a normally distributed random parameter with a mean of −0.020 and a standard deviation of 0.024. These parameter values indicated that increasing the shoulder width decreases the severe injury rate in 79.86% of segments and increases the severe injury rate in 20.14% of segments. The right shoulder width variable shows a positive sign and is not statistically significant in the fixed parameter model. However, the right shoulder width variable was found to have a normally distributed random parameter with a mean of −0.024 and a standard deviation of 0.016 in the random parameter model. These distribution values mean that the right shoulder width increases severe injury rates in 7.04% of the main line segments and decreases severe injury rates in 92.96% of the main line segments. These results are consistent with previous studies [15, 17], which have shown that accident probability decreases as shoulder width increases.

We estimated that the number of horizontal curves variable had a random parameter with a mean of −0.025 and a standard deviation of 0.032. This variable decreases severe accident rates in 78.28% of the roadway segments and increases severe accident rates in 21.72% of the roadway segments. The effect of the number of horizontal curves on the severe injury accident rate in the fixed parameters model was positive for all interstate roadway segments considered, but no statistically significant influence was derived; this brings about additional support to the use of the random parameter. Finally, the variable for the number of vertical curves was shown to be statistically significant in both the fixed parameter model and the random parameter model under a normal distribution with a mean of −0.021 and a standard deviation of 0.019. This random parameter result indicates that the effect of the number of vertical curves decreases the likelihood of severe injury rates in 85.49% of all observed segments and increases the likelihood of severe injury rate in 14.51% of all observed segments. Results from these curve-related variables are similar to a previous study that showed that some variations in roadway geometrics may improve driver alertness, resulting in more careful driving [26].

5. Conclusions

We used a Tobit regression model with fixed and random parameters to examine the geometric factors that influence the rate of accidents that result in severe injury (fatal and disabling injury). In the Tobit regression model, a dependent variable (severe injury accident rate) was applied as a continuous variable and was left-censored at zero, which is an alternative to the traditional discrete accident frequency approach. Unobserved heterogeneities were considered as well in the parameter estimation processes by employing a random parameter, which is difficult to do in traditional fixed parameter estimation. In other words, by using the random parameter Tobit method, heterogeneity from factors such as vehicle type, weather, individual character, and other unobserved factors which are not captured in the data collection was accounted and corrected for in finding significant factors on the severe injury accident rates on the interstates.

Nine years of severe injury accident and geometrics data from seven interstate main lines in Washington State, USA, were used to develop the models. Seven variables were found to have random parameters with statistically significant standard deviation values. The effects of these variables vary across observations: segment length, annual average daily traffic volume, number of lanes, left and right shoulder width, and numbers of horizontal and vertical curves.

While this study is exploratory in nature, random parameters Tobit regression model has the potential to provide a fuller understanding of the factors on severe accident rates, which showed the outstanding result related to fixed parameters model. Although the predictive power was improved (as shown in log-likelihood values), other possible variables that may affect the likelihood of severe accidents such as pavement conditions were not considered. In addition, interchange segments having more complex infrastructures and more diverse traffic flow types (merge, diverge, and weave) were not considered in this study. We recommend that future studies include those variables and various analyses for exploring additional causes of traffic accidents.

Competing Interests

The authors declare that there are no competing interests regarding the publication of this paper.