Analysis of Work-Zone Crashes Using the Ordered Probit Model with Factor Analysis in Egypt
Work-zones, due to their nature, are predisposed to hazardous situations. This is a consequence of conducting construction work within the vicinity of, or near, vehicular traffic. The exposure to danger underscores the need for proper understanding of the occurrence of work-zone crashes, as well as the risk factors associated with them. This paper aims mainly to develop a hybrid approach that combines a factor analysis method and an ordered probit model to carry out a comprehensive analysis of work-zone crashes. The paper presents an analysis of work-zone data crashes from 2010 to 2015 that occurred in Egyptian long-term highway maintenance and rehabilitation projects. Factor analysis is used to identify the main and common factors that influence work-zone crashes and to calculate the weight of every factor. The ordered probit model is developed, based on the results of factor analysis scores, to examine the contribution of common factors in the severity of work-zones. The most influential factors that have contributed to work-zone crashes are weather condition, number of lane closures, type of surface construction, road character, day of week, and so forth. In addition, the results indicated that four common factors are significantly affecting the severity of work-zone crashes in Egypt.
For the last few years, Egypt has recorded such a large number of traffic crashes that “death on the road” is frequently seen on signs as a way of describing highway safety in Egypt. The fatalities from road traffic crashes for 2013 were 10,466, which had a great impact on the emerging economy . As highway infrastructure ages, there is a growing need for regular maintenance and rehabilitation. This leads to an increase in the number of work-zones on highways, with the attendant disruption and interruption of traffic flow, resulting in traffic safety problems. Closing off highways to traffic, while maintenance and rehabilitation works are ongoing, is usually unrealistic. This leads to only a partial closure, with redirection of traffic into alternative lanes and routes. Scenarios like this are typical, creating traffic work-zones, resulting in constricted space for traffic. As a result of all this, the safety of work-zones occupies a place of extreme importance in Egypt and the Middle East in general. Therefore, the pressure on government and road safety managers to institute measures that can reduce the frequency of road crashes and the severity of injuries is on the increase. However, no previous research about work-zone crashes in Egypt has been published. Thus, research work on crash severity may lead to the discovery of factors that influence changes in severity and that may be potentially beneficial in developing traffic controls, to minimize the incidence of severe highway crashes.
This paper mainly aims to develop a hybrid approach, which combines the factor analysis and ordered probit model to carry out a comprehensive analysis of work-zone crashes. The first stage, the factor analysis method, is used based on work-zone crash data elements to determine the weight of each factor to classify the main factors and identify the common factors. In the second stage, the severity model is developed based on common factor scores to determine the relative effects of each common factor. An ordered probit model is an appropriate model to identify the characteristics that contribute to crash severity, because crash severity, as a dependent variable, is ordered in nature [2–4].
2. Literature Review
Around the world, significant research effort had already been conducted in order to identify the factors and causes that have an influence on work-zone crashes with regard to its severity and frequency. Essential portion of the research revealed that part of an effective and efficient work-zone safety is utilizing statistical modeling techniques to determine the contributing factors associated with work-zone crashes.
Dissanayake and Lu  developed a set of sequential binary logistic regression models to predict the factors causing crashes. However, the methodology does not account for multivehicle collisions instigated by young drivers as well as the condition of the damaged vehicles. Abdel-Aty et al.  used the log-linear technique to examine the connection between driver age and crash characteristics. They concluded that injury severity was positively associated with the age of the driver. Furthermore,  conducted a study using a negative binomial regression technique to analyze cross-median crashes statistically. Two predictive models were developed, by considering three factors: median width, Annual Daily Traffic (ADT), and segment length. Donnell and Mason  developed logistic regression models, by using both the ordinal and nominal responses as a way of demonstrating crash severity. From the study, the ordinal response model worked better for cross-median crashes, whereas the nominal response gave a better outcome for median-barrier crashes. Osman et al.  conducted a study, which intended to assess the factors to investigate the contribution of several factors to severity of collisions in work-zones. The model showed that roadways class, crashes during weekends and evening times, and road alignment were significantly affecting the severity of work-zone crashes, whereas, in New Zealand,  concluded that the time period, heavy vehicle, and vulnerability of drivers were the factors which determined the severity of collisions in work-zones. Li and Bai  examined work-zone crash severity by using crash severity index models; they recommended that during work-zone safety inspection or work-zone planning crash severity index models should be used. Theofilatos et al.  applied meta-analyzed technique to examine the connection between crash frequency and work-zone length and duration. Yang et al. [13, 14] devised a crash frequency model and further evaluated the measurement errors in work-zone length, by using full Bayesian estimations rather than the NB model, to identify risk factors in work-zone safety. A study conducted by  used the negative binomial model to compute monthly crashes in the work-zone. From the study, they concluded that working at night resulted in minimal crashes, compared to daytime. However, they could not identify an acceptable relationship between injury frequency and presence. Chang and Mannering  studied the occupancy-crash-injury severity relationship and deduced that truck-involved crashes, rear-end crashes, and high-speed limits increased crash injury severity; whereas the logit model was used by  to analyze driver injury severity involved in a single-vehicle crash, the researchers concluded that several major factors increase driver injury severity. These included daylight, speed limit, and dry pavement as opposed to a slippery pavement, among others.
Khattak and Targa  and Khattak et al. [19, 20] applied the ordered probit models in predicting injury level for crashes at construction zones, involving trucks, to identify characteristics that contribute to the severity of accidents involving older drivers, while  applied this technique to show the similarities and differences in factors contributing to crash injury severity on roadway sections, signalized intersections, and toll plazas; he concluded that results produced by the model were the best, besides it is worth due to its simplicity. Thus, he recommended using the order probit approach to model driver injury severity. Additionally, the ordered probit model was used for crash severity level, and as a result, the study encouraged brief reporting of occurrences, as this made their retention and storage in crash databases easier . On the same note,  used the same model in investigating the risk of different injury levels sustained in two-vehicle crashes and single-vehicle crashes, with the conclusion that pickups and sports vehicles were less safe than passenger cars. Also, the study concluded that males and younger drivers, in newer vehicles at lower speeds, suffered less severe injuries. The same technique was applied by  to check factors influencing the severity of injury faced by motor-vehicle occupants. According to the authors, rural areas recorded more serious crashes than in urban areas, and women were more likely to be involved in serious or fatal injuries crashes than men. Qi et al.  conducted a study on rear-end crashes in the work-zone, through the stepwise elimination selection method, for calibrating the ordered probit model. A limitation of the methodology is that it lacks vital information, such as various vehicle characteristics, external factors, such as light and weather conditions, and the drivers’ age and sex, in their severity model. According to , surface conditions contributed to the severity of work-zone crashes. For instance, dry conditions were more dangerous and severe than wet ones. Furthermore, middle-aged drivers were more susceptible to work-zones crashes, according to .
Xu  applied decision tree analysis, correlation analysis, and cluster analysis as data mining methods to analyze traffic crash data. Sun et al.  developed the Analytic Hierarchy Process (AHP) considering causes and environment, road, and the influence of people on traffic crashes. Cheng  studied the correlation between traffic crashes and different factors were analyzed. The study concluded that traffic crashes had a significant correlation with drivers, number of vehicles, and population. Chen et al.  conducted a study using factor analysis to study the major factors contributing to road traffic crashes. The researchers concluded that the major factors in crashes depend on their weights, faulty behavior, driving experience, the condition of the vehicle, the purpose of the vehicle, and so on. On the same note, Yuan et al.  used the same technique to determine the weight of main factors through the factor score matrix.
This study is an attempt to bridge the gap in knowledge of the work-zone crashes in Egypt. To our knowledge, no previous research about work-zone crashes in Egypt has been published before. This study is among the first to investigate the factors affecting the severity of Egyptian highways work-zone crashes. Second, the results and conclusions of this paper can provide insightful information for stakeholders in the planning and management of work-zones.
3. Data Collection
The focus of this paper rested on work-zone crashes recorded during highway maintenance and rehabilitation projects, on both urban and rural highway facilities, with long-term projects (duration greater than one year) with both day and night work. In Egypt, the Ministry of the Interior has a traffic department, with a research unit responsible for managing the database of national road crashes. A subsidiary of the Ministry of Transport, the General Authority for Roads, Bridges and Land Transport (GARBLT), also regularly collects crash data, especially those that occur on federal highways. Crash data of about 345 cases, spanning 2010 through 2015, was used in this paper. Crash variables extracted from the database for each incident included information on the driver, vehicle information, time, characteristics of the road, work-zone information, and environmental conditions. Six tables were used in presenting the data that was obtained. These included fault drivers table, environmental table, work-zone information table, road character table, vehicle and crash characteristics table, and crash severity table. Table 1 illustrates the frequency distribution and descriptive statistics of the influential factors.
4.1. Factor Analysis
Factor analysis  is a method of statistics which aims at finding the most important factors that can cause an event, for a number of plausible reasons. It is a collection of various methods, used to study the reason behind underlying constructs that may, or can, influence the responses to the values of observed data. Its main aim is to determine the most important and common factors that are influencing a set of observed measures. In this study, we apply the principal component method to extract maximum variance from the set of work-zone crash variables and put them into common factors as index for the analysis. The general model of factor analysis, according to , is given byHere is the observable random vector (i.e., original observed variables); is the common factor of , is the coefficient of (factor loading matrix), is the correlation coefficient, is the error factor, and B is the special factor of X (usually ignored when analyzing).
determines the degree or strength of a linear relationship between the variables that are present in the rows and columns of the matrix. The higher the correlation is, the stronger that relationship will be.The process of factor analysis involves normalizing the matrix so that its mean value is 0 and its variance is 1, under the assumption. Then we need to assume that and are independent of each other. Here, X has m common factors and is known as the factor model. After that, the correlation coefficient matrix, i.e., , and its latent root, i.e., , are obtained. The last step is to determine the number of factors that are common and relevant and add up the common variance, i.e., , and then eventually the common factors are obtained by rotating the loading matrix.
4.2. Ordered Probit Model
Ordered probit model is the other statistical modeling methodology that has been used in this paper. This model is usually built on the idea of a latent factor underlying the injury risk propensity from road crashes. These road crashes were studied to determine the observed ordinal fatal injury crash reasons . The general specification of the model is given by Here is the latent risk propensity for crash victim , is the estimated vector of parameters, and is the vector of observed nonrandom explanatory variables. It measures the various attributes of crash victims , and is the random error term that follows a standard normal distribution.
Hence, its mean is 0, and the variance is 1 for .
The standard ordinary least square regression technique cannot be applied here, as the dependent variable of this model, , is unobserved. It is also reasonable to assume that a high risk of severity is related to observed fatalities, denoted by . This relationship has been described in the following equations [35, 36]:Here represent the threshold values for crashes that define , corresponding to integer ordering, and is the highest-ordered severity level. Now, it may be noted that the probability of a crash victim facing a level severity is always equal to the probability that (i.e., the latent risk propensity) assumes a value which is between two fixed thresholds. It can also be explained as follows: if we are given the value of , then the probability that the severity faced by crash victim belongs to each severity level is given by Here Φ is the cumulative normal distribution function. For estimation, it can be written aswhere depicts the lower threshold and depicts the upper thresholds for the severity level . Now, for all the probabilities to be positive, the threshold values must satisfy the restriction μ <…<uk <…<uk-1. The understanding of the effect of individual estimated parameters involves the computation of these probabilities. SPSS has been used to develop the crash model. Crash severity was considered in three levels for severity models in this paper including (i) no injury, (ii) injury, and (iii) fatal injury.
5. Results and Discussion
5.1. Factor Analysis
Factor analysis is a statistical approach used to reduce data dimensionality. In this study, factor analysis was conducted on the high-dimensional work-zone data to simplify the process of understanding the data. The factor analysis model was prescribed using the SPSS statistics program. To verify the suitability of the data for the analysis, the fitness test was carried out using the KMO (Kaiser-Meyer-Olkin) and Bartlett test of sphericity. The KMO test measures the adequacy of samples for each variable in the model and for the complete model. The value ranges between 0 and 1. When the KMO approaches 1, it implies a high correlation between variables, and the data is suitable for factor analysis. In this paper, the KMO value of 0.67 obtained suggests that the data is well suited for factor analysis [37, 38]. In the case of Bartlett test of sphericity, the Chi-square value of 1786.156, with 78 degrees of freedom, was found to be significant at 0.000. The result here demonstrates a good correlation between variables and proved to be suitable for factor analysis. To ensure reliable results, we apply three tests, i.e., the eigenvalue, scree plot, and parallel analysis test to identify the number of factors that reasonably explain greater percentage of the data variability. The eigenvalue (EV) represents the proportion of variance accounted for by each factor. In Table 2, from the initial eigenvalue, the percentage of variance explained by the first principal factor was 24.466% and the percentage explained by the second, third, fourth, and fifth factors is 17.407%, 11.684%, 11.177%, and 8.543%, respectively. The cumulative percentage variance explained by the five factors was 73.278%. In broad terms, considering the explanatory power of a factor, we retain the factors with eigenvalue more than 1 as a common factor, consistent with the first five factors.
The number of common factors also can be known by parallel analysis test and scree plot. Figure 1 shows a scree plot and parallel analysis test consisting of 13 factors. From the fifth factor, the eigenvalue of each factor declined more slowly. Table 3 illustrates the results of parallel analysis test for each factor. According to the results of the test methods, we should select the first five factors as common factors. It is significant to state that one of the motives for conducting factor analysis in this study is to decrease the large number of variables that describe a complex phenomenon such as work-zone crash severity analysis to limited interpretable latent variables known as a factor. In essence, this section seeks to identify a smaller number of interpretable factors that explain the maximum amount of variability in the data. The principal component method was used to extract the factor loading matrix. The varimax orthogonal method has also been used to diffuse the factors that are common, by factor rotation. The loadings of the rotated factor loading matrix further differed from each other. High loadings were dispersed close to the matrix diagonal, which suggested that each common factor is mainly associated with a few factors. This study adopts a strategy which is named the common factor, based on the communal idea of influential variables taken into account in the common factors. In this way, we can better comprehend the function implication of each common factor. The five factors with high loadings as shown in Table 4 are (number of lane closures = .901, type of surface construction = .830, road character = .830, and road class = .813) related to work-zone information being a feasible and functional countermeasure for reducing the severity of crashes occurring in work-zones. Similarly, the four factors with high loadings (day of week = .887, month = .852, and time = .843) on factor two reveal information about visibility and traffic condition. Factor three reveals information about the crash characteristics (crash type = .869, number of the vehicles = -.603, and vehicle type = .593). In the case of factor four, the variables with high loading are gender and age, all related to driver skills and ability to operate safely under different work-zone configurations. Finally, there is the weather condition factor. The names assigned to the five common factors are shown in Table 5.
Based on the regression method factor score matrix was extracted. Each powerful factor had scored on every common factor. Through normalizing of the variance percentage of the five common factors, the weights of factors were calculated. The normalized weight Ai is estimated using the data of rotated loading quadratic sum. The absolute value Ak (k is the number of influential factors) of the influential factor score of each common factor was extracted in Table 6, and then normalization was proposed. The score Ak after normalization represents the weight of the influential factor in relevant common factor. Then the weight Wk of the influential factors of the work-zone crashes was deduced. The weights of main influential factor are shown in Table 7. Figure 2 represents the weight of 13 influential factors Wk.
The results showed that the weather condition factor is the most important factor that explains high amount of the variability in the original 13 variables, followed by number of lane closures, type of surface construction, road character, day of week, and so forth. The first five influential factors had a noticeable effect on work-zone crashes because their individual weights are significant (i.e., ≥ 8%).
The results in this section are consistent with findings in the literature. The impact of weather condition on work-zone crash severity is consistent in the research by . Regarding traffic safety at freeway work-zone, Ozturk et al.  and WuBiao et al.  concluded that number of lane closures has a significant relationship. Road character and day of week were found to be significantly associated with work-zone severity [9, 10, 37, 39]; these conclusions were consistent with the findings in this paper.
5.2. Estimations of Ordered Probit Model
The ordered probit model was specified using SPSS. The analysis consisted of 345 observations of work-zone crashes on 13 factors. The factor analysis identified five factors, which were used as inputs for the ordered probit model. The factors were tested for statistical significance (p value < 0.05) to find out the factors that significantly influence injury severity crashes at highway construction work-zones in Egypt. The results of the ordered probit model are presented in Table 8. Four factors were identified as significant factors associated with crash injury severity in work-zones.
As estimated in the ordered probit model, the crash characteristics factor has a better safety performance in terms of reducing the probability of the average injury risk, as compared with advanced information factor, driver skills factor, and weather condition factor. In other words, crashes that occurred under the influence of crash characteristic factor are more likely to have a nonfatal injury than under other factors. Based on the result, the visibility and traffic factor does not have a significant impact on injury severity of work-zone traffic crashes. The coefficient of advanced work-zone information factor is found to be positive and thus points to the fact that inadequate advanced information about construction work-zones, on Egyptian highways, tends to increase the probability of fatal injury crashes in work-zones. In this regard,  demonstrated that road traffic crashes occur for one of three key factors. The first reason, related to information flow, is a perceptual error. This occurs, for instance, when work-zone conditions are not perceived early enough to allow for sufficient driver perception-reaction time.
The increase in driver skills factor will decrease the probability of fatal injury in work-zone crashes. One possible reason may be the lack of sufficient driving skills for safe maneuver in the event of unfavourable traffic and roads. Several studies have been conducted to investigate the factors, relating to driver skills, which are influential in predicting fatal injury crashes. A study by  found that young drivers are overrepresented in road crashes and fatalities and that one approach to improving their safety is the provision of advanced training. Beanland et al.  reviewed the literature on the efficacy of advanced driver training with the aim being to assess its effectiveness as a means of reducing young driver crash-involvement. Their report indicates that various forms of prelicense and postlicense training have been found to be effective, for skill acquisition and enhanced driving performance. The results revealed that crashes that occurred under the influence of weather condition factor have higher probability of fatal crashes. This finding is consistent with previous studies [15, 39] and in contrast with the previous study .
The probability to be involved in no injury and injury crash rather than fatal crash will increase significantly if the crash occurred under the influence of crash characteristic factor. That may be explained by the fact that crashes involving heavy vehicles often result in multiple-vehicle crashes as opposed to when crashes occur without heavy vehicles, because heavy vehicles have reduced breaking abilities. This is explained by the excessive weight carried on trucks in Egypt; more than 96% of Egypt’s goods are transported on trucks .
Hence, the influence of the factors on the risk of the crash can be ranked by comparing the size of the estimated coefficients of the factors. In that regard, it can be concluded that advanced information factor apparently has the highest and greatest impact on injury level for a crash occurring at a work-zone (with β = .161), whereas weather condition factor appears to have a relatively low risk (β = .132). In terms of magnitude, for example, crashes that occurred under the influence of advanced information factor are 1.22 times crashes that occurred under the influence of factor weather condition, assuming all factors remain being equal.
6. Conclusion and Recommendations
In this paper, a hybrid approach, which combines an ordered probit model and factor analysis method, was developed to carry out a comprehensive analysis of the injury severity of crashes in highway construction work-zones in Egypt. The results of the factor analysis revealed the important and main factors that are influential in determining crash severity were weather condition, number of lane closures, type of surface construction, road character, day of the week, and so forth. The advanced work-zone information factor, visibility and traffic factor, crash characteristic factor, driver skills factor, and weather condition factor are common factors. The percentage variance explained by the five factors is 73.278%. We applied the results of the factor analysis method to calibrate the ordered probit model to examine the influence of these factors in predicting injury severity of work-zone crashes. The model estimation results showed that four factors (advanced work-zone information factor, crash characteristic factor, driver skills factor, and weather condition factor) are significant risk factors associated with work-zone crashes. In addition, the results showed that the weather condition factor had significantly great influence on severe work-zone crashes in Egypt.
Based on the results obtained in this study, four factors which had a massive influence on severe injury work-zone crashes were found in this study. This means that the proportion of 55.87% variance in work-zone risk could be explained by four factors. Thus, this paper offers the following suggestions that can help to reduce the frequency and severity of work-zone crashes, for improved traffic safety in highway construction work-zones in Egypt.
Driver skills training: programmes designed to offer regular skills training to prelicensed and postlicensed drivers are required to reduce the incidence of bad driving behavior, for improved road safety. Besides, existing countermeasures that work well in other countries can be adapted to Egypt, such as graduated driving licensing programmes [46–48] and additional precincts to minimize exposure of novice drivers to work-zone crashes [49, 50].
Provision of advanced information: the fact that most of the crashes that occurred were associated with human errors infers that advanced knowledge of road and traffic conditions will assist drivers to prepare and take action to avoid crashes adequately. In this regard, we recommend the development of effective education programmes to educate the general public as well, who have to travel through work-zones. In addition, ITS technologies such as Dynamic Message Signs (DMS), at a reasonable distance ahead of the work-zone, may be an effective means of compensating for adverse design, human factors, and roadway conditions.
This paper has some limitations, which may affect the result and interpretations; some information (seat belt, alcohol, etc.) was not taken into account, due to the lack of such information in the database. It is recommended that all of this information be collected in the work-zone crash database and used for model calibration in the future.
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
The authors thank the Ministry of Commerce of China (MOFCOM) for funding of Ph.D. doctoral program at Southwest Jiaotong University and Dr. Essam Radwan from University of Central Florida for valuable comments and suggestions by that have helped improve this article.
World Health Organization, Global Status Report on Road Safety 2015, WHO Library Cataloguing-in-Publication Data Global, 2015.View at: Publisher Site
S. Dissanayake and J. Lu, “Analysis of severity of young driver crashes: Sequential binary logistic regression modeling,” Transportation Research Record, no. 1784, pp. 108–114, 2002.View at: Google Scholar
O. Ozturk, K. Ozbay, and H. Yang, “Estimating the Impact of Work Zones on Highway Safety,” TRB 93rd Annual Meeting - Transportation Research Board, vol. 116, pp. 1–16, 2014.View at: Google Scholar
A. J. Khattak, R. J. Schneider, and F. Targa, “Risk factors in large truck rollovers and injury severity: analysis of single-vehicle collisions,” Transportation Research Record: Journal of the Transportation, vol. 40, 2003.View at: Google Scholar
V. Katta, Development of crash severity model for predicting risk factors in work zones for Ohio, 2013.
H. Xu, “Study on road traffic accident data analyzing and mining,” J Chin People Public Sec Univ, vol. 4, pp. 69–73, 2008.View at: Google Scholar
P. Sun, R. Song, and H. Wang, “Prevention and analysis of the traffic accidents,” Safe Environment Engineering, vol. 14, pp. 97–100, 2007.View at: Google Scholar
W. Cheng, Research Based on Model And Method of Urban Traffic Accidents And Traffic Conflict Technique, China Jilin Univ, Jilin, China, 2004.
Q. Yuan, X. Li, C. Wang, Y. Li, and Y. Gao, “Cluster and factor analysis on data of fatal traffic crashes in China,” in Proceedings of the 2017 4th International Conference on Transportation Information and Safety (ICTIS), pp. 211–224, Banff, AB, Canada, August 2017.View at: Publisher Site | Google Scholar
T. W. Anderson, An Introduction to Multivariate Statistical Analysis, John Wiley & Sons, New York, NY, USA, 1984.View at: MathSciNet
S. P. Washington, M. G. Karlaftis, and F. L. Mannering, Statistical and Econometric Methods for Transportation Data Analysis, Chapman & Hall, Boca Raton, Fla, USA, 2003.View at: Publisher Site
S. A. Mulaik, Foundations of factor analysis, Statistics in the Social and Behavioral Sciences Series, CRC Press, Boca Raton, FL, USA, 2nd edition, 2010.View at: MathSciNet
B. G. Tabachnick, “Using Multivariate Statistics,” Bost Allyn Bacon Bamberger, vol. 31, p. l, 2001.View at: Google Scholar
O. Ozturk, K. Ozbay, H. Yang, and B. Bartin, “Crash Frequency Modeling for Highway Construction Zones,” 2013.View at: Google Scholar
B. Wu, H. Xu, and W. Zhang, “Identifying the Cause and Effect Factors of Traffic Safety at Freeway Work Zone Based on DEMATEL Model,” in Proceedings of the Second International Conference on Transportation Engineering, pp. 2183–2188, Southwest Jiaotong University, Chengdu, China.View at: Publisher Site | Google Scholar
M. Green and J. Senders, Human error in road accidents, visual expert, Canada, CRC Press, 2018.View at: Publisher Site
S. Gargett, L. B. Connelly, and S. Nghiem, “Are we there yet? Australian road safety targets and road traffic crash fatalities,” BMC Public Health, vol. 11, 2011.View at: Google Scholar