Abstract
Traffic accidents occur due to a combination of factors that lead to casualties and injuries. By identifying the most effective factors, it is possible for safety authorities to provide appropriate solutions for decreasing the accident severity and implementing preventive measures. The aim of this research was to present models to predict the accident severity on two-lane two-way (TLTW) rural highways in Iran over a one-year period from 2019 to 2020. Therefore, the occurrence probability of any type of accident was determined by artificial neural network (ANN)-based prediction models using nine independent variables affecting the accident severity. This study developed numerous ANN structures using back-propagation to model the potential nonlinear relationship between the accident severity and accident-related factors. Results indicated that among the models, the multilayer perceptron neural network (MLPNN) model with 6-2-2 partition had the best performance and prediction power. This model was developed using the standardized rescaling method for covariates and batch for training. Also, 9, 5, and 2 units were considered automatically for the input, hidden, and output layers, respectively, and the hyperbolic tangent and softmax were used as an activation function in the hidden and output layers, respectively. This model had the lowest cross-entropy error of 39.6 and the highest correct percentage of 82.5%, and the area under the receiver operating characteristic (ROC) curve was 0.852. Moreover, among all the effective variables, pavement condition index, roadside hazard, shoulder width, and passing zone ratio had the greatest impact on the accident severity. Finally, safety strategies were proposed to increase safety and reduce accidents along these roads.
1. Introduction
Vehicle accidents have been regarded as one of the most important issues among other topics and were considered as the 8th leading cause of death [1]. Based on the World Health Organization (WHO) statistics, 1.35 million people die in highway accidents annually and almost 50 million are injured due to traffic accidents [2]. Developing transportation infrastructure in Iran has dramatically intensified damages caused by accidents. According to the latest official statistics, 800,000 accidents occur annually in this country [3]. Also, the road traffic fatalities in 2016 in Iran were 15,932, of which 78% were male and the rest were female. Moreover, the fatality rate on rural highways in Iran accounts for nearly 69% of the total fatalities due to accidents [4]. According to research recently conducted, the fifth leading reason of death worldwide would be traffic accidents in 2030 [5]. Furthermore, statistics of forensic medicine show that road accidents in rural areas have hit 101,996 persons from 2011 to 2016 [6]. So, to reduce accidents, the safety authorities should adopt appropriate solutions to enhance road safety and reduce the likelihood of accidents [7, 8]. Determining the factors associated with accident severity enables traffic engineers to lessen accidents in developing countries such as Iran, and thus the traffic safety can be improved by analyzing accident characteristics [9, 10].
In recent years, various studies have been performed on traffic safety on rural roads to detect the effective variables. Abdel-Aty and Abdelwahab aimed to investigate driver injury severity and contributory factors that affect the likelihood of injury severity. They used multilayer perceptron as well as other artificial neural network (ANN) models for their research. They illustrated that accident location, point of collision, type of vehicle, seatbelt usage, vehicle speed, and gender were significant contributory factors to injury severity of accidents [11]. Labinjo et al. explored the epidemiology of traffic injury in Nigeria to identify risk factors affecting road traffic injury. Poisson regression analysis was used to measure relative risks for associated factors. Results indicated that an increased risk of injury was associated with the male gender between the age of 18 and 44 and post-secondary education [12]. Montella et al. conducted an exploratory analysis of rural road accidents in Italy to detect interdependence and dissimilarities between accident patterns and provide insights for the development of safety improvement strategies. Data-mining techniques, such as classification trees and association rules, were used in this research. They revealed that the most influential factors on the occurrence of rural road accidents were road type, pedestrian age, lighting conditions, and vehicle type [13]. Hu and Xiang in a research investigated the properties of rural road accidents and revealed that 92.68% and 5.42% of fatalities occurred on straight and curved roads, respectively. Moreover, fatalities in the daytime were more serious than those at night time, and driving a motor vehicle was the significant reason for fatalities [14]. Zangooei Dovom et al. examined the distribution of fatal accidents in Khorasan Razavi province, Iran. According to the results, males accounted for more casualties than females, and those aged between 21 and 30 years from both genders met with most accidents. They also indicated that the riskiest group among all road users was the male motorcyclists [15]. López et al. investigated the contributory factors in vehicle accidents on two-lane rural highways in Spain. The analysis method consisted in identifying decision rules extracted from decision trees. Results of this research indicated that the parameters pertinent to the highest accident rate were good weather, daylight, regular working day, the age group of 28–60 years, summer season, male gender, and during 12 to 18 hours [16]. Sahebi et al. aimed to predict the accident severities and the affecting factors on rural roads of the Tehran province, Iran. An ordered logit model was applied for analyzing the factors influencing the death or injury of accidents. Results indicated that the type of vehicles involved in the accident, location of accident, and time of accident had a significant role in the severity of rural road accidents [17]. Chen and Fan conducted a study to investigate and identify significant contributing factors to the pedestrian-vehicle accidents in rural areas of the United States using a mixed logit model approach. Results showed that factors such as a bad driver’s physical condition, heavy trucks, dark light condition, speed limit between 35 and 50 mph, and speed limit above 50 mph significantly increased injury severities [18]. Casado-Sanz et al. considered various factors in rural road accidents in Spain by applying a logistic regression model. Results indicated that driver age of 30–45 years, male driver, middle of the week, good weather condition, and daylight had the maximum risk of accidents on rural roads [19]. Kamboozia et al. examined various factors affecting the severity of accidents on rural roads of Guilan province, Iran, and indicated that male drivers had a significant contribution to the occurrence of vehicle accidents. Moreover, 12 pm to 6 pm had the highest accident risk. They also revealed that the middle of the week recorded the greatest accident occurrence [20]. Ghasedi et al. tried to estimate the variables that affect rural accidents that occurred in the Guilan province, Iran, using factor analysis, logit regression, and ANN models. Results showed that the variables of exceeding allowable speed, driver age 30 to 50 years, and rainy weather had the highest impact on vehicle accident severity, while the variables of rainy weather and light conditions were considered the most contributory factors in pedestrian accident severity [21].
By investigating various studies, it was found that the focus of most researchers has been on getting a deeper understanding of the contributory factors affecting the frequency and severity of accidents on rural roads using machine learning approaches. But, most of the studies overlooked the influence of other types of variables such as pavement condition, roadside installations, and geometry features on the severity of accidents. To address this gap, the current paper examined these kinds of variables using machine learning approaches to detect contributory factors and measure the percentage of each parameter, which is useful for safety authorities to take preventive measures to reduce the occurrence of accidents.
The current study used various ANNs to identify the influencing factors on the severity of accidents. Accordingly, various ANN models were developed using effective variables to determine the magnitude of the effect of each variable on the accident severity occurring on two-lane two-way (TLTW) rural highways. Finally, the model reflecting accident-related factors in accident severities in two types of fatality/injury and damage is represented. This research attempts to evaluate the impact of environmental, pavement, and geometry features on accident severities occurring on TLTW rural highways. In order to examine the presented models, accident data provided from 2019 to 2020 on TLTW rural highway segments of Tehran-Qom and Tehran-Saveh in Iran were utilized.
2. Materials and Methods
2.1. Data Collection
The dataset utilized in this research was gathered from 65 km of the Tehran-Qom, and from 50 km of the Tehran-Saveh rural roads (two-ways), Iran. These highways are one of the major corridors connecting important provinces of the country including south, southeast, and southwest provinces. Data were collected over a one period from 2019 to 2020 on the monitored roadways in Iran. Accidents’ data were obtained from the official reports of the Traffic Police Center of Qom and Saveh. It should be mentioned that for injury accidents, in some cases, although police officers reported the injured people at the scene of an accident, they might have been dead during or after being transported to a hospital. Also, some of the damage accidents usually end with the consent of both sides and without being recorded by police officers. However, in the case of fatal accidents, the cases were accurately recorded and completed. Totally, 197 accidents’ data were collected, of which 154 (79.2%) were damage accidents, and 41 (20.8%) were fatal and injury ones. Various details were considered for each accident, including the number of accidents, location of the accident, as well as type and accident severity (separated as fatal, injury, and damage). Also, other required information including road characteristics was collected by experts in the field surveys. Further information about the road features is explained in Section 2.2. The target variable in this study was the severity of accidents, which were divided into three categories, namely, fatal, injury, and damage accidents. Since the amount of fatal accidents was small in comparison to total accidents and the significance and goodness of fit of the models cannot be satisfied by considering the three types of dependent variables, fatal accidents were combined with injury accidents, and the dependent variable was split into two categories of damage and injury/fatal accidents [22]. In addition, the intended points were collected and recorded by the global positioning system (GPS) equipment. The research was conducted using traditional paper questionnaires presented in the Persian language. All participants were prepared and educated before conducting the survey by the head of the team. The teams who participated in conducting the field surveys were considered as experienced, independent, and qualified people in safety issues. Two teams participated in performing the field surveys, and each of them had four safety inspectors. Totally, eight safety inspectors participated in this research. In order for clarifying the details of safety inspectors, it can be claimed that all of them were young (the average age of 34.6 years) with academic degrees in road and transportation as well as having enough background in the safety field. The approach of performing the research relied on manual data collection, visual inspection, and subjective expert judgment for their effectiveness.
2.2. Independent Variables
Selecting the variables is a function of some criteria. In order to decide which variables should be included in a model, some considerations are required. The variables believed to be related to the research should be determined in the model. In other words, the reason for selecting the independent variables was based on the experience, engineering judgment, and the gap of previous studies as well as available information measures are considered as to which variables should be determined [23, 24]. In this research, a survey was conducted on rural two-lane highway segments in Iran to gather the considered information. Based upon the abovementioned reasons for selecting the independent variables, 9 factors were regarded to develop a database for ANN modeling of accidents, including roadside hazard, pavement condition index, speed limit, access density, passing zone ratio, shoulder width, roadway width, centerline rumble strips, and shoulder rumble strips. It can be mentioned that all of the data gathered in this study were inventory data for the safety evaluation. Table 1 reports the symbols of variables used in modeling for each parameter in accident prediction models.
2.2.1. Roadway Segmentation
So far, extensive research has been performed on the use of the roadway segmentation approach to estimate accident models, some of which have considered roadway sections with fixed lengths. In this paper, roadway segmentation has been done in the same way mentioned by Abdel-Aty and Radwan, Cafiso et al., and Mayora and Rubio [25–27]. In this research, the length of 65 km in the TLTW rural highway of Tehran-Qom and the length of 50 km in the TLTW rural ighway of Tehran-Saveh in Iran were divided into 13 sections and 10 sections, respectively. Totally, 115 km in the TLTW rural highways in Iran having 23 sections were considered for this research. The roadway segmentation was performed with a fixed length and at 5 km intervals for the entire roadway. The inputs including safety features were used for sections and these features were calculated for each segment separately. These features include(i)Roadside hazard(ii)Pavement condition index(iii)Speed limit(iv)Access density(v)Passing zone ratio(vi)Shoulder width(vii)Roadway width(viii)Centerline rumble strips(ix)Shoulder rumble strips
2.2.2. Roadside Hazard Index
Roadside hazard (RSH) for this research has been introduced by Cafiso et al. [26, 28]. This index is used in 200 m parts on both left and right sides of the route. Using designed checklists provided by this method, a survey was conducted by safety inspectors for both directions separately. In this index, three numbers (0, 1, and 2) were set for scoring the three levels of risk conditions. The numbers 0, 1, and 2 were considered for without risk, low risk, and high risk, respectively, to score five items of RSH (trees and other rigid obstacles, dangerous terminals and transitions, bridges, embankments, and ditches), with the weighted average of RSH that is calculated according to equation (1):in which the relative weight of each of the 5 items is calculated based on AASHTO standards. The coefficients were set 2 for trees and rigid obstacles, 2 for dangerous terminals and transitions, 5 for bridges, 3 for embankment, and 1 for ditches according to AASHTO [29]. Also, in equation (1), k is the direction of route inspection (right = 1, left = 2), and scoreijk denotes the score for 5 factors (j = 1, …, 5) in the ith visit unit on the right or left side. Therefore, the roadside hazard is calculated by the use of designed checklists for 200 m parts and then the mean values of parts for each section are regarded as RSH index.
2.2.3. Pavement Condition Index
Road surface characteristics were described by pavement condition represented with regard to pavement condition index (PCI), illustrated by a group of engineers of the United States Army. PCI is measured along the highway both for the right and left lanes and each movement route. In this research, the considered data were provided by safety engineers through the survey of the road where the following approximate method was used to indicate the condition of the entire segment. The inspection considered in the research was conducted at the level of network and the inspection took place through studying some sample units in each segment [30]. PCI values were determined with field inspections. Thus, PCI was calculated using designed checklists for visiting units, and then the mean values of units for each segment were considered as a pavement condition index.
2.2.4. Speed Limit
Speed data were extracted by studying the road. An approximate method to estimate the speed limit on each roadway segment was followed. Since the speed limit ranged almost from 60 km/h to 90 km/h, its average value was considered for each section as a speed limit value. It should be noted that the speed limit in this study was based on the posted speed limits based on the selected sections.
2.2.5. Road Characteristics and Geometric Data
In order to collect geometric data for the selected sections, field surveys were carried out by safety inspectors for establishing the road characteristics. Roadway lane width and roadway shoulder width were calculated for each section separately. The average values of the roadway lane width and roadway shoulder-width variables were considered in the sections with different values. Access density is defined as any road connecting personal properties or buildings to public roads. So, access density was calculated by dividing the number of access roads by section length. Passing zone ratio is one of the most important elements, playing an important role in providing minimum sight distance for drivers. For obtaining the variable, the total length of each zone is divided by the length of the entire segment. Centerline rumble strips and Shoulder rumble strips are other variables for the safety of two-lane roads, making necessary warnings for the driver to prevent the vehicle from drifting off the roadway. These variables were considered as categorical data.
3. Description of Data
Uniformity of the data using the related formula is one of the important methods in entering the data into the software. In this research, some of the data were continuous and others were categorical, and all data were converted into categorical ones for modeling purposes. The reason for setting codes and creating ranges was to provide better modeling in terms of accuracy and efficiency. In order for uniformity of the data, the continuous data were converted into categorical ones by the use of statistical formula (the maximum value minus the minimum value as well as creating handle length). That means that the suitable ranges were created as well as setting codes for each range. Table 2 shows the effective variables in the occurrence of accidents in Qom-Tehran and Tehran-Saveh rural roads and the suitable coding for each variable. The steps of the research method are also presented in Figure 1.

4. ANN Approach
ANNs are considered as one of the most powerful models, capable of solving complex problems in various fields [31–33]. They surpass other techniques in terms of flexibility on the dataset and free assumption in the procedure. There is no need for advanced statistical analysis prior to modeling [34–36]. By the use of ANN models, the patterns in the specification of problem and effective factors in an accident can be identified. ANNs have been occasionally applied in determining the effect of independent variables on the target variable [37–39]. The basis for focusing attention on ANN approaches is the high ability of these models in modeling and predicting the incidence as well as the severity of complex accidents plus describing the level and importance of factors that affect accident occurrence. Three-layer network models of interconnected artificial neurons (input, hidden, and output layers) are applied to design the ANN models. It is possible for researchers to create one or more layers between the neurons of input and output layers [40, 41]. Each neural cell in any related layer can be connected to the entire neurons in the subsequent layer; however, there is no interconnection between the neurons in the identical layer [42, 43]. The role of the input layer is to collect statistics about variables from the constructed dataset, and then the hidden layer processes information obtained from the input later and transmits it to the output layer. The output layer is formed to compute either the categorical class label or continuous measures. The values of the input layer toward the inside of the hidden node are multiplied by weights, a set of prearranged values [44–46]. From then onward, all values are combined to create a single number, accepted as an activation function (AF). The output is converted by the nonlinear AF. Various AFs can be used in ANN models.
4.1. Multilayer Perceptron Model
Multilayer perceptron (MLP) is one of the most common training networks with a feed-forward structure, where information flows from the input layer and passes through the hidden layers to the output layer to generate an output. Based on the many studies conducted, MLP is a universal approximator. This model can approximate any finite nonlinear functions with pretty high precision with one hidden layer [47]. The error is calculated by the comparison of computed outputs of each input against the expected outputs in order to determine the weights. In the MLP model, the softmax AF at the output layer and cross-entropy as the error function were applied for the model validation [48]. The outputs are calculated at the output layer as (2):where = ( ,…, ) and = (,…, ) represent the weights in the second and first layers, respectively, and f is the AF. The hyperbolic tangent was used as an AF in the hidden layer as (3):
Softmax was applied as an AF in the output layer as (4):
5. Results
Establishing an acceptable architecture is crucial to find accurate results. To develop a suitable model, it is important to choose an appropriate number of hidden layers and neurons. Since too many neurons can cause over-fitting, selecting very few neurons is insufficient for data processing. The current research examined the MLP neural network (3 various ANN models) to identify the most effective variables by evaluating the data. Table 3 reports the data processing results of the developed models, which indicate the number of neurons in layers and nine independent variables already introduced: RSH, PCI, S, AD, P, SW, LW, CLRS, and SRS. Moreover, the output layer had two nodes (fatality/injury and damage accidents) where the softmax was applied as an AF. As a result, the models were verified by the cross-entropy error.
In order to develop the MLP model, the automatic architecture was used, computing five hidden layers for the model. The various functions were used for different layers. The activation function was applied as the hyperbolic tangent for the hidden layer, and the softmax function was applied for the output layer, determined by the automatic selection structure. In addition, cross-entropy as the error function was considered for the model validation when using the softmax function. It should be noted that 197 samples (rows) were considered in the models, meaning that each sample is representative of each accident collected within 23 roadway segments.
Before training, all covariates were standardized using the relation of (x-mean)/s. Moreover, the hyperbolic tangent as an AF in the hidden layer was determined by the automatic selection structure in order to accelerate training. These methods were used in all three models. MLP was trained by a back-propagation learning method, and the gradient descent technique was applied to update the weights in order to minimize the error function gradually. Different partition rates of the dataset were randomly determined for training, testing, and hold-out: 60%–20%–20%, 70%–20%–10%, and 50%–30%–20%. The reason for the conducted analysis was to assess the performance of the model under various conditions [49]. The developed model was determined using the scaled conjugate method. The hold-out samples were not used to develop the models. Table 4 provides information about the optimization method used to build the ANN models.
The specification of the models with various structures is represented in Table 5. It represents the developed models and provides information on cross-entropy error, incorrect prediction percentage, and other information about ANNs in the samples used for training, testing, and hold-out.
In order to choose the best model, the values of cross-entropy error, as well as incorrect prediction percentage, were calculated for each ANN model. With respect to the values in Table 5, the ANN1 model was identified to have the smallest cross-entropy error values of 39.6 and 16.87 for training and testing the dataset, respectively, indicating the model’s capability to predict the severity of accidents. According to the results, in the ANN1 model, the percentage value of inappropriate prediction constructed on the training and testing samples was measured to be 17.5% and 22.0%, respectively. The training procedure was performed until one consecutive step with no reduction in the error function was attained. Regarding the 0.5 value as a criterion to determine the correct prediction probability, the values higher than 0.5 describe the level of correct prediction which can be estimated by the model.
Table 6 reports the correct percentage of the ANN1 model in each accident severity. According to the results, among 89 cases related to damage accidents in a training sample, 78 cases were predicted accurately by the ANN1 model with the correct percentage of 87.6. However, in the training sample, among 25 cases related to fatal/injury, 16 cases were accurately predicted by the model with the correct percentage of 64.0. The developed ANN1 model properly classified 82.5% of cases in general. In the testing and hold-out samples, the overall correct percentages were 78.0% and 69.0%, respectively. Furthermore, the ANN1 model had a better performance in classifying damage accidents in comparison with classifying fatal/injury accidents.
The box-plot diagram shown in Figure 2 also confirms Table 6 by illustrating the predicted pseudo probabilities (PPPs) in each class of dependent variable (Y), considering the combined training and testing samples. The values larger than 0.5 for PPP indicate the correct predictions in each class. According to these observations, in the damage accident category, the level of PPP for the data in the same category by the model (the blue box in class) was considered from 0.78 to less than 1 (more than 0.5). Given the stop condition of 0.5, the network had a good performance in predicting the cases placed in the damage category. Only a small part of the end of the chart is indicative of a few numbers of the cases incorrectly predicted. The second bar chart placed in the damage category showed the probability for fatal/injury accidents classified in the damage category, and its approximate range was from 0 to less than 0.3. In this bar, there are a few numbers of the cases incorrectly predicted at the end of the chart. The third bar chart is indicative of the probability of damage category placed in the fatal/injury category, and its values were considered from 0.23 to 0.6, an approximate range. In addition, the probabilities predicted by the ANN1 model for two categories in the fatal/injury category showed that the probability of fatal/injury accidents was 0.4 to less than 0.8 (the green box in class). As a result, Table 6 is confirmed by the mentioned explanations.

The ANN1 model was validated with further processing by the receiver operating characteristic (ROC) curve. The ROC curve is a technique for visualizing, organizing, and selecting classifiers based on their performance. When it is needed to check or visualize the performance of the multiclass classification problem, the ROC curve is used. Figure 3 represents a comprehensive assessment of the sensitivity against specificity for all classification levels and displays two curves for two categories (fatality/injury and damage accidents). Since there were only two classes, the curves were symmetric to the 45° line from the upper left corner to the lower right. The higher the curve inclined up and left, the network power is more accurate in prediction [50]. As can be seen from Figure 3, the ROC curve constructed on the training and testing datasets had high predictive power, and both categories in the dependent variable were correctly classified by the designed model with high accuracy.

In binary classification problems, the area under the curve (AUC) is an efficacious procedure to evaluate the performance of a prediction model. Probabilistic classifiers for each example take into account the probability that the value is between 0 and 1 since the AUC is a portion of the area of the unit square, where 0 represents an entirely incorrect response, and 1 indicates an entirely correct response. Generally, a 0.5 amount for AUC shows no difference, 0.7–0.8 is regarded as satisfactory, 0.8–0.9 is considered as excellent, and more than 0.9 is considered to be outstanding [51]. The area under the diagonal line in the ROC curve, which indicates random stratification, is always 0.5. Therefore, a properly performing classification model should not have AUC less than 0.5 [52]. In order for this probability to be considered as a positive or negative response in the classification, an interval should be given for comparison. If the probability is higher than it, the response is considered positive, otherwise, it is negative [53]. As represented in Table 7, the AUC value for both categories was found to be 0.852, significantly greater than 0.5, indicating that the prediction effect is in the outstanding range and the response from the model is positive.
As can be seen in Figure 4(a), the cumulative gain chart illustrates if the percentage of all data which are a combination of the existing data in the training and testing samples is selected by the model, how much of this percentage includes the data in each of the categories of the dependent variable. In other words, it shows the performance of the developed model in each of the accident categories (fatality/injury and damage accidents). For example, the results showed that if 10% of all data are selected by the ANN1 model and sorted based on the foreseen pseudo-probability of the fatal/injury accident category, it would be expected to cover about 29% of all cases placed in the fatal/injury accident category (10%, 29%); if about 90% of the fatal/injury data are selected, it covers all cases in its own category. If these data are sorted based on the foreseen pseudo-probability of the damage accident category, and 10% of these data are selected, about 13% of damage accidents are precisely predicted. In addition, if 100% of the damage data are selected, it can be expected that all damage accidents are accurately predicted (100%, 100%).

(a)

(b)
Since these charts reflect the probability of accidents (fatality/injury and damage) throughout the entire roadway, they can help prioritize the type of accident for improving safety. The lift graph (as indicated in Figure 4(b)) is drawn based on the finding of the cumulative gain graph and represents a better comprehension of the results. The points on the vertical axis of this plot (lift axis) indicate the ratio of the selected percentage out of all data to the percentage of predicted expectation of desired class of the dependent variable. For instance, the point (30%, 2.6) on the curve related to fatal/injury accidents describes the position where 78% of all selected data are, and the lift graph measure is also based on the gain graph and equal to the portion of 78%/30%. The maximum value of the lift graph for fatal/injury and damage accidents was 2.8 and 1.3, respectively.
In Table 8, the relative and normalized importance of each independent variable over the target variable (the severity of accidents) in developing the models is presented, which is an important outcome of the ANN models. Normalization of this significance was achieved by dividing the importance values by the largest value and defined as a percentage. Four of the most influential variables are shown in Table 8.
Figure 5 indicates the extent to which the predicted values of the network change with the change of the values of each independent variable, representing the impact of each variable on predicting accidents. As can be seen in Figure 5, the variables of pavement condition index (0.153), roadside hazard (0.144), shoulder width (0.127), and passing zone ratio (0.123) had the greatest effect on the severity of accidents, respectively. Thus, it can be concluded that pavement condition index and roadside hazard had the highest impact on the designed neural network model when assessing the importance of the independent variables. Thus, improving these variables can contribute to lowering the likelihood of rural road accidents.

The result of this research is consistent with some previous studies that indicated the positive effect of pavement condition on the severity of road accidents [54, 55]. However, some studies have shown a negligible effect of this factor on road safety [23, 56]. Roadside hazard was also indicated to affect the severity of accidents, which is in contradiction with previous research [56]. Moreover, shoulder width was another important factor affecting the severity of accidents in this research, the significant effect of which has been reported in previous studies [57, 58]. However, some researchers have shown low importance for this factor [56, 59]. The positive effect of passing zone ratio on road safety has also been represented in previous research consistent with the result of this study [60, 61]. Furthermore, the lower impact of access density on the severity of accidents has been revealed in some studies, which is in accordance with the results of this research [56, 59]. Some studies have also presented the positive effect of this factor with respect to different types of medians, number of lanes, and speed limit [62, 63]. The safety benefit of rumble strips in reducing the frequency and severity of accidents has been emphasized in some studies, contrary to the results of this research which showed less effect of this factor [57, 64–68]. Also, in some previous research, a significant effect of speed limit on the severity of accidents was shown, which is in contradiction with the results of this study that indicates the low impact of these factors on accidents [9, 56]. Some researchers have also shown a low effect of this factor on the severity of accidents in accordance with this study [59]. Finally, similar to the results of this study, a low effect of roadway width on road safety was indicated [56]. However, some studies have shown the positive effect of this factor on reducing accidents [58].
6. Conclusion
The dataset utilized in this research was collected from a 65 km TLTW rural highway of Tehran-Qom and a 50 km TLTW rural highway of Tehran-Saveh (totally 115 km) in Iran, 23 rural highway segments and the length of each segment was considered 5 km. Also, some data related to the independent variables were collected by an expert team through a field research. Furthermore, the road accident severity was categorized into 2 classes (fatality/injury, damage) and defined as a dependent variable. The variables of roadside hazard, pavement condition index, speed limit, access density, passing zone ratio, shoulder width, roadway width, centerline rumble strips, and shoulder rumble strips were considered as predictors of accident severity and collected from 23 sections on the selected rural highways. After coding the variables, three MLPNN models were automatically trained by the back-propagation method with different structures. In order to develop models, the dataset was categorized into three groups. The first was considered for training, the second dataset for testing, and the third one for the hold-out objective. The third dataset was applied to describe the ability of model in predicting the dependent variable, though it was not used in the modeling. Among the three models, the MLPNN model with 6-2-2 partition had the highest accuracy. This model had the smallest cross-entropy error value of 39.6 and the percentage of incorrect prediction value of 17.5%. The ANN1 model with 6-2-2 partition was developed using the standardized rescaling method for covariates, batch for training, and the back-propagation algorithm. Furthermore, 9, 5, and 2 units were taken into consideration for the input, hidden, and output layers, respectively. The hyperbolic tangent and softmax were used as an AF in the hidden as well as output layers, respectively. The results revealed that the performance of ANN1 model was relatively more suitable to predict the data with 82.5% correct prediction. According to the results of the importance of independent variables in the ANN1 model, pavement condition index and roadside hazard variables had the greatest impact on the occurrence of accidents in this model. The study also has a practical implication. For safety authorities, these findings would be interesting for achieving statistically significant results of various risk factors associated with accidents. The results show that improving the pavement condition on rural roads, roadside hazard as well as building wider shoulder width, providing additional room for nonmotorized travel (3 feet and more) can significantly reduce the severity of accidents, consistent with other research. Maintenance and repairing distress in regular periods on rural roads can lead to saving costs for repairing cars (since the damages resulting from distress on rural roads will be reduced) as well as declining the numbers of fatal, injury, and damage accidents. Also, creating barriers (installing or repairing guardrails, and keeping the distances of fixed objects with the roadway width) is recommended for two-lane rural roads that can provide appropriate guidance for nonmotorized traffic.
This study, however, is not without its limitations. Because this research just focused on the effectiveness of environmental, geometry, and pavement condition features in reducing the severity of accidents on two-lane rural roadways in Iran, the effectiveness of other types of variables (such as time of accidents, vehicle type, driver gender) could not be investigated because of the smaller sample size of severe accidents while conducting the survey in the test segments. A larger sample size of the severe accidents can effectively detect the interaction of weather condition and shoulder-width as well as the interactions of other variables because drivers are much more attentive when the weather condition is stormy or rainy and therefore the effect of shoulder-width or pavement condition might be varied in various weather conditions. The limited sample size of the current research did not allow us to investigate other factors. Thus, for future research, it is recommended to use modern equipment for conducting the field survey, which prevents the possibility of human error. Moreover, other machine learning techniques can be incorporated into the proposed approaches in the prediction of accident severity [69–71]. Deep learning models can also obtain more accurate results [72–74]. By understanding users’ perceptions regarding the improvement of facilities, these methods, in conjunction with a survey analysis, can also improve work zone safety [75]. In addition, researchers can present other effective parameters on the severity of accidents [76].
Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that they have no conflicts of interest. In this study, Iranian governmental organizations have not been partners and sponsors, and this study is purely academic in nature.