Factors related to drivers and their driving habits dominate the causation of traffic crashes. An in-depth understanding of the human factors that influence risky driving could be of particular importance to facilitate the application of effective countermeasures. This paper sought to investigate effects of human-centered crash contributing factors on crash outcomes. To select the methodology that best accounts for unobserved heterogeneity between crash outcomes, latent class (LC) logit model and random parameters logit (RPL) model were developed. Model estimation results generally show that serious injury crashes were more likely to involve unemployed drivers, no seatbelt use, old drivers, fatigued driving, and drivers with no valid license. Comparison of model fit statistics shows that the LC logit model outperformed the RPL model, as an alternative to the traditional multinomial logit (MNL) model.

1. Introduction

Road traffic crashes occur from a combination of factors related to elements of the transportation system, made up of the road and its environment, vehicles, and road users, with crash outcomes ranging from property damage to death. Some factors contribute to crash occurrence, while others influence the outcome (or severity) of the crash or both. While the effects of some crash causal factors such as speed are fairly obvious, they may be linked to other unobserved factors, such as a sensation seeking nature of the driver, which are not typically accounted for during the crash reporting process. Having a holistic understanding of crash causal factors and how they impact on severities are necessary to develop and target countermeasures.

There is a significant body of road safety literature dedicated to the study of factors affecting crash occurrence and severities. Multiple proposals on countermeasures have ranged from roadway reengineering, improved vehicle safety features, and strategies to influence driver behavior. The development of these proposals or countermeasures have been anchored on understanding the factors that affect the likelihood of crash occurrence and/or circumstances that influence the severity of the crash outcome. A critical component of road traffic crash analyses has been the examination of the driver. Some drivers have habits or choose to drive in ways that increase their likelihood of getting into a crash. For instance, driving styles such as choice of speed, threshold for overtaking, tolerance for gap acceptance, and adherence to traffic control have been strongly linked to certain groups of drivers [1]. According to [1], while certain groups of drivers may be disproportionately represented in crash statistics, this may be due to reasons not related to their risk of crash. One of the early attempts by researchers to gain in-depth understanding of crash causal factors was the Indiana Tri-Level Study. From this study, [2] observed that human errors and deficiencies were definite or probable cause in over 90% of the crashes examined. The leading direct human causes identified in the study included improper lookout (probable cause in 23% of accidents), excessive speed (17%), inattention (15%), improper evasive action (13%), and internal distraction (9%). In a similar study, [3] investigated specific driver behaviors and unsafe driving acts that lead to crashes. The study further assessed the situational, driver, and vehicle characteristics associated with these behaviors. They found human error to be the most frequently cited contributing factor in 99.2% of crashes, followed by environmental (5.4%) and vehicle factors (0.5%). Thus, most crashes and their associated injuries and fatalities can be linked to some form of unsafe driving habits [3]. It is therefore important to examine the causal driver characteristics and also assess their driving behaviors that increase the likelihood of crash occurrence.

This paper investigates the effects of human-centered crash causal factors on crash outcomes. This is achieved by developing latent class logit (LC) and random parameters logit (RPL) models to identify how the human-related factors influence injury severity of crashes.

2. Human-Centered Traffic Safety

Driver-related behavioral factors and human errors dominate the causation of traffic crashes [2, 3, 6]. Driving behaviors and styles are influenced by external and driver-specific factors. Individual and societal characteristics which influence driving behavior in a way which can affect the chances of crash occurrence collectively constitute human factors in traffic safety. Driver characteristics (e.g., gender, race, and age), attitudes, beliefs, and personality traits (e.g., tolerance, caution, inattentiveness, perception of risk, and sensation seeking) are some human factors that influence driving habits [1, 7, 8]. Societal norms and cultural practices, such as adherence to traffic rules and regulations, on the other hand also play important roles in shaping driver attitudes and beliefs. These have impacts on driving styles and can affect traffic safety [914]. The National Highway Traffic Safety Administration (NHTSA) observed that cultural differences and sensitivities correlate with motor vehicle fatality and injury rates. In the US, for instance, racial and ethnic groups are disproportionately killed in traffic crashes compared with the much larger non-Hispanic White population [9]. The American Automobile Association explicitly studies traffic safety culture in the US [12]. Reference [15] documented differences in traffic safety culture in Iowa, [16] documented differences in traffic safety behavior across geographic regions in Alabama, and others [e.g., [13, 17]] have even compared traffic safety cultures across international boundaries. This means that, with other things being equal, some human-centered characteristics and behaviors put some groups of the driving population at greater risk of getting into traffic crashes.

In an attempt to explore the causal link between human factors and the likelihood of crashes, [18] distinguished behavior-related factors into two major categories: those that reduce the capability of a driver to perform driving tasks (e.g., inexperience, accident proneness, and alcohol and drug use) and those factors that influence risk taking while driving (e.g., habitual disregard of traffic laws and regulations). Differences in the behavioral factors exist among different demographic groups. For instance, [19] observed that alcohol was less likely to be a factor in traffic crashes involving older drivers, while the primary problems with young drivers are risk taking and lack of skill. Crashes among young drivers are more likely to involve a single vehicle, one or more driving errors, and speed as a factor or involve alcohol abuse. Reference [20] has also observed that young males are more prone to excessive speeding influenced by peer pressure. Female drivers on the other hand are more prone to driving errors [21]. Reference [22] studied the impact of distracted driving on safety and traffic flow. Their study has shown that drivers are likely to drive in a manner that negatively affects traffic safety and traffic flow if they are distracted, regardless of driver age. Other studies have shown that inexperienced drivers are more susceptible to errors and are slower to recover after being distracted [21, 23]. Reference [24] conducted a study to examine the effects of personality factors assessed during adolescence on persistent risky driving behavior and traffic crash involvement among young adults. They found that, for males, aggression, traditionalism, and alienation were the personality traits most frequently associated with risky driving behavior and crash risk. Willfully flouting driving laws and regulations may be indicative of risk taking behavior. Reference [25] identified that unlicensed drivers were at significantly higher risk of car crash injury than those holding a valid license. Beyond the individual characteristics, certain driving styles and behaviors also affect the severity of the crash. For instance, seatbelt nonuse has been associated with increased risk of injury and death in a crash. Reference [26] estimates reveal that more than half of teen drivers (13–19 years) and adults aged 20–44 years who died in crashes in 2014 were unrestrained at the time of the crash. Faster driving speeds are also known to increase the likelihood of crash occurrence and also the severity of the crash consequences. Speeding-related fatalities constituted approximately a third of total traffic fatalities across the United States between 2005 and 2014 [27]. Impairment by alcohol and other drugs, driver distraction and inattention have been cited frequently as contributing factors in crashes and these can also affect the severity of the crash outcome [e.g., [2, 3, 28, 29]]. Statistics show that alcohol-impaired-driving fatalities accounted for a third of all crash fatalities in the United States in 2014 [29]. Driver inattention has also been extensively linked to crash occurrence. Nearly 10 percent of fatal crashes, 18 percent of injury crashes, and 16 percent of all police-reported motor vehicle traffic crashes in 2014 were reported be distracted driving related [30].

Considering that human factors are responsible for the highest proportion of traffic crashes, it would seem that human-centered countermeasures would be worth pursuing. Indeed, [31] reported that crash countermeasures achieve best results when they influence driver behavior. Human-centered countermeasures may take the form of improved driver training and testing, education campaigns aimed at changing driving practices, legislation to control driver behavior, and improvements to the design of road systems and automobiles [1]. Promoting a culture of safe road user behavior is required to achieve sustained reductions in road traffic injuries.

3. Crash Injury Severity Models

The primary emphasis of crash injury severity studies is to identify factors that influence the severity of crash outcomes. Safety researchers have relied on myriad statistical modeling techniques, applied to postcrash records and other noncrash specific data, to gain data-driven knowledge and understanding into crash causal circumstances. Reference [32] has shown that interest in identifying factors that affect crash injury severity has increased considerably in the last few years, perhaps, due to the availability of data and proliferation of advanced statistical packages. Depending on data characteristics and scope of studies, researchers have the option of choosing from a wide range of statistical tools for crash severity studies.

Discrete-choice (logit and probit) models have been used extensively over the years to analyze crash injury severity due to the classification of the severities into discrete outcomes. These methodologies have been applied to study safety of different roadway facilities and have included variables that describe the crash circumstances, environmental conditions, roadway, vehicle, and driver characteristics. For instance, [33] used nested logit formulation to predict crash severity on a section of rural interstate in Washington State. This study investigated the effect of environmental conditions, highway design, crash type, driver characteristics, and vehicle attributes on crash severity. References [34, 35] also applied nested logit techniques to analyze crash severity at unsignalized intersections and at roundabouts, respectively. Other logit modeling techniques that have been used in injury severity studies include binary logistic models [3640], ordered logit models [4144], multinomial logit [45, 46], mixed logit [5, 47], and heterogeneous models [44]. Logit models are however not able to handle random variations and are not applicable to panel data with temporally correlated errors. They also do not allow any pattern of substitution [48]. Probit models address these limitations. Ordered probit model is the most used type of probit models in crash severity analysis [e.g., [34, 4951]]. Reference [52] used ordered probit modeling techniques to isolate factors that contribute to injuries in older drivers involved in crashes. Reference [53] analyzed crashes at signalized intersections to determine the expected injury severity level using ordered probit model. Data mining techniques have also been used to analyze traffic crash injury severity. For instance, [54, 55] used classification and regression trees and [56] used Chi-squared automatic interaction detection to analyze crash severities. Other advanced methodologies used in literature include Bayesian networks [e.g., [57, 58]], neural networks [e.g., [59, 60]], and linear genetic programming [e.g., [61]]. Latent class approach has recently been used for analyzing driver injury severities [6264].

The fundamental characteristics of crash data and purpose of study dictate the choice of tool or methodology [65]. Many other methods have been used for crash injury severity studies. This discussion is by no means exhaustive on the subject. Reference [65], for instance, presents a comprehensive review of crash injury severity models and methodological approaches. Similarly, [32] undertook a meta-analysis and presented documentation on the characteristics and limitations of different modeling methods for safety researchers.

4. Data Description

This study is based on 2011–2015 injury-related crash data, for the State of Alabama, obtained from the Critical Analysis Reporting Environment (CARE) system developed by the Center for Advanced Public Safety at the University of Alabama for the Alabama Department of Transport (ALDOT). Each crash record contained all details related to a crash recorded by the police at the time of the crash, including details of the drivers (e.g., gender, age, and race) and vehicles (e.g., make, model, and age) involved, description of the roadway environment (facility type, presence of curvature or grade, traffic control, etc.), and environmental conditions (weather, lighting, rural versus urban, etc.). The data was filtered to select crashes that were reported to have human-centered factors as their primary contributing circumstance. These human-centered factors consist of driving styles, decisions, and activities undertaken by the driver, which led to the crash. For each crash event, information on the driver’s license status and seatbelt use was obtained. Demographic information of the causal driver was also obtained. Observations with missing values were omitted from the dataset, resulting in a total of 87,326 observations. Table 1 shows the summary statistics of the variables available for model building and analysis.

Two categories of severity were adopted as is often done in crash injury severity studies [e.g., [16, 3640]]. Serious injury crashes (defined as fatal or incapacitating injury, where an incapacitating injury implies that the victim is unable to leave the scene of the crash without physical assistance to do so) comprised 30% of the data and minor injury crashes (defined as nonincapacitating injury or possible injury) made up 70% of the crash observations. Crashes involving some form of driver error (defined to include aggressive driving, failure to yield, following too close, and ran traffic control device) made up approximately half of injury crashes. About 44% of injury crashes were reported to involve women. A third of the drivers involved in injury crashes were unemployed and about 42% of the drivers were less than 30 years old. Some 9% of the drivers were under the influence of drugs, alcohol, or medication, while 11% involved speeding.

5. Methodology

Unobserved heterogeneity is a critical issue in traffic safety research. Ignoring the moderating effect of unobserved variables can lead to biased estimates and incorrect inferences if inappropriate methods are used [66, 67]. Limiting the impact of a variable to its statistical significance in a model can mean eliminating some otherwise risky factors. Reference [68] observed that an insignificant variable in one model may be due to lack of observations. On the other hand, significance of a variable in an injury severity model is not an automatic indication that it is an important etiologic factor.

The ordinal nature of reporting crash injury severities makes ordered probit and logit models appropriate [51, 69]. However, these model forms can restrict the way variables influence outcome probabilities, possibly leading to incorrect inferences [37, 70]. Compared to the traditional ordered probability models, multinomial logit (MNL) models have a flexible structure which allows each severity outcome to have a different function for capturing the probabilities of injury severities [66, 71, 72]. Notwithstanding this, the MNL model is deficient in its application as it is susceptible to correlation of unobserved effects from one crash severity level to the next. Such correlation leads to a violation of the model’s independence of irrelevant alternatives (IIA) property [70]. Also, the assumption that random terms in the crash severity functions in MNL models are independent and identically distributed (IID) is often violated in practice because crash severity functions do not contain a complete list of all contributing factors. Even though nested logit models can capture some unobserved effects shared by some injury severity outcomes, they cannot address unobserved heterogeneity in the data. Random parameters (mixed logit) models and latent class (finite mixture) logit models have the ability to capture the unobserved heterogeneity by allowing parameters to differ across observations [47, 67, 73]. For this study, injury severity analysis was performed to investigate the effects of some human-related explanatory factors on the likelihood of the occurrence of serious or minor injury severities.

5.1. Injury Severity Analysis

A traditional MNL injury severity model was first developed to identify how the human-centered variables influence crash outcomes. RPL and LC logit models were then estimated to address the heterogeneity challenges inherent in the MNL model. Estimation results for the RPL and the LC logit models are then compared to select the best fitting alternative model to the MNL model.

5.1.1. Random Parameters Logit Model

RPL model allows for heterogeneity within observed crash data by varying the elements of the vector of estimable parameters, . The outcome specific constants and elements of may either be fixed or randomly distributed over all parameters with fixed means. The random parameters logit model formulation is obtained from the standard MNL by introducing random parameters with , where is a vector of parameters of the chosen density function (mean and variance) [48, 70, 74] as and is the probability of injury severity conditional on .

For model estimation, can now account for unobserved heterogeneity of the impact of on injury severity outcome probabilities, with the density function used to determine . Random parameters logit probabilities are weighted average for some different values of across observations where some elements of the parameter vector are fixed parameters and some may be randomly distributed. A continuous distribution relating how parameters vary across crash observations is assumed by the researcher. For this study, the normal distribution is assumed for model estimation [5].

5.1.2. Latent Class Logit Model

LC logit model offers an alternative perspective to the random parameters logit model in terms of accommodating heterogeneity [67, 73, 75]. This model replaces the continuous distribution assumption of random parameter model with a discrete distribution in which unobserved heterogeneity is captured by membership of distinct classes [75, 76]. A latent class logit model allows the driver injury severity to have different classes so that each of the classes will have their own parameters with the probability given by [77]where represents a vector that shows the probabilities of for crash , is the possible classes , and represents the estimable parameters (class-specific parameters). The probability of driver having injury severity is given bywhere is the probability of drivers to have injury severity level for crash in class . Based on the two equations above, the latent class logit model for class will bewhere represents the possible number of injury severity levels and is a class-specific parameter vector that takes a finite set of values.

The latent class logit model can be estimated with maximum likelihood procedures [75]. The latent class method however does not account for the possibility of variation within a class since it assumes homogeneous characteristics of the within-class observations [76]. References [78, 79] present the random parameter latent class model as an extension of the latent class logit model to capture interactions with observed contextual effects within the latent classes.

Marginal effects are typically computed to reveal the relative impact of explanatory variables on the dependent variable. Marginal effect in a latent class logit model is computed for each class as the difference in the estimated probabilities with the indicator changing from zero to one, while keeping all the other variables at their means. Reference [80] has shown that the direct and cross-marginal effects can be computed respectively as follows:The direct marginal effect shows the effect of a unit change in on the probability, , for crash to result in severity . The cross-marginal effect shows the impact of a unit change in variable of alternative () on the probability for crash to result in outcome . According to [80, 81], the final marginal effect of an explanatory variable is the sum of the marginal effects for each class weighted by their posterior latent class probabilities. It should be noted that there are no definitive rules for selecting a set number of latent classes to be modeled [81]. It is documented, however, that too many classes can negatively affect model convergence and complicate model interpretation [82]. It has been suggested to add one class at a time until further addition does not enhance intuitive interpretation and data fit [75, 82]. To select the model that best fits the data, likelihood ratio tests may be performed to compare models with different number of classes [75], or based on the Bayesian Information Criterion (BIC) computed for the two models [8386]. Recent studies of crash injury severities have used the BIC measure to determine the number of classes [63, 64, 66]. The BIC for a given empirical model is equal towhere is the log-likelihood at convergence, is the number of parameters, and is the number of observations. Lower BIC values indicate a better model fit.

6. Estimation Results

Examination of the classes of human-centered factors among injury crashes revealed interesting information on what behaviors contribute to injury crashes and, to some extent, what types of drivers exhibit them. In order to develop a more nuanced understanding of how human-centered factors affected crash severity, a series of analyses were conducted to examine the extent to which the various parameters are useful in estimating crash injury severity. A total of 12 variables were used for model building. Table 2 shows the estimation results for the RPL and the LC logit models. Since the RPL and LC logit models are improved extensions of the standard MNL model, results for the MNL model have also been shown to confirm this.

The MNL model reveals that crashes involving fatigue, drivers with invalid license, no seatbelt use, and old and unemployed drivers were more likely to result into serious injury while driver error, DUI, speed, and distracted driving-related crashes were more likely to lead to minor injuries. The MNL model also shows that female drivers, young drivers, and African American drivers were more likely be involved in minor injury crashes. The effects of the parameters in the MNL model are fixed across severity levels. This implies that variables are assumed to influence either minor injuries or serious injuries, not both. The RPL model, however, reveals that driver error, speeding, distracted driving, no seatbelt use, and young driver indicators were random variables. The random variables significantly contributed to both serious and minor injury crashes. This means that some proportion of crashes involving a random variable, for instance, driver error, resulted in serious injuries and some proportion resulted in minor injuries.

Two distinct classes with homogeneous attributes were identified to be significant for the LC logit model: latent class 1 (LC 1) with probability of 0.72 and latent class 2 (LC 2) with probability of 0.28. The two-class model was selected over an estimated three-class model based on BIC: the two- and three-class models had BIC values of 98032 and 98154, respectively. An inspection of the constant term defined for the serious injury function indicates that a crash in LC 1 is more likely to result in serious injury than a crash in LC 2. One interesting observation was that old drivers had high chance of being involved in serious injury crashes regardless of the latent class. Driver error, DUI, speed, and distracted driving-related crashes were less likely to lead to minor injuries in LC 2 but more likely to result in minor injuries in LC 1. Similarly, crashes involving females, African American, and young drivers were likely to result in serious injury in LC 2 and minor injury in LC 1. Unemployed drivers were more likely to be involved in serious injury crashes in LC 1 but less likely to be involved in the same in LC 2.

The marginal effects (Table 3) show that older drivers and crashes involving no seatbelt use, respectively, had 0.73% and 1.89% higher likelihood of resulting in serious injury. Injury crashes involving unemployed drivers, drivers with invalid license, and fatigued driving, respectively, had 4.19%, 0.32%, and 0.05% higher chance of lead to serious injury outcome. This result also indicates that drivers with no employment are perhaps more likely to drive with invalid license. Another interesting result from marginal effects is that though a high proportion of the injury crashes were attributed to driver error, DUI, and speeding, their outcomes were more likely to be minor injury.

A comparison of the fit statistics (e.g., McFadden pseudo = 0.069, 0.183, 0.193 for MNL, RPL, and LC logit models, resp.) suggests a stronger support for the LC logit model over the MNL and RPL models. Similar conclusions have been reported by other researchers [e.g., see [75, 81, 87, 88]]. An attempt was made to develop LC random parameters logit model for this study. However, none of the random parameters had statistically significant standard deviations. There was also no significant improvement in model fit statistics when compared with the LC logit model.

7. Conclusion

In this paper, latent class logit and random parameters models were developed as alternatives to the traditional multinomial logit model for human-centered crash injury severity analysis to account for unobserved heterogeneity. The study was based on 2011–2015 injury-related crash data, for the State of Alabama, and considered only crashes that had human-centered primary causal factors. Two crash injury outcomes were examined: serious injury (fatal and incapacitating injury) and minor injury (non-incapacitating and possible injuries). Twelve variables were used to build the models.

Comparison of fit statistics shows that the two-class latent class logit model outperformed the random parameters model, as an alternative to the traditional MNL model. This result is generally in line with past studies in this area. An attempt was made to identify random parameters for the LC logit model. However, none of the random parameters had statistically significant standard deviations. There was also no significant improvement in model fit statistics when compared with the LC logit model.

Both the RPL and LC models showed that six specific driving behaviors significantly contributed to the occurrence of serious crashes, driver error, speeding, DUI, distracted driving, fatigue driving, and not wearing a seatbelt. These conclusions suggest that targeted outreach and education campaigns designed to address these specific behaviors (or combinations thereof) could reduce serious crashes [e.g., [1, 7, 18, 23, 24]]. The analyses also showed that focusing education efforts on specific driver types (i.e., demographic groups) may also be effective in reducing serious crashes in Alabama. And finally, some of the behaviors may be positively impacted with increased or enhanced enforcement [e.g., [1, 14, 89]].

Human-centered (i.e., driving behavioral related) inferences from the current study are limited to the driving population of the State of Alabama. Nonetheless, there are general observations and conclusions documented herein that expand the understanding of the relationship between drivers and the severity outcomes of crashes.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.


The authors would like to acknowledge the Southeastern Transportation Center and the Alabama Transportation Institute for support of this research.