Spatial-Temporal Analysis of Injury Severity with Geographically Weighted Panel Logistic Regression Model
This study is intended to investigate the influencing factors of injury severity by considering the heterogeneity issue of unobserved factors at different arterials and the spatial attributes in geographically weighted regression models. To achieve the objectives, geographically weighted panel logistic regression model was developed, in which the geographically weighted logistic regression model addressed the injury severity from the spatial perspective, while the panel data model accommodated the heterogeneity attributed to unobserved factors from the temporal perspective. The geo-crash data of Las Vegas metropolitan area from 2014 to 2016 was collected, involving 27 arterials with 25,029 injury samples. By comparing the conventional logistic regression model and geographically weighted logistic regression models, the geographically weighted panel logistic regression model showed preference to the other models. Results revealed that four main factors, human-beings (drivers/pedestrians/cyclists), vehicles, roadway, and environment, were potentially significant factors of increasing the injury severity. The findings provide useful insights for practitioners and policy makers to improve safety along arterials.
During the last decade, there have been a variety of different approaches and perspectives [1–3] presented in safety evaluation, and regression analysis has been widely applied to investigate the relationship between crash injury severity and influencing factors. The widely utilized regression approaches, e.g., linear regression, logistic regression, and probit regression, assume that all sampling data have the uniform relationship with the influencing factors, which may not be always true, especially in multivariate systems . In urban area, the crash may occur in different geographical locations and different sampling time, and the relationships between crash injury severity and influencing factors may not follow the constant covariate effects; thus the spatial and temporal issues need to be addressed, and accommodated by an appropriate method.
The earliest temporally varying coefficient methods are addressed by the analysis of longitudinal data widely utilized in injury severity. Begun with a binary logit model by Young and Liesman , longitudinal data were integrated with different logit models to accommodate unobserved heterogeneity issue. The most commonly used method to investigate the injury severity under various conditions is mixed logit model [6–17], which can analyze the relationship between injury severity and the influencing factors as well as addressing the heterogeneity issue appropriately; Anastasopoulos and Mannering  provided a comparison of fixed and random parameter logit models using two types of injury severity data. The results showed that random parameter logit model was superior to the fixed-parameter one, and the models based on individual crash data provided better overall fit relative to the models based on the proportion of crashes by severity type; then Xie et al.  extended the injury severity analysis to Bayesian binary logit model with random effects, but some of the findings were counterintuitive, and comparison with mixed logit model or random parameter model is recommended [20–23] so as to investigate real-time data more effectively.
Temporal variation has been one critical aspect to address in injury severity analysis. Some studies have been proposed to integrate temporal nonstationarity into injury severity analysis to account for it. Initially, Behnood and Mannering  explored the temporal stability of factors affecting driver-injury severities in single-vehicle crashes using a mixed logit model to capture potential unobserved heterogeneity. The results showed that the possible presence of temporal instability in injury severity models can have significant consequences in highway-safety practice. Depicting the spatial feature by the location coordinates, Wang et al.  introduced geographically weighted regression (GWR) model to investigate noncrossing rail-trespassing crash injury severity. Correlates of injury severity were found to be nonstationary across space, which draws forth the problem of this study. The study by Zeng et al.  developed three temporal multivariate random parameters Tobit model to analyze crash rate by injury severity, which accommodated temporal correlation and unobserved heterogeneity across observations and correlations across injury severity. The findings supported the model with independent temporal effects as a good alternative for traffic safety analysis.
Spatial feature has been another significant aspect to address in injury severity analysis. Castro et al.  proposed a spatial generalized ordered response model to examine highway crash injury severity, which addressed the unobserved heterogeneity in the effects of contributing factors as well as spatial dependence in the injury severity levels. The results underscored the proposed model for data fit purposes as well as accurately estimating variable effects. Similar study by Narayanamoorthy et al.  accommodated spatial dependence in bicycle and pedestrian injury counts by severity level. After that, Xu et al.  investigated the risk of pedestrian injuries involved in traffic crashes at signalized intersections in Hong Kong with Bayesian spatial logit model. The proposed Bayesian hierarchical logit model with uncorrelated and spatially correlated random effects increased the model goodness-of-fit. From both macro- and micro- perspectives, Huang et al.  developed Bayesian spatial model with conditional autoregressive prior and Bayesian spatial joint model, respectively. The results found that the micro-level model has better overall fit and predictive performance, while the macro-level crash analysis provides better insights for non-traffic engineering issues. Successively, Bhat et al.  proposed a spatial random coefficients flexible multivariate count model to examine the pedestrian injuries by injury severity level. The analysis accommodated spatial dependency through a spatial autoregressive lag structure and captured spatial drift effects through the spatial structure on the constants and the slope heterogeneity effects, which takes much effort to the spatial feature analysis. Zeng et al.  presented Bayesian spatial random parameters Tobit model, and both spatial correlation between adjacent sites and unobserved heterogeneity across observations were addressed. Prato et al.  estimated the probability of pedestrian injury severity by considering built environment and spatial correlation with a linearized spatial logit model. The findings showed positive spatial correlations of crashes with the same severity outcomes and emphasized the role of built environment in the proximity of the crash. The latest study by Zeng et al.  developed a Bayesian spatial generalized ordered logit model with conditional autoregressive priors to examine severity of freeway crashes. The ordered nature in discrete crash severity levels and the spatial correlation among adjacent crashes were accounted for.
Recently, an increasing interest has arisen in modeling both spatial and temporal data in injury severity. Ouni and Belloumi  described the spatial pattern of vulnerable road users’ collisions according to different temporal scales and investigated the related risk factors for injury severity in Tunisia, but the method still employed standard multinomial logit model by considering both the geographical and temporal variation. In order to address the geographically nonstationary coefficients, Atkinson et al.  proposed the GWR model, whose basic assumption is Tobler’s first law of geography: “everything is related to everything else, but near things are more related than distant things”, and this is uniform with the crashes occurring along the roadways or around signalized intersections. So far, GWR model, one of several spatial regression techniques, has been increasingly used in various areas [37–39], especially in geography area [40–43]; however, the application of GWR in safety area [44, 45] is limited.
To this end, the purpose of this study is to propose the geographically weighted panel logistic regression (GWPLR) model, as an extension of GWR model, for the analysis of binary outcome data with spatial and temporal information. It can incorporate both spatial and temporal information into the logistic regression model, which accommodates the potential temporal correlation among the observations within specific locations as well as the unobserved heterogeneity. Using this model, we can figure out the influencing factors of injury severity in closely related locations at any time within the observation period.
2.1. Geographically Weighted Regression Model
The geographically weighted regression (GWR) model introduces a spatial feature to the ordinary logistic regression model by including X and Y coordinates, and each regression parameter relies on geographical location of the data. According to Fotheringham et al. , the basic expression for the GWR model can be described aswhere denotes the predicted response variable, represents the intercept parameter, in which is the coordinate point (latitude, longitude) for the location of i shown in ArcGIS, is a set of values of parameters at the i-location and i=1,2,…,n, refers to the vector of independent variable in i-observation, and refers to the error term.
Since the assumptions of the classical linear regression remain in place for GWR model, the estimation of GWR model can be conducted with the weighted maximum likelihood method, and the geographical factor is considered as the weighting factor, which has different value for each location in ArcGIS. The matrix form for estimating the GWR parameters can be given aswhere is diagonal matrix and different for each point i of coordinates , including the weights in its main diagonal, achieved through the weighting function, or kernel.
In essence, there are two main kernel functions as Gaussian and bisquare functions, and two types of fixed and adaptive kernels, i.e., four options for geographical kernel weighting. For the fixed kernel, distance is constant but the number of nearest neighbors varies so that using Gaussian function is secure, while in the case of the adaptive kernel, distance varies but the number of neighbors remains constant so that using bisquare kernel is better. However, different combinations of kernel functions and kernel types can be utilized; e.g., when the regression points are evenly distributed, a bisquare function is better even for a fixed kernel option, or in the case of binary logistic regression, the outcome distribution is unbalanced, and adaptive Gaussian as a better option can be employed, in which can be described aswhere represents the distance between the location and the location of , which is equal to , and h here refers to the nonnegative parameter, called bandwidth (fixed or adaptive). For more estimation details about GWR model, the reader can refer to Fotheringham et al. , Huang et al. , and Albuquerque et al. .
2.2. Geographically Weighted Logistic Model
Correspondingly, the geographically weighted logistic regression (GWLR) model is a special case of GWR model, focusing on the binary logistic regression model from a spatial aspect by including X and Y coordinates. The injury severity estimation computes the probability falling between 1 or 0, and injury and fatality or property damage only (PDO). The cut-off value is typically set as 0.5, indicating that if the injury severity probability is greater than 0.5, the value is assumed to be 1/injury and fatality, and if it goes to below 0.5, it is assumed to be 0/PDO. Particularly, the GWLR equation can be formulated aswhere is the probability of the injury severity, and function is the parameters (coefficients) of the k variables in the model, which vary with the location of coordinates , and the rest variables are the same as above. Specifically, the surrounding crash points within the kernel distance are weighted against the crash data point being computed so that the crash points close to the subject point put more influence than the crash points farther away; thus the value of is the result of the weighting function applied to the location of the crash data points.
The estimation of GWLR model can follow the GWR one and be represented by the following expression with the natural logarithm transformation (ln).The matrix features weights as shown in (3) and is used to geographically weight the observations in the estimation of each set of parameters , meaning that the matrix is responsible for assigning a larger weight to the geographically closest observations to point i, while assigning a smaller weight for the most distant observations from point i in the estimation of its parameters . The matrix composes the likelihood function as follows.After differentiating (6) in function of and equating to zero, the model parameters can be estimated using interactive numerical methods . The model can be realized by GWR4 software. More estimation details can be found in Fotheringham et al. , Wu and Zhang , Martinez-Fernandez et al. , Rodrigues et al. , and Albuquerque et al. .
2.3. Geographically Weighted Panel Logistic Regression Model
The two models aforementioned both belong to cross-sectional GWR case, which performs repeated estimation of a local regression at each point in space with a subsample of the data properly weighted according to the distance of each location to the target point. However, the time series of observations at a particular geographic location should be considered as a realization of a spatiotemporal process; thus the cross-sectional GWLR can be expanded into panel data (control for heterogeneity at the individual level), and the fixed or random effects GWLR models can be applied to obtain the coefficients of the explanatory variables at that specific location.
The specification of geographically weighted panel logistic regression (GWPLR) model can be expressed asor the vector formwhere represents explained variables vector, which can be expressed as the temporal levels , denotes spatial autoregressive coefficient, W represents spatial weight matrix, WY means the spatially lagged dependent variable, X is independent variable matrix, is the parameter matrix, t=1,…T periods, collects the effects of omitted variables and departures from the assumptions of the theoretical model, and are vectors of disturbance and individuals, and is the error term. Thus, (8) can interpret fixed effects and random effects model, and the difference between the proposed model and conventional panel data model lies in the nonstationarity of coefficients, which may vary from location to location.
Specifically, the GWPLR estimation can be conducted in the following steps: (1) a kernel function and the bandwidth h will be selected similar to the one in cross-sectional GWLR model; (2) subsample the spatial observations for i’s local estimation; (3) weigh all the time observations of j’s variables in levels for i’s location estimation; (4) estimate a panel data model using weighted subsampled data; (5) repeat steps 2 to 4 for all target locations i in the sample of data; (6) map the significant local coefficients for each explanatory variable; and (7) repeat the procedure (steps 1 to 6) for different h and compare the final results. More estimation steps can be found in the studies by Yu  and Bruna and Yu .
To examine the goodness-of-fit in the proposed models, there are two criteria: one is cross-validation score (CV) (only applicable to Gaussian models) and the other is the Akaike Information Criterion (AIC) . In order to compare the GWLR and GWPLR models, AIC is selected, and in order to avoid the sample bias adjustment in the AIC definition, a corrected AIC is referred to as AICc. The formulations are as follows:where D and K are the deviance and the effective number of parameters in the model with bandwidth b and n represents the number of observations.
The implementation of GWR and GWLR models and spatial analysis can be performed with software package GWR4, and the GWPLR model can be realized with the statistics software package R .
3. Data Description
The dataset was collected from the GIS open data site maintained by Nevada Department of Transportation (NDOT) from 2014 to 2016. The target population in this study was the major and minor arterials in the metropolitan Las Vegas area, including City of Las Vegas, City of North Las Vegas, City of Henderson, and Clark County. As known to us all, Las Vegas area attracts millions of tourists every year since it is an entertainment city, but most of them concentrate on City of Las Vegas and Clark County (located in central part), where the crash injuries account for 35% and 38%, respectively, while the injuries in City of North Las Vegas (located in the northwest part) and City of Henderson (located in the southeast part) occupy 15% and 12% correspondingly. The four areas cover different types of residents (locals and visitors) and take a variety of lifestyles (business, regular work, and casual fun), thus leading to different driving styles and injury severity levels. The sample was composed of 27 major and minor arterials and 25,029 injuries from the four areas. Four main components from the Traffic Safety Crash Data were included: the crash features, the vehicle (motorized and nonmotorized) profiles, roadway characteristics, and the crash environment.
In this study, 27 arterials and 25, 029 injuries were elaborately selected as shown in Figure 1, in which 7,647 injuries in 2014 account for 31%, 6,938 injuries in 2015 account for 28%, and 10,444 injuries in 2016 account for 42%. In Nevada, the injury severity is typically categorized as property damage only (PDO), injury and fatality. In our selected sample, the fatality only accounted for 0.5%. Given that the two adjacent injury categories were quite similar, merging the injury and fatality categories was not expected to substantially affect the inference. Consequently, the dependent variables in the proposed model were binary injury severity in which the response of interest referred to PDO, and injury and fatality were treated as the contrast. As required by the GWR model, each injury should include the X (latitude) and Y (longitude) coordinates, and the injuries within 100ft of arterials were buffered; thus not only the time and types of injury, but the location of each injury is collected and displayed in Arc GIS directly.
According to the vehicles involved during the injury, the explanatory variables reflecting the vehicle profiles include motorized and nonmotorized types; the former refers to the total vehicle, vehicle types, vehicle direction, vehicle action (e.g., changing lanes, making U-turn, and passing other vehicles), vehicle conditions (e.g., hit-and-run, mechanical defects, and driving too fast), and vehicle driver’s age and driver’s conditions (e.g., normal, fatigue, physical impairment, and distraction), while the latter includes pedestrian, pedal cyclist, and motorcyclist. In this study, according to the classification by NDOT, when the crash happens, if there are two or more vehicles involved, the vehicle with the main responsibility here is considered as vehicle 1, and the rest with minor responsibility is/are vehicle 2. After the dataset was cleaned, crashes involving two vehicles account for 87% injuries, which verifies the classification reasonably.
The roadway characteristics contain the number of vehicle lanes and roadway conditions (e.g., dry, wet, ice, snow, etc.), and the crash environment extracts the weather, lighting conditions, and first harm (e.g., median, fence, and pedestrian)
In order to evaluate the proposed models in GWR4 and R software, the categorical variables are digitalized, and all the variables collected are listed and summarized in Table 1 with the proportions of the categorical variables and the descriptive statistics of the continuous/indicator variables.
4. Results and Discussion
Before the proposed model estimation, the Spearman correlation test is conducted to examine the strength of the association between two ordinal variables (not relying on the assumption of normally distributed data) so as to avoid the multicollinearity issues among the independent variables. The test result shows that driver conditions of vehicle 2 are highly related to total vehicles involved, driver’s age of vehicle 2, and actions of vehicle 2; moreover, lighting condition is highly related to road conditions; thus they are not adopted at the same time as the independent variables. Other variables, such as highway factor and conditions of vehicle 2, did not show up in the results because they are not significant for the injury severity.
In order to address the spatial feature, conventional binary logistic regression model is performed to be compared with the proposed GWLR and GWPLR models, and geographical variability test is conducted to examine the spatial heterogeneity. Due to the unbalanced outcome distribution of binary logistic regression, adaptive Gaussian is considered as the kernel type. Bandwidth selection method follows the golden section search of software GWR4, and best bandwidth value is 122 nearest neighbors.
As one major asset, GWR can be used as exploratory tool, and a series of bandwidths can be selected so that the resulting parameter surfaces are inspected at different levels of smoothing. For three different adaptive bandwidths of 15, 60, and 130 of the nearest neighbors, in the kernel function, hi will be the distance from each region I to its , , and neighbor, respectively. Consequently, 14, 59, and 129 regions for each bandwidth were actually considered because the selection of these bandwidths is made of fairly exploratory nature to cover a relatively small, a somewhat average, and a fairly large amount of nearest neighbors . The larger the distance considered by a bandwidth, the greater the smoothing, since near locations play a less important role in each local estimation. Therefore, about 10 nearest neighbors at each selected bandwidth can reflect what the adaptive kernel function does, and the optimal bandwidth value of 122 in this study makes sense.
Table 2 gives the final results of three models with significant variables at 95% confidence interval. In terms of goodness-of-fit, conventional binary logistic regression model has the largest AIC value, and GWPLR model has the smallest while GWLR model is in the middle, which illustrates that GWPLR model performs better than the other two models. Geographical variability test is examined by model comparison, and AICc indicator suggests no spatial variability in terms of model selection criteria, which indicates that the coefficients of original GWLR and GWPLR models and corresponding switched models are not varying over space. This implies that the coefficients of the proposed models are stable after fitting; i.e., the fixed effects are considered for the proposed GWPLR model.
Following is some explanation for the significant variables of GWPLR model. The variable crash type reveals the significance with the injury severity, in which rear-end and sideswipe are two significant types on the basis of unknown crash type, but show positive and negative relation with injury severity, respectively. The positive coefficient of rear-end indicates that rear-end crash may increase the injury and fatality while the negative coefficient of sideswipe means that the sideswipe crash may lead to PDO more, which is uniform with Das and Abdel-Aty , and Meng and Qu .
The variable total vehicles involved in the crash give the positive impact on the injury severity, implying that the more the vehicles involved in the crash are, the more severe the injury would be. Generally speaking, if more vehicles are involved in the crash, there might be more than one crash; perhaps the second crash happens after the first conflict, thus leading to more injury severities and fatalities. This is consistent with the common sense, especially in urban arterials.
Considering the motorcycle as the base, vehicle 1 and vehicle 2 types in the crash are negatively related to injury severity, implying that, compared to motorcycle injury, the injury and fatality occurring in car, truck/bus, and pickup/van tend to be less severe. This, from the perspective of motorcycles, shows that the motorcycles are the most dangerous among the vehicle types since there is not much protection from injury. Albalate and Fernandez-Villadangos  confirmed the relationship between motorcycle injury and vehicle type, which supports our finding.
Compared to the driver’s age less than 16 years old, the drivers’ ages of vehicle 1 and vehicle 2 are positive to the injury severity, meaning that the drivers who are more than 16 years old tend to cause severe injury or fatality, which is a little disputable. The crash happening to the drivers who are more than 65 years old may lead to severe injury or fatality easily [29, 53], but this is not true for the drivers from 26 to 60 years old because during the period the driving experience and the physical function tend to be mature under normal conditions; thus further exploration is required to confirm this.
The actions of vehicle 1 involved in the crash, going straight, turning left and turning right are significant to the injury severity, but going straight and turning left are positive while turning right is negative, indicating that going straight and turning left may cause injury and fatality and turning right may lead to PDO. The finding is consistent with Xu et al. . Generally, turning right is allowed under permitted signal phases, and the conflict points are less; thus the injury is less severe, but going straight and turning left may cause more conflict points, even during the protected signal phases, so the injury may be more severe. However, as for the actions of vehicle 2 involved in the crash, going straight, stopping, and turning left are negatively associated with the injury severity, which is a little different from vehicle 1 involved. However, the injury caused by actions of vehicle 2 may be acceptable due to minor responsibility, and going straight, stopping, and turning left produce less injury severity than those of vehicle 1.
For the driver conditions of vehicle 1, inattention/distraction is positively related to injury severity, which implies that inattention/distraction leads to severe injury or fatality easily. Inattention/distraction has been paid more attention recently, especially after the occurrence of smart phones. Various studies have verified that inattention/distraction is one major contributing factor to crashes, particularly with young drivers [55–57]. This finding is in line with them.
As for conditions of vehicle 1 involved, disregarding traffic signs/signals/road markings, driving too fast, failure-to-yield right-of-way, and hit-and-run are significant to injury severity, whereas conditions of vehicle 2 are not significant, which implies that vehicle 1 takes main responsibilities compared to vehicle 2. However, disregarding traffic signs/signals/road markings and failure-to-yield right-of-way are positively associated with injury severity while driving too fast and hit-and-run are negatively related to injury severity. It is understandable for the former two vehicle conditions causing severe injury or fatality, but it is controversial for the latter two conditions, as driving too fast easily produces severe injury or fatality and the same with hit-and-run, whereas the results show the opposite, so further verification is required in next step.
Among the first harms, motor vehicle in transport and slow/stopped vehicle are positively significant to the injury severity, implying that the two factors may produce severe injury or fatality. Usually, motor vehicles in transport are heavy trucks or vans, and the roadway space occupied by the maneuver is larger; thus the injury caused is more severe than that by cars. There is no doubt about this. However, slow/stopped vehicles, as common sense, are supposed to be less injured when the crash occurs, but the situation may depend on whether the slow/stopped vehicles are active or passive. The passive vehicles may be more injured since they are driven by external force, while the active ones may not generate severe injury due to the slow speed.
In comparison with the unknown weather condition, cloudy condition shows a positive relation with injury severity. As confirmed by previous studies by Yu and Abdel-Aty  and Li et al. , weather conditions, especially adverse weather, have significant impact on injury severity, which supports our finding. In Las Vegas area, there are not many rainy days (only accounting for 1.9% in three years), so cloudy days may be adverse, except the clear days; thus it may increase the injury severity at certain degree.
Another significant factor is lighting condition, in which daylight and dark are positively concerned with injury severity. In accordance with the studies by Anarkooli and Hosseinlou , and Uddin and Huynh , different lighting conditions may cause injury severity on rural roads, work zones, and urban roadways, respectively. This is in line with our study. Due to comprehensive impact of various factors, such as traffic volume, travel speed, roadway conditions, and pedestrians, most of injury occurs in daylight (accounting for about 60%), which is reasonable. As for the dark light (accounting for about 30% injury), it takes longer time for drivers’ visual to capture the vehicles or pedestrians and then control the maneuver promptly; thus it makes sense that the injury severity is increased.
The last three significant factors, pedal cyclist, pedestrian, and motor cyclist, are vulnerable road users during the crash, and all are positively related to injury severity, which is reasonable. The finding conforms with the studies by Tin et al. , and Badea-Romero and Lenard , and more serious casualties come from head injury for pedestrians and pedal cyclists; hence helmet use is promoted for the vulnerable road users .
After examining the results of estimates for the significant variables, the local (one city) estimates are similar to those of the global (the four cities) model, implying that the geographically weighted method actually localizes the global results, regardless of the bandwidth selected. Next, the spatial heterogeneity can be reflected from the spatial distribution of the local estimates. The significant fixed effects estimates of vehicle 1 and 2 driver age are always positive; however, the magnitudes of the local estimates of these variables are extremely different for North Las Vegas and Las Vegas since the drivers of the two areas are mainly from local residents and visitors, respectively. Moreover, the adaptive Gaussian kernel was employed here to emphasize the analysis of the estimated level of each regional fixed effect when the local model allows the spatial heterogeneity of the global model so as to enumerate the levels of the dependent variable.
In order to verify the proposed model, the validation revealed that GWPLR model allows studying potential spatial heterogeneity in models controlling for individual heterogeneity, which can be regarded as one approach to addressing spatial nonstationarity in any model on the basis of pooled time observations for each location. Consequently, the proposed GWPLR model expands the range of spatial-temporal analysis.
In this study geographically weighted panel logistic regression model was proposed to analyze injury severity to identify the influencing factors. To address the heterogeneity issue due to unobserved factors and spatial attributes at different locations, panel logistic regression model was employed and estimated within geographically weighted model. The results showed that four main factors, human-beings (drivers/pedestrians/cyclists), vehicles, roadway, and environment, were potentially significant factors of increasing the injury severity.
Two key findings can be drawn from the results of the study. First, geographically weighted regression model can be transferred from geography area to derive the potential injury severity factors on urban areas. Second, geographically weighted panel logistic regression model, from the spatial-temporal point of view, can address heterogeneity issue by using panel data model and can accommodate the spatial features, which extends the range of spatial-temporal analysis.
Generally, the results of this analysis provide potential insights for practitioners and policy makers concerning safety along urban arterials from various aspects. Planners may need to improve arterial design to accommodate the appropriate activities of pedestrians/cyclists, while operators may optimize the signal coordination to avoid the confliction. Safety officials need to provide education programs and take suitable measures for reducing the risks to drivers and pedestrians/cyclists, while the drivers should pay more attention to the traffic signs and roadway environment and avoid the inattention/distraction as much as possible. In terms of empirical concerns, practitioners and policy makers can take various efficient, coherent, and suitable measures to improve safety along arterials. Such measures can include building up median alternatives to separate motorized vehicles from nonmotorized ones, using different light intensities during different time periods and various weather conditions, or setting up electronic signs during certain weekdays and vague weather conditions.
Specifically, the GWPRL model exploratory analysis indicates that when the data is subsampled and weighted, the local estimates can be around the global estimates, which is expected. Although the bandwidth optimization technique was not considered here, relevant findings were displayed potentially; the spatial distribution of the estimates reveals heterogeneity. The magnitudes of driver age are different for different regions, which reflects the spatial heterogeneity of the model to explain the dependent variable. The policy maker may be interested in examining a particular region for the global and local model to make corresponding measures for specific influencing factors.
Some weakness still exists in this study. One weakness is that as the fatality only accounted for a small proportion, the fatality and injury were merged into one category, so future efforts to consider the small proportion of fatal crashes as one separate category could be tried. Moreover, since the results of the study were based on the dataset from Las Vegas area, it is worthwhile to try out different data sources to confirm the findings and transferability of this study in future studies. The future study about the proposed GWPLR model includes optimizing estimation procedure, visualizing the model results, developing random effects models, and exploring the method of geographically weighted time series model.
Injury severity data were obtained from Traffic Safety Crash Data. The data that support the findings of this study are maintained by Nevada Department of Transportation, which can be accessed through the following website: https://data-ndot.opendata.arcgis.com/search.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
The authors would like to thank the Nevada Department of Transportation (NDOT) providing the GIS open crash data. This study was supported by Fundamental Research Fund for the Central Universities [HUST: 2018KFYYXJJ001].
F. Ye and D. Lord, “Investigation of effects of underreporting crash data on three commonly used traffic crash severity models,” Transportation Research Record, no. 2241, pp. 51–58, 2011.View at: Google Scholar
D. N. Moore, W. H. Schneider IV, P. T. Savolainen, and M. Farzaneh, “Mixed logit analysis of bicyclist injury severity resulting from motor vehicle crashes at intersection and non-intersection locations,” Accident Analysis & Prevention, vol. 43, no. 3, pp. 621–630, 2011.View at: Publisher Site | Google Scholar
M. Xie, W. Cheng, G. S. Gill, J. Zhou, X. Jia, and S. Choi, “Investigation of hit-and-run crash occurrence and severity using real-time loop detector data and hierarchical Bayesian binary logit model with random effects,” Traffic Injury Prevention, vol. 19, no. 2, pp. 207–213, 2018.View at: Publisher Site | Google Scholar
J. Martínez-Fernández, E. Chuvieco, and N. Koutsias, “Modelling long-term fire occurrence factors in Spain by accounting for local variations with geographically weighted regression,” Natural Hazards and Earth System Sciences, vol. 13, no. 2, pp. 311–327, 2013.View at: Publisher Site | Google Scholar
L. Wu, F. Deng, Z. Xie et al., “Spatial analysis of severe fever with thrombocytopenia syndrome virus in China using a geographically weighted logistic regression model,” International Journal of Environmental Research and Public Health, vol. 13, no. 11, p. 1125, 2016.View at: Publisher Site | Google Scholar
P. Widyaningsih, D. R. Sari Saputro, and A. N. Putri, “Fisher scoring method for parameter estimation of geographically weighted ordinal logistic regression model,” Journal of Physics: Conference Series, vol. 855, no. 1, Article ID 012060, 2017.View at: Google Scholar
A. S. Fotheringham, C. Brunsdon, and M. Charlton, Geographically Weighted Regression: the Analysis of Spatially Varying Relationship, John Wiley & Sons, Chichester, UK, 2002.
D. Yu, “Exploring spatiotemporally varying regression relationships: the geographically weighted panel regression analysis,” The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Science, vol. 38, pp. 134–139, 2010.View at: Google Scholar
F. Bruna and D. Yu, “Geographically weighted panel regression and development accounting for European Regions,” in Proceedings of the 6th Seminar Jean Paelinck in Spatial Econometrics, Madrid, Spain, October 2013.View at: Google Scholar
A. S. Mclntosh, K. Curtis, T. Rankin et al., “Associations between helmet use and brain injuries amongst injured pedal-and motor-cyclist: A case series analysis of trauma centre presentations,” Journal of the Australasian College of Road Safety, vol. 24, no. 2, pp. 11–20, 2013.View at: Google Scholar