Abstract

Red-light running behaviors of bicycles at signalized intersection lead to a large number of traffic conflicts and high collision potentials. The primary objective of this study is to model the cyclists’ red-light running frequency within the framework of Bayesian statistics. Data was collected at twenty-five approaches at seventeen signalized intersections. The Poisson-gamma (PG) and Poisson-lognormal (PLN) model were developed and compared. The models were validated using Bayesian values based on posterior predictive checking indicators. It was found that the two models have a good fit of the observed cyclists’ red-light running frequency. Furthermore, the PLN model outperformed the PG model. The model estimated results showed that the amount of cyclists’ red-light running is significantly influenced by bicycle flow, conflict traffic flow, pedestrian signal type, vehicle speed, and e-bike rate. The validation result demonstrated the reliability of the PLN model. The research results can help transportation professionals to predict the expected amount of the cyclists’ red-light running and develop effective guidelines or policies to reduce red-light running frequency of bicycles at signalized intersections.

1. Introduction

In recent years, the bicycle has been widely used as an important traffic mode, especially for a commuting trip or recreational trip [1, 2]. Bicycles provide users with convenient, flexible, and affordable mobility, constituting an important supplementation to the urban transit system. Bicycle has also been recognized as an environmentally friendly mode of transport [36]. In China, bicycle use has significantly increased during the past several decades mainly due to the increasing congestion in most of the large cities. A study in 2010 showed that average bicycle modal share for urban trips accounts for 38% in China [7]. Because of the advantage of no pollution emission, low carbon, and low noise, the government is showing an interest in promoting bicycles [36, 8].

Despite all the obvious advantages, the use of bicycles has also raised some issues and concerns regarding their safety impacts. In 2011, 8,776 cyclists were killed and 35,552 were seriously injured in road accidents [9]. The fatalities and injuries of cyclists accounted for 82.3% and 84.6% of the nonmotorized traffic fatalities and injuries in 2011 in China [9]. Accident analysis reveals that approximately 43% of fatal crashes involving bicycles result from violation of traffic rules [9]. As one of the most overt illegal behaviors, red-light running at signalized intersection is very common in China. According to a previous study, e-bikes and bicycles contributed to 22.4% of the incidents, in which red-light running was found to be the predominant factor [10].

Previously, several studies have investigated the red-light running behavior of bicycles [1115]. For example, a recent study by Wu et al. [11] observed 451 two-wheelers facing a red light. It was found that 56% of the two-wheelers crossed the intersection against a red light in China. A cross-sectional survey study by Bacchieri et al. [12] in Brazil showed that the red-light infringement rate reached 38.4% of male commuter cyclists. By contrast, the cyclists’ red-light running behavior proportion was low in Australia [13, 14]. The observational studies reported relatively low infringement rates from 7% to 9% in Australia.

Some studies focus on the impact factors of cyclists’ red-light running violation [11, 13, 1620]. Wu et al. [11] developed a binary logit model to identify the significant factors that affect two-wheeled rider’s red-light running behavior likelihood. It was found that the main factor for red-light running was age, with the young and middle-aged riders being more likely than the old ones to run against a red light. A further study by Zhang and Wu [16] in 2013 showed that sunshields installed at the intersection could reduce the red-light running behavior of cyclists and e-bike riders. The results showed that riders were 1.376 times more likely to run against traffic light upon intersection without sunshields than with shields. Studies for cyclists in Australia [11, 18] have found that the three main factors for cyclists’ infringement were travel directions, the presence of other road users, and the volume of cross traffic. Cyclists turning left are 28.3 times more likely to run against red lights than cyclists who continued straight through the intersection. It also found that, for gender, males are more likely to offend than females, for age, older cyclists are less likely to infringe compared to younger cyclists, and, for crash involvement, cyclists are more likely to infringe at red lights if they had not previously been involved in a bicycle-vehicle crash while riding.

From the previous research on the red-light violations, a basic conclusion showed that individual characteristics, such as gender and age, are found to be important factors affecting the red-light running behaviors. Some other factors, such as the presence of other road users, group size, and traffic volume, are found to have an effect on crossing behaviors too. Until recently, however, little documentations have been available regarding modeling the cyclists’ red-light running frequency in an aggregate way. Previous studies are limited in their capacity to explore if the cyclists’ red-light running frequency can be modeled, what factors may affect the frequency, and what kind of model suits the cycles’ red-light running count. Research is needed to better understand the above issues.

The primary objectives of this research are to develop cyclists’ red-light running frequency models within the framework of an advanced Bayesian statistical approach and to validate the reliability of the developed models. The focus of this study was on signalized intersections where the red-light running of bicycle often constituted a safety concern.

2. Data Collection

Field survey was designed to get the amount of cyclists’ red-light running frequency and the possible influential factors including road geometric design, environmental condition, and traffic condition. Field data collection was conducted at twenty-five approaches at seventeen signalized intersections in the city of Nanjing in China. Nanjing is one of the biggest cities in East China by the year of 2012 with a population of 8.16 million and an area of 6,597 square kilometers.

The sites were carefully selected such that their geometric design and traffic control features represent the most common situations in major cities in China. More specifically, (a) there have to be a reasonably high number of bicycles during the observation period for the data extraction effort to be efficient and (b) each intersection should have pedestrian signals, in order to judge the red-light running behavior.

Field data collection was only conducted during weekday peak periods, under fine weather conditions, and when traffic police was not present. Two synchronized video cameras were set up in the field for data collection. One camera (camera A) was placed beside the crosswalks to film the cyclists’ whole crossing process and the other camera (camera B) was set up on top of a roadside building to observe the traffic volume and bicycle volume. The cameras were carefully placed so that the cyclists were unaware that they were being observed. A total of 47.5 hours of data was recorded at the selected sites.

The recorded videos were then reviewed in the lab for data reduction. A trained graduate student was designated to review all the videos to ensure that consistent criteria were applied for identifying the crossing behaviors at different sites. From camera A, information of cyclists’ crossing behavior including red-light running behavior and non-red-light running behavior was extracted. From camera B, information of traffic condition, such as bicycle flow volume and conflict traffic volume, was extracted. Since the e-bikes run much faster than conventional ones, the type of bicycle (e-bikes or conventional bicycle) was recorded. The information of bicycle speed and vehicle speed was also extracted from camera B. The speed of bicycle was estimated using the VideoStudio software. VideoStudio can process the video files in a frame-by-frame way at a rate of 25 frames per second so that the observer can identify the speed of bicycles by comparing their locations in different frames. And vehicle speed was estimated using the same previous method. The road geometrical and environmental conditions such as lane width, roadway width, and pedestrian signal type on selected sites were also recorded by the investigators during the survey.

Note that, according to the traffic signal, three crossing behaviors were classified as “red-light running” defined as three types: cyclists who cross the intersection during the red signal; cyclists who begin to cross when the signal is green but do not finish during the green signal; cyclists who cross part of the intersection during the red signal and then continue crossing during the green signal.

In total, 2961 red-light running behaviors were identified at the selected signalized intersections. The original data collected in 1-minute time intervals were then aggregated into 5-minute levels, resulting in a sample size of 570.

3. Methodology

3.1. Poisson-Gamma (PG) Model and Poisson-Lognormal (PLN) Model

In this study, the Poisson-gamma (PG) model and Poisson-lognormal (PLN) model were used to fit the cyclists’ red-light running frequency observed during a particular time interval. Let represent the amount of cyclists’ red-light running for the th specific time period. It is assumed that the frequencies are independent and that

The Poisson distribution assumes that the mean equals the variance. However, this assumption does not suit this case because the variance of the red-light running amount is often greater than the mean. To deal with the overdispersion for unobserved or unmeasured heterogeneity, it is assumed thatwhere is the expected amount of red-light running for the th time period and represents a multiplicative random effect to model possible overdispersion in cyclists’ red-light running counts.

A logarithm link function connects to a linear predictor, given as where are coefficients and are explanatory variables.

The PG model is obtained by the following assumption:where is the inverse dispersion parameter. The dispersion parameter is usually given as .

For the PG model, the mean and variance are given as follows [21]:

Similarly, the PLN model is obtained by the following assumption:where represents the extra-Poisson variance.

For the PLN model, the mean and variance are given as follows [21]:

3.2. Bayesian Estimation and Prior Distributions

The PG model and PLN model are estimated in a full Bayesian context via MCMC (Markov Chain Monte Carlo) simulation. To obtain the posterior distribution of the model parameters , (under PG model), and (under PLN model), prior distributions of these parameters should be given firstly. Prior distributions are meant to reflect to some extent prior knowledge about parameters of interest. If such prior information is available, it should be used to formulate the so-called informative priors. In contrast, uninformative (vague) priors are usually used to reflect the lack of prior information. Since prior information about the PG model and PLN model parameters is not available, the following uninformative prior distributions are used:where is vector of zeros and is identity matrix. where and . where .

Let and , based on the specification of the prior distribution, the posterior joint distribution (under PG model) and (under PLN model) can be derived using Bayesian inference approach as follows:

3.3. Model Comparison

The deviance information criterion (DIC) was used for model comparison. Among the candidate models, the one with the lowest DIC is considered as the best one. DIC can be calculated by [22]where is the posterior mean of the unstandardized deviance of the model, is the point estimate of the model’s parameters, and is the number of valid parameters in the model.

3.4. Data Description

The dependent variable of the model was the amount of cyclists’ red-light running in five minutes, with a sample size of 570. The data collected in 1-minute time interval was not considered due to too many zeros (no red-light running behavior in 1 minute) in the samples. Thirteen explanatory variables, including road geometric design, environment condition, and traffic condition, were initially considered for the models specification. Descriptive statistics of the initially considered explanatory variables are included in Table 1.

4. Results

4.1. Model Specification

The models were specified using the software package WinBUGS. The MCMC sampling techniques were used to approximate the posterior distributions (mean and standard deviation) of the model parameters. Two Markov chains for each parameter in the models were run 20,000 iterations, 10,000 of which were excluded as a burn-in sample. Monitoring the convergence is important since it ensures that the posterior distribution has been found. The convergence was monitored by several ways. Convergence of the two chains is assessed using the Brooks-Gelman-Rubin (BRG) statistic. A BRG value less than 1.2 indicates convergence [23]. Convergence is also monitored by visual inspection of the MCMC trace plots.

The results of the model specification are illustrated in Table 2. Parameters in the final model are significant at the 95% confidence level (i.e., the ranges do not include a value with sign different from the mean). Generally, the two probability distributions provide similar parameter estimates. The final equations of the two models are given aswhere and represent the expected amount of cyclists’ red-light running during a 5-minute time period, represents the bicycle flow volume, represents the conflict traffic volume, represents the pedestrian signal type (1: countdown signal; 0: flashing signal), represents the average speed of vehicle, and represents proportion of e-bikes.

As observed from Table 2, the DIC value for the PG model is 1097, whereas, for the PLN model with the same response variables, the DIC value is 1063. It is assumed that a difference of more than 10 in the DIC value might rule out the model with higher DIC [22]. As the drop in the DIC value is 34, the analysis of the DIC suggests that the PLN model outperforms the PG one.

A positive coefficient sign indicates that the cyclists’ red-light running frequency increases with the increase of the corresponding parameter, whereas a negative coefficient sign indicates that the red-light running frequency decreases with the increase of the corresponding parameter. The coefficients for bicycle flow volume were found with a value approximately around 1.63, which means that the cyclists’ red-light running count increases more rapidly than traffic volume. The e-bike rate was found highly significant with positive signs in the models, indicating that an increase in the proportion of e-bikes in bicycles also increased the total cyclists’ red-light running frequency. Furthermore, the coefficient associated with pedestrian signal type was reasonably found positive, ranging between 0.514 and 0.564, which means a decrease of 67.2% and 75.7% of cycles’ red-light running by replacing a countdown signal with a flashing signal (i.e., ; ). The coefficients for conflict traffic volume and vehicle speed are significantly negative, implying that the amount of red-light running decreased with the increase in conflict traffic flow volume and vehicle speed.

4.2. Model Validation

The Bayesian posterior values were used to assess the goodness-of-fit of the model. Only PLN model was considered for the validation procedure as it provided a lowest DIC, which suggested that the PLN model provides a better fit to the data set. This procedure firstly generates replicated data set (simulation data set) based on the postulated model and then compares the simulation data set with the observed data set through the discrepancy statistics. The probability that the simulated data set could be more extreme than the observed one is measured bywhere is the discrepancy statistic and is the model parameters. A model is considered suspect if the observed value has a tail area probability close to 0 or 1 [24].

Three discrepancy statistics were selected to check potential failing of the model as follows:where denotes either the simulated data or the observed data .

The Bayesian values are reported in Table 2 for , , and discrepancy statistics, respectively, which are around 0.5 and 0.6. The Bayesian values suggest that the models fit well the red-light running count observed as the probability of regression residuals from simulated data. Therefore, replication of cyclists’ red-light running frequency using the developed models is likely to be close to the amount of cyclists’ red-light running frequency observed on site.

5. Conclusion and Discussions

This study evaluated the application of PG model and PLN model developed using Bayesian statistical techniques for modeling the frequency of cyclists’ red-light running behavior at signalized intersection. Data were collected at seventeen signalized intersections in the city of Nanjing. In total, 2,961 cyclists’ red-light running behaviors were observed at the selected sites. In detail, the amount of cyclists’ red-light running was modeled as events that occur randomly in a given interval of time (i.e., 5 minutes) under the assumption of Poisson distribution. Overdispersion of the count data was accounted for by adding a multiplication random effect term with gamma and lognormal distribution in the original Poisson regression model, resulting in the PG model and the PLN model. With the Bayesian framework, the model specification results demonstrated that the two models can fit the observed data; however, the PLN model outperformed the PG model by comparing the DIC values. The validation procedure of the developed models was conducted using the Bayesian value. Three discrepancy statistics were selected in the procedure. The analysis of the Bayesian values that are far from 0 or 1 indicates the reliability of the models.

The cyclists’ red-light running frequency predictive model developed in this study can be used by the researchers or agencies to estimate the expected amount of cyclists’ red-light running frequency given information such as the bicycle, conflict traffic volume, and traffic control. In addition, the research results are helpful to provide the direction for policies and countermeasures aimed at reducing the amount of red-light running of bicycles. For example, since a flashing signal can reduce the red-light running frequency, it could be installed at the signalized intersection instead of countdown signal. It is also found true that the high proportion of e-bikes increases the red-light running frequency at signalized intersections; a license management of e-bike might be made by law, aiming at enhancing the enforcement of e-bikes for road safety [8, 11, 25].

There are several limitations in the present study. The data used for model specification was only with a small sample size of 570Data need to be collected at more signalized intersections with heterogeneous traffic, geometric, and traffic control. More variables should also be added into the prediction models. Therefore, further research should expand the use of state-of-the-art models, such as random parameter Bayesian models, to account for the heterogeneity. In addition, the survey was conducted during fine weather. A further study could be conducted during different weather condition to evaluate the impacts of different weather conditions on red-light running frequency of bicycles; some other factors such as the pavement markings [26] and traffic conflicts [27] would also be considered in further study.

Competing Interests

The authors declare that there are no competing interests regarding the publication of this paper.

Acknowledgments

This research was sponsored by the National Natural Science Foundation of China (no. 51478110 and no. 51508122), the Scientific Research Foundation of Graduate School of Southeast University (no. YBJJ1533), the Fundamental Research Funds for the Central Universities, and the Scientific Innovation Research of College Graduates in Jiangsu Province (no. KYLX_0173).