Abstract

Free flow speed is a fundamental measure of traffic performance and has been found to affect the severity of crash risk. However, the previous studies lack analysis and modelling of impact factors on bicycles’ free flow speed. The main focus of this study is to develop multilayer back propagation artificial neural network (BPANN) models for the prediction of free flow speed and crash risk on the separated bicycle path. Four different models with considering different combinations of input variables (e.g., path width, traffic condition, bicycle type, and cyclists’ characteristics) were developed. 459 field data samples were collected from eleven bicycle paths in Hangzhou, China, and 70% of total samples were used for training, 15% for validation, and 15% for testing. The results show that considering the input variables of bicycle types and characteristics of cyclists will effectively improve the accuracy of the prediction models. Meanwhile, the parameters of bicycle types have more significant effect on predicting free flow speed of bicycle compared to those of cyclists’ characteristics. The findings could contribute for evaluation, planning, and management of bicycle safety.

1. Introduction

Traffic safety and crash risk of both motorized vehicles and bicycles are the high-priority issues to traffic engineers and researchers [14]. Recently, with the rapid growth of bicycles (including classic bicycles and electric bicycles) in developing countries such as Vietnam, Malaysia, Indonesia, and China, there have been many efficiency and safety problems for bicycle traffic flow. Although there are many significant environmental, climate, congestion, and public health benefits for cycling, bicycle crash is still a serious issue [5]. According to the statistical data from the Ministry of Public Security in China [6], the percentages of deaths and injuries of cyclists in all travel modes have been increasing, up to around 15% and 17%, respectively. In 2012, there were nearly 9,000 people who died in bicycle traffic crashes in China. Therefore, the improvement of bicycle safety is very important and urgent for both traffic engineers and researchers.

Speed is a fundamental measure of traffic performance of a highway system and can be widely used to describe the condition of the traffic flow and as an input for travel time, delay, and level of service determination [7]. Meanwhile, speed is also an important factor in road safety. There have been lots of studies having found that speed not only affects the severity of a crash but is also related to the risk of being involved in a crash [8]. There is a strong relationship between crash risk of motorized vehicles and speed under free flow conditions [9]. Similar conclusions can also be found for bicycle traffic flow [10]. Therefore, modelling and analysis of impact factors on bicycle free flow speed or crash risk are very useful and will provide the basis for improved bicycle traffic safety.

The previous studies on bicycle speed focus on the determination of bicycle free flow speed and speed distribution. Liu et al. [11] reported the mean of observed bicycle free flow speed was approximately 14 kph. Wei et al. [12] reported that the peak-hour free flow speeds of bicycle with and without separated barrier are 18.2 kph and 13.9 kph, respectively. According to Allen et al. [13], the bicycle free flow speed appears to be somewhere between 10 kph and 28 kph, with the majority of the observations being between 12 kph and 20 kph. Cherry [14] found free flow speeds of bicycles in Shanghai were 18.2 kph and 13.0 kph for electric bicycles and classic bicycles, respectively; free flow speeds in Kunming were similar, at 17.9 kph for electric bicycles and 12.8 kph for classic bicycles. Lin et al. [15] found the free flow speeds for both electric bicycles and classic bicycles in Kunming were 21.86 kph and 14.81 kph, respectively. Similar results (21.86 kph for electric bicycles and 14.81 kph for classic bicycles) have also been found in Hangzhou by Jin et al. [16]. In terms of bicycle speed distribution, Dey et al. [17] proposed a speed distribution curve model under mixed traffic conditions, including both fast-moving vehicles (e.g., cars/jeeps, trucks/buses, two-wheelers, and three-wheelers) and slow-moving vehicles (e.g., bicycles and tractors). Lin et al. [15] used the lognormal distribution to fit the heterogeneous bicycle speed data. Wang et al. [18] analysed the impact of various factors on the speed of heterogeneous bicycle flow and used normal distribution to fit the bicycle speed samples.

Most studies emphasize modelling the relationships between free flow speed and such factors as geometric features, traffic characteristics, traffic control, environmental features, weather conditions, and driver’s experience and characteristics [1923]. However, the majority of existing models are only applicable to predict the speed for cars [23]. The impact factors on free flow speed for motorized vehicles are significantly different compared to bicycle traffic. To the best of our knowledge, there was little research focus on modelling the affecting factors on bicycles’ free flow speed. The authors believe that this research would be helpful in evaluating and improving the safety of bicycle traffic flow, particularly at high heterogeneous bicycle flow locations.

The contribution of this paper is to develop artificial neural network (ANN) models to predict free flow speed for bicycle traffic with considering some impact factors. Four different models, namely, Model 1, Model 2, Model 3, and Model 4, were developed considering different categories of contributing factors. The characteristics of different models were analysed and compared. It is expected that the developed models may be useful for future prediction of bicycles’ free flow speed or crash risk under different cycleway features, traffic conditions, bicycle types, and/or characteristics of cyclists.

2. Data Collection

2.1. Model Parameters

Selection of model parameters is a critical task to model the relationship between bicycle free flow speed and its contributing factors. Based on the literature review and analysis of bicycle traffic flow [16], the input parameters of the proposed models could be divided into the following four groups: cycleway features (e.g., cycleway width, pavement condition, and geometric feature), traffic conditions (e.g., flow, speed, and density), bicycle types (e.g., electric bicycles), and characteristics of cyclists (e.g., age, gender, and alcohol consumption). Because the cycleway conditions are good and separated with motorized vehicle by barriers at the survey sites, the pavement conditions and geometric features have little effect on cyclists. Therefore, only cycleway width (CW) was considered in the proposed models. The traffic conditions category only includes bicycle flow per hour per meter (BF). Bicycle types in China typically consist of three categories: classic bicycle (CE), bicycle-style-electric-bicycle (BSEB), and shooter-style-electric-bicycle (SSEB). The bicycle type parameters hence include percentage of BSEBs (PBS) and percentage of SSEBs (PSS). Considering the difficulty of bicycle data collection, four characteristic parameters of cyclists including percentage of male cyclists (PMC), percentage of young cyclists (PYC), percentage of middle-aged cyclists (PMAC), and percentage of loaded cyclists (PLC) were selected in this paper. The selected input parameters of models are listed in Table 1.

2.2. Data Survey

Field bicycle data used in this study were collected from eleven cycleways in Hangzhou, China. The widths of cycleway range from 2.27 to 4.60 m. All of the survey sites are straight and low gradient, located at least 100 meters away from intersection, and separated with motorized vehicle lane. The cameras were set up on the roadside of the cycleway to record the operation of bicycle traffic. Video surveillance application could record the movement of bicycles and the flow and speed can be automatically calculated. The other parameters (e.g., bicycle type and age and gender of cyclists) could be recorded and coded manually. In this paper, bicycle type consists of three categories: CE, BSEB, and SSEB. Cyclists’ genders are easily distinguished and recorded by investigator. According to cyclists’ age, the cyclists were divided into three groups: the young (under 40), the middle-aged (between 40 and 60), and the elderly (over 60). The loaded cyclist means a cyclist who is carrying something (including an object or a person) on his/her bicycle. From the collected bicycle data, the descriptive statistics of model parameters can be found in Table 1. From the table, it can be found that average bicycle flow is 590 bicycles/h/m, and the average percentages of BSEB, SSEB, male, young, middle-aged, and loaded cyclists are 16.78%, 53.11%, 65.64%, 64.14%, 28.10%, and 11.19%, respectively. Each parameter has wide range and is suitable for modelling the relationships with free flow speed of bicycle.

2.3. Estimation of Free Flow Speed for Bicycle Traffic

The free flow speed of bicycle flow is the speed of bicycles under low volumes and low densities and is the most important parameter for cycleway capacity estimation, LOS, and speed limit. Because it is difficult to determine which traffic conditions are of low volumes and densities, the 85th percentile speed is usually used as the free flow speed [24]. The 85th percentile speed of bicycle is the speed below which 85 percent of cyclists travel and is the most frequently used for speed limit design. The TRB special report also shows that the 85th percentile speed is an important descriptive statistic in evaluating road safety [25]. Therefore, in this study, we use the 85th percentile speed of bicycle flow as the free flow speed and the crash risk indicator for the evaluation of bicycle safety.

3. Artificial Neural Network Models

Artificial neural networks are a family of statistical learning models inspired by biological neural networks and are used to estimate or approximate functions that can depend on a large number of inputs and are generally unknown [26]. ANN models are widely used in modelling free flow speed of motorized vehicles [2731].

3.1. Description of Models

An ANN model can include multiple input variables to predict multiple output variables. In this study, four ANN models were developed with or without considering different input variables such as bicycle types and characteristics of cyclists. The purpose of including different input variables in the modelling of bicycles’ free flow speed and crash risk is to analyse and assess whether or not the selection of input variables will affect the performance of the developed ANN models.

Table 2 lists four categories input parameters used for each model. Cycleway features and traffic flow parameters have been proved to have important effect on free flow speed; hence, these two parameters are always included in four models. Model 2 and Model 3 include the input parameters of bicycle types and characteristics of cyclists, respectively; meanwhile, Model 4 selects both bicycle types and characteristics of cyclists as the input variables. The tick marks (✓) in Table 2 indicate that the input parameters are included in the modelling process as input variables. The developed four ANN models will provide selectivity and flexibility for considering suitable input variables in the prediction of free flow speed of bicycle flow.

3.2. Network Architectures

A back propagation ANN (BPANN) model, as shown in Figure 1, was introduced for modelling bicycle free flow speed. BPANN model is one of the most well-known ANN models applied in many areas [26]. The goal and motivation for developing the backpropagation algorithm are to find a way to train a multilayered neural network such that it can learn the appropriate internal representations to allow it to learn any arbitrary mapping of input to output.

The three-layer BPANN architecture of this study is listed in Figure 1. Multilayer BPANN is a layered parallel processing system consisting of input layer, output layer, and hidden layer [32]. According to Figure 1, , , and are subscripts for input, hidden, and output layers, respectively. The number of input and output parameters and hidden nodes is , , and , respectively. The number of nodes in input and output layers ( and ) corresponds to the number of input variables and output variables. The number of nodes in the hidden layer () should be determined by the network designer and number of input and output variables. The weight factors for hidden layer and output layer are and , respectively. The values and are problem dependent. In this study, the values of are 2, 4, 6, and 8 for Models 1, 2, 3, and 4. The output parameter is free flow speed, and thus the value is one.

The number of nodes in hidden layer has significant effect on the performance of BPANN models. According to the previous research, the number of nodes in hidden layer should meet the following conditions:where is an integer between 0 and 10. Because the four models have different numbers of input variables, we use the same number of nodes in hidden layer for comparison. Therefore, in this paper, the number of nodes in hidden layer was set as 10 for all of models.

3.3. BPANN Algorithm Process

The backpropagation learning algorithm for ANN can be divided into two phases: propagation and weight update. The detailed algorithm processes are listed as follows.

3.3.1. Initialization of BPANN

The input variables are expressed as ; the input vector in hidden layer is ; the output vector in hidden layer is ; the input vector in output layer is ; the output vector in output layer is ; and the expected output result is . The error function of each output sample is expressed aswhere is the error of each output sample.

Interconnecting weights were assigned some random numbers. The given precision and maximum learning number are set.

3.3.2. Calculating the Inputs and Outputs in Hidden Layer

The input and output values in hidden layer could be calculated by the following equations:where and are the critical values of neurons in hidden layer and in output layer, respectively. is the logarithmic sigmoid transfer function which is represented by the following equation:

3.3.3. Calculating Partial Derivative of Error Function

The partial derivative of the error function for each neuron in output layer can be expressed as follows:

The partial derivative of the error function for each neuron in hidden layer can be also expressed as follows:

3.3.4. Adjustment of Interconnecting Weights

The adjustment of interconnecting weights for hidden layer and output layer is expressed aswhere and are the changes of weight values for hidden and output layers; is the number of iterations; is the learning rate, a parameter selected for the magnitude of change in interconnecting weights.

3.3.5. Calculating the Total Error

The total error of all training samples can be calculated aswhere is the serial number of training samples and is the number of training samples.

3.3.6. Iteration Termination Conditions

If or the number of iterations is larger than preset maximum learning number , then stop the ANN algorithm and output the results. Otherwise, return to the second step and begin the next learning iteration.

4. Results and Discussion

The BPANN codes were developed using a commercial software named MATLAB. The field bicycle data for training, validation, and testing are collected from eleven bicycle paths in Hangzhou, China [33]. 459 samples were collected, and 70% of total samples (321 samples) were used for training, 15% (69 samples) for validation, and 15% (69 samples) for testing. Detailed descriptive statistics of field data can be found in [16] and Table 1. The BPANN with 2-10-1, 4-10-1, 6-10-1, and 8-10-1 architectures for Models 1–4 are trained and validated. The trained models were tested by 69 samples which were not used in the training and validation stages.

Before training, in order to improve the training performance of the BPANN, it is often useful to scale the field input variables so that they always fall within a specified range. Therefore, in this study, field sample data is normalized in the range by using the following formula:

The strength of each training, validation, and testing stage was evaluated by calculating the error and regression coefficient . Learning performance plots of four BPANN models are shown in Figure 2, and the regression analysis plots of four models for training, validation, and testing are presented in Figures 36.

The performance indicators, the mean absolute percentage error (MAPE) and the root mean square error (RMSE), for the testing samples were proposed [34]. These two indicators are given by the following equations:where is the predicted free flow speed of bicycle for the th testing sample; is the observed free flow speed for the th testing sample; is the number of testing samples.

The correlation coefficient (), MAPEs, and RMSEs of four models are listed in Table 3, and the observed and predicted free flow speeds are illustrated in Figure 7. From the figure and the table, we have the following findings:(1)It is seen that all four BPANN models predict free flow speed with less errors, and the absolute speed differences are less than 2 kph. The results indicate that these models are all excellent in predicting the free flow speed. Model 1 including minimum input variables also performs well in predicting the free flow speed.(2)It can be also found that Model 2 and Model 3 have higher accuracies than Model 1. It is evident that the inclusions of bicycle types and characteristics of cyclists greatly improves the performance of Model 2 and Model 3 compared to Model 1. Different from motorized vehicles, characteristics of cyclists can be observed and analysed. Model 3 shows that using the input variables of cyclists’ characteristics produces a slightly higher rate of accuracy compared to Model 1.(3)Comparing Model 2 and Model 3, it can be found that the performance of Model 2 is better than that of Model 3. This implies that bicycle type has more significant effect on bicycles’ free flow speed and crash risk than characteristics of cyclists. Due to the higher speed of electric bicycles, the free flow speed and bicycle crash risk have significant correlation on the percentage of electric bicycles. Therefore, the management and speed limit for electric bicycles are very important to improve the safety of bicycle path.(4)Considering both input categories of bicycle types and characteristics of cyclists, the performance of Model 4 for testing dataset is the best. The MAPE and RMSE of testing data are 4.13% and 1.09 kph, respectively. This model provides us with the theoretical foundation for analysing the impact factors on the free flow speed and crash risk of bicycle traffic flow.

5. Conclusions

Free flow speed of bicycle traffic flow is a very important parameter for determining the speed limit of cycleway and evaluating the crash risk of bicycle traffic flow. The developed BPANN models in this paper are expected to be a useful and robust method to help traffic engineers improve the safety of bicycle traffic flow. Therefore, four different models with or without considering the impact factors (e.g., bicycle types and characteristics of cyclists) are used to predict the free flow speed and crash risk of heterogeneous bicycle traffic flow. The BPANN models have been trained, validated, and tested using MATLAB software. As mentioned in results of testing datasets, the correlation coefficients () of four models by using adaptive learning have been obtained as 0.72, 0.85, 0.82, and 0.87, respectively, for expected outputs. The results imply that the proposed ANN methods have acceptable accuracies in predicting free flow speed of bicycles, and the considered bicycle types and characteristics of cyclists will effectively improve the accuracy of the prediction models. The study is limited to predicting the free flow speed only considering four categories factors. Other parameters such as percentage of passing, geometric features, and environmental features may be included for modelling in future work.

Conflict of Interests

The authors declare that there is no conflict of commercial or associative interests regarding the publication of this work.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (nos. 51278220 and 51208462), the Fundamental Research Funds for the Central Universities (2014QNA4018), the Projects in the National Science & Technology Pillar Program (2014BAG03B05), and the Key Science and Technology Innovation Team of Zhejiang Province (2013TD09).