Abstract

Prediction of bus arrival time is an important part of intelligent transportation systems. Accurate prediction can help passengers make travel plans and improve travel efficiency. Given the nonlinearity, randomness, and complexity of bus arrival time, this paper proposes the use of a wavelet neural network (WNN) model with an improved particle swarm optimization algorithm (IPSO) that replaces the gradient descent method. The proposed IPSO-WNN model overcomes the limitations of the gradient-based WNN which can easily produce local optimum solutions and stop the training process and thus improves prediction accuracy. Application of the model is illustrated using operational data of an actual bus line. The results show that the proposed model is capable of accurately predicting bus arrival time, where the root-mean square error and the maximum relative error were reduced by 42% and 49%, respectively.

1. Introduction

In recent years, with the accelerated pace of China’s urbanization process, urban transport problems have become increasingly prominent. Public transport is widely regarded as the best choice to solve the traffic problems and improve the urban environment [1]. The Chinese government proposed “Give Priority to the Development of Urban Public Transport” policy in 2004 and released “Give Priority to the Development of Urban Public Traffic Guidance” in 2012. Advanced urban public transport systems are under construction and will continue to improve. Bus arrival time prediction is the core content of such systems for bus travel information and bus travel-route guidance. It is an important part of the urban public transport system.

At present, there are many models for predicting bus arrival time of public transit, such as nonparametric regression models, support-vector machine (SVM) models, Kalman filters, artificial neural network (ANN) models, and hybrid models. Lin et al. [2] used the historical data mean method to predict the average bus arrival time delay. Patnaik et al. [3] used automatic passenger counts of bus data to establish a prediction model of multivariable regression. Sun et al. [4] proposed a model to predict the arrival time using the weighted mean of historical data and real-time global positioning system (GPS) data. Padmanaban et al. [5] proposed an arrival time prediction model that is based on real-time bus data and bus operation delay. Xue et al. [6] developed a mathematical model based on the analysis of the process of bus operation and bus station characteristics.

He et al. [7] proposed a new bus arrival time prediction model with multi-index evaluation which is based on SVM and verified its feasibility. Yu et al. [8] developed an SVM prediction model considering the time period and segment, weather, and operation time of current and downstream sections. Li [9] developed a prediction model for road-section operation time based on real-time correction of bus speed. Zuo and Wang [10] developed a finite-state machine forecasting model based on real-time GPS data. Shalaby et al. [11] used a Kalman filter to predict bus running time based on GPS data. Chien and Kuchipudi [12] developed a Kalman filter to predict the arrival time at bus station based on road and stop characteristics. Vanajakshi et al. [13] proposed the use of automatic vehicle location (AVL) data and Kalman filter to predict bus arrival time in mixed traffic environments, where model parameters were adjusted in real time according to the prediction error.

Park and Rilett [14] argued that the ANN model can provide better prediction performance than the Kalman filter. Chien et al. [15] proposed an adaptive feedback ANN model based on the operation time of arterial segment and stop station. The model can automatically adjust the parameters according to the real-time prediction error. Lin et al. [16] proposed a two-layer ANN model that considered the effect of time and intersection signal lights, but this model required a large amount of training data. The effect of different weather conditions on bus travel time was analyzed by Bladikas et al. [17].

Hybrid models have also been developed for the analysis of bus operation. Ran [18] proposed a hybrid model that combined multivariable regression and ANN based on bus real-time AVL data. Liu [19] proposed a hybrid prediction model, based on a Kalman filter and ANN, that effectively combined historical and real-time data. Among the existing techniques, ANN has the characteristic of nonlinear adaptive information processing, which provides a great advantage in prediction. In particular, a wavelet neural network (WNN) that combines ANN and wavelet analysis exhibits good time-frequency localization characteristics and neural network self-learning function. Therefore, WNN has strong abilities of recognition, fault tolerance, and accurate prediction of bus arrival time. However, the traditional WNN has used the gradient descent learning method to correct the weighting parameters, which result in slow training speed and the possibility of being trapped into a local optimum solution.

To address the preceding issues, this paper proposes a hybrid model of bus arrival time prediction that combines WNN and an improved particle swarm optimization (IPSO) algorithm. The next sections present the IPSO algorithm, the proposed IPSO-WNN model, and its implementation for bus arrival time prediction. Application of the model to an actual case study is then presented, followed by the conclusions.

2. Improved Particle Swarm Optimization Algorithm

2.1. Traditional Particle Swarm Optimization

The traditional particle swarm optimization (PSO) is a stochastic computational intelligent method that has a simple structure, where a few parameters need to be adjusted [1]. Similar to other evolutionary algorithms, PSO is initialized with random particles (potential solutions). However, in PSO, each particle is assigned a random velocity and then flies in the N-dimensional space, where its velocity is dynamically adjusted according to the flying experiences of other particles in the group and its own experience. Subsequently, through an iterative update of the position and speed of the particles, the optimal solution is found. The sketch map of the PSO algorithm is shown in Figure 1.

Let the position and speed of particle i of the population in the N-dimensional solution space be expressed as Xi = (xi1, xi2, …, xiN) and Vi = (, , …, ), respectively. Then, the speed and position of particle i are updated as follows:where Vi = speed of particle i,  = inertia weight, C1 and C2 = learning factors, which refer to the acceleration weight of particles that fly to individual and group extremums, respectively, rand () = random number between 0 and 1, Pbesti = position of the optimal solution that particle i has found so far (personal best), Xi = position of particle i, and Nbesti = position of the optimal solution that the neighborhood of particle i has found so far (global best).

Appropriate values of C1 and C2 can accelerate the convergence and avoid falling into a local optimum, where a larger Vmax can guarantee the global search ability of the particle population. The coefficients , C1, and C2 determine the capacity of the space search of the particle. The preceding PSO is a standard algorithm and is the basis for the current research to improve the algorithm.

2.2. PSO Algorithm Improvements

In PSO, based on the experiences of the group and the particle’s own experiences, a particle flies to the best particle that has a strong global search ability and a better solution area. However, in the process of the optimization of complex high-dimensional problems, the traditional PSO algorithm has a more global ability at the start and a more local ability at the end of the process. Therefore, PSO is more likely to explore local optimum solutions at the end. In addition, the search performance of the algorithm depends on the values of the parameters. To address these limitations, two improvements to the traditional algorithm were adopted: (1) improving subgroup strategy and (2) updating particle velocity and learning factors.

For the subgroup strategy improvement, let the total number of particles be divided into subgroups that are multiples (that is, N is a multiple of). Initialize the particle swarm, calculate the fitness value of each particle, and sort the particles according to their fitness values from large to small, where the sorted particle numbers are 1, 2, …, . At each interval i = N/M, extract the particle subgroups in turn. The particles contained in subgroup j are {j/j = j + i × k}. This process can effectively avoid uneven grouping of the subgroups. In addition, the better particles can drive the bad particles in all groups, resulting in a balanced evolution of each subgroup [8].

For the improvement related to updating particle velocity and learning factors, the particle velocity updating of equation (1) is revised as follows:where C3 = learning factor and NLbesti = position of the optimal solution that the subgroup particles have found so far. Then, the position of particle i, Xi, is updated using equation (2). The learning factors are given bywhere C1s, C2s, and C3s = corresponding values of C1, C2, and C3 at the start of the algorithm and C1e, C2e, and C3e = corresponding values of C1, C2, and C3 at the end of the algorithm.

At the start of the algorithm, the value of C1 is larger and the values of C2 and C3 are smaller. This is advantageous to the search of the particles in the whole space and provides a stronger global searching ability. At the end of the algorithm, C1 becomes smaller and C2 and C3 become larger and this helps the particles to have a strong local searching ability and in turn finds the global optimal solution.

The process of improving the PSO algorithm is shown in Figure 2. The specific implementation steps are as follows:Step 1: initialize the particle swarm. The position and velocity of the initial particles are randomly generated within the specified range, and the Pbesti coordinates of each particle are set to their current positions. The optimal particle for each subgroup is the best individual value of the subgroup in which the particle is located, and NLbesti is set to the current position of the optimal particle. The optimal particle of the entire neighborhood is the best individual of the optimal particles in each subgroup, and Nbesti is set to the current position of the optimal particle.Step 2: calculate the fitness value of the particle. For each particle, the current fitness is compared to the fitness of the best position, Pbesti, that it has experienced. If it is better than the previous value, the function value of Pbesti is updated; otherwise, it remains unchanged. The fitness of each particle of this iteration is compared to the fitness of NLbesti experienced by the subgroup in which it is located. If it is better than the previous value, the function value of NLbesti is updated; otherwise, it remains unchanged. The fitness of each particle in this iteration is compared to the fitness of the best Nbesti experienced by the whole group. If it is better than the previous value, the function value of Nbesti is updated; otherwise, it remains unchanged.Step 3: update particle speed and position. The speed and position of each particle are updated according to equations (2) and (3).Step 4: check whether the end condition is met. When the maximum number of iterations is reached or the minimum error is satisfied, the optimal solution is output; otherwise, return to Step 2.

3. Proposed IPSO-WNN Model

As previously mentioned, the proposed model of bus arrival time prediction combines the improved PSO with WNN and is called IPSO-WNN. A description of the WNN technique and the IPSO-WNN model is presented in this section.

3.1. Wavelet Neural Network

The WNN is a mathematical model that combines wavelet analysis and neural network. It is based on the topology of the backpropagation (BP) neural network and the wavelet basis function as the transfer function of the hidden layer nodes, instead of the original sigmoid function. In other words, the wavelet function is introduced as the transfer function of the BP network. A transfer function of WNN is used in the shift and scaling factors, allowing a stronger ability for recognition, fault tolerance, and prediction. The WNN structure is shown in Figure 3.

Given the input sample data Xi (i = 1, 2, …, k), the mathematical expression of the hidden layer output is expressed aswhere  = output value of the j node in the hidden layer,  = wavelet basis function,  = linked weights between the input and hidden layers,  = shift factor of the wavelet basis function,  = scaling factor of the wavelet basis function, and  = number of nodes in the hidden layer.

The mathematical expression of the output layer is given bywhere  = k-value of the output layer,  = weight between the hidden layer and the output layer , and  = number of nodes of the output layer.

The method of modifying the weights and thresholds of the traditional WNN is similar to that of the correcting algorithm for BP neural network weights. Using the gradient correction method to constantly correct network weights and thresholds of the wavelet basis function can reduce the gap between the expected and predicted outputs. When the error reaches a specified limit, the correction can stop. The WNN correction process involves two steps, as follows:Step 1: calculate network prediction error:where  = prediction error of WNN and  = expected output value of k.Step 2: correct the weights of WNN and the coefficients of the wavelet according to the network prediction error e, as follows:where , , and are calculated based on the network prediction error as follows:where is the learning rate.

3.2. Procedures of the IPSO-WNN Model

The fitness function, which indicates the accuracy of the neural network, is used to evaluate the quality of each particle. The following training error (mean squared deviation) of WNN is chosen as the fitness function of PSO:where N = number of training samples and and  = expected and actual output values of , respectively. The optimization specific steps are as follows:(1)Data normalization: normalize the sample data for input and output to produce dimensionless quantities.(2)Parameter initialization: initialize the parameters of WNN, including PSO parameters, such as particle swarm iterations, population size, location, and maximum speed.(3)Population initialization: randomly initialize the position and velocity of the particle and calculate the initial fitness values according to the fitness function.(4)Finding initial extremum: determine individual and group extremums according to the initial particle fitness values.(5)Iterative optimization: use the PSO algorithm to update the position and velocity of the particle according to the fitness value of the new updated individual and group extremums. When the fitness value converges or the specified number of iterations is reached, go to Step 6.(6)Output optimal weights and thresholds: set the position of the global optimal particle as the optimal weights and WNN thresholds.(7)Prediction of WNN: use the optimal weights and thresholds to predict the new samples.

4. Implementing IPSO-WNN Model for Bus Arrival Time Prediction

Using the improved PSO algorithm, the WNN model was optimized and the bus arrival time prediction model was coded using Matlab. Details on preparing input data, input data processing, and transfer function and determining number of hidden layer nodes are described in this section.

4.1. Preparing Input Data

The input data to the proposed model are determined based on relevant literature [2022] and the historical data on weather, date, time, and bus real-time operation. The bus arrival time at the next stop was selected as the output target. The input data include sample data vector, training dataset, weather factors, date factors, and time factors.

4.1.1. Sample Data Vector

This vector includes the following nine input variables:where tbi = travel time of the three buses ahead of the bus under consideration from stop (k − 1) to stop k in the same time period of the day, where i = 1, 2, 3, and thi = travel time of buses whose departure times are in the same period in the previous three weeks from stop (k − 1) to stop k,  = weather conditions,  = date factor, and  = period factor.

4.1.2. Training Dataset

The training dataset is D = (hι, t), where ι = 1, 2, …, n, and n is number of training samples. The variable tri is actual operation time of the current bus from stop (k − 1) to stop k.

4.1.3. Weather Factors

Weather conditions of one day can be expressed as  = {0, 1, 2}, in which 0 means rainy day, 1 means sunny day, and 2 means other weather conditions.

4.1.4. Date Factors

Bus arrival time varies not only between working days and weekend, but also among the seven days of the week. The seven days of the week are expressed as .

4.1.5. Time Factors

The study period of the day (5 : 00–23 : 00) was divided into seven different time periods, expressed as s = 1, 2, …, 7, where 1 represents 5 : 00–7 : 00, 2 represents 7 : 00–9 : 00, 3 represents 9 : 00–11 : 30, 4 represents 11 : 30–14 : 30, 5 represents 14 : 30–17 : 00, 6 represents 7 : 00–8 : 00, and 7 represents 19 : 00–23 : 00.

4.2. Input Data Processing and Transfer Function

The normalized function mapminmax of Matlab used in this study is given bywhere  = normalized data, ymax = 1, ymin = −1, and xmax and xmin = maximum and minimum values of the samples, respectively.

For the transfer function, in practice, the Morlet wavelet function is widely used and has achieved good results. This function is a single frequency complex sine function with the Gauss network, given by

4.3. Determining Number of Hidden Layer Nodes

The structure of the neural network is composed of input layer, hidden layer, and output layer. The number of input layer nodes according to the preceding analysis was identified as 9. The output layer represents bus arrival time as the output value, and therefore, this layer has only one node. The optimum number of nodes in the hidden layer require many iterations during the training process. The reference formula for selecting the optimal number of nodes in the hidden layer is given by [23]where  = optimum number of nodes in the hidden layer,  = number of nodes in the output layer,  = number of nodes in the input layer, and  = constant (0 to 10).

According to equation (14), the number of hidden layer nodes should be between 3 and 13. The idea of implicit node optimization is as follows. When the network training is not sufficient, the number of hidden layer nodes is increased, and the training and prediction errors will be reduced. If the number of hidden layer nodes continues to increase, the prediction error will increase. Therefore, this idea can be used to determine whether the number of nodes of the hidden layer is appropriate.

The training error of the sample represents the error obtained when the data of the training set are taken as the input sample. The prediction error of the sample is the error obtained when the test dataset is taken as the input sample. At the same time, an expected error E can be set, where E is a constant with a threshold ranging from 0 to 1 (0.5 was selected in this study).

The training error of the wavelet neural network when the number of hidden layer nodes is M is expressed as and the prediction error of the wavelet neural network when the number of hidden layer nodes is M is expressed as . When the number of hidden layer nodes is (M − 1), the training and prediction errors are expressed as and , respectively. When the number of hidden layer nodes is (M + 1), the training and prediction error are expressed as and , respectively.

Finally, whether M is the best hidden layer node is determined according to the following formulas:

To determine the number of nodes in the optimal hidden layer, the process is as follows. When , the number of nodes in the current hidden layer is less than the number of nodes in the optimal hidden layer. Therefore, the number of nodes in the hidden layer should be increased. When , the number of nodes in the current hidden layer is larger than the number of nodes in the optimal hidden layer. Therefore, the number of nodes in the hidden layer should be decreased. When both conditions are simultaneously satisfied and , the current node number M is the optimal hidden layer node number. After several iterations of training, the training and prediction errors are obtained and are substituted into equations (15)–(20) to obtain the optimal number of hidden layer nodes. This number was found to be 10. Therefore, the structure of the wavelet neural network is finally determined as 9-10-1; that is, the number of nodes in the input layer is 9, the number of nodes in the hidden layer is 10, and the number of nodes in the output layer is 1.

5. Case Study

The proposed model was applied to bus line 102 in Suzhou city to predict bus arrival times. This bus line runs from Baodai West Road to South Railway Station Square. The route is 12.8 km long and includes 20 bus stops. Bus operational data, which involved 1790 segments, were collected in 2019 during May 7–9, May 14–16, May 21–23, and May 27–30 (every week from Thursday to Saturday). The collected data were converted to 34,010 route segment operational times for each bus running along the route (1790 × 20). After data processing, a total of 472 sets of sample data were obtained, 382 groups were selected as the training data, and the remaining 90 groups were used as the test data. Part of the training sample data is shown in Table 1.

5.1. Training Samples

The following input data were assumed: maximum number of iterations of the algorithm = 500, population size = 50, accelerating factors  = 2.5,  = 2.0,  = 1.5,  = 1.5,  = 2.0, and  = 2.5. The output fitness curves of the WNN and IPSO-WNN models are shown in Figure 4. For the WNN model (Figure 4(a)), the early training speed was slow, and 500 iterations were needed to achieve convergence with an error of 0.05. On the contrary, the proposed IPSO-WNN model (Figure 4(b)) achieved rapid convergence in the training iterations number 0–50. When the number of iterations is about 310, the solution converged and the error precision reached 0.01. Compared to WNN, the proposed IPSO-WNN model has a faster convergence rate and a smaller convergence error.

5.2. Establishing Bus Arrival Time Prediction Model

As previously mentioned, PSO was used to replace the traditional descent gradient method. In the training process, the weights and thresholds of the WNN were optimized. Based on the optimization results, the parameters of the calibrated IPSO-WNN model were determined. Then, the prediction model of bus arrival time was established as follows:

The calibrated parameter values of the IPSO-WNN model are presented in Table 2. Given the nine variables shown previously in equation (11), one can obtain 102 prediction values of bus arrival time.

5.3. Model Validation

The 90 sets of the test samples were normalized, and then the optimal network weights and thresholds were used as input to the model. The prediction errors are shown in Figures 5 and 6. The root-mean square error, used to evaluate model accuracy, is given by the following equation:where RMSE = root-mean square error,  = actual value, and  = predicted value. The smaller the RMSE is, the higher the accuracy is. The RMSE of the WNN and IPSO-WNN models were 20.9% and 10.6%, respectively, indicating that the overall accuracy of the IPSO-WNN model is better. In addition, the results show that the relative error of bus arrival time prediction of the WNN model ranged from 5.2% to 16.8%, while that of the IPSO-WNN model ranged from 3.5% to 9.8%. Thus, the proposed model reduced the maximum relative error of the WNN model by 42% and the RMSE by 49%. Clearly, the proposed IPSO-WNN model has obvious advantages in bus arrival time prediction.

6. Conclusions

This paper has presented an improved particle swarm optimization model that was integrated with a wavelet neural network to predict bus arrival time, and a new IPSO-WNN model was developed. The improvements to the PSO algorithm, which were intended to help find the optimal solution quickly and avoid local optimum solutions, were related to improving subgroup strategy and updating of particle velocity and learning factors. The IPSO algorithm overcomes the limitations of the traditional PSO. The IPSO-WNN model was applied for the prediction of bus arrival time in an actual bus line. Actual bus operational data were used for model training, and the proposed model was run to obtain the best network weights and thresholds. The application results show that the RMSE of the IPSO-WNN model was 10.6% compared to 20.9% for the traditional WNN. This study has focused on the prediction of bus arrival time using the developed IPSO-WNN model. Future research will continue to improve the PSO algorithm and train it with other common traffic datasets and compare it with other methods.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This research was financially supported by the Science and Technology Fund of Education Department of Fujian Province (JAT160079). The assistance of Shutian Xu, Ronglin Su, and Lilin Huang in data collection and analysis is gratefully acknowledged.