Abstract

Short-term traffic flow prediction can provide a basis for traffic management and support for travelers to make decisions. Accurate short-term traffic flow prediction also provides necessary conditions for the sustainable development of the traffic environment. Although the application of deep learning methods for traffic flow prediction has achieved good accuracy, the problem of combining multiple deep learning methods to improve the prediction accuracy of a single method still has a margin for in-depth research. In this article, a combined deep learning prediction (CDLP) model including two paralleled single deep learning models, CNN-LSTM-attention model and CNN-GRU-attention model, is established. In the model, a one-dimensional convolutional neural network (1DCNN) is used to extract traffic flow local trend features and RNN variants (LSTM and GRU) with attention mechanism are used to extract long temporal dependencies trend features. Moreover, a dynamic optimal weighted coefficient algorithm (DOWCA) is proposed to calculate the dynamic weights of CNN-LSTM-attention and CNN-GRU-attention with the goal of minimizing the sum of squared errors of the CDLP model. Then, the neuron number, loss function, optimization algorithm, and other parameters of the CDLP model are discussed and set through experiments. Finally, the training set and test set for the CDLP model are established through the processing of traffic flow data collected from the field. The CDLP model is trained and tested, and the prediction results of traffic flow are obtained and analyzed. It indicates that the CDLP model can fit the change trend of traffic flow very well and has better performance. Furthermore, under the same dataset, the results from the CDLP model are compared with baseline models. It is found that the CDLP model has higher prediction accuracy than baseline models.

1. Introduction

With the economic development, the number of motor vehicles in the urban area has increased rapidly, and traffic congestion and traffic accidents have become increasingly serious. In order to mitigate the urban traffic problem, intelligent transportation systems have been widely implemented [14]. Among them, short-term traffic flow prediction is one of the core parts of an intelligent transportation system, which provides the basis for traffic management, traffic control, and traffic guidance and also provides support for traveler’s decision-making. Prediction of short-term traffic flow has always been a hot topic for scholars in the field of traffic engineering.

For short-term traffic flow forecasting, early research mainly focused on statistical learning methods based on traditional mathematical models. Under the assumption of a certain probability distribution, the parameters of the statistical forecasting model are estimated through theoretical inference, and the model’s forecasting results have a better strong explanatory. The traditional methods mainly include Kalman filter models, time series models, and nonparametric regression models.

Okutani and Stephanedes [5] proposed two prediction models based on the Kalman filter theory to predict the traffic flow of streets in Nagoya. In the models, the newest prediction error and the traffic data of multiple adjacent road sections are considered to improve the prediction accuracy. Xie et al. [6] used the discrete wavelet decomposition method to denoise the traffic flow data and then established a Kalman filter model to predict the traffic flow, which reduced the interference of local noise on the original data and obtained better prediction results. Guo et al. [7] proposed an adaptive Kalman filter model, which uses the adaptive update method of variance to improve the parameters of the model, and verified that the prediction accuracy of the model is better than the traditional Kalman filter model through a large amount of highway traffic data. Emami et al. [8] proposed a fade memory Kalman filter model based on real-time data from the Internet of vehicles and Bluetooth detectors. This model considers the influence of weights and reduces the errors caused by the measurement method. Experiments show that the model can improve the accuracy of the forecast data.

The autoregressive integrated moving average (ARIMA) model is widely used in traffic flow prediction. Ahmed and Cook [9] investigated the ARIMA model in representing freeway time series data and found ARIMA was more accurate than moving average, double-exponential smoothing models. Hamed et al. [10] applied the ARIMA model to forecast traffic volume in urban arterials, and it turned out to be the most adequate model in reproducing all original time series and is computationally tractable. In addition to the ARIMA model, the autoregressive integrated moving average model with explanatory variables, seasonal autoregressive moving average model and other variant structures ARIMA models have also been applied in the field of traffic flow forecasting [11, 12].

The K-nearest neighbors (KNN) method does not require complex prior knowledge and precise function expressions. It has the advantages of a simple algorithm and good portability and has been applied in the field of traffic flow prediction. Zhang et al. [13] used the mean KNN and weighted KNN to establish traffic flow prediction models and comparative analysis was made. Cheng et al. [14] proposed an adaptive spatiotemporal KNN model, which comprehensively considers spatiotemporal weights, time windows, and other parameters, and simulation results demonstrated that the prediction effect of traffic flow has been further improved. The core content of the KNN is to design an appropriate search mechanism, and its prediction results rely on historical data. When the historical data are large, the search efficiency of this method will have a greater impact on the real-time performance of the prediction model.

The basic idea of the support vector machine (SVM) method for traffic flow prediction is to map the original traffic flow data to the high-dimensional feature space through the kernel function and to find the linearly divided plane from the mapping space to solve nonlinear problems in traffic flow data. Yang et al. [15] proposed a short-term traffic flow prediction model based on spatiotemporal correlation and adaptive multicore SVM for the nonlinearity and randomness of traffic flow. Luo et al. [16] used the method of least square SVM to predict the traffic flow, in which a hybrid optimization algorithm is proposed to select the optimal parameters, and the experimental results show the model can improve the prediction ability and computational efficiency. Tang et al. [17] proposed a traffic flow prediction model that combines denoising schemes and SVM algorithms to improve the prediction accuracy. Results show the model outperforms that without denoising strategy. In addition to the traditional SVM model, variant SVM algorithms, such as seasonal SVM [18], which considers traffic data seasonality, and Online-SVR [19], which deals with special events, have also been applied in traffic flow prediction and good results are obtained.

The development and wide applications of traffic information collection technology, such as inductive detector, geomagnetic detectors, radio frequency identification technology, radar detection, video detection, and floating car detection [2024], provide a large amount of data for traffic flow prediction. At the same time, with the rapid development of artificial intelligence technology, deep learning, which has powerful data feature mining and nonlinear data fitting capabilities, has been successfully applied in many fields, such as image processing and speech recognition [2527], and gradually used in traffic parameter forecasting [2831].

Moreover, the key point of traffic flow forecasting research has also shifted from traditional statistical learning forecasting methods and shallow neural networks [3234] to deep learning forecasting methods. The shallow neural networks, which only have a single hidden layer, cannot learn the deeper features of traffic flow data and their prediction accuracy is often lower than that of the deep learning network. The deep learning methods have been gradually applied to the field of traffic flow prediction.

Deep belief network (DBN) is an earlier deep learning method used for traffic flow prediction. Huang et al. [35] designed a combined prediction model with unsupervised learning DBN at the bottom layer and multitask learning layer at the top layer for supervised prediction. The multitask learning layer can make full use of the weight sharing in DBN and outperform predicted results. Koesdwiady et al. [36] incorporated weather conditions and traffic flow data into the feature space at the same time and designed a DBN network for unsupervised pretraining, and relevant data from San Francisco are used to conduct experiments to verify the effectiveness of the proposed method. Xu and Jiang [37] proposed a DBN-support vector regression model for short-term traffic flow, in which DBN is used to learn the internal characteristics of traffic flow and support vector regression to predict the traffic flow. Experiments show that the model can effectively predict traffic flow and has fine prediction accuracy. Han and Huang [38] proposed a traffic flow prediction model combining DBN and a kernel extreme learning classifier, in which the internal characteristics of traffic flow data are extracted by DBN and the kernel extreme learner is used to predict traffic flow. Experiments show that the model can improve the accuracy of traffic flow prediction and reduce simulation time.

Convolutional neural network (CNN) is also a typical structure of deep learning. It is a feedforward neural network used to solve data problems similar to a grid structure. It can accurately extract data features while reducing the complexity of the model. This efficient local feature extraction capability is conducive to better find the spatial correlation between traffic flow data, and then it is widely used in traffic flow prediction [39]. Zhang et al. [40] proposed a short-term traffic flow prediction model based on CNN, in which a spatiotemporal feature selection algorithm determines the optimal input data time lags and amounts of traffic flow data; then, CNN learns these spatiotemporal features. The effectiveness of the model was verified by comparing the prediction results with actual traffic data. An et al. [41] proposed a fuzzy-based CNN traffic flow prediction model, in which the fuzzy approach is used to represent the features of traffic accidents. The experimental results show that the model has superior performance. Liu et al. [42] proposed a CNN-attention model to predict traffic speed. Experimental results show that the model has a great advantage in traffic flow prediction and the impact of different traffic flow temporal and spatial data on traffic flow can be found through visualizing the weights generated by the attention model. Peng et al. [43] proposed a spatial-temporal incidence dynamic graph recurrent CNN to predict urban traffic passenger flow and experiments show that the predictive performance of this network is superior to traditional predictive methods.

LSTM network is a deep learning structure and also a variant of recurrent neural network (RNN). RNN can be applied to the relevant forecasting field of time series data [44]. However, RNN has a problem of the disappearance of the gradient, which can be overcome by LSTM [45]. LSTM has been applied in the field of traffic flow prediction. Ma et al. [46] applied the LSTM to establish a traffic speed prediction model. The results show that the LSTM network effectively captures the time correlation and nonlinearity of the traffic state, and the prediction accuracy is better than most statistics methods. Zhao et al. [47] proposed a traffic forecast model based on LSTM considering temporal-spatial correlation in traffic systems. The results validate that the model can obtain better prediction performance compared with other representative forecast models. Tian et al. [48] proposed a multiscale smoothing method to fill in the missing values in traffic flow data and established an LSTM model to predict traffic flow. Experiments show that the LSTM model has better prediction performance than other prediction methods. Zhao et al. [49] established the LSTM model to predict traffic flow speed and validated that the prediction accuracy is higher than that of the support vector regression prediction method. Wang et al. [50] constructed an LSTM encoding and decoding model based on the attention mechanism for time series prediction, which includes periodic mode and recent time mode. Experiments show that the model is effective and reliable in long-term prediction of time series.

In addition, combination algorithms for traffic flow prediction, especially deep learning algorithms, have received more attention from scholars and produced a series of achievements. Zhou et al. [51] combined LSTM and SVR to build a model for short-term traffic flow prediction, in which a genetic algorithm is used to optimize the parameters of SVR. The results indicate that the prediction model has higher accuracy than LSTM and CNN. Zhang et al. [52] proposed a model for short-term traffic forecasting, which integrates a graph convolution operator and a residual LSTM structure. The model is evaluated on a traffic speed dataset and better prediction results than six baselines are obtained. Li et al. [53] developed a deep learning-based method, including CNN and LSTM, for real-time movement-based traffic volume prediction at signalized intersections. In the model, CNN is applied to learn the spatial features of traffic volume and LSTM to learn the temporal dependencies. Xia et al. [54] proposed a distributed LSTM weighted model combined with a time window and normal distribution to enhance the prediction capability for traffic flow. Furthermore, the experimental results indicate that the model achieved accuracy improvement.

In summary, the deep learning methods have been widely applied to short-term traffic flow prediction and achieved series of results. Moreover, from the above literature researches, it can be found that the combination of multiple deep learning methods, such as a combination of CNN and LSTM, can improve the performance of the prediction model. LSTM is a variation of RNN, which can obtain the time series characteristics of traffic flow. Meanwhile, there is another variant of RNN, namely GRU, which can also obtain the time series characteristics of traffic flow and make traffic flow prediction [55, 56]. The combined model of LSTM and GRU is used to predict traffic flow parameters, which has been discussed and applied in [57, 58], and its outstanding performance in both prediction accuracy and stability has been proved. In the two works of literature, LSTM and GRU are serial structures. LSTM is firstly used to learn the spatial-temporal characteristics of data, and then GRU is used to predict traffic parameters or LSTM is firstly used to predict value and then encoder with GRUs further captures the relationship between the input sequence and the output sequence. However, the sequential combination structure of LSTM and GRU does not simultaneously use the advantages of the two to complement each other, and it also lacks CNN’s guidance on the local trend of traffic flow. It is necessary to apply the combination of three deep learning methods to study the prediction of traffic flow. In addition, the attention mechanism theory [59] has the function of improving the data extraction capabilities of deep learning by imitating human vision to assign weights to data features and has been widely used in image processing and speech recognition [6063]. Applying it to CNN, LSTM, and GRU deep learnings for traffic flow prediction is also worthy of discussion.

In this article, a DOWCA is presented, and a combined prediction model with CNN, LSTM, GRU, and attention mechanism for short-term traffic flow is proposed and discussed. The main contributions of this study are as follows:(1)In order to build a combined traffic flow prediction model, a dynamic optimal weighted coefficient algorithm (DOWCA), is proposed, in which the weights of each single prediction method are calculated dynamically following new prediction results added.(2)A combined deep learning model for short-term traffic flow prediction, namely CDLP, is established based on the CNN, LSTM, GRU, and attention mechanism, which includes paralleled CNN-LSTM-attention model and CNN-GRU-attention model. In CDLP, the dynamic weights for the two single models are calculated by DOWCA.(3)After parameter setting through experiment comparison and analysis, the CDLP model is trained and tested using traffic flow data from the field. The results indicate that the CDLP model outperforms baseline models.

The rest of the article is organized as follows. In Section 2, the methodologies of CNN, LSTM, GRU, and attention mechanism are introduced. In Section 3, a DOWCA is proposed and the CDLP model is constructed. In Section 4, the experiment results and analysis are presented. Finally, a brief conclusion and recommendations for future work are presented in Section 5.

2. Methodology

2.1. CNN

CNN is a feedforward neural network with a deep structure and mainly composed of convolution layer, pooling layer, and full connection layer [64]. Among them, the convolutional layer is the most important part of CNN, which uses the convolution kernel to carry out a convolutional calculation for data from the input layer and outputs the convolutional characteristics of the data. If the CNN model contains multiple convolutional layers, then the number of output characteristic parameters by the convolutional layer is large. In order to reduce the number of parameters, the pooling layer is often used to carry out subsampling operations on the convolutional features of the data to extract part of the information and prevent the model from overfitting. The fully connected layer is usually used at the end of the CNN model to reduce unnecessary feature loss, in which all features are integrated and calculated as the final output.

2.2. LSTM Network

LSTM is a variant structure of RNN, which can solve the problem of gradient disappearance and gradient explosion in RNN and can better realize the prediction of time series sequence. The LSTM network is composed of a series of basic cells. The basic cell structure is shown in Figure 1, which includes three gate structures: input gate, output gate, and forget gate.

The orange lines in Figure 1 represent the input gate. The main function of the input gate is to control the input process of all information at time t. The information input process mainly includes two parts. One part is the process of updating the current time information through the tanh function to obtain a new state vector, and the other part is superimposing the current input and the output information of the hidden layer at the previous time through the sigmoid function. The specific implementation process can be expressed as follows:where Wi, Wc, Ui, and Uc are the weights of the input gates; bi and bc are the biases of the input gates; and and tanh are activation function, and their formulas are as follows:

The red lines in Figure 1 represent the forget gate, whose main function is to determine the redundant information to be discarded in the unit. The input of the forget gate includes input Xt and output ht − 1 of the unit at the previous time. The output process is shown in formula (3).where Wf and Uf are the weight of the forget gate and bf represents the bias of the forget gate.

The forget gate uses the sigmoid function to superimpose the input values Xt and ht − 1, and the output value is limited to the range of [0, 1]; finally, the output value is multiplied by the output unit state Ct − 1 at the previous moment. When the output value is 0, it means that the information will be completely discarded. When the output value is 1, it means that the information will be completely retained.

The output information of the forget gate and the input gate is, respectively, multiplied and superimposed on each other to obtain the current unit output state. The specific calculation process is as follows:

It can be seen from this formula that Ct represents the long-term memory of all historical information at the current moment.

The purple lines in Figure 1 represent the output gate. The output gate determines the output result of the entire basic cell, which is related to the cell output state Ct at the current moment. First, use the sigmoid function to process part of the information of the input unit to obtain the output Ot of the output gate and then use the tanh function to process the information in Ct. After the two sets of processed information are multiplied, the final output ht is obtained. The specific calculation formula is as follows:

2.3. GRU Network

Similar to LSTM, GRU is also a variant structure of the RNN algorithm, and it also has the function of dealing with the problem of gradient disappearance in RNN and ineffective long-term sequence memory. Compared with LSTM, GRU reduces the complexity of the structure by reducing the gates in the architecture. The cyclic structure of GRU consists of two gate structures, an update gate (purple lines) and a reset gate (red lines), and its cell structure is shown in Figure 2.

The update gate zt can determine the memory information at the previous time and the remaining part of the information at the current time and continue to transfer the remaining information to the future time so as to obtain the long-term dependence in the entire network transmission process. The reset gate rt is mainly used to obtain short-term time dependence, control the operation of the hidden state information ht−1 and the current input value xt at the previous moment, and decide to forget the amount of information in the past.

Formulas (6)–(9) represent the calculation process of each state within each time step in GRU cell.where , , and are input-related weight matrices; Uz, Uh, and are cyclically connected weight matrices; and bz, br, and are related biases.

2.4. Attention Mechanism

Attention mechanism focuses on important information by assigning different weights to input features. The process of focusing on important information is shown as the calculation process of weight. The higher the importance of information is, the larger the weight is allocated. In the application of attention mechanism in deep learning model, the calculation process of context vector and weight involved is as follows.

The output hidden state of the deep learning model is supposed as , and the context vector Ct can be calculated as follows:

In formula (10), is the weight for hi, and the sum of the weights is 1. It can be calculated as follows:where et,i is an alignment model, and its calculation formula is as follows:where , Ua, and ba are the network parameters of deep learning model and can be calculated as follows:where (·) denotes the deep learning network.

Based on formula (13), the output of the attention mechanism is expressed as follows:where softmax is activation function.

3. Model

3.1. Dynamic Optimal Weighted Coefficient Algorithm

Compared with a single prediction model, the combined prediction model can comprehensively utilize the advantages of multiple prediction models, improve the accuracy of prediction results, and has better robustness. In the combined prediction model, the calculation of the weighted coefficient of each single prediction model is the key. Generally, the optimal weighted coefficient algorithm (OWCA) is used, in which the weighted coefficient of each single prediction method is calculated with the goal of minimizing the sum of squared errors of the combined prediction [6568]. The calculation principle is as follows.

Suppose there are m prediction methods; the prediction value of the ith method at time t is yit, where i = 1, 2, …, m; t = 1, 2, …, N. Then, the prediction error of the ith prediction method can be expressed by the following:

Let l1, l2, …, lm be the weighted coefficients of m prediction methods, respectively, and  = 1. The prediction result of the combined prediction method, labeled as , can be calculated as follows:and the prediction error et for the combined prediction method at time t can be obtained:

Let J represent the sum of squared errors of the combined prediction method, then the problem of solving the optimal weight at time t can be expressed as the following optimization model:

Formula (18) can be expressed in matrix form as follows:where L = (l1, l2, …, lm)T represents the weighted coefficient column vector; R = (1, 1, …, 1)T represents the m-dimensional column vector with all 1 elements; E is the combined prediction information error matrix, E = (Eij)m × m and Eij is expressed as follows:where ei represents the prediction error column vector of the ith single prediction method, and ei= (ei1, ei2, …, eiN)T.

If the prediction error vector group of m prediction methods is linearly independent, then the combined prediction information error matrix E is an invertible matrix. According to the Lagrange multiplier method [69], the optimal solution of model (18) can be obtained as follows:where is the optimal weight vector, namely, the optimal weighted coefficients of m prediction methods.

According to the OWCA and the historical prediction error of each single prediction method, the optimal weighted coefficient of each single prediction method can be obtained so as to carry out the combined prediction. In the OWCA, the weighted coefficient of each single prediction method is fixed. However, in the prediction of time data sequences, such as traffic flow, with the increase of time, the prediction results of each single prediction method also increase. More importantly, the prediction errors of each single prediction method also vary. If the weighted coefficient of each single prediction method is invariable, it cannot reflect the influence of the newly increased prediction results of each single prediction method on the combined forecasting, which also affects the accuracy of the combined forecasting results.

Therefore, based on the optimal weighted coefficient algorithm, a dynamic optimal weighted coefficient algorithm, namely, DOWCA, is proposed. In the DOWCA, with the increase of time, the amount of historical prediction error data increases continuously, the weighted coefficient of each single prediction method, namely, the dynamic weighted coefficient, labeled as , is recalculated by the OWCA. The dynamic weighted coefficients are applied to each single prediction method and the combined prediction results are obtained. The whole process of the DOWCA is shown in Figure 3, and the pseudocode of DWOCA is shown in Algorithm 1.

Input: the predicted value of different single model at time and actual data yt
Output: combined prediction value
(1)begin
(2)   calculate the prediction error of the i th prediction method by equation (15)
(3)   fordo
(4)    construct the combined prediction information error matrix by equation (20)
(5)    R ← 
(6)    calculate inverse matrix
(7)    calculate optimal weights by equation (21)
(8)    calculate combination prediction results by equation (16)
(9)    output
(10)   end
(11)end
3.2. Combined Deep Learning Prediction Model

CNN has the ability to obtain local trend features of data sequences, while LSTM and GRU have the ability to obtain long-term dependent features of data sequences. At the same time, the attention mechanism can make the deep learning model pay attention to important features. Based on this, a combined deep learning prediction model with CNN, LSTM, GRU, and DOWCA is designed for traffic flow prediction, namely, CDLP model.

In the CDLP model, CNN, LSTM, and attention are connected sequentially and become the sequential combination structure, which is named as CNN-LSTM-attention model, i.e., one single traffic flow prediction model in the CDLP model. Moreover, CNN, GRU, and attention are also designed as the sequential combination structure and named as CNN-GRU-attention model, i.e., another single traffic flow prediction model in the CDLP model. Then, the two sequential combination structures are paralleled and combined by DOWCA. From a layer standpoint, the CDLP model has three layers, input layer, hidden layer, and output layer. The hidden layer includes four layers, CNN layer, LSTM and GRU layer, attention layer, and dropout layer. The whole structure of the CDLP model is shown in Figure 4.

The input layer of the CDLP model is the processed traffic flow data sequence, including training set and test set, which is simultaneously inputted to two paralleled CNN layers in the hidden layer of the CDLP model.

The hidden layer of CDLP includes two CNN layers, LSTM and GRU layers, two attention layers, and two dropout layers in sequence. Moreover, all of them are paralleled. About the CNN layer, due to the periodicity and sequence of traffic flow data, 1DCNN is used and the output of 1DCNN is computed by the activation function ReLu. The formula of ReLu is as follows:

About LSTM and GRU layers, if too many network layers are selected, the calculation of the entire network will be large and more training time will be needed. According to [70], when both the accuracy of the prediction model and the training time are considered, the two LSTM network layers are suitable, so two network layers in LSTM are selected. Similarly, two network layers are selected in GRU. The input of the first LSTM and GRU network layer is local trend features extracted by 1DCNN and its output is the state of the neural unit of the current LSTM and GRU layer. The second LSTM and GRU network layer mines the characteristics of the data and outputs the hidden layer state to the attention layer.

About the attention layer, the input state comes from LSTM and GRU. Correspondingly, (·) in formula (13) denotes LSTM and GRU.

The last layer in the hidden layer, the dropout layer, is designed to prevent the occurrence of overfitting after the attention layer, which is the output from the hidden layer of CDLP to the output layer. Moreover, the input of the dropout layer is the output , …, , from the attention layer.

The CDLP model is aimed to predict the traffic flow at the next moment based on the historical data. Therefore, the output layer includes two paralleled neural units, which are actually the outputs of two single models, CNN-LSTM-attention model and CNN-GRU-attention model, respectively. The two output neural units are fully connected with the dropout layer. In addition, the output layer of the CDLP model also includes weight calculation for the outputs of the CNN-LSTM-attention model and CNN-GRU-attention model, in which the DOWCA is used. Finally, the traffic flow prediction values and the dynamic optimal weights for outputs of CNN-LSTM-attention and CNN-GRU-attention are obtained.

4. Experiment

4.1. Data Processing and Dataset

The traffic flow data at the intersection of Jiangxi Road and South Fuzhou Road in Qingdao, China, are collected through inductive detector as the dataset for the verification of the CDLP model. The original dataset contains three consecutive months of traffic flow data for each entrance road segment at the intersection from February 15, 2019, to May 15, 2019. The statistical time interval of these data is 5 minutes, and a total of 25920 pieces of data are obtained.

First, the abnormal and missing data in the original data are processed, in which the abnormal data are regarded as missing data. The Lagrangian interpolation method is used to process the missing data. In the process, four adjacent data before and after the missing datum are selected for interpolation to ensure the reliability of the interpolation data.

Then, the Min-Max method is used to normalize the data, and the calculation formula is as follows:where ymin and ymax are the minimum and maximum values of traffic flow, respectively and y and are the traffic flow data before and after being normalized, respectively.

The normalized data are divided into the training set and test set. The data from February 15, 2019, to May 1, 2019, are used as the training set, and the dataset from May 2, 2019, to May 15, 2019, is the test set.

4.2. Experimental Environment and Selection of Evaluation Indicators

The hardware and software conditions in the experimental environment of this article are shown in Table 1.

In order to evaluate the traffic flow prediction performance of the CDLP model, three evaluation indicators are selected: MAPE, MAE, and RMSE. Their calculation formulas are as follows:where n is the total number of samples in the test set, is the ith actual value of sample, and is the predicted value of the ith sample.

4.3. CDLP Model Parameter Setting
4.3.1. Loss Function

The loss function quantifies how close a given neural network is to the ideal state it is trained on. The average absolute error function and the mean square error function are used as loss functions commonly. Because of the convenient calculation of the mean square error function, in the CDLP model, the mean square error function is selected as a loss function and the calculation formula is as follows:where is the actual value, n is the total number of samples, and is the prediction value.

4.3.2. The Neuron Number in the CDLP Model

The neuron numbers of the input layer and hidden layer should be set before the model is trained (the number of neurons in the output layer has been determined in Section 3.2). The following is the process of setting the number of neurons in the input layer and the hidden layer.

In order to obtain the appropriate neuron number of the input layer, 6, 12, 18, and 24 are selected, respectively, to train the model, and the optimal neuron number is obtained through error analysis of the test set. Similarly, for the setting of the neuron number of LSTM layers and GRU layers, four numbers of 16, 32, 64, and 128 are selected, respectively, to train the model. Moreover, the optimal neuron number is determined through the error analysis of the test set.

Regarding the error analysis of the test set, MAPE is selected as the main evaluation indicator, while MAE and RMSE are used as auxiliary evaluation indicators. The evaluation indicator results of the test set under different neuron numbers in input and LSTM layers are obtained, which include the MAPE, MAE, and RMSE, as shown in Table 2.

From Table 2, it can be seen that when the neuron number of the input layer is set to 12 and the neuron numbers of the two LSTM layers are set to 128 and 128, respectively; the MAPE, MAE, and RMSE of the model test set are all the smallest. It indicates that the neuron numbers of the input layer and hidden layer are the best for the model training effect under this setting. Moreover, the neuron numbers of the two GRU layers are the same as those of LSTM, i.e., 128 and 128, respectively.

4.3.3. Optimization Algorithm

In the training process of the deep learning model, an optimization algorithm is used to iterate the model parameters to reduce the loss function value so that the training process of the model tends to be stable as the number of iterations increases. The optimization algorithms mainly include RMSprop and Adam. The two algorithms are applied to train the CDLP model and the better one is selected as an optimization algorithm according to the prediction results. After training of CDLP model under RMSProp algorithm and Adam algorithm, respectively, the results of three evaluation indicators are obtained and shown in Table 3.

It can be found from Table 3 that when the Adam algorithm is used to train the CDLP model, the MAPE, MAE, and RMSE are less than those of the RMSProp algorithm. It indicates that the Adam algorithm is more effective than the RMSProp algorithm and is selected as the optimization algorithm of the CDLP model.

4.3.4. Other Parameters

In the 1DCNN layer, the convolution operation is implemented by convolution kernels, and 64 convolution kernels with a size of 2 × 1 are used, i.e., filters = 64, size = 2. In the dropout layer, the loss rate of the dropout function is set as 20%. In addition, the epoch is set as 500 iterations, and the batch size is set as 128.

4.4. Results and Analysis

The CDLP model is trained and tested with a designed training set and test set after the above model parameters are determined. At the same time, in order to verify the advantages of the CDLP model, the prediction results from the single CNN-LSTM-attention model and single CNN-GRU-attention model are extracted during the process of training and testing for the CDLP model. Moreover, the corresponding results are obtained. Figure 5 shows the loss function curve of the training set and test set of CNN-LSTM-attention and CNN-GRU-attention. Figure 6 shows the prediction results of the CDLP model for the test set.

From Figure 5(a), it can be seen that the loss function of the training set of the CNN-LSTM-attention decreases rapidly and steadily as the number of iterations increases and finally tends to a stable state. Then, the loss function of the test set goes through initial fluctuations as the iteration progresses, quickly tends to the loss function of the training set, and is in a stable state. It can be seen from Figure 5(b) that similar to the CNN-LSTM-attention, the loss function of the training set of the CNN-GRU-attention network decreases rapidly and steadily and finally tends to a stable state and the loss function of the test set also gradually tends to the training set after initial fluctuations. Finally, the loss function is in a stable state. The loss function curves of the training set and test set of CNN-LSTM-attention and CNN-GRU-attention show that the design of CNN-LSTM-attention and CNN-GRU-attention network in the CDLP model is reasonable.

Figure 6 contains a comparison of the traffic flow predicted results of the CDLP model with the actual value (top figure) and the error of predicted traffic flow (below figure). From the top figure, it can be found that the CDLP model fits the change trend of traffic flow very well, indicating that the model learns the time change characteristics of the traffic flow series and realizes the better prediction. Moreover, from Figure 6, it can be seen that the overall errors remain stable and most of them change in a certain range of -20 and 20. Moreover, based on the error of predicted traffic flow, the MAPE curve is obtained and shown in Figure 7. From the figure, it can be found that the trend of the MAPE curve first quickly rises to the maximum value, then quickly decreases, and gradually becomes stable. Finally, the MAPE curve tends to be 5.12%. This shows that the CDLP model has excellent robustness and obtains small error, further showing that the CDLP model can better realize the prediction of traffic flow.

Furthermore, in order to further verify the prediction effect of the CDLP model, Figure 8 shows the absolute error comparison of the traffic flow predicted values of the CDLP model, CNN-LSTM-attention model, and CNN-GRU-attention models. Figures 8(a) and 8(b) show the traffic flow prediction errors of the first week and the second week in the test set under three models. As can be seen from the figure, the fluctuation range of the prediction error curve of the CDLP model is the smallest, followed by CNN-LSTM-attention and CNN-GRU-attention. This indicates that the prediction accuracy of the CDLP model is better than that of CNN-LSTM-attention or CNN-GRU-attention and also shows the advantages of the combination model compared to a single model.

Meanwhile, some baseline models published in recent years, which are LSTM, GRU, CNN, CNN-LSTM, CNN-GRU, CNN-LSTM-attention, and CNN-GRU-attention, are used to verify the accuracy of the CDLP model. The evaluation indicators of the CDLP and baseline models, which are MAPE, MAE, and RMSE, are obtained, as shown in Table 4. Moreover, the training times of CDLP and baseline models are shown in Table 5. It can be seen from Table 4 that the evaluation indicators of the CDLP model are the smallest, followed by baseline models. This shows that the prediction accuracy of the CDLP model is the best. Moreover, it can be found from Table 5 that the training time of the CDLP model is as long as the time of the CNN-LSTM-attention model, but its prediction accuracy is higher than that of CNN-LSTM-attention model. The training time of the CNN model is the shortest, but the prediction accuracy is the lowest, so the robustness of the CDLP model is relatively high.

In addition, according to the DOWCA, the weights of CNN-LSTM-attention model and CNN-GRU-attention model in the CDLP model are calculated, as shown in Figure 9.

Figure 9 shows that the weights of CNN-LSTM-attention and CNN-GRU-attention are dynamic and constantly changing, which indicates the two methods have different prediction results for the same traffic flow data. Moreover, it can be seen from Figure 8 that the weights of the two models gradually decrease from a large change at the beginning and eventually become stable, which reflects the systematic feasibility of the dynamic weighted coefficient algorithm, namely, the convergence. Furthermore, it shows that the weights of the CNN-LSTM-attention model are greater than those of the CNN-GRU-attention model, indicating that the prediction accuracy of the CNN-LSTM-attention model is higher than the CNN-GRU-attention model, which is consistent with the results in Table 4.

5. Conclusion and Future Work

Traffic flow prediction is an important part of the intelligent transportation system. In this article, a dynamic weighted coefficient algorithm for combinational prediction model is presented, namely, DOWCA. Furthermore, based on CNN, LSTM, GRU, and DOWCA, a combined deep learning model for short-term traffic flow prediction is proposed, namely, CDLP model. The structure of the CDLP model with an input layer, a hidden layer, and an output layer is designed. From the point of the combined model, the CDLP model includes two paralleled single models, i.e., CNN-LSTM-attention model and CNN-GRU-attention model. The parameters of CDLP model are determined by experiment, which includes loss function, the neuron number, and optimization algorithm.

The data from a field intersection are collected, and the dataset for the CDLP model is obtained through abnormal and missing data processing and normalization processing, which is divided into the training set and test set. The CDLP model is trained and tested. The results obtained show that the feasibility of the CDLP model can predict traffic flow with high accuracy. Moreover, in order to further verify the performance of the established model, based on the same dataset and the same parameter settings as the CDLP model, the baseline models are, respectively, used to predict the traffic flow. After analyzing the prediction results of these models, the results show that the accuracy of the CDLP model is higher than the baseline models. And DOWCA is validated to obtain the optimal weighted coefficients for CNN-LSTM-attention and CNN-GRU-attention in the CDLP model dynamically.

The structure of a CDLP model is designed and its parameters are set in this article. However, some parameters, for example, the number of nodes in the input layer and hidden layer in the model is obtained through experiments based on the selection of short-term traffic flow parameters in the past. How to optimize the parameters in a combined deep learning model needs to be further studied. Furthermore, traffic flow prediction involves several parameters; the deep learning structures based on the combinatorial algorithm can be expanded to multidimensional input variables, such as traffic speed and occupancy.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This study was supported by China Postdoctoral Science Foundation Funded Project (2019M652437), the Scientific Research Foundation of Shandong University of Science and Technology for Recruited Talents (2019RCJJ014), Shandong Postdoctoral Innovation Project (201903030), and Key Research and Development Project of Shandong Province (2019GGX101008).