#### Abstract

Traffic flow forecasting is the key to an intelligent transportation system (ITS). Currently, the short-term traffic flow forecasting methods based on deep learning need to be further improved in terms of accuracy and computational efficiency. Therefore, a short-term traffic flow forecasting model GA-TCN based on genetic algorithm (GA) optimized time convolutional neural network (TCN) is proposed in this paper. The prediction error was considered as the fitness value and the genetic algorithm was used to optimize the filters, kernel size, batch size, and dilations hyperparameters of the temporal convolutional neural network to determine the optimal fitness prediction model. Finally, the model was tested using the public dataset PEMS. The results showed that the average absolute error of the proposed GA-TCN decreased by 34.09%, 22.42%, and 26.33% compared with LSTM, GRU, and TCN in working days, while the average absolute error of the GA-TCN decreased by 24.42%, 2.33%, and 3.92% in weekend days, respectively. The results indicate that the model proposed in this paper has a better adaptability and higher prediction accuracy in short-term traffic flow forecasting compared with the existing models. The proposed model can provide important support for the formulation of a dynamic traffic control scheme.

#### 1. Introduction

Urban road resources are limited in time and space, and the continuous growth of motor vehicle ownership increases the pressure of urban traffic. Meanwhile, the similarity of residents’ choice of travel route and travel time also aggravates the traffic congestion. Problems including urban traffic accidents, traffic congestion, and environmental pollution have been widely concerned by the governments and the traffic personnel. In order to improve the efficiency of traffic operation, many countries are committed to developing intelligent transportation systems based on communication, sensors, computers, and other technologies [1, 2]. The traffic flow data is important for intelligent transportation system analysis. The historical traffic flow data can be used to predict the future traffic flow, and the traffic management department can take measures to intervene in traffic in advance in order to ease traffic congestion and reduce traffic accidents. Numerous studies have proposed and optimized traffic flow forecasting methods, extracted traffic flow information, mined potential laws of traffic flow data, and provided theoretical support for intelligent transportation [3].

Most of the early research on traffic flow forecasting are based on linear theoretical models, such as the historical average model (HA), the automatic regression moving average model (ARIMA) [4] and its variants [5, 6], and the Markov chain (MC) [7]. These models assume that the conditional variance of time series is static, which does not exist in real traffic conditions. Later, a variety of data-driven traffic flow forecasting algorithms were developed based on machine learning methods, such as Bayesian network [8], K-nearest neighbor [9], and support vector machine (SVM) [10]. However, these algorithms and their learning methods cannot adapt to dynamic traffic, because they cannot capture the nonlinear characteristics of complex traffic flows, and cannot deal with a large number of structured and unstructured traffic flow datasets.

In recent years, the real-time performance, accuracy, and reliability of traffic data have been significantly improved due to the rapid development of big data and intelligent transportation technology and the continuous improvement of traffic facilities [11–13]. Deep neural network models with high dimensional data processing and nonlinear data feature mining capabilities have gained significant attention. The prediction model based on deep learning has gradually become a mainstream trend of traffic flow forecasting [14, 15]. Convolutional neural network (CNN) [16] and recurrent neural network (RNN) [17] are two popular networks that are widely used by researchers [18]. Since the cyclic mechanism can perform the same task of each element in the sequence, the RNN is more suitable for sequential processing over time. Therefore, the RNN is widely used in the temporal characteristics of traffic flow. However, the RNN is susceptible to the disappearance of gradient, and it is difficult to capture the long-term temporal characteristics. In order to solve the problem of RNN’s long-term dependence on time, Ma et al. proposed a long-term short-term memory network (LSTM), compared it with the traditional prediction methods, and verified the accuracy and generalization of the LSTM prediction model [19]. Considering the temporal and spatial correlation of traffic flow, Zhao et al. established a new traffic pre-diction model based on an LSTM network through multiple memory cells [20]. In order to reduce the computational complexity, Wu et al. proposed the gated recursive cell network (GRU) based on the LSTM and reduced the three gate functions in the LSTM to two [21]. The performance of the GRU was slightly better due to the simpler gating structure and relatively low computational complexity. The structure of these neural network models uses continuous feedback between the input sequence and the time step to naturally combine the time dependence and show better performance.

The performance of a single neural network model is limited. In order to further improve the adaptability of the prediction model, the LSTM, the GRU, and the other methods have been combined to form a composite model in recent years [22]. Wu et al. used the LSTM to extract the characteristics of the time structure, combined with the ResNet to optimize the overall model, and reduced the occurrence of gradient disappearance or explosion in network degradation [23]. Sun et al. proposed a GRU neural network model, called SSGRU, based on the linear regression coefficient for road network traffic [24].

Although the RNN and the improved models based on the RNN have made substantial progress in traffic flow forecasting, these models still have the problems of low accuracy and low training efficiency. Therefore, to capture the time dependence of traffic flow data more efficiently and improve the applicability of the short-term traffic flow forecasting model, this paper proposes a traffic flow forecasting model based on time convolution neural network optimization of genetic algorithm. Taking the prediction error as the applicability, a genetic algorithm is used to determine the optimal hyperparameters combination of the TCN neural network to improve the prediction accuracy of the TCN neural network.

The rest of this paper is organized as follows. Section 2 introduces time convolutional neural networks. In Section 3, a GA-TCN model based on a genetic algorithm optimized time convolutional neural network is proposed. The proposed model is verified using examples in Section 4. Finally, Section 5 concludes the paper.

#### 2. Time Convolutional Neural Networks

The time convolutional neural network (TCN) [25] derived from a CNN can directly use the powerful features of convolution to extract the features across time steps. Therefore, the TCN shows better performance than both the LSTM and the GRU in many sequential tasks. The main structure of a TCN can be divided into causal convolution for sequence and dilated convolution and residual modules for historical data memory (see Figure 1).

##### 2.1. Causal Convolution

Causal convolution is a strict time-constrained model in which the value at time *t* depends only on the value at time *t* at the next level and before it.

Filter , the sequence , and the causal convolution at is

##### 2.2. Dilated Convolution

Dilated convolution allows the input of convolution to have interval sampling, and the sampling rate is controlled by dilations. The higher the level is, the larger the size of *d* will be, which can make the size of the effective window grow exponentially with the number of layers.

Filter , the sequence , and at , the dilated convolution with the dilation *d* is

##### 2.3. Residual Module

The training becomes extremely difficult with the increase of network depth mainly because in the process of network training based on stochastic gradient descent, the multilayer backpropagation of error signals can easily cause the phenomenon of “gradient dispersion” or “gradient explosion.” A residual network (ResNet) [26] can well solve the problem of difficult training caused by network depth.

In the TCN, the simple connections between the layers are replaced by the residual structures as shown in Figure 2. At the same time, is transformed by 1 × 1Conv to solve the problem that and cannot be added directly due to different channel numbers.

#### 3. GA-TCN Prediction Model

##### 3.1. Model Structure

A TCN based on a convolutional neural network can extract the time features of traffic flow across time steps. In this paper, the TCN is selected as the traffic flow forecasting network mainly due to its simple structure and strong performance for sequence modeling tasks. The prediction accuracy of the TCN is closely related to the value of the hyperparameter. The filters, kernel size, batch size, and dilations need to be optimized in the search space to determine the best hyperparameters combination.

This paper uses the genetic algorithm to optimize the TCN parameters. The traffic flow time series data is used as the input, the prediction error is used as the fitness value, and the prediction of traffic flow of the next stage is considered as the output matrix. The time iterative convolution neural network adaptive weight adjustment model is used to obtain the optimal solution of search space. The structure of the GA-TCN traffic flow forecasting model is shown in Figure 3.

##### 3.2. GA Optimization Module

A genetic algorithm (GA) is a computational model of the biological evolution process that simulates the natural selection and genetic mechanism of Darwinian evolution [27]. The GA is a method to search for the optimal solution by simulating the natural evolution process. It transforms the target problem into the process of biological evolution and generates new populations through crossover, mutation, replication, and other operations and eliminate the solutions with low fitness. The traditional GA is modified with the combination of a time convolutional neural network. The main steps of the optimization process of the modified genetic algorithm are as follows.(1)In population initialization, each chromosome is set to the same length. If it fails to meet the length requirements, it is filled with zero. The coding form of the chromosome does not adopt binary form but only exchanges the elements between different genes.(2)The mean absolute error (MAE) of the TCN is used as the fitness function:(3)The individual of the solution is selected for crossover and mutation operations. The modification of the crossover function consists of two steps. First, identify the positions on the two chromosomes (A and B) that need to be exchanged, and then traverse the genes on the two chromosomes at these positions. If the gene at this location on any chromosome is zero or the gene to be swapped is related to dilations layers, cancel the swap at this location. Generally, the dilation increases by a multiple of 2. Therefore, only filters, kernel size, batch size, and the layers of dilations are mutated.(4)If the fitness function target value reaches the optimal value, move to the next step; otherwise, return to step 3.(5)Obtain the fitness target value and the best parameters.(6)Calculate the MAE of pre-diction based on the best parameters.(7)If the number of population iterations is satisfied, the calculation stops, and then the global optimal hyperparameters combination [filters, kernel size, batch size, and dilations] of the TCN is obtained. Otherwise, return to step 6.

##### 3.3. TCN Prediction Module

The TCN prediction module mainly uses one-dimensional causal convolution and a filter *f* of width *k*. According to the input traffic flow data (), the convolution operation is performed to obtain A sequence [A B] of length *m*-*k* + 1 and with the same number of channels:where represents the convolution operation.

The Dropout setting allows a random selection of neurons to be inactivated to prevent training overfitting and speed up model convergence. The Dropout coefficient is set to 0.1 in the TCN prediction module.

The Adam optimizer is used in the TCN prediction module to optimize the parameters of the model to reduce the loss. Adam is an adaptive optimization algorithm that performs one-step optimization on a random objective function. It combines the advantages of the AdaGrad and the RMSProp optimizers.

ReLU represents a linear integer function and is used as the activation function of the TCN:

After adding the residual, the activation function becomeswhere and are the weight matrices and is the bias vector.

The valid information in A is extracted by activating the function and made accessible at the next level. The final output is O:where ⊙ represents the Hadamard product.

#### 4. Experiments

##### 4.1. Data Description

The experiment in this paper used the public dataset PEMS, which is a real-time collection of California highway traffic data by the Caltrans Performance Measurement System. PEMS data was collected every 30 seconds to form the final data every 5 minutes. The PEMS dataset used in the experiment were collected from a station on highway I05, as shown in Figure 4. The time range was 00:00:00–23:59:59 from May 1^{st} to June 30^{th} in 2014, with a total of 61 days and 17568 records.

The main conditions of the PEMS dataset are shown in Table 1. The main data used in the experiment were timestamp and total flow.

Through descriptive statistical analysis of traffic data, distribution histograms and cumulative frequency graphs were drawn, as shown in Figure 5. We can see that the flow is mainly distributed within 20–360. Among them, the proportion in the range of 20–60 and 240–300 is relatively high, and the proportion in the range of 60–240 and 300–360 is relatively low.

##### 4.2. Data Preprocessing

Due to equipment failure, noise interference, improper storage, human error, and other emergencies, the data cannot be collected 100%, and there are missing and outlier values. Therefore, it is necessary to preprocess the original data before establishing the prediction model. From the descriptive statistical analysis of the original data, it can be found that there are no large outliers in the data, and only some 0 values exist. In the experiment, we used 0 as the missing value. The sensitivity of the tree model to missing values is low, and it is suitable for the case of moderate or large data. Therefore, the random forest was used to fill in the missing values in this experiment.

The traffic flow data is a nonstationary random sequence. In the experiment, minimum and maximum normalization was adopted to map the data to the interval [0,1]. The normalization is defined aswhere is the normalized data, is the original data, and and are the maximum and maximum values of , respectively.

There were obvious differences in the characteristics of sunrise travel between working days and weekend days. Therefore, the data was divided into working days and weekend days. There were 43 working days and 18 weekend days. The data of the last working day and last weekend day were used as the test set, and the rest of the data was used as the training set.

##### 4.3. Experiment Settings

###### 4.3.1. Baseline Methods

In order to better verify the prediction effect of the model proposed in this paper, the following three baseline methods are added in the experiment for comparison.

The first method is LSTM. It is special RNN that adopts a threshold structure to transform the RNN, solves the problems of gradient disappearance and gradient explosion in regression prediction of long time series of RNN, and can learn long dependence.

The second method is GRU. It is a variant of RNN. The LSTM has three gate functions: input gate, forgetting gate, and output gate, whereas in the GRU model, there are only update and reset gates.

The final method is TCN. Time convolutional neural network is a new algorithm based on CNN that can solve time series prediction.

###### 4.3.2. Evaluation Metrics

In this experiment, the root mean square error (RMSE), the mean absolute error (MAE), and the mean absolute percentage error (MAPE) are used as the evaluation indexes of the model:where and are the true and the predicted values of the traffic volume, respectively, and is the number of samples.

##### 4.4. Parameter Optimal Value

The search range of filters, kernel size, batch size, and dilations layer is set to [5–30], [2–10], [8, 16, ..., 128, 256], and [2–8], respectively. The optimal combination of TCN is determined to be [filters = 20, kernel size = 3, batch size = 32, and dilations = [1, 2, 4, 8, 16, 32]]. With the increase of the number of iterations, the training error of the GA-TCN model gradually converges, and the mean square error of the hyperparameter optimal solution decreases to achieve the optimal solution of the search space, as shown in Figure 6.

##### 4.5. Experimental Results

Figures 7 and 8 compare the prediction results of the four models on working and weekend days, respectively. All four models can fit the real data well. During the morning peak and evening peak on working days, the flow has the obvious mutation. We choose 6 : 30 to 9 : 00 as an example to compare and analyze the prediction model. As can be seen from the enlarged figure of this period, when the traffic volume changes dramatically, the changing trend of the GA-TCN model is most similar to the actual traffic volume. The prediction error of each model can be obtained by calculating the difference between the predicted value and actual flow. Among them, the average error of LSTM, GRU, TCN, and GA-TCN is 8.40, 5.81, 7.29, and 5.18, respectively. GA-TCN model has the lowest error and the highest prediction accuracy. On weekend days, the peak traffic flow lasts longer and is relatively stable. We choose 13 : 00 to 18 : 30 as an example to compare and analyze the prediction model. The predicted results of the GA-TCN model are consistent with the actual traffic flow, while the predicted results of the LSTM and GRU model have an obvious time lag. The average error of LSTM, GRU, TCN, and GA-TCN is 3.82, 4.47, 3.16, and 3.47, respectively. TCN has the smallest mean error, followed by GA-TCN. Although the average error of GA-TCN is larger than that of TCN, the error value of GATCN is basically the same; that is to say, the prediction result of TCN is completely consistent with the change trend of the actual flow. This shows that the GA-TCN model proposed in this paper has strong adaptability and higher prediction accuracy.

Three evaluation indicators, RMSE, MAE, and MAPE, were obtained through the predicted working and weekend days data of four different models, as shown in Table 2. In the evaluation index statistics, the prediction results of the proposed GA-TCN model are the best. The RMSE, MAE, and MAPE of the suboptimal working day model are 18.36%, 28.90%, and 24.69% higher than that of the GA-LSTM model, respectively. The RMSE, MAE, and MAPE of the suboptimal weekend day model are 3.27%, 2.38%, and 5.68% higher than that of the GA-LSTM model, respectively.

##### 4.6. Analysis of Influencing Factors

In order to more intuitively show that the proposed GA-TCN model has better accuracy than the other models, Figure 9 shows the RMSE and MAE distribution histograms of the four models according to the data in Table 2. The working day MAE of LSTM, GRU, and TCN is higher than that of the weekend days, while the working day MAE of the GA-TCN is lower than that of the weekend days. Since the data volume on weekdays was more than twice that of the weekend days, the results demonstrate the GA-TCN proposed in this paper is more suitable for the processing of big data, can more effectively capture the short-term dependence of traffic flow, and improve the accuracy of short-term traffic flow forecasting.

**(a)**

**(b)**

**(c)**

Generally, the time interval for short-term traffic flow forecasting is [5–30] min. For further evaluation, the performances of four models were tested for short-term traffic flow forecasting at different time intervals. The results are shown in Tables 3 and 4.

With the increase of prediction time interval, the prediction difficulty of the model gradually increases. It can be seen from the comparison of prediction performance at different time intervals in Figure 10 that the prediction error of the four models increases roughly linearly. Among them, the change trend and degree of RMSE and MAE of the four working day models are the same. The RMSE, MAE, and MAPE of weekend days LSTM increase by 2.93, 1.91, and 0.5%, respectively, being the most significant changes. The RMSE and the MAE of the GA-TCN increase by 1.34, 1.07, and 0.37%, respectively, being the lowest changes. The results show that the performance of the proposed GA-TCN is better than the existing models at all time intervals. The proposed model has relatively good prediction performance when the data interval is large, and the data sample is small, and it can be better applied to short-term traffic flow forecasting at various time intervals.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

##### 4.7. Influence of Parameters on Model Optimization

Finding the optimal characteristic parameter value by genetic algorithm will reduce the prediction efficiency of the model. In order to reduce the influence of the searching process on the efficiency of prediction, we first selected some samples to preliminarily determine the searching range of TCN characteristic parameters. Then, we make statistics on the optimal individuals of 200 iterations and analyze the relationship between characteristic parameters and error and efficiency, as shown in Figure 11.

**(a)**

**(b)**

**(c)**

**(d)**

It can be seen from Figure 11(a) that MAE increases with the increase of filters, and MAE within the range of 5–15 is relatively low and stable. Figure 11(b) shows that although kernel size has a certain influence on MAE, the influence is small. The impact of batch size on model prediction results is related to the actual data size. You need to adjust the batch size based on the data size. However, under the same number of epochs, a large batch size requires less batch and shorter training time. In other words, in order to improve the forecasting efficiency, the larger the batch size, the better. However, increasing the batch size will reduce the model generalization performance, and the model performance will decline. This also indicates that in Figure 11(c), when the batch size is 32 or 64, the prediction effect is good, but with the increase of the batch size, the prediction performance gradually decreases. Figure 11(d) shows that with the gradual increase of dilations layer, the model accuracy first improves and reaches the optimal value when dilations layer = 4 (dilations = [1, 2, 4, 8]), and then gradually decreases, because the increase of dilations layer leads to a larger receptive field. The benefit of increasing receptive field is to obtain longer temporal dependence of traffic flow. However, with the increase of dilations layer, the number of network layers gradually deepened, and the amount of computation and complexity increased, leading to more difficult training of the model, so the model accuracy decreased.

#### 5. Conclusion

In this paper, a short-term traffic flow forecasting model based on genetic algorithm optimization time convolution neural network is proposed. In the proposed model, the prediction error is taken as the fitness value, and the filters, kernel size, batch size, and dilations of the temporal convolutional neural network are optimized using the genetic algorithm to determine the prediction model with the best accuracy and efficiency. The model is experimentally verified on the PEMS dataset. The experimental results show that the proposed model is better than the existing baseline methods including LSTM, GRU, and TCN and has more adaptability and higher accuracy in short-term traffic flow forecasting.

The experiment conducted to verify the effectiveness of the proposed model was limited to certain parts of the city. In order to establish a more accurate prediction model, the entire urban road network should be used as the research object. In future research, many influencing factors such as traffic accidents, temperature and humidity, and precipitation will be considered.

#### Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

#### Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

#### Acknowledgments

This research was supported in part by the Pacific Northwest 13 Transportation Consortium, US Department of Transportation University Transportation Center for 14 Federal Region 10, the Science and Technology Planning Project of the Shandong Province (Grant no. 2016GGB01539), and the Science and Technology Planning Project of the Zibo City (Grant no. 2019ZBXC515).