Abstract

As an important part of data management, network traffic evaluation and prediction can not only find network anomalies but also judge the future trends of the network. To predict network traffic more accurately, a novel hybrid model, integrating Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN) with long short-term memory neural network (LSTM) optimized by the improved particle swarm optimization (IPSO) algorithm, is established for network traffic prediction. Firstly, an LSTM prediction model for the real-time mutation and dependence of network traffic is constructed, and the IPSO is applied to optimize the hyperparameters. Then, CEEMDAN is introduced to decompose sequences of raw network traffic data into several different modal components containing different information to reduce the complexity of the network traffic sequence. Finally, the evaluation of the experiments shows the feasibility and effectiveness of the proposed method by comparing it with other deep neural architectures and regression models. The results show that the proposed model CEEMDAN-IPSO-LSTM produced a significantly superior performance with a reduction of the prediction error.

1. Introduction

As the information storage terminals of the Internet of Things (IoT), and with the continuous expansion of the data center scale, the network structure of the data center is becoming increasingly complex, the network business and network data flow are growing rapidly, and the frequency of network congestion is also getting higher and higher [13]. Network traffic monitoring, network resource optimization, network congestion avoidance, and network security strategy are of great significance in the real-time analysis of network traffic [48]. When the network is overloaded or congested, accurate prediction can ensure high-quality execution of network services with super importance or priority [9]. In recent years, predictive analysis based on historical network traffic has become a major research topic in the academic field. Establishing an accurate prediction model to describe network traffic characteristics contributes to optimizing network topology structure and route planning, reducing energy consumption, and providing more reliable service quality assurance.

Network links change dynamically with limited node processing resources. Network traffic prediction mainly depends on the statistical characteristics of flow and the strong correlation between time-sequence values. Modeling analysis based on network traffic time series is an effective method for network traffic research, which has been widely used in network traffic prediction and network performance evaluation. How to fully consider the complex characteristics of network traffic, and improve the prediction accuracy and real-time of network traffic has always been a hot and difficult topic of network traffic research [10].

In this study, we propose a novel hybrid model based on the Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN), the Long Short-Term Memory (LSTM) neural network model, and the improved particle swarm optimization (IPSO) algorithm to predict and analyze the network traffic. Firstly, combined with the real-time mutability, dependence, and highly nonlinear characteristics of network traffic, we establish the LSTM network traffic prediction model to extract the dynamic characteristics of network traffic. Then, the IPSO is utilized for hypermeters optimization. In addition, the CEEMDAN method is employed to decompose the network traffic data into several simplified modes. Finally, we compare the prediction accuracy of different models to evaluate the prediction effects of the CEEMDAN-IPSO-LSTM neural network model.

The main contributions of our work are presented as follows:(1)CEEMDAN is introduced to decompose network traffic data into several components, which creates modal confusion and avoids making larger impacts on the original signal during adding the white noise.(2)Network traffic prediction model based on LSTM is constructed to pursue the real-time mutation of network traffic.(3)The improved PSO algorithm is proposed to optimize the hyperparameters of the LSTM network traffic model. The optimization of hyperparameters of the LSTM prediction model can improve the prediction performance.

The rest of this paper is as follows. In section 2, we review the related research about network traffic prediction. In section 3, we constructed a network traffic prediction model based on LSTM. In section 4, we study the hyperparameter optimization of LSTM by the Improved Particle Swarm Optimization, and network traffic data denoising by CEEMDAN. Section 5 presents our evaluation of the proposed method. Finally, the main conclusions and future work are drawn in section 6.

Due to the importance of network traffic prediction, there has been much research on network traffic prediction methods in recent years. Generally, network traffic prediction can be divided into short-term prediction, medium-term prediction, and long-term prediction according to the different cycles of network traffic prediction, [11] while network traffic prediction models are mainly divided into two categories: parametric model and nonparametric model [12].

2.1. Parameter model

The parametric model has the advantage of being simple and easy to understand. Moreover, it does not have high standards for training data, and the solving process is easy compared to nonparametric model, which consumes less time. However, the parametric model is suitable for the prediction of small data volumes with obvious features and stable structure, while the network traffic has real-time mutability and dependency characteristics, and the parametric model will lead to higher prediction errors than the nonparametric model.

ARMA network traffic model can effectively analyze network data with a stable flow in a short period, obtain network traffic characteristics at the corresponding scale, and realize data flow decomposition [13]. In the network traffic model based on ARMA, the deployment of the multi-scale fitting process can obtain high accuracy under any expiration delay, simplify the ARMA model, and enhance the integration effect of the ARMA framework in network traffic modeling [14].

However, ARMA is not suitable for long-term network traffic data with network anomalies because the premise of ARMA modeling is that the data analyzed is a stationary random process. Most of the actual network traffic data are nonstationary [15], which can be transformed into stationary data after finite-difference. Therefore, some scholars proposed the Autoregressive Integrated Moving Average (ARIMA) model in the research process [16].

2.2. Nonparametric model

Nonparametric model refers to a model with no fixed structure and fixed parameters. Common nonparametric models include Support Vector Machine (SVM), k-Nearest Neighbor (kNN) [8], Artificial Neural Network (ANN), etc. The nonparametric model can automatically fit a variety of function forms without assumption, and the training effect is good, which is suitable for predicting large data volume.

Due to the real-time variability and dependence of network traffic, traditional network traffic prediction models have some disadvantages such as weak generalization ability and limited prediction accuracy. Therefore, more and more researchers use nonparametric models to predict network traffic data. The Support Vector Regression model (SVR) and its variant MK-SVR are first used to predict network traffic [1719], which effectively predicts the changing trend of network traffic data but lacks the consideration of temporal correlation of time series data leading to a limit of prediction accuracy.

Methods based on the artificial neural network, such as Convolutional Neural Network (CNN) [20], improve the effect of flow classification by autonomous feature learning of data [21]. LSTM neural network and Gated Recurrent Unit (GRU) neural network have a superior effect over existing SVM and ANN models in predicting network traffic, which is more suitable for random nonlinear network traffic prediction [12]. LSTM neural network was originally used for short-term flow prediction, which can better learn the abstract representation of nonlinear flow data and capture the inherent characteristics of long-term dependence relationship in continuous data, thus improving the accuracy of flow prediction [22]. LSTM neural network is used for network traffic prediction, and the auto-correlation coefficient is added to the model to describe the trend of network traffic change better, which improves the accuracy of the prediction model [23]. On this basis, the improved Particle Filter (PF) algorithm is used to optimize the LSTM model, which improves the training rate and overcomes the shortcoming of convergence to local optimal in the traditional LSTM network [24].

The experiments of many neural network methods to predict the network traffic data show that in a real-time network data set, LSTM is of better performance than Recurrent Neural Network (RNN), the Feed-forward Neural Network (FFN), and other classic methods. LSTM neural network can more accurately simulate time series and its long-term dependencies than the traditional RNN, in large network traffic matrix prediction, and obtain a faster convergence rate [25]. The variants of LSTM neural network, GRU neural network, and identity-RNN (IRNN) have comparable performance with LSTM [26]. Minimal Gated Unit (MGU) overcomes the shortcoming of the high computing cost of the LSTM network and achieves relatively predictable performance with less model training time [27]. In addition, LSTM neural network has achieved good prediction results in financial data forecast [28, 29], metal price prediction [30], air quality index prediction [31], modular temperature prediction [32], and bridge health monitoring [33].

In summary, a single parametric or nonparametric model has its problems and defects, while a hybrid prediction model can overcome the shortcomings of a single model by combining two or more models. The hybrid model mainly combines some decomposition algorithms, optimization algorithms, and prediction algorithms, respectively, in the data preprocessing, prediction, and result correction stage of network traffic prediction. Although combinatorial prediction has achieved good results in other researches [34, 35], there are still some problems, such as how to choose the prediction model and its parameters, how to integrate the prediction results reasonably, and how to choose the appropriate decomposition algorithm or optimization algorithm. For network traffic prediction, using the combined prediction model and overcoming the above problems is a research direction worthy of further study.

3. Network Traffic Prediction Based on LSTM

3.1. LSTM Neural Network Model

LSTM neural network (hereinafter referred to as LSTM) is an improvement of the recurrent neural network, which aims to overcome the defects of the recurrent neural network in processing long-term memory [36]. The LSTM introduced the concept of cellular states, which determine which states should be preserved and which should be forgotten. The basic principle of LSTM is shown in Figure 1.

As shown in Figure 1, Xt is the input at time t, ht-1 is the output of the hidden layer at time t-1, and Ct-1 is the output of the historical information at time t-1; f, i, and, o are, respectively, the forgetting gate, input gate, and output gate at time t, and is the internal hidden state, namely, the transformed new information. LSTM conducts parameter learning for them in the training. Ct is the updated historical information at time t, and ht is the output of the hidden layer at time t.

Firstly, the input xt at time t and the output ht-1 of the hidden layer are copied into four copies, and different weights are randomly initialized for them, to calculate the forgetting gate f, input gate i, and output gate o, as well as the internal hidden state . Their calculation methods are shown in formulas (1)–(4), where W is the parameter matrix from the input layer to the hidden layer, U is the self-recurrent parameter matrix from the hidden layer to the hidden layer, b is the bias parameter matrix, and σ is the sigmoid function, so that the output of the three gates remains between 0 and 1.

Secondly, forgetting gate f and input gate i are used to control how much historical information Ct-1 is forgotten and how much new information is saved, to update the internal memory cell state Ct. The calculation method is shown in formula (5).

Finally, output gate o is used to control how much Ct information of the internal memory unit is output to the implicit state ht, and its calculation method is shown in formula (6).

3.2. Network Traffic Prediction Model Based on LSTM

Network traffic data are modeled as a nonnegative matrix X of an NxT, where N represents the number of nodes, T represents the number of time slots sampled, and each column in the data matrix represents the network traffic value at different nodes in a specific time interval.

Network traffic prediction can obtain the predicted value of the future time through the historical time series, X (i, j) represents the scale of the NxT flow matrix, and xn,t represents the network traffic value of row n and column t. Network traffic prediction is defined by a series of historical network traffic data (xn,t-1, xn,t-2, xn,t-3,…, xn,t-1) to predict the network traffic at time t in the future. In the network traffic prediction model based on LSTM (Figure 2), it is assumed that the network traffic at a certain point in the t-slot is predicted, the input of the model is (xn,t-1, xn,t-2, xn,t-3,…, xn,t-1), and the output is the predicted value of the network traffic at the t-slot at this point.

In Figure 2, we summarize the process of network traffic prediction based on LSTM, and it mainly includes network traffic data preparation, data preprocessing (data resampling and null filling), normalization of data, data classification, prediction network building, network compilation, network evaluation, data prediction, and evaluation.

The detailed contents of each process for network traffic prediction are as follows:(1)Network traffic data preparation and preprocessing. To meet the time and frequency requirements (second, minute, hour, day, etc.) of network traffic data prediction, the original data are required to resample, namely, the time series from one frequency is converted to another frequency. And to ensure even data time interval, the uneven time interval data are converted to equal interval data. There are generally two methods of data resampling: downward sampling and upward sampling. The former is to convert high-frequency data into low-frequency data, while the latter is to convert low-frequency data into high-frequency data. In addition, if there is a void value in the resampled data sequence, it is necessary to fill the void value. The commonly used methods include the direct deletion method, statistically based filling method, and machine-learning-based filling method. The direct deletion method may discard some important information in the data, and the statistically based filling method ignores the timing information of the data [37]. Therefore, this paper adopts the machine-learning-based filling method–K-Nearest Neighbor (KNN) to fill the void value of network traffic data.(2)Data normalization. The range standardization method is used to process the network traffic data so that the sample data value is between 0 and 1. The calculation method of the range standardization method is shown in formula (7).where represents the maximum value of network traffic data and represents the minimum value of network traffic data.(3)Data division. After preprocessing and normalization, the network traffic data are divided into training set and test set according to the simple cross-validation. Under the condition of keeping the network traffic data time sequence constant, the training set and the test set are divided by fivefold cross-validation [38], which are used for the training and prediction of the LSTM network traffic prediction model.(4)LSTM network traffic prediction model construction. An LSTM neural network is defined and its parameters are set, including the values of time step size, number of network layers, number of neurons in each layer, dropout, activation function, type and number of the return value, dimension size of the hidden layer, learning rate, batch processing size, iteration times, etc.(5)Network compilation. Set the optimizer, error measurement index, training record parameters, and compile the LSTM network traffic prediction model.(6)Network evaluation. Substitute training data into the model for training, and evaluate the error of the established prediction model. According to the results, finetune the parameter setting of the model to get a better prediction effect.(7)Prediction and evaluation. The optimized network traffic prediction model is used to make a prediction, and calculate the prediction errors by comparing prediction results with the real data.

4. The LSTM Network Traffic Prediction Model Optimized by IPSO and CEEMDAN

4.1. Improved Particle Swarm Optimization

Particle Swarm Optimization (PSO) is a simple-rule, fast-convergence-speed swarm intelligence optimization algorithm [39, 40]. It regards every individual as a part with no size and no quality in an n-dimension search space, which flies at a certain speed. It improves the searching through group cooperation and competition among the particles under the guidance of swarm intelligence.

Particle swarm optimization in n-dimensional continuous search space, for i-th (i = 1, 2,…,m) particle, determines that n dimensional current position vector xi(k) = [xi1xi2,… xin]T represents the current position of the i-th particle in the search space, and n dimensional velocity vector represents the search direction of the particle. The optimal position (pbest) experienced by the i-th particle in the group is denoted as pi(k) = [pi1pi2pin]T, and the optimal position (gbest) experienced by all particles in the group is denoted as . The basic PSO algorithm is shown in formulas (8) and (9).where , , is the inertia weight factor, and is acceleration constant, all of which are nonnegative values. and are random numbers with uniform distribution within the range of and , and are corresponding control parameters.

In the PSO algorithm, ω keeps the particle moving inertia so that it tends to expand the search space, the ability to search new areas. The ω value usually adopts the linear inertia weight method, that is, the ω value increases or decreases linearly with the number of iterations. Compared with the fixed ω value, the linear method improves the optimization ability and convergence speed of the PSO algorithm to some extent, but it is far from enough. The nonlinear inertia weight method can further improve the optimization ability and convergence speed of the PSO algorithm [41]. Therefore, the ω calculation in this paper is improved by using the nonlinear inertia weight method, as shown in formula (10).

In formula (10), ωmax and ωmin, respectively, represent the maximum inertia weight and the minimum inertia weight, and i is the current iteration number. item_max is the maximum iteration number.

In the PSO algorithm, c1 and c2 are used to adjust the step size of particle movement. In this paper, the sine function is used to improve the acceleration constant [29]. The calculation method is shown in formulas (11) and (12).

4.2. LSTM Hyperparameter Optimization Based on Improved PSO

The selection of hyperparameters of the LSTM prediction model has an important influence on prediction accuracy. The current hyperparameter selection method based on the empirical method has randomness, blindness, and nonuniversality in the parameter setting. Therefore, multiple hyperparameters are formed into a multidimensional solution space, and the optimal parameter combination is obtained by traversing the solution space, which can reduce the randomness and blindness of parameter selection. Multiple hyperparameter selections are in a larger scope, which needs a better performance optimization algorithm to obtain the global optimal solution quickly, so we introduce the improved particle swarm algorithm (Improved PSO, IPSO) to optimize LSTM model parameters. With the quick convergence speed, the IPSO promotes the scientific nature of the model parameter selection and further improves the prediction accuracy of the models.

It is assumed that n hyperparameters of the LSTM network traffic prediction model are optimized, each particle represents a set of hyperparameters of solution space. It is supposed in the n-dimensional continuous search space, there are m groups of hyperparameter combinations, representing the i-th (i = 1, 2,…,m) hyperparameter. The current position vector xi(k) = [xi1xi2xin]T of n dimension represents the current value of an i-th group of hyperparameters in the solution space. The velocity vector of n dimension represents the search direction of this group of hyperparameters.

The goal of network traffic prediction is to make the predicted value close to the actual value, that is, the error between the predicted value and the actual value is as small as possible. Therefore, the Root Mean Square Error (RMSE) of training data in the network traffic prediction model is selected as the objective function. Let fitness = RMSE, then the objective function is to minimize RMSE. The RMSE calculation method is shown in formula (13).

In formula (13), is the prediction value. , y is the real value, .

Two important hyperparameters of the LSTM network traffic prediction model are optimized according to IPSO: time step size and the number of neurons in each layer. The single-layer and bilayer LSTM models are taken as the research objects to carry out the hyperparameter optimization. For the single-layer LSTM model, the node is for the number of neurons, and the lookback is for the time step, fitness = RMSE (node, lookback); for a bilayer LSTM model, fitness = RMSE (node1, node2, lookback).

According to the algorithm flow of IPSO, the process of IPSO optimized LSTM network traffic prediction model hyperparameter mainly includes six steps.

Step 1. The IPSO parameter is set. The particle swarm size is set as the number of hyperparameter combinations m. Each particle is randomly set as the initial value and speed of each group of hyperparameters within the allowed range. The maximum number of iterations item_max and the prediction error Pre_error.

Step 2. The fitness of each particle is evaluated, that is, the fitness value of the objective function of each group of hyperparameters is calculated.

Step 3. The optimal objective function value Pi for each set of hyperparameters is set. For the i-th group hyperparameter, its current target function value current_fitness is compared with Pi. If it is less than Pi, then current_fitness is used as the best target function value Pi for the ith group hyperparameter, namely, Pi = current_fitness.

Step 4. The global optimal value . For the hyperparameter of i-th group, P is compared with . If it is less than , then Pi is taken as the optimal value of the current group, namely,

Step 5. The search direction and value of each set of hyperparameters are updated according to formulas (8) and (9).

Step 6. The termination conditions are checked. If the set condition (default error or the maximum number of iterations) is not met, step 2 is returned to continue execution.

4.3. Network Traffic Data Decomposition by CEEMDAN

The empirical mode decomposition algorithm (EMD) is a data processing method commonly used for nonstationary time series signals [42]. It can decompose the nonstationary signals into a series of intrinsic mode function (IMF) components with different time scales. However, modal confusion exists in this method. Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN) algorithm improved the EMD algorithm by adding a set of white noise with equal size and opposite signs before decomposing data via the EMD [43]. The CEEMDAN both confuses modal confusion and also avoids making larger impacts on the original signal during adding the white noise. The main steps of CEEMDAN are as follows:(1)Add a group of Gaussian white noise sequence ɛi (t) with opposite signs to the original sequence x (t), and obtain a new set of time series;(2) Decompose each time series via EMD in formula (15) and obtain n intrinsic mode functions components;where, cij is the j-th modal component obtained by EMD decomposition after adding white noise for the i-th time.(3)Add different adaptive noises and repeat steps (14) and (15) for m times to obtain the set of m groups of intrinsic modal components (IMF), in which the last group is the trend term (Res);(4)Calculate the ensemble average of all components to obtain the final modal component group ci (t).

The process of network traffic prediction based on IPSO-LSTM combined with CEEMDAN is shown in Figure 3.

The process of data decomposition and prediction includes three main steps.(1)The network traffic data are decomposed by CEEMDAN into serval different modal components and obtain some subsequences of IMF1, IMF2, IMF3, …, IMFn;(2)Use the IPSO-LSTM model to predict each subsequence and gain results1, result2, result3, …, resultsn;(3)Superpose the subsequence prediction results of results1, result2, result3, …, resultsn and output the network traffic prediction result.

4.4. Network Traffic Prediction Algorithm Based on CEEMDAN-IPSO-LSTM

According to the process of IPSO for hyperparameter optimization and data de-composition by CEEMDAN, based on the network traffic prediction steps of LSTM, the CEEMDAN-IPSO-LSTM network traffic prediction algorithm is obtained. The pseudo-code of the algorithm is shown in Algorithm 1.

(1)Network traffic data preparation and preprocessing
(2)Decompose the raw data into several different modal components and obtain some subsequences of IMF1, IMF2, IMF3, …, IMFn
(3)Divide each subsequence into a training set and a test set
(4)Construct the LSTM network traffic prediction model. Set partial parameters and fix the number n of the optimized parameter
(5)IPSO parameter initialization (particle swarm size m, solving space dimension d, the maximum number of iterations iter_max, learning factor , , weight ω)
(6)Initialize the values of n-dimensional parameter combinations of m groups randomly in the solution space
(7)Initialize the global optimal parameter combination gbest_parameters, the partial optimal parameter combination pbest_parameters and the best fitness function value Pg
(8)While the end condition is False
(9) Apply the n-dimensional parameter combinations of m groups, respectively, to the LSTM network traffic prediction model for training, and calculate the current fitness function value;
(10) Get the current best fitness value Pi and the corresponding parameter combination pbest_parameters;
(11) if Pi < Pg;
(12)   ;//Update the best fitness value
(13)  gbest_parameters = pbest_parameters;//Update the global optimal parameter combination
(14) end if;
(15) for each parameter combination
(16)  Calculate the search direction and position of the new parameter combination according to equations (8) and (9)
(17)  Fix the updated parameter in the selected values;
(18) end for;
(19) The number of iterations + 1;
(20)end while;
(21)Return to gbest_parameters;
(22)Introduce gbest_parameters into the LSTM network traffic prediction model;
(23)Predict test data of each subsequence and gain results1,result2, result3,…, resultsn;
(24)Superpose the subsequence prediction results of results1, result2, result3,…, resultsn and output the network traffic prediction result.

Algorithm 1 firstly prepares network traffic data and decomposes the raw data into several subsequences, and then divides each subsequence into a training set and a test set. Then, it uses the IPSO-LSTM network traffic model to obtain the optimal parameter combination. Finally, the optimal parameters are substituted into the LSTM model to complete the prediction of each subsequence and output the network traffic prediction result by superposing subsequence prediction results.

CEEMDAN-IPSO-LSTM network traffic prediction algorithm contains three processes, the time complexity of data decomposition is , k is the size of the predicted data set; the time complexity of hyperparameter optimization process is ; and the time complexity of the prediction process is , h is the hidden_size, p is the input_size. Therefore, the time complexity of CEEMDAN-IPSO-LSTM is

In the running process of the algorithm, the parameter optimization process consumes the most time with the highest computational complexity, but its time cost is acceptable because this process needs to run only once to obtain the optimal combination of hyperparameters. Once the hyperparameters are determined, the main time complexity is reflected in the prediction process. The time of the prediction process is mainly spent in the training. As long as the training is completed, the prediction can be finished by substituting the input data into the equation.

5. Experiment Evaluation and Discussion

5.1. Experimental Environment Configuration and Parameter Setting

This experiment completed under the measured flow data of BC-Oct89Ext provided by Bell Laboratory is selected. The flow data were Ethernet data detected in The Bell Morristown study, containing one million packets. This paper selects some data segments of BC-Oct89Ext flow data for model analysis.

For the prediction results of the network model, three error analysis indicators were used to verify the prediction accuracy, which were Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE), respectively. MAE and MAPE calculation methods are shown in equations (18) and (19).

According to Equation (13), the smaller the RMSE value, the smaller the average error between the prediction results and the actual data, the higher the prediction accuracy of the model, and the better the prediction performance of the model. Similarly, it can be seen from equations (18) and (19) that the MAE and MAPE values tend to 0, the better the prediction effect of the model is and the more perfect the model is. On the contrary, the greater the value is, the greater the error is, and the worse the prediction effect of the model is.

5.2. Network Traffic Prediction Results Based on LSTM
5.2.1. Data Processing

(1) Data resampling. As the original network traffic data in BC-OCT89Ext were collected multiple times per second with unequal time intervals, the data collected multiple times per second were preprocessed with the mean value method, and then the K-Nearest Neighbor (KNN) algorithm was used to fill the void value. Figure 4 shows 1800 pieces of flow data after packet resampling and null value processing.

(2) Data decomposition. It can be seen from Figure 5 that network traffic data have obvious nonlinearity and nonstationarity, which makes prediction difficult. Then the original time series is decomposed by the CEEMDAN method into several more predictable time subseries, and six groups of modal subsequences were obtained from high frequency to low frequency. Decomposition results are shown in Figure 5. It can be seen that the fluctuation of IMF1 to Res subsequence gradually flattens out and the frequency becomes lower and lower.

(3) Data division. The data after normalization was divided into a training set and a test set according to simple cross-validation. The first 80% of the data were used as training data for LSTM network model training. The remaining 20% of the data were used as prediction data to verify the efficiency of the model.

5.2.2. Network Traffic Prediction Based on Basic LSTM

(1) Network definition. In this forecast, the network structures of three-layer LSTM (one input layer, one hidden layer, and one output layer) and four-layer LSTM (one input layer, two hidden layers, and one output layer) are, respectively, adopted.

The specific connection mode of the three-layer LSTM is as follows: the timesteps of LSTM in the first layer are 1. The input of the data dimension is 3, and the number of neurons is 64. The second layer hidden layer (dense) takes the output of the first layer LSTM as input; the output layer of the third layer takes the output of the second hidden layer as the input and connects to a full connection layer. A one-dimensional vector with a length of 360 output from the full connection layer is the final output result, which represents the value of the predicted future 360 data points. To prevent overfitting, a dropout layer was added between the first layer and the hidden layer for regularization. After many tests in this experiment, it was concluded that when the dropout is 0.3, the training set had the highest accuracy.

Compared with the three-layer LSTM network, a hidden layer is added to the four-layer LSTM network structure. The hidden layer uses the results of the first layer as the input for training and transmits its output to the next hidden layer. The number of neurons is the same as that of the first layer. The dropout = 0.3 layer is added in both the first and second layers to prevent overfitting.

(2) Network compilation. LSTM network compilation uses the adaptive moment estimation (Adam) algorithm as the optimizer and the mean square error loss function as the objective function.

(3) Network fitness. The LSTM network was trained on 1440 samples and 360 samples were used for testing. The number of iterations epochs equals 50, look_back is made of 1, 5, and 10, respectively, and batch_size equals 128.

(4) Network evaluation. When look_back takes 1, 5, and 10, respectively, and the number of hidden layers (LN) is 1 and 2, respectively, the loss data of the model training process is shown in Figure 6.

(5) Network traffic forecast. 360 test data were predicted, and the first 100 predicted results were shown in Figure 7. TestOriginal_result represents the original data, and testPredict_result_101, testPredict_result_105, and testpredict_110, respectively, represent the prediction results when LN = 1, look_back takes 1, 5, and 10, respectively. TestPredict_result_201, testPredict_result_205, and testPredict_210, respectively, represent the prediction results when LN = 2, and look_back takes 1, 5, and 10, respectively.

(6) Evaluate the prediction error of the model. The LSTM model under different parameter combinations was executed for network traffic prediction, and the indexes of RMSE, MAE, and MAPE for each validation set were calculated. The results were shown in Table 1.

It can be seen from Table 1 that the prediction error of the model changes with the look_back increases, and the prediction error of single-layer LSTM and double-layer LSTM is different under the same look_back. Based on the above experiments, four groups of experiments were added, namely, when look_back takes 15 and 20, and multiple predictions were made in the case of single-layer network and double-layer network, respectively, and corresponding error values were calculated. The test results are shown in Figure 8.

It can be seen from Figure 8 that the setting of the number of hidden layers and the time step has a great impact on the fitting effect of LSTM. When a hidden layer is added, the prediction error changes, and the increase or decrease of the prediction error is not fixed at different timesteps. When the time step is changed, that is, the look_back value is changed from small to large, and the trend of prediction error is also not fixed. For example, when the look_back value changes from 5 to 10, the prediction error of the single-layer LSTM model decreases, while the prediction error of the double-layer LSTM model increases.

Therefore, for network traffic data, the prediction effect of the parameter combination set by the empirical method is unstable and cannot achieve the optimal prediction performance. Therefore, the Improved Particle Swarm Optimization (IPSO) will be adopted to carry out model optimization, that is, the intelligent algorithm will be used to efficiently obtain the parameter combination with the optimal prediction effect.

5.2.3. Parameter Optimization of LSTM Network Traffic Prediction Model Based on IPSO

The IPSO algorithm was used to optimize the LSTM network traffic prediction model, and parameters were optimized for single-layer LSTM and double-layer LSTM, respectively. The fitness value of the LSTM prediction model changed as the number of iterations increased during the experiment, as shown in Figure 9.

In Figure 9, fitness12, fitness23, and fitness22 correspond to the fitness values of the model IPSO-LSTM12 (2 parameters node1, lookback of the single layer), IPSO-LSTM23 (3 parameters node1, node2, lookback of the double layer), and IPSO-LSTM32 (2 parameters node1, lookback of double layer), respectively. The second parameter of IPSO-LSTM22 is set as node2 = 4 according to the optimization results of LSTM23.

It can be seen from Figure 9 that the final convergence value of fitness12 is less than fitness23 and fitness22, the convergence rate is faster than fitness23, and the fitness22 final convergence value is only slightly smaller than the fitness of 23. This shows that for the long-term prediction of network traffic data if the fitness value from a single hidden layer LSTM optimized by the particle swarm algorithm is slightly smaller than that from a two-layer hidden layer LSTM optimized by the particle swarm algorithm, convergence speed is faster.

It can be seen that compared with the empirical method of setting LSTM parameters, the RMSE of the IPSO for setting LSTM parameters is reduced by 20%, which means that the IPSO algorithm can effectively find the optimal parameter combination of LSTM network traffic prediction and reduce the prediction error.

In addition, Figure 10 shows the changes in node number and time step size during the IPSO-LSTM12 model optimization that shows the process of the optimal parameter value of the LSTM network traffic model determined by the improved PSO algorithm.

It can be seen from Figure 10 that the optimal parameters of the LSTM12 model are set as node1 = 8 and look back = 1. Therefore, in the prediction of network traffic data used in this paper, the optimal configuration of the single-layer LSTM model is to set the number of neurons to 8 and the time step to 1.

The changes in node number and time step size in IPSO-LSTM23 model optimization are shown in Figure 11.

It can be seen from Figure 11 that the optimal parameters of LSTM23 model are set as node1 = 16, node2 = 4, and look back = 1. Therefore, in the prediction of network traffic data used in this paper, the optimal configuration of the two-layer LSTM model is to set the number of network neurons in the first layer to 16, the number of neurons in the second layer to 4, and the time step length to 1.

To evaluate the prediction performance of the LSTM model after parameter optimization by IPSO, network traffic data samples at 180 time points are used for verification. In this paper, the IPSO optimized single-layer LSTM IPSO—LSTM12, double parameter optimization model IPSO - LSTM22 of double-layer LSTM, three parameters optimization model IPSO LSTM23-1 (no dropout in training) of double-layer LSTM, three-parameter optimization model of IPSO LSTM23-2 (dropout in training) of double-layer LSTM are compared, and Figure 12 shows the model prediction results for the last 180 test data.

It can be seen from Figure 12 that the prediction results of the LSTM model with different parameter combinations have a good fitting effect, and the prediction results of the single-layer LSTM dual-parameter optimization model IPSO-LSTM12 are better than those of other parameter configuration models. To compare the predictive performance of the four models more clearly, the predictive performance evaluation index values of the four models in Figure 12 are obtained, respectively, and the results are shown in Table 2.

As it can be seen from Table 2, compared with single-layer LSTM12, two-layer LSTM22 has slightly fewer prediction errors in RMSE and MAE, while MAPE is slightly bigger. If only RMSE or MAE evaluation index is considered, LSTM22 is better than LSTM12, while only MAPE evaluation indicators are considered, LSTM12 is considered better than LSTM22. On the whole, the prediction error of LSTM12 and LSTM22 is less than that of the other three prediction models, that is, the prediction effect of LSTM12 and LSTM22 on network traffic data is better than that of the other three models. The prediction error of LSTM23-2 is less than that of LSTM23-1, which indicates that the optimization of dropout parameters added in the training reduces the prediction error of the model and improves the prediction performance of the model.

5.2.4. Network Traffic Prediction Based on CEEMDAN-IPSO-LSTM

Through testing on a 500-time data set, the predictive performance of each IMF is shown in Figure 13. Figure 13 shows the prediction results and training loss of eight IMFs and it has a better prediction effect. IMF0 and IMF7 are a little poor, in which the loss of the training set is very high during the whole training process. Especially, the loss of IMF0 is relatively large. For the remaining IMFs, LSTM predicts them well. Despite this problem, the overall results were excellent when the predictions were integrated.

After finishing predicting all IMFs, the final prediction result is integrated by superimposing the predicted results of each IMF. Figure 14 shows the forecasting flowchart of CEEMDAN-IPSO-LSTM.

5.3. Result Analysis

To evaluate the prediction effect of the proposed hybrid method CEEMDAN-IPSO-LSTM, it is compared with other neural network prediction methods like CEEMDAN-LSTM, IPSO-LSTM, and LSTM, and other predictive models like ARIMA, Support Vector Regression (SVR), Decision Tree Regressor (DTR), and Multivariate Linear Regression (MLR). Similarly, the network traffic data samples at 180-time points were used for verification, and the prediction results of the eight models are shown in Figure 15.

Figure 15 shows that the prediction effects of different models and the hybrid prediction model have a better fitting effect which indicates that the prediction results of the CEEMDAN-IPSO-LSTM model are better than those of other models. To compare the prediction performance of the eight models more clearly, their predictive performance evaluation index values were obtained, respectively, and shown in Table 3.

It can be seen from Table 3, that the prediction errors of the LSTM-based model are all less than regression prediction models, which indicates that the LSTM network traffic prediction model has a better prediction effect than other regression network traffic prediction models. In other words, the LSTM is more suitable for solving long-term network traffic data prediction and processing real-time variability of network traffic data. In addition, the RMSE, MAE, and MAPE index values of the CEEMDAN-IPSO-LSTM prediction model are all smaller than other neural network prediction models, indicating that the proposed hybrid model CEEMDAN-IPSO-LSTM is better than other prediction models in network traffic prediction.

Besides, we make comparisons of decomposition methods like EMD, EEMD, and CEEMDAN. Firstly, based on a 500-time network flow data, we decompose the original data into several IMFs and compare the decomposition results of three decomposition methods. Then, we make predictions by LSTM methods combining the three decomposition methods to explain which method works better.

In Figure 16, there are seven IMFs of EMD, eight of EEMD, and eight of CEEMDAN, including residue. We only know that different decomposition results make the prediction accuracy different, but it is hard to see which one produces the better prediction. So, we make predictions by LSTM combining the three decomposition methods and the results are in Figure 17.

Figure 17 shows the red lines fit the raw data more closely, which shows that the predicted result of CEEMDAN-LSTM is closer to the real value. To further verify the effect of different decomposition methods, Table 4 gives the prediction error of CEEDAN-LSTM, EEMD-LSTM, and EMD-LSTM.

In Table 4, the prediction error of CEEMDAN-LSTM is significantly less than the other two methods, which indicates CEEMDAN can decompose data more effectively so that LSTM can predict better. That is to say, the results verify the superiority of CEEMDAN for data decomposition.

Also, based on the same 500-time network flow data, we compare CEEMDAN-IPSO-LSTM with s three state-of-the-art prediction models to verify the effectiveness of the proposed network traffic prediction model, like ST-LSTM, SA-ARIMA-BPNN, and INGARCH. The last 100-time prediction data of the four methods are in Figure 18.

In Figure 18, the four prediction methods do a good job of forecasting network traffic. Figure 18 shows that the purple and green lines match the raw data represented by the blue lines better, which demonstrates that the proposed method and the SA-ARIMA-LSTM make more effective predictions close to reality. To compare the prediction accuracy of the four methods more clearly, the prediction error of the four methods is calculated similarly and shown in Table 5.

In the same appearance as Figure 18, CEEMDAN-LSTM has the lowest prediction error. The appearances of Figure 18 and Table 5 prove the superiority of the CEEMDAN-IPSO-LSTM in this paper once again.

Above all, the CEEMDAN-IPSO-LSTM has a better prediction effect and higher reliability for the future prediction of network traffic.

6. Conclusion and Future Work

Network traffic prediction can be applied to network resource optimization and network congestion avoidance, which makes great significance for network business planning, data management, fault detection, resource allocation, and other operations. In this paper, a hybrid deep interval prediction model has been proposed for network traffic forecasting to improve the prediction accuracy. Firstly, the nonparametric LSTM neural network is used to establish the network traffic prediction model, and the Improved Particle Swarm Optimization algorithm is used to optimize the hyperparameters of the established LSTM prediction model, and further obtain the optimized LSTM network prediction model–IPSO-LSTM12,IPSO-LSTM23 and IPSO-LSTM32–which reduces the RMSE by 20% compared to the Experience-based LSTM. Besides, the prediction performance of single-layer LSTM is better than double-layer LSTM in network traffic prediction. Then the CEEMDAN is introduced to decompose the network traffic time series into different modes to reduce the complexity of the network traffic sequence. To verify the effectiveness of the proposed models, the proposed CEEMDAN-IPSO-LSTM model is applied to network traffic prediction and compared with other neural network prediction methods and regression methods. The experimental results show that compared with other prediction models and the traditional LSTM model, the CEEMDAN-IPSO-LSTM model reduces the prediction error and obtains a better fitting effect, which demonstrates that the proposed hybrid method improves network traffic prediction accuracy.

In future work, we plan to enhance the prediction model from two aspects to further improve the prediction accuracy of network traffic. On the one hand, in the data preprocessing stage, we will try other data decomposition methods, such as Variational Mode Decomposition (VMD), wavelet packet, and combination method, to improve the stability and regularity of network traffic data decomposition. On the other hand, we will focus more on the error correction strategy of the hybrid model of network traffic forecasting, such as analysis of different error correction strategies, or re-decompose the IMF data, to enhance the prediction performance.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest.

Acknowledgments

This work was funded by the National Natural Science Foundation of China under Grant No. 62072363, the National Natural Science Foundation of China under Grant No. 61672416, Industrialization projects of the Education Department of Shaanxi Province under Grant No. 21JC017, Industrial research project of Science and Technology Department of Shaanxi Province under Grant No. 2020GY-012, and Science and Technology Plan Project of Yulin under Grant No. 2019-175.