Advances in Machine Learning for CybersecurityView this Special Issue
A Novel Hybrid Network Traffic Prediction Approach Based on Support Vector Machines
Network traffic prediction performs a main function in characterizing network community performance. An approach which could appropriately seize the salient characteristics of the network visitors could be very useful for network analysis and simulation. Network traffic prediction methods could be divided into two classes: one is the single models and the opposite is the hybrid fashions. The hybrid models integrate the merits of several single models and consequently can enhance the network traffic prediction accuracy. In this paper, a new hybrid network traffic prediction method (EPSVM) primarily based on Empirical Mode Decomposition (EMD), Particle Swarm Optimization (PSO), and Support Vector Machines (SVM) is presented. The EPSVM first utilizes EMD to eliminate the impact of noise signals. Then, SVM is applied to model training and fitting, and the parameters of SVM are optimized by PSO. The effectiveness of the presented method is examined by evaluating it with different methods, including basic SVM (BSVM), Empirical Mode Decomposition processed by SVM (ESVM), and SVM optimized by Particle Swarm Optimization (PSVM). Case studies have demonstrated that EPSVM performed better than the other three network traffic prediction models.
It is generally known that network traffic prediction can provide a variety of practical information for Internet organizations, for example, about travelling, rental company, and smart search. Network traffic prediction is a procedure whereby a webmaster catches the network traffic and inspects it closely to discover what is going to happen in the follow-up and coming period on the network. It can assist each webmaster by establishing reasonable network planning and controlling the network traffic congestion effectively . Precise network traffic prediction can thoroughly catch the notable attributes of the traffic, and thus it plays a vital role in network traffic analysis and simulation and offers assistance to customers to understand the network dynamics. So, in recent years, to enhance the network traffic prediction precision, researchers in China and abroad have proposed numerous network traffic prediction methods.
In general, network traffic prediction methods can be divided into two categories: one is the single models and the other is the combination, i.e., hybrid model which integrates the merits of several single models . Dickinson  has demonstrated that the combined and hybrid models can get better forecasting result than that of individual models. Besides, the topology and geometry of network are always very complex, which often influence the network traffic prediction accuracy. Demonstration of network traffic complexity shows up in numerous circumstances, for instance, the long-area connections and self-resemblance were found in a statistical analysis of traffic estimations. The complexity indicated from the traffic estimations has prompted the development of network traffic prediction, which suggests that a single model cannot yield satisfactory prediction result [4–6]. The main reason behind this is that network traffic displays numerous characteristics, such as trend, cycle time, self-resemblance, and long-area dependence. Network traffic prediction with a single model cannot capture all the characteristics mentioned above. But a combination model can not only capture the linear characteristics but also the nonlinear characteristics of the NTD (NTD). Therefore, the combination model is applied in this paper.
Over the past few decades, scientists over China and abroad have presented a lot of strategies to predict network traffic in diverse areas [7, 8]. Among them, some were more inclined to improve the existing models. For example, the literature  prolonged the notion of the broadly stated and used the fractional Brownian traffic model. Qing-Fang et al.  used a BIC-based totally neighboring factor choice approach to select the quantity of the nearest neighboring factors for the nearby Support Vector Machines. And, with the intention to obtain quicker convergence in the training of BiLinear Recurrent Neural Network (BLRNN), the literature  applied two procedures to the network. Other experts preferred a combination of the existing models. For example, the literature  developed a novel combined model to predict the network traffic in the National Taitung University and Shu-Te University. Chen et al.  evolved a new bendy neural tree structure which used Gene Programming, and the parameters are optimized through the Particle Swarm Optimization algorithm. The literature  presented a novel approach, which integrated wavelet transform, the grey theory, and the chaos theory; the numerical experiment demonstrated that the proposed model can get better prediction results. Recent papers about network traffic prediction methods can be seen in literatures [14, 15].
Despite the fact that the previous mentioned approaches can produce an adequately precise prediction result for various cases, they, in general, have focused on the precision evaluation of the approaches obtained without noting the internal characteristics of the network traffic data (NTD). In truth, NTD are normally influenced by means of risky factors, therefore inflicting noise indicators that can increase the difficulty of forecasting. So in this paper, the EMD is first applied to eliminate the noise signals before applying SVM to predict the network traffic. Besides, the parameters in SVM are optimized by PSO. Therefore, the presented method integrates the EMD, PSO, and SVM, hence its abbreviated name is EPSVM. In order to examine the effectiveness of EPSVM, we contrast it with three other approaches, namely, (1) the original NTD directly processed by SVM (the method is named as BSVM), (2) NTD processed by EMD and then using SVM to model the denoised data (the method is named as ESVM), and (3) the original NTD directly processed by the SVM, whose parameters are optimized by PSO (the method is named as PSVM). Besides, it is noteworthy that the NTD are gathered from the Network Center of Lanzhou University.
The rest of this paper is presented as follows. The theoretical background of EMD, PSO, and SVM models is specified in Section 2. In Section 3, the presented approach is introduced. Section 4 illustrates the experimental results. At last, Section 5 concludes this paper.
2. Theoretical Background of EMD, PSO, and SVM Models
In this subsection, the theories related to the proposed method (EPSVM) are introduced, and they are EMD, PSO, and SVM.
2.1. Empirical Mode Decomposition
Empirical Mode Decomposition (EMD) is a nonlinear signal processing method developed by Huang et al. . It can decompose a signal into a sum of functions and intrinsic mode functions (IMFs). These IMFs must satisfy two conditions: (1) the number of extrema and the number of zero-crossings either are equal or differing at most by one; (2) the mean value of the envelope defined by the local maxima and the local minima is zero at all points. According to [16–18], any signal can then be disintegrated:(1)Identify all the local extrema, and then connect all the local maxima with a cubic spline line as the upper envelope.(2)Repeat the procedure for the local minima to produce the lower envelope. The upper and lower envelopes should cover all the data between them.(3)The mean of the upper and lower envelopes is designated as , and the difference between the signal and is the first component :
Ideally, if satisfies the definition of an IMF, then it is the first IMF.(4)If is not an IMF, is treated as the original signal, and by repeating processes (1), (2), and (3), is acquired. After repeating the sifting process up to times, becomes an IMF, i.e.,
Then, it is designated as
The first IMF component from the original data should contain the finest scale or the shortest period component of the signal.(5)Separating from the original signal , we could get . By repeating the above process several times, the result was . Then, , are the IMFs that were obtained.
2.2. Particle Swarm Optimization
Particle Swarm Optimization (PSO) is one of the recent meta-heuristic technologies proposed by Kennedy and Eberhart  in view of the natural flocking and swarming behaviors of birds and insects. Consider an optimization problem of variables. A swarm comprises of particles flying in a dimensional search space. Let denote a particle’s position and denote the particle’s flight velocity over a solution space. Each individual in the swarm is scored utilizing a scoring function that obtains a fitness value representing how good it settles the issue. The best previous position of a particle is . The index of the best particle among all particles in the swarm is . Each particle records its own personal best position () and knows the best positions found by all particles in the swarm (). Then, the best position of particle could be calculated :where is the inertia weight factor, and are two independent randomly distributed variables with the range of [0, 1], and and are two positive constants called acceleration coefficients.
2.3. Support Vector Machines
Support Vector Machine (SVM)  is a set of classification and regression techniques, designed to systematically optimize its structure based on the input training data. More details about SVM can be seen in the literatures [22–24].
Given the training data , where denotes the space of the input patterns and is the associated output values of . In -SVR, our goal is to produce a function based on the training data set to approximate the unknown function . By introducing different constraints for violating a “tube” constraint from above and from below, we arrive at the formulation stated in Vapnik’s article  for -SVR:where denotes the number of samples, whereas and are the allowed error “above” and “below” the training error subject to -insensitive tube and is the regularization term. The empirically selected constant determines the tradeoff between these two terms.
To preserve the sparse property of the solution, Vapnik used the -insensitive loss function described by
Instead of minimizing the observed training error, -SVR attempts to minimize the generalization error bound so as to achieve generalized performance, and this makes -SVR extremely robust to outliers. Finally, we get the following explicit form by introducing Lagrange multipliers, the Kernel trick, and employing the optimality constraints:
3. The Proposed Method
The proposed method (EPSVM) first uses EMD to eliminate the noise signal, then the data after EMD procedure are put into the SVM, and the parameters of SVM are optimized by PSO. So, in this subsection, the theory of SVM optimized by PSO is introduced in Section 3.1. And then, the specific prediction procedure of EPSVM is presented in Section 3.2.
3.1. SVM Optimized by PSO
The parameters of SVM have an extraordinary effect on the forecasting precision, and it is very important to optimize the two parameters in the forecasting procedure. So, PSO is utilized to optimize the parameters in SVM (which is named as PSVM). The detailed process of PSVM is depicted in Figure 1 which includes the following five steps:(1)Initialization: the quantity of the population is initialized, and the preliminary position and velocity of each particle are randomly allocated.(2)Fitness assessment: for each particle, its fitness is assessed, and the fitness function is calculated as the subsequent:where and stand for the actual and forecast values, respectively.(3)Update and according to the fitness function results.(4)Update the velocity of each particle according to Equation (4) and the position of each particle using Equation (5).(5)Termination: the velocity and position of the particle are updated until the stop conditions are met.
3.2. The Specific Process of the Proposed Method
To address the issues of noise signals caused by many uncertainties, EMD is first applied to remove the noise section of the NTD, which has many merits as discussed in Section 2.1. After the processing of the EMD, ESPSVM applied the PSVM described aforementioned on the processed NTD series to get the final results. The detailed procedure of the EPSVM is depicted in Figure 2:(1)Noise reduction: utilize EMD to remove the noise section of the original data.(2)Put the data processed by the first step to PSVM model, and the final forecasting result can be obtained.
4. Experimental Results and Discussion
4.1. Criteria for Measuring Accuracy
In time series prediction, we always enquire what criterion may be used to correctly measure the accuracy of the anticipated outcomes. Performance evaluation of time series prediction is in fact tremendously depending on what sorts of the standards are chosen to measure the accuracy of predicted outcomes. In an effort to justify the affordable accuracy for a time series forecast, three famous criteria  are selected for measuring the prediction accuracy. The selected standards are expressed as follows:where is the number of periods in forecasting and and are the actual value and forecasted value.
4.2. Network Traffic Data (NTD)
The presented EPSVM approach is evaluated by the real NTD in the Network Center of Lanzhou University (LZU). These data are gathered every 5 minutes, so each hour has 12 NTD, and one day amounts to 288 (12 ∗ 24) NTD. In addition, the NTD on workdays and nonworkdays (7 days) are all applied to examine the effectiveness and feasibility of the proposed EPSVM method. Therefore, the total NTD used in this paper is 2016 (288 ∗ 7). To guess the network traffic fluctuations instantly and at the same time expedite website tracking, two types NTD, namely, inflow and outflow NTD are applied. As a result, the prediction process contains two procedures: one being the inflow NTD prediction and the other the outflow NTD prediction. Figures 3 and 4 show the 2016 NTD which contains inflow NTD and outflow NTD, respectively.
From Figures 3 and 4, it can be concluded that the NTD used in this paper were divided into seven groups. Each group has 288 NTD, which means that each group represents one day. In the 288 data, the data from 12 p.m. to 7 a.m. (15 hours) are used for model training and fitting, and the trained as well as the fitted model is adopted to predict the NTD from 8 p.m. to 9 p.m.
4.3. Prediction Results
As discussed above, the EPSVM approach initially implemented the Empirical Mode Decomposition to put-off the noise interference from the authentic data. After the noise removal process of the authentic NTD, the records are named as denoised NTD. Figure 5 shows the noise putting-off technique of the inflow data and outflow data. It is worth noting that all of the NTD used was processed by EMD so as to observe its effect.
Through comparison of the authentic NTD with the denoised information from Figures 5 and 6, it could be seen that the denoised statistics becomes a little smoother. So, instead of the usage of the authentic series, the proposed approach EPSVM used the denoised information to model training and fitting. After acquiring the denoised data, EPSVM used the SVM version optimized with the PSO to predict further. Here, each institution of records became normalized, and every normalized record group is divided into training sets and testing units, where the training sets had been the NTD from 12 p.m. to 7 a.m. (15 hours) every day, and the testing units were the NTD from 8 p.m. to 9 p.m. The EPSVM obtained the final seven days prediction results by predicting the 24 values of one day and forecasting them for the other six days in the week.
As for the other three methods, their common characteristics are that they all used SVM to model training. The difference is that the ESVM simulated the denoised data, BSVM and PSVM simulated the original data directly, but the parameters of PSVM are optimized by PSO. Figures 6 and 7 show the final inflow NTD and outflow NTD prediction results for each day by these four approaches, respectively.
Figures 7 and 8 just roughly show a contrast between the predicted results of the four methods and the original NTD to confirm that the EPSVM can perform better than the other three approaches. On the basis of the three evaluation metrics (RMSE, MAE, and MAPE) calculated in Section 4.1, the three criteria for measuring the accuracy of the four methods were computed in this section and are recorded in Tables 1 and 2.
Tables 1 and 2 show the three criteria results of the four approaches (BSVM, PSVM, ESVM, and EPSVM) on each day and the average values of the three criteria. Because of that, the results of the three metrics of the four forecasting methods in Table 1 are not the same as that in Table 2; we discuss the two tables separately. From Table 1, the subsequent outcomes occurred:
A comparison between BSVM and PSVM: Table 1 shows that if the three evaluation metrics of the seven days are all taken into consideration, PSVM had expected lower values than BSVM apart from Wednesday, Friday, and Saturday. There are only three days in the week; BSVM has lower values than PSVM; generally speaking, if we just compare the average values, it can be seen that PSVM performs better than BSVM.
A comparison between PSVM and ESVM: Table 1 shows that if the three metrics of each day are taken into consideration, the value of ESVM is lower values than PSVM for almost every day of the week apart from Sunday. And, in comparison with the average value of one week, it can be found that ESVM also had lower values than PSVM.
To sum up, ESVM performs better than PSVM, and PSVM was better than BSVM for most days of the week. If the advantages of PSVM and ESVM are assembled, the result should be superb. So, the proposed approach EPSVM which combined PSVM and ESVM could get better forecasting results. Table 1 shows that, if we compare the three metrics of each day, EPSVM had expected lower values than the alternative three methods for most days, apart from the fact that three evaluation metrics performances of EPSVM are higher than those of PSVM on Sunday and higher than those of EVM on Tuesday. In addition, if we compare the average values of EPSVM with other three alternative methods, EPSVM also has decrease values than the other three techniques.
From Table 2, the subsequent outcomes occurred:
A comparison between BSVM and PSVM: by observing Table 2, we see that if the three metrics of each day are all taken into consideration, PSVM had expected lower values than BSVM apart from Monday and Sunday. There are only two days in the week; BSVM has lower values than PSVM; generally speaking, if we just compare the average values, it can be seen that PSVM performs better than BSVM.
A comparison between PSVM and ESVM: by observing Table 2, we can see that if the three metrics of each day are considered, ESVM had expected lower values than PSVM apart from Saturday and Sunday. There are only two days in the week; PSVM has lower values than ESVM; generally speaking, if we just compare the average values, it can be seen that ESVM performs better than PSVM.
All in all, from the above statements deduced from Tables 1 and 2, the following conclusion can be reached: the proposed method EPSVM obviously performs better than the other three methods, and these four methods all have an acceptable performance for each day of the week.
Network traffic prediction offers useful data for website administrators to customize the records that are hosted on Internet servers with a view to reach a bigger target market. In an effort to decorate the functionality of real-time network visitor’s analysis, it is very vital to expand a rather correct network visitor’s prediction technique to help the webmaster control the bandwidth allocation effectively. In view of this, an artificial intelligence-based hybrid method EPSVM is presented in this article. EPSVM first uses EMD to process the original NTD so as to remove the noise part of the NTD. Then, it employs SVM to model the denoised network traffic series. Here, the parameters of SVM are optimized by PSO. Experiments with the NTD from LZU network center obviously verify that EPSVM significantly can enhance network traffic prediction accuracy. As part of real-time and reliable analysis of smart grids, EPSVM will help the webmaster better monitor the websites or, in other words, help network engineers optimize their websites, maximize online marketing, track user behavior, and push ads to users.
The data used to support the findings of this study were supplied by Lanzhou University under license and so cannot be made freely available. Requests for access to these data should be made to the corresponding author.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
The authors would like to thank the Natural Science Foundation of China (61672469, 61472370, and 61822701), the Key Science and Technology Foundation of Gansu Province (1102FKDA010), and the Science and Technology Support Program of Gansu Province (1104GKCA037) for supporting this research.
B. R. Chang and H. F. Tsai, “Improving network traffic analysis by foreseeing data-packet-flow with hybrid fuzzy-based model prediction,” Expert Systems with Applications, vol. 36, no. 3, pp. 6960–6965, 2009.View at: Publisher Site | Google Scholar
J. M. Bates and C. W. J. Granger, “The combination of forecasts,” Journal of the Operational Research Society, vol. 20, no. 4, pp. 451–468, 1969.View at: Publisher Site | Google Scholar
J. P. Dickinson, “Some comments on the combination of forecasts,” Operational Research Quarterly, vol. 26, no. 1, pp. 205–210, 1975.View at: Publisher Site | Google Scholar
G. Orosz, B. Krauskopf, and R. E. Wilson, “Bifurcations and multiple traffic jams in a car-following model with reaction-time delay,” Physica D: Nonlinear Phenomena, vol. 211, no. 3-4, pp. 277–293, 2005.View at: Publisher Site | Google Scholar
G. Orosz, R. E. Wilson, and B. Krauskopf, “Global bifurcation investigation of an optimal velocity traffic model with driver reaction time,” Physical Review E, vol. 70, no. 2, Article ID 026207, 2004.View at: Publisher Site | Google Scholar
I. Gasser, G. Sirito, and B. Werner, “Bifurcation analysis of a class of “car following” traffic models,” Physica D: Nonlinear Phenomena, vol. 197, no. 3-4, pp. 222–241, 2004.View at: Publisher Site | Google Scholar
X. Wang, C. Zhang, and S. Zhang, “Modified elman neural network and its application to network traffic prediction,” in Proceedings of 2012 IEEE 2nd International Conference on Cloud Computing and Intelligence Systems, vol. 2, pp. 629–633, IEEE, Hangzhou, China, October-November 2012.View at: Google Scholar
W. Pan, Z. Shun-Yi, and C. Xue-Jiao, “SFARIMA: a new network traffic prediction algorithm,” in Proceedings of 2009 First International Conference on Information Science and Engineering, pp. 1859–1863, IEEE, Nanjing, China, 2009.View at: Google Scholar
F. H. T. Vieira, G. R. Bianchi, and L. L. Lee, “A network traffic prediction approach based on multifractal modeling,” Journal of High Speed Networks, vol. 17, no. 2, pp. 83–96, 2010.View at: Google Scholar
M. Qing-Fang, C. Yue-Hui, and P. Yu-Hua, “Small-time scale network traffic prediction based on a local support vector machine regression model,” Chinese Physics B, vol. 18, no. 6, pp. 2194–2199, 2009.View at: Publisher Site | Google Scholar
D. C. Park, “Structure optimization of BiLinear recurrent neural networks and its application to ethernet network traffic prediction,” Information Sciences, vol. 237, pp. 18–28, 2013.View at: Publisher Site | Google Scholar
Y. Chen, B. Yang, and Q. Meng, “Small-time scale network traffic prediction based on flexible neural tree,” Applied Soft Computing, vol. 12, no. 1, pp. 274–279, 2012.View at: Publisher Site | Google Scholar
S. Han-Lin, J. Yue-Hui, C. Yi-Dong, and C. Shi-Duan, “Network traffic prediction by a wavelet-based combined model,” Chinese Physics B, vol. 18, no. 11, pp. 4760–4768, 2009.View at: Publisher Site | Google Scholar
L. Nie, X. Wang, L. Wan et al., “Network traffic prediction based on deep belief network and spatiotemporal compressive sensing in wireless mesh backbone networks,” Wireless Communications and Mobile Computing, vol. 2018, Article ID 1260860, 10 pages, 2018.View at: Publisher Site | Google Scholar
N. E. Huang, Z. Shen, S. R. Long et al., “The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis,” Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences, vol. 454, no. 1971, pp. 903–995, 1998.View at: Publisher Site | Google Scholar
L. Qian, G. Xu, W. Tian, and J. Wang, “A novel hybrid EMD-based drift denoising method for a dynamically tuned gyroscope (DTG),” Measurement, vol. 42, no. 6, pp. 927–932, 2009.View at: Publisher Site | Google Scholar
C. Junsheng, Y. Dejie, and Y. Yu, “A fault diagnosis approach for roller bearings based on EMD method and AR model,” Mechanical Systems and Signal Processing, vol. 20, no. 2, pp. 350–362, 2006.View at: Publisher Site | Google Scholar
L. Yu, S. Wang, and K. K. Lai, “Forecasting crude oil price with an EMD-based neural network ensemble learning paradigm,” Energy Economics, vol. 30, no. 5, pp. 2623–2635, 2008.View at: Publisher Site | Google Scholar
R. C. Eberhart and J. Kennedy, “A new optimizer using particle swarm theory,” in Proceedings of the Sixth International Symposium on Micro Machine and Human Science, vol. 1, pp. 39–43, Nagoya, Japan, October 1995.View at: Google Scholar
M. S. Kıran, E. Özceylan, M. Gündüz, and T. Paksoy, “A novel hybrid approach based on particle swarm optimization and ant colony algorithm to forecast energy demand of Turkey,” Energy Conversion and Management, vol. 53, no. 1, pp. 75–83, 2012.View at: Publisher Site | Google Scholar
B. E. Boser, I. M. Guyon, and V. N. Vapnik, “A training algorithm for optimal margin classifiers,” in Proceedings of the Fifth Annual Workshop On Computational Learning Theory, pp. 144–152, ACM, Pittsburgh, PA, USA, July 1992.View at: Google Scholar
A. J. Smola and B. Schölkopf, “A tutorial on support vector regression,” Statistics and Computing, vol. 14, no. 3, pp. 199–222, 2004.View at: Publisher Site | Google Scholar
C. J. C. Burges, “A tutorial on support vector machines for pattern recognition,” Data Mining and Knowledge Discovery, vol. 2, no. 2, pp. 121–167, 1998.View at: Publisher Site | Google Scholar
N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods, Cambridge University Press, Cambridge, UK, 2000.
V. Vapnik, The Nature of Statistical Learning Theory, Springer, Berlin, Germany, 1995.
F. Diebold, Elements of Forecasting, Cengage Learning, Boston, MA, USA, 2006.