Research Article | Open Access
Usage of Modified Holt-Winters Method in the Anomaly Detection of Network Traffic: Case Studies
The traditional Holt-Winters method is used, among others, in behavioural analysis of network traffic for development of adaptive models for various types of traffic in sample computer networks. This paper is devoted to the application of extended versions of these models for development of predicted templates and intruder detection.
1. Intruder Detection Systems
Intruder Detection Systems (IDSs) are software or hardware solutions aimed at detection of intrusion attempts to a protected network or a host. This is done by monitoring network traffic, usage of the resources of a protected computer system or by the analysis of system logs in order to detect suspicious actions and then take appropriate actions, which in the majority of cases is the generation of an alert informing about the detected danger. In the literature, the following are usually distinguished: Intruder Detection Systems, Active Response Systems, and Intruder Protection Systems (IPSs).
The next generation of security devices is the so-called Unified Threat Management (UTMs), which integrate, apart from the traditional IPS, also mechanisms such as Gateway Antivirus, Gateway Antispam, Content Filtering, Parental Control, Load Balancing, Bandwidth Management, and On-Appliance reporting, while obviously not every UTM system must have all of the above mechanisms implemented.
Another type of specialized security solutions, which can be implemented in UTM systems or constitute standalone solutions, is Information Leak Prevention systems, also known as Data Loss Prevention, Data Leak Prevention (DLP), or Information Loss Prevention (ILP),
Book [1, page 179] presents a listing of Intruder Detection Systems and Intruder Protection Systems, which includes more than 60 systems. Issues relating to IDS are also presented in many other research works (see e.g., [2–4]).
Anomaly detection is one of the three groups of methods, including misuse detection systems and integrity verification, used in Intruder Detection Systems.
Misuse detection is the detection of specific behaviours which confirm that an attack occurred, whereas anomaly detection involves predictive pattern of behaviours, deviations from which instances of an attack on a protected system are considered. Misuse detection has, in the majority of cases, deterministic character (the rules matching the observed phenomena or action is found or not), and it is easier to algorithmize, whereas anomaly detection necessarily refers to uncertain observations and has to use statistical methods (statistical methods have been used in IDS systems since 1987, and the first IDS in which they were implemented was the “Haystack” project conducted in Los Alamos National Laboratory (see e.g., [5, page 432]).
Paper mentioned in  describes the application of the traditional Holt-Winters method in behavioral analysis of network traffic for development of adaptive models for various types of traffic in four sample computer networks. The next obvious step, after evaluation of the model, is the development of and predicted pattern and alert generation algorithm (see e.g., [5, 7, 8]).
2. Holt-Winters Model: Brutlag’s Anomaly Detection Algorithm
The Holt-Winters model, called also the triple exponential smoothing model, is a well-known adaptive model used to modeling time series characterized by trend and seasonality (The Holt model was formulated in 1957 and the Winters model in 1960. See [9, 10], a comprehensive review of the literature about this and other models based on exponential smoothing is given in ). In its additive version, it presents the smoothed variant of the time series as the sum of three constituents where is the value estimated by the model of the variable in moment , is the length of the seasonal periodicity, is the constituent smoothing out the level of the time series, represents the increase of the time series resulting from the trend, is the seasonal component of the time series, , , and are smoothing parameters, estimated for the particular time series, while is the real value of the variable in moment , and the parameters , , and belong to interval.
Estimation of model parameters is iterative, usually though minimization of arbitrarily selected measures of error (e.g., the Mean Squared Error of expired estimations or the sum of absolute values of the residuals of the model see e.g., [12, page 187], [13, page 226], [14, page 77], and [6, 15], [16, page 223]).
Holt-Winters method was used to detect network traffic anomalies as described in . In the paper concept of “confidence bands” was introduced. As described in the paper, confidence bands measure deviation for each time point in the seasonal cycle, and this mechanism bases on expected seasonal variability.
The estimated deviation of the real value of the dependent variable is where is the estimated deviation of the real value of the dependent variable in moment from the estimated value , where the value of parameter is the estimated value in the model described above in (4). The event when the real value of the dependent variable differs from the estimated value by more than multiplied by the scaling factor is considered an anomaly (an alert is triggered in the IDS system). In  the extension of the RRDtool is presented, covering real-time determination and marking of values and on the chart and generating information on occurring anomalies. The author assumed an arbitrary method of determining the initial values of parameters , , and as well as the iterative method of adapting only parameter (see: ), which, from a statistical point of view, may provoke doubts, as it leads to development of suboptimal models from the perspective of minimization of the value of any measure of error. Additionally, in the Brutlag method, the calculated value of the parameter for the purposes of determining the value above or below which anomalies will be reported is multiplied by intuitively selected scaling factor of value between 2 and 3, which makes the model even more arbitrary (see e.g., ).
Thirdly, an important feature of the Holt-Winters model is the assumption on single seasonality (periodicity) of the given series, while in the case of network traffic one could expect double seasonality: daily and weekly. Anyone intending to use the Holt-Winters model to develop an anomaly detection system needs to select which periodicity should be used in the model.
3. Adaptative Models with Double and Triple Seasonality (Taylor Models)
In,  a suggestion is made to extend the Holt-Winters method to cover series with double, while in  with triple seasonality. In  the Taylor model with double seasonality was used to modelling internet traffic. Obviously it is theoretically possible to develop analogous models for time series with multiple periodicity; however, issues are raised in the literature (see ) concerning the unstable behaviour of such models, as well as the doubtful impact of third and further seasonalities on the calculated value of the predicted variable. Similar reservations also apply to double-seasonal Taylor models, in which the duration of the first period is considerably longer than that of the second one.
Double-seasonal Holt-Winters-Taylor model (referred to as HWT2 in subsequent sections) is determined by the following equations: where is the length of the seasonal 1 (day) periodicity, is the length of the seasonal 2 (week) periodicity, is the constituent smoothing out the level of the series, corresponds to the increase of the series resulting from the trend, is a seasonal component of the series for seasonality 1 (day), and is a seasonal component of the series for seasonality 2 (week).
The initial values of components were arbitrarily set as
4. Application of Brutlag’s Anomaly Detection Algorithm in the HWT2 Model
In order to identify indications of anomalies in the modelled system, an analogous solution to the one presented in  can be used. In view of the double seasonality in the Taylor model, one might imagine two types of scatter permitted for the value of the predicted variable—one based on the parameter , and the other on The initial values of components were arbitrarily set as
One needs to remember that the parameters of the exponential smoothing models may be interpreted as a measure of the impact of the last measurement (parameters , , , and ) or earlier measurements (values , , , and ) on predicted values. Contrary to descriptive models, where the estimated values of parameters given the appropriate dependent variable has an intuitive meaning (in the case of single-equation additive model, the impact of the explanatory variable on the value of the dependent variable), and the criterion of minimizing the adopted measure of adjustment is decisive, the parameters of adaptive models of time series with exponential smoothing may be interpreted as a measure of smoothing—the greater the values of and , the greater the impact of values of the last measurements (i.e., measured correspondingly one day and one week earlier), the lower the values of the parameters, the better the model “remembers” the previous values (whose impact is weighted with and coefficients). One might, at least to a certain extent, that is if does not have too great an influence on the adopted measure of adjustment, decide to arbitrarily change the values of smoothing parameters, especially if the given series displays periodicity.
The existence of two types of permitted scatter results in the necessity of distinguishing between two types of alerts: the first one related to exceeding the thresholds determined by the parameters of daily seasonality and the second one—of weekly seasonality. As the thresholds may intertwine it is necessary to distinguish in the alerts (see Figure 1) all three possible events (“daily” threshold exceeded, “weekly” threshold exceeded, both thresholds exceeded).
5. Results and Conclusions
For the networks described in , the HWT2 model analysis was carried out for which the parameters were estimated through minimization of the expression where where is the length of the data series.
We decided to use MAE rather than mean squared error (MSE)-based measure because there were a lot of so-called outliers noted in the analysed samples and MSE-based measure that can be oversensitive in those cases (see, e.g., ).
In the referenced article network traffic for aggregated series was modelled (hourly data) and in the case of smaller-scale research series resulting from measurements of traffic every 10 minutes were used.
The obtained results and the number of alerts generated on exceeding the data threshold (“gamma” alert and “delta” alert) is presented in Table 1. For comparison, the table also contains the number of alerts obtained in modelling traffic using the classic Holt-Winters model (analogous to , this is with an iterative estimation of the value of parameters with minimization of MAE/M error, however, for 10-minute interval measurement, which is a departure from the referenced article).
|Source: own research.|
As compared to the traditional Holt-Winters model, the magnitude of error is virtually unchanged (the table presents results with accuracy down to one per cent. In reality, there were differences between models in adjustment measures, at no more than one thousandth percentage point, which in practice is insignificant) whereas in all cases the number of alerts generated for the same scaling factor has lowered. The potential application of the HWT2 model with the so-determined two types of thresholds may prove useful for reducing the number of false positives.
The application of the parameters and as weights determining the permissible scale makes the model have a relatively “short memory.” Therefore, if the present value of the periodical constituent is closely related to its previous value, the extent of permitted thresholds will be relatively high. In the case of models with “better memory,” they will be more sensitive to unitary changes in traffic. In the analyzed network traffic series, together with the high value of parameter , the usually estimated value of parameter was relatively low ( or less), the consequence of which was a relatively high number of alerts, the vast majority of which, as it would seem, would be deemed false positives. In the examined series, noted was greater sensitivity of the model to changes in the level parameter than the seasonality impact ( and ), that had relatively low values. One might thus consider, instead of increasing the scaling factor, to arbitrarily increase the value of these parameters to the maximum values which do not cause significant deterioration of the model-matching score (e.g., below 1 percentage point). As it would seem, this is an approach which is better substantiated from a statistical point of view than manipulating the value of the scaling factor (an interesting challenge of the method adopted in  for determination of the values not provoking alerts contained ).
6. Further Actions
Among the methods used in anomaly detection, the following may be mentioned:(i)entropy measurement—see, for example, [23, 24],(ii)the so-called correlation of packets—see [25, 26]—where algorithms of simulated annealing were used,(iii)principal components analysis—see, for example, [27, 28],(iv)support vector machines—see, for example, ,(v)adaptive threshold algorithm and the cumulative sum algorithm—see, for example, [29–31],(vi)data clustering—see also ,(vii)k-nearest neighbors method—see , decision trees—see, for example, ,(viii)artificial neural networks (ANNs)—see, for example, .(ix)distributed ANN—see, for example, [35, 36],(x)decision rule induction—see, for example, ,(xi)immune algorithms—see, for example ,(xii)genetic algorithms—see, for example, ,(xiii)fuzzy logic—see, for example [40, 41],(xiv)zero-one models—see  and so forth.
As presented in  the characteristics of various networks or even the various types of network traffic in the same network are very different. Therefore, even if one of the widely used models of traffic or methods for their creation finds even the slightest application in test trials, its research work is practically useful.
Presently, our works are carried out on implementing both models (traditional Winters and HWT2) in the Anomaly Detection preprocessor, referred to in articl .
- A. Fadia and M. Zacharia, “Network intrusion alert. An ethical hacking guide to intrusion detection,” in Proceedings of the Thomson Source Technology, Boston, Mass, USA, 2008.
- S. Sooyeon, K. Taekyoung, J. Gil-Yong, P. Youngman, and H. Rhy, “An experimental study of hierarchical intrusion detection for wireless industrial sensor networks,” IEEE Transactions on Industrial Informatics, vol. 6, no. 4, pp. 744–757, 2010.
- E. A. Patkowski, “Mechanizmy wykrywania anomalii jako element bezpieczeństwa,” Biuletyn Instytutu Automatyki i Robotyki nr 26/2009, Wydawnictwo Wojskowej Akademii Technicznej, Warsaw, Poland, 2009.
- F. Palmieri and U. Fiore, “Network anomaly detection through nonlinear analysis,” Computers and Security, vol. 29, no. 7, pp. 737–755, 2010.
- J. Pieprzyk, T. Hardjono, and J. Seberry, Teoria Bezpieczeństwa Systemów Komputerowych, Helion, 2005.
- M. Szmit and A. Szmit, “Use of holt-winters method in the analysis of network traffic: case study,” Communications in Computer and Information Science, vol. 160, pp. 224–231, 2011.
- L. Fillatre, D. Marakov, and S. Vaton, “Forecasting seasonal traffic flows,” in Proceedings of the Workshop on QoS and Traffic Control, Paris, France, December 2005.
- I. Klevecka, “Forecasting network traffic: a comparison of neural networks and linear models,” in Proceedings of the 9th International Conference “Reliability and Statistics in Transportation and Communication” (RelStat '09), Riga, Latvia, October 2009.
- P. Goodwin, “The holt-winters approach to exponential smoothing: 50 years old and going strong,” in Proceedings of the FORESIGHT Fall, pp. 30–34, 2010, http://www.forecasters.org/pdfs/foresight/free/Issue19_goodwin.pdf.
- B. Guzik, D. Appenzeller, and W. Jurek, Prognozowanie i Symulacje. Wybrane Zagadnienia, Wydawnictwo AE w Poznaniu, Poznań, Poland, 2004.
- E. S. Gardner, “Exponential smoothing: the state of the art-Part II,” International Journal of Forecasting, vol. 22, no. 4, pp. 637–666, 2006.
- J. Gajda, Prognozowanie i Symulacja a Decyzje Gospodarcze, C. H. Beck, Warsaw, Poland, 2001.
- A. Zeliaś, B. Pawełek, S. Wanat et al., Prognozowanie Ekonomiczne. Teoria, Przykłady, Zadania, Wydawnictwo Naukowe PWN, Warszawa, Poland, 2004.
- M. Cieślak, Ed., Prognozowanie Gospodarcze, Wydawnictwo AE Wrocław, 1998.
- P. J. Brockwell and R. A. Davis, Introduction to Time Series and Forecasting, Springer, New York, NY, USA, 2nd edition, 2002.
- R. J. Hyndman, A. B. Koehler, J. K. Ord, and R. D. Snyder, Forecasting with Exponential Smoothing: The State Space Approach, Springer, Berlin, Germany, 2008.
- J. D. Brutlag, “Aberrant behavior detection in time series for network monitoring,” in Proceedings of the 14th System Administration Conference, pp. 139–146, New Orleans, Fla, USA, 2000.
- E. Miller, “Holt-Winters Forecasting Applied to Poisson Processes in Real-Time,” August, 2010, http://www.scribd.com/doc/35521051/Miller-Automated-Error-Detection-in-Web-Production-Environment.
- J. W. Taylor, “Short-term electricity demand forecasting using double seasonal exponential smoothing,” Journal of Operational Research Society, vol. 54, pp. 799–805, 2003.
- J. W. Taylor, “Triple seasonal methods for short-term electricity demand forecasting,” European Journal of Operational Research, vol. 204, pp. 139–152, 2010.
- S. Gelper, R. Fried, and C. Croux, “Robust forecasting with exponential and holt-winters smoothing,” Journal of Forecasting, vol. 29, no. 3, pp. 285–300, 2010.
- R. Lawton, “On the Stability of the Double Seasonal Holt-Winters Method,” http://forecasters.org/submissions09/LawtonRichardISF2009.pdf.
- G. Nychis, V. Sekar, D. G. Andersen, H. Kim, and H. Zhang, “An empirical evaluation of entropy-based traffic anomaly detection,” in Proceedings of the Association for Computing Machinery (ACM '08), 2008.
- Y. Gu, A. McCallum, and D. Towsley, “Detecting anomalies in network traffic using maximum entropy estimation,” in Proceedings of the IMC Conference, http://conferences.sigcomm.org/imc/2005/papers/imc05efiles/gu/gu.pdf.
- SPADE 092200, http://rpmfind.net/linux/RPM/mandriva/9.2/i586/Mandrake/RPMS/snort-2.0.1-3mdk.i586.html.
- T. J. Kruk and J. Wrzesień, “Korelacja w wykrywaniu anomalii,” in Proceedings of the Materiały Konferencji CERT Secure, Warsaw, Poland, 2003.
- H. Ringberg, A. Soule, J. Rexford, and C. Diot, “Sensitivity of PCA for Traffic Anomaly Detection,” San Diego, Calif, USA, 2007, http://www.haakonringberg.com/work/papers/pca_tuning.pdf.
- A. Lakhina, M. Cronvella, and C. Diot, “Diagnosis network-wide traffic anomalies,” in Proceedings of the ACC SIGCOMM, February 2004, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.93.7011&rep=rep1&type=pdf.
- V. A. Siris and F. Papaglou, “Application of anomaly detection algorithms for detecting syn floodinfg attacks,” in Proceedings of the IEEE Global Telecommunications Conference, vol. 4, pp. 2050–2054, 2004.
- R. Mbabazi, Victim-based defense against ip packet flooding denial of service attacks, M.S. thesis, Makerere University, 2009.
- R. Blazek, H. Kim, B. Rozovskii, and A. Tartakovsky, “A novel approach to detection of “Denial-of-Service” attacks via adaptive sequential and batch-sequential change-point detection methods,” in Proceedings of the IEEE Systems, Man, and Cybernetics Information Assurance Workshop (West Point '01), June 2001.
- O. Siriporn and S. Benjawan, “Anomaly detection and characterization to classify traffic anomalies case study: TOT public company limited network,” Proceedings of World Academy of Science, Engineering and Technology, vol. 37, pp. 706–714, 2009.
- A. Sharma, A. K. Pujari, and K. K. Paliwal, “Intrusion detection using text processing techniques with a kernel based similarity measure,” Computers and Security, vol. 26, no. 7-8, pp. 488–495, 2007.
- S. O. Al-Mamory and H. Zhang, “New data mining technique to enhance IDS alarms quality,” Journal in Computer Virology, vol. 6, no. 1, pp. 43–55, 2010.
- D. Tian, Y. Liu, and Y. Xiang, “Large-scale network intrusion detection based on distributed learning algorithm,” International Journal of Information Security, vol. 8, no. 1, pp. 25–35, 2009.
- Snort+AI, http://snort-ai.sourceforge.net/.
- R. Cichocki, “Algorytmy indukcji reguł decyzyjnych w Systemach Wykrywania Intruzów,” in Proceedings of the XII Konferencja Sieci Komputerowe, Zakopane, Poland, 2005.
- D. Dasgupta, “Immunity-based intrusion detection system: a general framework,” in Proceedings of the 22nd National Information Systems Security Conference (NISSC '99), 1999.
- W. Li, “Using genetic algorithm for network intrusion detection,” in Proceedings of the United States Department of Energy Cyber Security Group 2004 Training Conference, Kansas City, Mo, USA, 2004.
- J. Luo, S. Bridges, and R. Vaughn, “Fuzzy frequent episodes for real time intrusion detection,” International Journal of Intelligent Systems, vol. 15, no. 8, pp. 687–704, 2000.
- S. Bridges and R. Vaughn, “Fuzzy data mining and genetic algorithms applied to intrusion detection,” in Proceedings of the National Information Systems Security Conference (NISSC '00), Baltimore, Md, USA, October 2000.
- M. Szmit, Využití nula-jedničkových modelů pro behaviorální analýzu sít’ového provozu, [w:] Internet, competitiveness and organizational security, Tomas Bata University Zlín, pp. 266–299, 2011.
Copyright © 2012 Maciej Szmit and Anna Szmit. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.