Abstract

The traditional Holt-Winters method is used, among others, in behavioural analysis of network traffic for development of adaptive models for various types of traffic in sample computer networks. This paper is devoted to the application of extended versions of these models for development of predicted templates and intruder detection.

1. Intruder Detection Systems

Intruder Detection Systems (IDSs) are software or hardware solutions aimed at detection of intrusion attempts to a protected network or a host. This is done by monitoring network traffic, usage of the resources of a protected computer system or by the analysis of system logs in order to detect suspicious actions and then take appropriate actions, which in the majority of cases is the generation of an alert informing about the detected danger. In the literature, the following are usually distinguished: Intruder Detection Systems, Active Response Systems, and Intruder Protection Systems (IPSs).

The next generation of security devices is the so-called Unified Threat Management (UTMs), which integrate, apart from the traditional IPS, also mechanisms such as Gateway Antivirus, Gateway Antispam, Content Filtering, Parental Control, Load Balancing, Bandwidth Management, and On-Appliance reporting, while obviously not every UTM system must have all of the above mechanisms implemented.

Another type of specialized security solutions, which can be implemented in UTM systems or constitute standalone solutions, is Information Leak Prevention systems, also known as Data Loss Prevention, Data Leak Prevention (DLP), or Information Loss Prevention (ILP),

Book [1, page 179] presents a listing of Intruder Detection Systems and Intruder Protection Systems, which includes more than 60 systems. Issues relating to IDS are also presented in many other research works (see e.g., [24]).

Anomaly detection is one of the three groups of methods, including misuse detection systems and integrity verification, used in Intruder Detection Systems.

Misuse detection is the detection of specific behaviours which confirm that an attack occurred, whereas anomaly detection involves predictive pattern of behaviours, deviations from which instances of an attack on a protected system are considered. Misuse detection has, in the majority of cases, deterministic character (the rules matching the observed phenomena or action is found or not), and it is easier to algorithmize, whereas anomaly detection necessarily refers to uncertain observations and has to use statistical methods (statistical methods have been used in IDS systems since 1987, and the first IDS in which they were implemented was the “Haystack” project conducted in Los Alamos National Laboratory (see e.g., [5, page 432]).

Paper mentioned in [6] describes the application of the traditional Holt-Winters method in behavioral analysis of network traffic for development of adaptive models for various types of traffic in four sample computer networks. The next obvious step, after evaluation of the model, is the development of and predicted pattern and alert generation algorithm (see e.g., [5, 7, 8]).

2. Holt-Winters Model: Brutlag’s Anomaly Detection Algorithm

The Holt-Winters model, called also the triple exponential smoothing model, is a well-known adaptive model used to modeling time series characterized by trend and seasonality (The Holt model was formulated in 1957 and the Winters model in 1960. See [9, 10], a comprehensive review of the literature about this and other models based on exponential smoothing is given in [11]). In its additive version, it presents the smoothed variant of the 𝑦𝑡 time series as the sum of three constituentŝ𝑦𝑡=𝐿𝑡+𝑇𝑡+𝑆𝑡𝑟,(1) where ̂𝑦𝑡 is the value estimated by the model of the variable in moment 𝑡, 𝑟 is the length of the seasonal periodicity,𝐿𝑡𝑦=𝛼𝑡𝑆𝑡𝑟+𝐿(1𝛼)𝑡1+𝑇𝑡1(2) is the constituent smoothing out the level of the time series,𝑇𝑡𝐿=𝛽𝑡𝐿𝑡1+(1𝛽)𝑇𝑡1(3) represents the increase of the time series resulting from the trend,𝑆𝑡𝑦=𝛾𝑡𝐿𝑡+(1𝛾)𝑆𝑡𝑟(4) is the seasonal component of the time series, 𝛼, 𝛽, and 𝛾 are smoothing parameters, estimated for the particular time series, while 𝑦𝑡 is the real value of the variable in moment 𝑡, and the parameters 𝛼, 𝛽, and 𝛾 belong to [0;1] interval.

Estimation of model parameters is iterative, usually though minimization of arbitrarily selected measures of error (e.g., the Mean Squared Error of expired estimations or the sum of absolute values of the residuals of the model see e.g., [12, page 187], [13, page 226], [14, page 77], and [6, 15], [16, page 223]).

Holt-Winters method was used to detect network traffic anomalies as described in [17]. In the paper concept of “confidence bands” was introduced. As described in the paper, confidence bands measure deviation for each time point in the seasonal cycle, and this mechanism bases on expected seasonal variability.

The estimated deviation of the real value of the dependent variable is𝑑𝑡||𝑦=𝛾𝑡̂𝑦𝑡||+(1𝛾)𝑑𝑡𝑟,(5) where 𝑑𝑡 is the estimated deviation of the real value of the dependent variable 𝑦 in moment 𝑡 from the estimated value ̂𝑦𝑡, where the value of parameter 𝛾 is the estimated value in the model described above in (4). The event when the real value of the dependent variable 𝑦𝑡 differs from the estimated value ̂𝑦𝑡 by more than 𝑑𝑡 multiplied by the scaling factor 𝑚 is considered an anomaly (an alert is triggered in the IDS system). In [17] the extension of the RRDtool is presented, covering real-time determination and marking of values ̂𝑦𝑡+𝑚𝑑𝑡 and ̂𝑦𝑡𝑚𝑑𝑡 on the chart and generating information on occurring anomalies. The author assumed an arbitrary method of determining the initial values of parameters 𝛼, 𝛽, and 𝛾 as well as the iterative method of adapting only parameter 𝛼 (see: [17]), which, from a statistical point of view, may provoke doubts, as it leads to development of suboptimal models from the perspective of minimization of the value of any measure of error. Additionally, in the Brutlag method, the calculated value of the parameter 𝑑𝑡 for the purposes of determining the value above or below which anomalies will be reported is multiplied by intuitively selected scaling factor 𝑚 of value between 2 and 3, which makes the model even more arbitrary (see e.g., [18]).

Thirdly, an important feature of the Holt-Winters model is the assumption on single seasonality (periodicity) of the given series, while in the case of network traffic one could expect double seasonality: daily and weekly. Anyone intending to use the Holt-Winters model to develop an anomaly detection system needs to select which periodicity should be used in the model.

3. Adaptative Models with Double and Triple Seasonality (Taylor Models)

In, [19] a suggestion is made to extend the Holt-Winters method to cover series with double, while in [20] with triple seasonality. In [21] the Taylor model with double seasonality was used to modelling internet traffic. Obviously it is theoretically possible to develop analogous models for time series with multiple periodicity; however, issues are raised in the literature (see [22]) concerning the unstable behaviour of such models, as well as the doubtful impact of third and further seasonalities on the calculated value of the predicted variable. Similar reservations also apply to double-seasonal Taylor models, in which the duration of the first period is considerably longer than that of the second one.

Double-seasonal Holt-Winters-Taylor model (referred to as HWT2 in subsequent sections) is determined by the following equations:̂𝑦𝑡=𝐿𝑡1+𝑇𝑡1+𝐷𝑡𝑟1+𝑊𝑡𝑟2,(6) where 𝑟1 is the length of the seasonal 1 (day) periodicity, 𝑟2 is the length of the seasonal 2 (week) periodicity,𝐿𝑡𝑦=𝛼𝑡𝐷𝑡𝑟1𝑊𝑡𝑟2+𝐿(1𝛼)𝑡1+𝑇𝑡1(7) is the constituent smoothing out the level of the series,𝑇𝑡𝐿=𝛽𝑡𝐿𝑡1+(1𝛽)𝑇𝑡1(8) corresponds to the increase of the series resulting from the trend,𝐷𝑡𝑦=𝛾𝑡𝐿𝑡𝑊𝑡𝑟2+(1𝛾)𝐷𝑡𝑟1(9) is a seasonal component of the series for seasonality 1 (day), and𝑊𝑡𝑦=𝛿𝑡𝐿𝑡𝐷𝑡𝑟1+(1𝛿)𝑊𝑡𝑟2(10) is a seasonal component of the series for seasonality 2 (week).

The initial values of components were arbitrarily set as𝐿1=𝑦1,𝑇1𝐷=0,1=𝐷2==𝐷𝑟1𝑊=0,1=𝑊2==𝑊𝑟2=0.(11)

4. Application of Brutlag’s Anomaly Detection Algorithm in the HWT2 Model

In order to identify indications of anomalies in the modelled system, an analogous solution to the one presented in [17] can be used. In view of the double seasonality in the Taylor model, one might imagine two types of scatter permitted for the value of the predicted variable—one based on the parameter 𝛾, and the other on 𝛿𝑑𝑡||𝑦=𝛾𝑡̂𝑦𝑡||+(1𝛾)𝑑𝑡𝑟1,𝑤𝑡||𝑦=𝛿𝑡̂𝑦𝑡||+(1𝛿)𝑑𝑡𝑟2.(12) The initial values of components were arbitrarily set as𝑤𝑟2+1=𝑑𝑟2+1=||𝑦𝑟2+1̂𝑦𝑟2+1||.(13)

One needs to remember that the parameters of the exponential smoothing models may be interpreted as a measure of the impact of the last measurement (parameters 𝛼, 𝛽, 𝛾, and 𝛿) or earlier measurements (values 1𝛼, 1𝛽, 1𝛾, and 1𝛿) on predicted values. Contrary to descriptive models, where the estimated values of parameters given the appropriate dependent variable has an intuitive meaning (in the case of single-equation additive model, the impact of the explanatory variable on the value of the dependent variable), and the criterion of minimizing the adopted measure of adjustment is decisive, the parameters of adaptive models of time series with exponential smoothing may be interpreted as a measure of smoothing—the greater the values of 𝛾 and 𝛿, the greater the impact of values of the last measurements (i.e., measured correspondingly one day and one week earlier), the lower the values of the parameters, the better the model “remembers” the previous values (whose impact is weighted with 1𝛾 and 1𝛿 coefficients). One might, at least to a certain extent, that is if does not have too great an influence on the adopted measure of adjustment, decide to arbitrarily change the values of smoothing parameters, especially if the given series displays periodicity.

The existence of two types of permitted scatter results in the necessity of distinguishing between two types of alerts: the first one related to exceeding the thresholds determined by the parameters of daily seasonality and the second one—of weekly seasonality. As the thresholds may intertwine it is necessary to distinguish in the alerts (see Figure 1) all three possible events (“daily” threshold exceeded, “weekly” threshold exceeded, both thresholds exceeded).

5. Results and Conclusions

For the networks described in [6], the HWT2 model analysis was carried out for which the parameters were estimated through minimization of the expressionMean_Absolute_ErrorMean,(14) whereMean=𝑛𝑡=1𝑦𝑡𝑛,(15) where 𝑛 is the length of the data series.

We decided to use MAE rather than mean squared error (MSE)-based measure because there were a lot of so-called outliers noted in the analysed samples and MSE-based measure that can be oversensitive in those cases (see, e.g., [21]).

In the referenced article network traffic for aggregated series was modelled (hourly data) and in the case of smaller-scale research series resulting from measurements of traffic every 10 minutes were used.

The obtained results and the number of alerts generated on exceeding the data threshold (“gamma” alert and “delta” alert) is presented in Table 1. For comparison, the table also contains the number of alerts obtained in modelling traffic using the classic Holt-Winters model (analogous to [6], this is with an iterative estimation of the value of parameters with minimization of MAE/M error, however, for 10-minute interval measurement, which is a departure from the referenced article).

As compared to the traditional Holt-Winters model, the magnitude of error is virtually unchanged (the table presents results with accuracy down to one per cent. In reality, there were differences between models in adjustment measures, at no more than one thousandth percentage point, which in practice is insignificant) whereas in all cases the number of alerts generated for the same scaling factor has lowered. The potential application of the HWT2 model with the so-determined two types of thresholds may prove useful for reducing the number of false positives.

The application of the parameters 𝛾 and 𝛿 as weights determining the permissible scale makes the model have a relatively “short memory.” Therefore, if the present value of the periodical constituent is closely related to its previous value, the extent of permitted thresholds will be relatively high. In the case of models with “better memory,” they will be more sensitive to unitary changes in traffic. In the analyzed network traffic series, together with the high value of parameter 𝛼, the usually estimated value of parameter 𝛾 was relatively low (102 or less), the consequence of which was a relatively high number of alerts, the vast majority of which, as it would seem, would be deemed false positives. In the examined series, noted was greater sensitivity of the model to changes in the 𝛼 level parameter than the seasonality impact (𝛾 and 𝛿), that had relatively low values. One might thus consider, instead of increasing the scaling factor, to arbitrarily increase the value of these parameters to the maximum values which do not cause significant deterioration of the model-matching score (e.g., below 1 percentage point). As it would seem, this is an approach which is better substantiated from a statistical point of view than manipulating the value of the scaling factor (an interesting challenge of the method adopted in [17] for determination of the values not provoking alerts contained [18]).

6. Further Actions

Among the methods used in anomaly detection, the following may be mentioned:(i)entropy measurement—see, for example, [23, 24],(ii)the so-called correlation of packets—see [25, 26]—where algorithms of simulated annealing were used,(iii)principal components analysis—see, for example, [27, 28],(iv)support vector machines—see, for example, [4],(v)adaptive threshold algorithm and the cumulative sum algorithm—see, for example, [2931],(vi)data clustering—see also [32],(vii)k-nearest neighbors method—see [33], decision trees—see, for example, [34],(viii)artificial neural networks (ANNs)—see, for example, [35].(ix)distributed ANN—see, for example, [35, 36],(x)decision rule induction—see, for example, [37],(xi)immune algorithms—see, for example [38],(xii)genetic algorithms—see, for example, [39],(xiii)fuzzy logic—see, for example [40, 41],(xiv)zero-one models—see [42] and so forth.

As presented in [6] the characteristics of various networks or even the various types of network traffic in the same network are very different. Therefore, even if one of the widely used models of traffic or methods for their creation finds even the slightest application in test trials, its research work is practically useful.

Presently, our works are carried out on implementing both models (traditional Winters and HWT2) in the Anomaly Detection preprocessor, referred to in articl [6].