Abstract

Distributed denial-of-service (DDoS) flood attacks remain great threats to the Internet. To ensure network usability and reliability, accurate detection of these attacks is critical. Based on Li's work on DDoS flood attack detection, we propose a DDoS detection method by monitoring the Hurst variation of long-range dependant traffic. Specifically, we use an autoregressive system to estimate the Hurst parameter of normal traffic. If the actual Hurst parameter varies significantly from the estimation, we assume that DDoS attack happens. Meanwhile, we propose two methods to determine the change point of Hurst parameter that indicates the occurrence of DDoS attacks. The detection rate associated with one method and false alarm rate for the other method are also derived. The test results on DARPA intrusion detection evaluation data show that the proposed approaches can achieve better detection performance than some well-known self-similarity-based detection methods.

1. Introduction

DDoS flood attacks have been one of the most frequently occurring attacks that badly threaten the stability of the Internet. For DDoS flood attack, an intruder undermines the availability of computer systems or services by exploiting the inherent weakness of the Internet system architecture, and overwhelming the target with a huge amount of traffic flows launched through multiple zombies. The attack process is a relatively simple, yet very powerful technique to attack the Internet resources. Therefore, accurate detection of these attacks is critical to the Internet community.

As shown by Leland et al. [1], and supported by a number of later research [27], the measurements of local and wide-area network traffic, wire-line and wireless network traffic all demonstrate self-similarity and long range dependence (LRD) characteristics at large time scales. The work in [8] points out that self-similarity of the Internet traffic is attributed to a mixture of the actions of a number of individual users, hardware and software behaviors at their originating hosts, multiplexed through an interconnection network. In other words, this self-similarity always exists regardless of the network type, topology, size, protocol, or the type of services the network is carrying. On the other hand, it is reported in [915] that when DDoS attack happens, the self-similarity of network traffic will change significantly. Thus, by monitoring the change of the Hurst parameter, the key parameter to describe the self-similarity of a self-similar process, DDoS attacks may be detected.

Much work has been done to detect DDoS attack by recognizing the pattern of self-similarity in the literature. In [16], Li deduced the statistical characteristic of network traffic autocorrelation function under normal condition and DDoS attack and gave the detection threshold based on the preselected detection rate and false alarm rate. In [11], Li quantitatively described the statistics of abnormal traffic and suggested that the Hurst parameter of network traffic under DDoS attack tends to be significantly smaller than that of normal traffic. Li also demonstrated in [11] that the average Hurst parameter of fixed number of normal traffic pieces follows Gaussian distribution at large time scales and when the attack occurs, this statistical property may in general change.

Based on Li’s work, we propose a DDoS detection method by monitoring the Hurst variation. Specifically, we use an autoregressive (AR) system to estimate the Hurst parameter of normal traffic. If the actual Hurst parameter varies significantly from the estimation beyond a threshold, we assume that DDoS attack happens. Then we propose two methods to determine the change point of Hurst parameter, that is, to determine the threshold of Hurst variation that is used to distinguish attack traffic from normal traffic. The detection rate associated with one method and false alarm rate for the other method are also derived. The experiment results on Defense Advanced Research Projects Agency (DARPA) data sets indicate that the proposed detection methods are effective in detecting DDoS flood attacks, and can achieve better detection performance than some well-known self-similarity-based detection methods.

The rest of this paper is organized as follows. Section 2 briefly introduces the concept of self-similarity and the Hurst parameter estimation. Section 3 explains the proposed detection process based on the Hurst variation. Section 4 discusses the two methods for determining the change point of LRD traffic. Section 5 presents the performance evaluation and analysis of the proposed detection methods with traffic data from DARPA, followed by a brief conclusion in Section 6.

2. Preliminaries

2.1. Self-Similar Network Traffic

Self-similarity means that the sample paths of the process and those of rescaled version , obtained by simultaneously dilating the time axis t by a factor , and the amplitude axis by a factor cannot be statistically distinguished from each other. Equivalently, it implies that an affine dilated subset of one sample path cannot be distinguished from its whole. H is called the Hurst parameter. For a general self-similar process, H measures the degree of self-similarity.

Network traffic arrival process; is a discrete time process, so the discrete time self-similarity definition is given below. Let be a wide-sense stationary discrete stochastic traffic time series with constant mean , finite variance , and autocorrelation function , . Let be an -order aggregate process of ; then For each defines a wide-sense stationary stochastic process with autocorrelation function .

Definition 2.1. A second-order stationary process X is called exact second-order self-similar (ESOSS) with Hurst parameter , if the autocorrelation function satisfies where and .

Definition 2.2. A second-order stationary process X is called asymptotical second-order self-similar (ASOSS) with Hurst parameter , if the autocorrelation function satisfies where and .

In the field of network traffic theory, it is more practical to use ASOSS.

2.2. Hurst Parameter Estimation

To date, several methods have been proposed to estimate the Hurst parameter. Some of the most popular ones include the aggregated variance, local whittle, and the wavelet-based methods [1721]. In this paper, we use the method proposed by Li [11] to estimate the Hurst parameter of network traffic. The estimation process is summarized as follows. For more information please refer to [11].

Let be the autocorrelation function of . Then where stands for the asymptotical equivalence under the limit , , and .

By taking fractional Gaussian noise as an approximate model of , one has where and are the variances of -order aggregate process and .

Divide the traffic series into nonoverlapping sections, and each section is further divided into nonoverlapping segments. Then the autocorrelation function of the th segment in the th section is given by where is the Hurst parameter of the th segment in the th section traffic piece. Let be the cost function. Then one has

Averaging in terms of yields where represents the Hurst parameter in the th section.

3. DDoS Detection Based on Hurst Variation

Given discrete network traffic trace time series , and , let and be normal traffic and abnormal traffic, respectively and the DDoS flood attack traffic during transition process of attacking. and are uncorrelated [11], so Y can be expressed as .

Figure 1 illustrates the components of normal traffic, attack traffic, and abnormal traffic. represents the number of bytes sent out by node at time for normal network services, stands for the number of bytes sent out by node at time for DDoS flood attack, and is the total traffic the target received at time .

Based on the theorems in [22], we understand that no matter whether is a self-similar process or not, as long as is a second-order stationary self-similar process, will be a self-similar process, but the degree of self-similarity may change. Let , and be the autocorrelation functions of , , and , respectively. Li in [11] proved that during the transition process of attacking, is significant, where . For each value of Hurst parameter in the range of , there is exactly one corresponding autocorrelation function [23]. Therefore, is significant means that changes significantly when attack occurs, where and are the Hurst parameters of and , respectively. Based on this observation, we propose a DDoS detection method by monitoring the Hurst variation in this paper. The details of the detection process are explained as follows.

After the Hurst parameter estimation of each section using (2.7), we apply autoregressive (AR) model to determine the self-similarity of traffic without attacks. That is, where is the estimated Hurst parameter of normal traffic section is the order of AR model, and are the coefficients of AR model, which can be obtained by using the least-squares method [24]. Other models such as moving average (MA) model and autoregressive moving average (ARMA) model also can be used in our method in the same way.

Since the Hurst parameter without any attack follows Gaussian distribution in most cases for [11], the probability distribution function of is given by where where is the number of traffic section. and are the mean and variance of the Hurst parameter , respectively.

Using linear estimation, the change of self-similarity is given by which can be regarded as the sum of independent Gaussian variables. So also follows Gaussian distribution. The mean and variance of are obtained by So the probability distribution function of is expressed by

The attack detection can be formulated as the following hypothesis testing problem.(A0) The change of self-similarity is within a threshold indicating normal network traffic.(A1)The change of self-similarity is outside the threshold indicating abnormal network traffic caused by DDoS attacks.

It can be seen that a proper threshold of is the key to successfully detect DDoS attacks. The threshold is also the change point of Hurst parameter whereby Hurst variation beyond this point implies DDoS attack. In the next section, we propose two methods for change point detection, one based on order statistic and the other based on maximum likelihood estimate.

4. Determining Change Point of LRD Traffic

In the following discussion, the change point of self-similarity is equivalent to the threshold that is used to distinguish attack traffic from normal traffic. We propose two methods to determine the change point and calculate the associated detection rate for one method and false alarm rate for the other method.

4.1. Order Statistic-Based Detection

For order statistic-based detection, are first sorted in an increasing order to reference cells as The detection threshold is obtained by selecting the th-order-ranked to represent the normal traffic plus measured noise. The input is multiplied to that cell by a scalar factor , and the threshold is expressed by

The traffic in section is considered normal if the change of self-similarity otherwise, the traffic is considered abnormal, indicating possible attacks in that section. is a random variable, and its probability distribution function is expressed by where is the probability distribution function of , and is the distribution function of .

We define the term detection as correctly recognizing an abnormal sign. The detection rate is obtained by averaging the conditional probability of detection under the given threshold over all possible values of the threshold. That is,

Substituting (3.6) and (4.3) into (4.4) yields

4.2. Maximum Likelihood Estimate-Based Detection

Considering the independence between and , , the joint probability density function of is obtained by Taking the natural logarithm on both sides of (4.6), we have In order to get the maximum likelihood estimate (MLE) of and , we have By solving (4.8), one has So the probability distribution function of is expressed by

Let the detection threshold be . The traffic in section is considered normal if the change of self-similarity ; otherwise, the traffic is considered abnormal, indicating possible attacks in that section.

Define false alarm as mistakenly recognizing a normal traffic as abnormal traffic. The false alarm rate of the proposed detection system is expressed by So when given the preselected false alarm rate , the detection threshold is given by where is the standard normal distribution function.

5. Experiments and Analysis

5.1. Data Preparation

To evaluate the proposed detection methods, we use two traffic data sets from DARPA 1999 [25]. The DARPA 1999 data sets are from the Information Systems Technology Group, MIT Lincoln Laboratory, under DARPA ITO and Air Force Research Laboratory. These traffic data sets are the first standard for the evaluation of computer network intrusion detection systems. The first traffic set collected from 8:20:00.0 to 11:10:39, 1 March (Monday), 1999, named DARPA1999-week1-Monday-inside, is an attack free series. The second traffic set collected from 8:20:00.0 to 16:24:41.5, 8 March (Monday), 1999, named DARPA1999-week2-Monday-inside, is an attack contained series. 3 types of DDoS attacks are contained in this data set, which are pod, back, and land separately. We rename the first-attack free traffic set as D99-W1-1-i and second attack contained traffic set as D99-W2-1-i for short. The traffic traces for these two data sets are displayed in Figure 2. The merging time scale is 100 ms.

5.2. Test Results and Analysis

After the 100 ms merging, the number of data in D99-W1-1-i is 102400 and the number of data in D99-W2-1-i is 290816. Combine these two traffic sets into one and name it as D99. D99 is divided into 64 sections () and each section is further divided into 12 segments (). So the length of each traffic segment is 512. We use (2.7) to estimate the Hurst parameter of the wth traffic segment in the nth section and , then average the in terms of . After that, we obtain the Hurst parameter in the th section, as shown in Figure 3.

We apply AR model with order to estimate the Hurst parameter of the traffic. The Hurst variation of the nth traffic section is obtained using (3.4). The results are shown in Figure 4.

For the order statistic-based detection method, we first sort in an increasing order and then choose the scale factor . After selecting a value , the detection threshold is calculated according to (4.2). Figure 5 shows the thresholds when is 40, 45, and 50, respectively.

Form Figure 5, we can see that when is smaller , the detection threshold is lower. In this case, more traffic sections will have Hurst variations above the threshold thus more attacks are declared. However, note that a smaller may also introduce more false alarms, mistakenly recognizing more normal traffic as attack traffic.

For the maximum likelihood estimate-based detection, we compute the detection threshold using (4.12). Figure 6 shows the resulted thresholds when the pre-selected false alarm rate is 1%, 5%, and 10%, respectively.

Form Figure 6, we can see that when the pre-selected false alarm rate is higher , the resulted threshold is lower. This is in accordance with our expectation because when the pre-selected false alarm rate is high, it is allowed to mistakenly treat some normal traffic as attacks, thus the detection threshold is low.

Figure 7 shows the detection rate versus false alarm rate for both of the detection methods. We can see from the figure that both of the two detection methods can achieve reasonable detection rate, but the detection performance of maximum likelihood estimate-based method is better than the order statistic-based method. Meanwhile, we can see that for both detection methods, a minor increase of the results in a significant increase in when is lower than 0.1. Which means if we allow a little bit more false alarm, the detection rate will be significantly improved. We can also observe from Figure 7 that when is higher than 0.9, a minor increase in will require a significant increase in . That is, if we want to improve the detection rate in the range greater than 0.9, we have to tolerate much more false alarms.

5.3. Comparison with Existing Detection Methods

In this section, we compare our proposed two detection methods with Allen's method [26] and Ren's method [27], for these are two well-known self-similarity-based detection methods in the literature. Both of these methods define a range of Hurst parameter for normal traffic. For Allen's method, the Hurst range is 0.5, 0.99 and the range is 0.65, 0.85 in Ren's method. Traffic section with a Hurst outside the range is treated as abnormal traffic.

Table 1 compares the detection performance of Allen's method, Ren's method, and our proposed methods. Ren's detection method achieves higher detection rate than Allen's method at the cost of slightly higher false alarm rate . We first use the Allen's false alarm rate 34% as the false alarm rate of the proposed two detection methods. The proposed order statistic-based detection method can archive detection rate as high as 87%, and maximum likelihood estimate-based detection method archives detection rate as high as 92%, both higher than the detection rate of Allen's method. Similarly, we use the Ren's false alarm rate 38% as the false alarm rate of the proposed two detection methods. The detection rates of the proposed detection methods are also higher than that of the Ren's method.

6. Conclusion

In this paper, we have proposed a DDoS detection method by monitoring Hurst variation based on Li's work on DDoS attack detection. Meanwhile, we have discussed two methods for determining the change point of LRD traffic, which can be used to distinguish attack traffic from normal traffic. Experiments have been conducted to evaluate the performance of our proposed scheme, and the test results show that the proposed detection methods outperform existing self-similarity based detection methods, and can significantly enhance the reliability and robustness of the DDoS flood attack detection.

Acknowledgments

This work was supported in part by the National High Technology Research and Development Program of China under Grant no. 2007AA01Z473 and the National Natural Science Foundation of China (NSFC) under Grants no. 60573125, no. 60873264, no. 60605019 and no. 60702047. The authors would also like to thank the reviewers for their constructive comments that have considerably increased the quality of this paper.