Abstract

This paper gives a novel traffic feature for identifying abnormal variation of traffic under DDOS flood attacks. It is the histogram of the maxima of the bounded traffic rate on an interval-by-interval basis. We use it to experiment on the traffic data provided by MIT Lincoln Laboratory under Defense Advanced Research Projects Agency (DARPA) in 1999. The experimental results profitably enhance the evidences that traffic rate under DDOS attacks is statistically higher than that of normal traffic considerably. They show that the pattern of the histogram of the maxima of bounded rate of attack-contained traffic greatly differs from that of attack-free traffic. Besides, the present traffic feature is simple in mathematics and easy to use in practice.

1. Introduction

People nowadays are heavily dependent on the Internet that serves as an infrastructure in the modern society. However, distributed denial-of-service (DDOS) flood attackers remain great threats to it. By consuming resources of an attacked site, the victim may be overwhelmed such that it denies services it should offer or its service performances are significantly degraded. Therefore, intrusion detection system (ISD) for detecting DDOS flood attacks has been greatly desired.

There are two categories regarding IDSs. One is misuse detection and the other anomaly detection. Attacking alerts given by misuse detection is primarily based on a library of known signatures to match against network traffic, see, for example, [15]. Thus, attacking with unknown signatures from new variants of an attack can escape from being detected by signature-based IDSs with the probability one, see, for example, [6], making such a category of IDSs at the protected site irrelevant. However, based on anomaly detection, abnormal variations of traffic are identified as potential intrusion so that this category of IDSs are particularly paid attention to for identifying new attacking, see, for example, [713]. For the simplicity, in what follows, the term IDS is in the sense of anomaly detection.

Noted that the detection accuracy is a key issue of an anomaly detector, see, for example, [14, 15]. To be effective, IDSs require appropriate features for accurately detecting an attack and distinguishing it from the normal activity as can be seen from [10, Section IV]. Hence, developing new traffic features for anomaly detection is essential.

The reference papers regarding traffic features for IDS use are wealthy. For example, 86 features for clustering normal activities are discussed in [9]. Note that a selected feature is methodology-dependent. In this regard, [16] uses packet head data. The paper [17] utilizes autocorrelation function of long-range dependent (LRD) traffic time series in packet size and [18] employs the Hurst parameter. Scherrer et al. adopt scaling properties of LRD traffic [19].

The traffic models used in [1723] are in the sense of fractal. In general, fractal models might be somewhat complicated in practical application in engineering in comparison with the traffic feature proposed in this paper.

Recall that there are two categories in traffic modeling [24, Section XIV]. One is statistical modeling (e.g., LRD processes). The other bounded modeling, which has particular applications to modeling traffic at connection level, see, for example, [2530]. Bounded models, in conjunction with a class of service disciplines, are feasible and relatively efficient in applications, such as connection admission control (CAC) in guaranteed quality-of-service (QoS). In addition, such models are simple in mathematics and relatively easy to be used in practice in comparison with fractal models. This paper aims at providing a new traffic feature for anomaly detection based on bounded modeling of traffic. The main contributions in this paper are as follows.(i)We present the histogram of the maxima of bounded traffic rate on an interval-by-interval basis as a traffic feature for exhibiting abnormal variation of traffic under DDOS flood attacks.(ii)The experimental results exhibit that the maxima of rate bound of attack-contained traffic is statistically greater than that of attack-free traffic drastically.

The rest of paper is organized as follows. Experimental data and related work are briefed in Section 2. The histogram of the maxima of traffic rate bound is proposed in Section 3. Experimental results are demonstrated in Section 4, which is followed by discussions and conclusions.

2.1. Experimental Data

While DDOS attacks continue to be a problem, there is currently not much quantitative data available for researchers to study the behaviors of DDOS flood attacks. The data in the 1998-1999 DARPA (http://www.ll.mit.edu/IST/ideval) are valuable but rare for public use though there are points worth further discussion [31]. Those data were obtained under the conditions of realistic background traffic and mean examples of realistic attacks [32, 33]. The used data sets in 1999 contain more than 200 instances and 58 attacks types, see, for details [34]. Two data sets are explained below.

2.1.1. Set One: Attack-Free Traffic (1999 Training Data—Week 1)

The first set of data containing 5 traces. We name them by OM-W1-i-1999AF (, 2, 3, 4, 5), meaning Outside-MIT-week1-i-1999-attack-free. Table 1 indicates the actual times at which the first packet and last one were extracted for each trace.

2.1.2. Set Two: Attack-Contained Traffic (1999 Training Data—Week 2)

Five traces are included in the second data set. They are named as OM-W2-i-1999AC (, 2, 3, 4, 5), implying Outside-MIT-week2-i-1999-attack contained. The actual times at which the first packet and last one were extracted for each trace are listed in Table 2.

2.2. Traffic Rate under DDOS Flood Attacks

Roughly, high rate is the radical feature of attack-contained traffic. The paper [35] reported the real events in 2000. He noticed that “the attacks inundated servers with 1 gigabit per second of incoming data, which is much more traffic than they were built to handle [35, page 12].” The analysis given by Moore et al. says that “to load the network, an attacker generally sends small packets as rapidly as possible since most network devices (both routers and NICs) are limited not by bandwidth but by packet processing rate [36, Section 2.1].” They infer that traffic rate is usually the best measure of network load during an attack. In short, computer scientists consider high rate as a basic feature of attack-contained traffic, also see, for example, [3742]. The experimental results in this paper are simply for the data of the 1999 DARPA in the case of high-rate attacks.

2.3. Traffic Bounds

In this subsection, we brief the deterministic bounds for accumulated traffic and traffic rate with the help of demonstrations using traffic traces OM-W1-1-1999AF and OM-W1-1-1999CF.

Let be the series, indicating the number of bytes in the th packet () of arrival traffic at time . Then, is a discrete series, indicating the number of bytes in the th packet of arrival traffic. Figure 1 shows a plot of for the first 1024 points of OM-W1-1-1999AF.

According to [27, 43], an upper bound of arrival traffic is given below.

Definition 2.1. Let be the arrival traffic function. Then, is called traffic upper bound of over the duration of length .

Note 1. The physical meaning of is that the accumulated amount of arrival traffic over the duration of length is upper bounded by . The unit of is bytes. is an increasing function in terms of . Figure 2 indicates of OM-W1-1-1999AF for .

Definition 2.2. Let be the arrival traffic function. Then, is called upper bound of traffic rate (traffic rate bound for short) of .

Note 2. Equation (2.2) specifies that GAMA is the maximum arrival rate at a specific point in the network over any duration of length . The unit of GAMA is defined as Bytes per . GAMA is a decreasing function in terms of . Figure 3 demonstrates GAMA of OM-W1-1-1999AF for .

3. Histogram of Maxima of Traffic Rate Bound: A Feature for Identifying Abnormal Variation of Traffic under DDOS Attacks

In this section, we first introduce the time series of traffic rate bound. Then, we establish the maxima of traffic rate bound. Finally, we achieve the histogram of the maxima of traffic rate bound. The demonstrations with the experimental data are used for facilitating the discussions.

3.1. Traffic Bound Series

Theoretically, can be any positively real number. In practice, however, is selected as a finite positive integer. Fix the value of and observe traffic bounds in the interval . Then, we express traffic bounds as a function in terms of the interval index . Considering the index , we express traffic upper bound by , which is a series.

Note that is a stochastic series and so is . That is, for . We term traffic upper bound series. Similarly, we use GAMA to represent traffic rate bound series. Figure 4 shows the traffic upper bound series. Figure 5 plots the rate bound series.

Since GAMA is random, identification in a single interval is not enough. We use Figure 6 to explain this point of view. From Figure 6, we see that the rate bound of attack-contained traffic is greater than that of attack-free traffic in some intervals, for example, in the second and third intervals. However, it is less than the rate bound of attack-free traffic in some intervals, for example, in the first and fourth intervals. Therefore, we will study the issue how the bound series of traffic rate statistically varies under DDOS flood attacks. For this reason, we study the maxima of traffic rate bound.

3.2. Maxima of Traffic Rate Bound

Denote that over the index in each interval . Then, MGAMA() represents a series to describe the maximum value of GAMA() in each interval . In other words, MGAMA() stands for the maxima of GAMA(). The unit of MGAMA() is the same as that of GAMA(). Here and below, we use the notation MGAMA_F() for attack-free traffic and MGAMA_C() for attack-contained traffic. Figures 7(a) and 7(b) give the plots of MGAMA_F() and MGAMA_C() for OM-W1-1-1999AF and OM-W2-1-1999AC, respectively.

3.3. Histogram of Maxima

Denote Hist[MGAMA_F()] and Hist[MGAMA_C()] as the histograms of MGAMA_F() and MGAMA_C(), respectively. Then, they represent empirical distributions of MGAMA_F() and MGAMA_C(). Figures 8(a) and 8(b) indicate the Hist[MGAMA_F()] and Hist[MGAMA_C()] for OM-W1-1-1999AF and OM-W1-1-1999CF, respectively. From Figure 8(c), we see that the pattern of Hist[MGAMA_F()] considerably differs from that of Hist[MGAMA_C()]. To investigate this phenomenon quantitatively, we need a measure to describe the similarity or dissimilarity between the pattern of Hist[MGAMA_F()] and that of Hist[MGAMA_C()], which will be explained in the next subsection.

3.4. Correlation Coefficient Used as a Similarity Measure for Pattern Matching

There are many measures to characterize the similarity or the dissimilarity of two patterns in the field of pattern matching, see, for example, [44, 45]. Among them, the correlation coefficient between two patterns is commonly used in engineering, see, for example, [46]. We use it to measure the pattern similarity in this research. Denote that where corr implies the correlation operation.

It is known that 0 ≤ Corr_FC ≤ 1. The larger the value of Corr_FC the more similar between the pattern of Hist[MGAMA_F()] and that of Hist[MGAMA_C()]. Mathematically, the case of Corr_FC = 1 implies that the pattern of Hist[MGAMA_F()] is exactly the same as that of Hist[MGAMA_C()]. On the contrary, Corr_FC = 0 means that the pattern of Hist[MGAMA_F()] is totally different from that of MGAMA_C()]. From the point of view of engineering, however, the extreme case of either Corr_FC = 1 or Corr_FC = 0 does not make much sense due to errors and uncertainties in measurement and digital computation. In practical terms, one uses a threshold for Corr_FC to evaluate the similarity between two. The concrete value of the threshold depends on the requirement designed by researchers that but it is quite common to take 0.7 as the smallest value of the threshold for the pattern patching purpose. Suppose that we consider 0.8 as the threshold value. Then, we say that the pattern of Hist[MGAMA_F()] is similar to that of Hist[MGAMA_C()] if Corr_FC ≥ 0.8 and dissimilar otherwise.

By computing, we obtain Corr_FC = 0.01751 for OM-W1-1-1999AF and OM-W2-1-1999CF, implying the pattern of Hist[MGAMA_F()] considerably differs from that of Hist[MGAMA_C()] as indicated in Figure 8(c). We will further demonstrate this interesting phenomenon in the next section.

4. Experimental Results

The value of Corr_FC for OM-W1-1-1999AF and OM-W2-1-1999CF has been mentioned above. In this section, we illustrate experimental results describing Corr_FC for OM-W1-2-1999AF and OM-W2-2-1999CF. The plots to illustrate Corr_FC for OM-W1-3-1999AF and OM-W2-3-1999CF, OM-W1-4-1999AF and OM-W2-4-1999CF, OM-W1-5-1999AF and OM-W2-5-1999CF and are listed in the appendices.

Figures 9(a) and 9(b) are the plots of the first 1024 points of OM-W1-2-1999AF and OM-W2-2-1999CF, respectively. Figures 10(a) and 10(b) indicate the series of traffic rate bound for OM-W1-2-1999AF and OM-W2-2-1999CF for with , respectively. Figures 11(a) and 11(b) demonstrate the maxima of rate bound for both traffic traces for . Figures 12(a) and 12(b) show the histograms of the maxima of traffic rate bound for both traces. Figure 12(c) gives the comparison between two. By computation, we have Corr_FC = 0.163261, meaning that the pattern of Hist[MGAMA_F(n)] considerably differs from that of Hist[MGAMA_C(n)] for OM-W1-2-1999AF and OM-W2-2-1999AC.

Note that the values of Corr_FC for other three pairs of test traces, see Figures 16(c), 20(c), and 24(c), also exhibit that the pattern of Hist[MGAMA_F(n)] is noticeably different from that of Hist[MGAMA_C(n)]. We summarize the values of Corr_FC of all five pairs of traces in Table 3, which shows that Corr_FC < 0.2 for all pairs of test traces.

5. Discussions and Conclusions

The maxima of rate bound of attack-contained traffic is not always higher than that of attack-free traffic, see Figure 7. Statistically, however, it is higher than that of attack-free traffic significantly as can be seen from the experimental results illustrated by Figures 8(c), 12(c), 16(c), 20(c), and 24(c). In addition, the results expressed in Table 3 indicate that the pattern of Hist[MGAMA_F(n)] is obviously different from that of Hist[MGAMA_C(n)]. Thus, the results in this paper suggest that the histogram of the maxima of traffic rate bound may yet be a traffic feature to distinctly identify abnormal variation of traffic under DDOS flood attacks.

In comparison with fractal model of traffic as discussed in , the present feature has an apparent advantage. Recall that statistical models like LRD processes, see, for example, , are usually for traffic in the aggregate case, but there is lack of evidence to use them to characterize statistical patterns of real traffic at connection. As a matter of fact, finding statistical patterns of traffic at connection may be a tough task. To overcome difficulties in describing traffic at connection level, bounded modeling is introduced [25–29]. Thus, if we let be all flows going through server from input link and let be the maximum traffic constraint function of , the present analysis method of traffic is technically sound and usable for but fractal models may not. Since the bounded models of traffic are mainly used at connection level in some applications, such as real-time admission control, it is clear that the present traffic feature for identifying abnormal variation of traffic under DDOS flood attacks can be extracted at early stage of attacks.

Appendices

These appendices gives experimental results for three pairs of traces. They are OM-W1-3-1999AF and OM-W2-3-1999CF, OM-W1-4-1999AF and OM-W2-4-1999CF, and OM-W1-5-1999AF and OM-W2-5-1999CF. The values of Corr_FC for each pair of traces are given in the captions of Figures 16(c), 20(c), and 24(c), respectively.

A. Experiments for OM-W1-3-1999AF and OM-W2-3-1999CF

See Figures 13, 14, 15, and 16.

B. Experiments for OM-W1-4-1999AF and OM-W2-4-1999CF

See Figures 17, 18, 19, and 20.

C. Experiments for OM-W1-5-1999AF and OM-W2-5-1999CF

See Figures 21, 22, 23, and 24.

Acknowledgments

This work was supported in part by the 973 plan under the project number 2011CB302801/2011CB302802, by the National Natural Science Foundation of China under the project grant numbers, 60873264, 61070214, 61173096, by Zhejiang Provincial Natural Science Foundation of China (R1110679), and by the University of Macau.