Mathematical Problems in Engineering

Volume 2008, Article ID 475878, 11 pages

http://dx.doi.org/10.1155/2008/475878

## Detection of Variations of Local Irregularity of Traffic under DDOS Flood Attack

^{1}School of Information Science and Technology, East China Normal University, No. 500, Dong-Chuan Road, Shanghai 200241, China^{2}Rensselaer Polytechnic Institute, 110 8th Street, Troy, NY 12180-3590, USA

Received 24 March 2008; Accepted 1 April 2008

Academic Editor: Cristian Toma

Copyright © 2008 Ming Li and Wei Zhao. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

The aim of distributed denial-of-service (DDOS) flood attacks is to overwhelm the attacked site or to make its service performance deterioration considerably by sending flood packets to the target from the machines distributed all over the world. This is a kind of local behavior of traffic at the protected site because the attacked site can be recovered to its normal service state sooner or later even though it is in reality overwhelmed during attack. From a view of mathematics, it can be taken as a kind of short-range phenomenon in computer networks. In this paper, we use the Hurst parameter (*H*) to measure the local irregularity or self-similarity of traffic under DDOS flood attack provided that fractional Gaussian noise (fGn) is used as the traffic model. As flood attack packets of DDOS make the *H* value of arrival traffic vary significantly away from that of traffic normally arriving at the protected site, we discuss a method to statistically detect signs of DDOS flood attacks with predetermined detection probability and false alarm probability.

#### 1. Introduction

IP Networks are subject to electronic attacks [1]. An intrusion detection system (IDS) collects information from a variety of systems and network sources to analyze the information of attack signs. A network-based IDS monitors the traffic on its network as a data source [2]. For distributed denial-of-service (DDOS) flood attack, an intruder bombs attack packets upon a site (victim) with a huge amount of traffic the sources of which are distributed over the world [3]. Hence the pattern of traffic under DDOS flood attack may suddenly differ significantly from the normal pattern of the arrival traffic. From the perspective of dynamical aspects for limited time interval in physics [4], one may regard this sudden change as a specific “pulse.” Though DDOS flood attack may not be a sole factor to make traffic pattern vary significantly, we assume that secure officers can distinguish significant variation of monitored traffic pattern caused by other known factors (e.g., normally heavy traffic) from DDOS flood attack. Without confusions causing, the term abnormal traffic used in this paper specifically implies a traffic series that has significant variation of traffic pattern caused by DDOS flood attack.

In this research, we ponder two fundamental issues in detection. One is feature extraction of monitored traffic time series. The other is detection scheme that can be used to assure predetermined detection probability and false alarm probability . The first issue will be discussed in Section 2 from a view of feature extraction of traffic based on self-similarity of traffic. The second will be dissertated in Section 3 based on statistical detection. Section 4 will explain the performance analysis of the present detection system. A case study is demonstrated in Section 5. Discussions are given in Section 6, which is followed by conclusions.

#### 2. Feature Extraction of Traffic

##### 2.1. Self-Similar Traffic

Computer scientists in the last decade discovered that traffic is a type of fractal time series. It has the properties of self-similarity, long memory, and multiscales (see e.g., [5]). A commonly used model in traffic engineering is fractional Gaussian noise (fGn) (see e.g., [6–8]).

Let be Wiener Brownian motion. Let be fractional Brownian motion with the Hurst parameter [9]. Let be Gamma function. Then by using fractional calculus, is expressed by

Let be the increment series of : where *a* is a real number. Then is fGn [9]. The autocorrelation
function (ACF) of fGn in the discrete case is given by where is the intensity of fGn [10]. The
normalized ACF of fGn is given by The
relationship between the fractal dimension of fGn and *H* is given by

Approximating the right side of (2.3b) with the second-order differential of , see [9, H15, page 350], for , yields

Let *y* and *R* be a
traffic series and its ACF, respectively. Then according to (2.5), where ~ implies the asymptotical equivalence under the
limit and is a constant [11].

The ACF (2.5)
is nonsummable for , implying long-range dependence (LRD). Hence *H* is a measure of LRD of traffic. It is kindly noted that LRD of
traffic does not mean that DDOS attacking is a long-range phenomenon. On the contrary,
DDOS attacking and its detection are short-range phenomena since both sides,
namely, an attacker and its opponent, are engaged with each other during a
short period of time. Such a battle makes local irregularity of traffic vary dramatically [12].

Without
losing generality, we consider traffic series *y* in the discrete case. By
dividing *y* into nonoverlapping blocks of size *L* and averaging
over each block, we obtain another series given by According to the analysis in [5, 9, 11], in the fGn sense, one has where Var implies the variance
operator. Thus the self-similarity is measured by *H*.

A series
encountered in engineering is usually of finite length. Let *y* be a
series of *P* length. Divide it into *N* nonoverlapping sections.
Each section is divided into *M* nonoverlapping segments. Divide each
segment into *K* nonoverlapping blocks. Each block is of *L* length. Let be the series with aggregated level *L* in
the *m*th segment of the *n*th section . Let
be the *H* value of Let be the measured ACF of in the normalized case. The theoretic ACF form
corresponding in the fGn sense is given by The above expression exhibits the multifractal property of
traffic as can be seen from [13].

Let be the cost function. Then one has Averaging in terms of index *m* yields representing
the *H* estimate of the series in the *n*th section.

Usually, for . However, stationarity of traffic time
series implies that at a specific site is a number falling
within a certain confidence interval [5, Paragraph 5, Section 5, page 966]. In
practical terms, a normality assumption for is quite
accurate in most cases for regardless of probability
distribution function of *H* [14]. Thus we take as a mean estimate of *H* of *x*, where *E* is the mean operator. It can be taken as a template of *H* of *x* for
the purpose of statistical detection. The appendix gives a case of the *H* estimation of a real-traffic series to clarify the reasonableness of *H* in featuring traffic time series.

##### 2.2. Characterizing Traffic Time Series with

Let *x* be normal
traffic time series. Normally, the site serves *x* peacefully though *x* may
sometimes be unpleasantly delayed because of the normal traffic jam. The
arrival traffic *x* is contributed by
many connections distributed all over the world. Figure 1 shows *x* contributed by traffic from *d* connections. From previous discussions, we see that *x* can be characterized by the Hurst
parameter and we denote it as .

Assume that the site is
intruded by DDOS flood attacking. Then actual arrival traffic (abnormal
traffic) consists of normal traffic *x* and attack traffic *a*, see
Figure 2, where *a* is contributed by *e* connections. We use as a feature of *y*.

#### 3. Detection Method and System Structure

To explain
our detection principle, we introduce three terms. Correctly recognizing an
abnormal sign is termed *detection*;
failing to recognize it, *miss*;
mistakenly recognizing a normal as abnormal is *a false alarm*.

Let . Then represents the deviation of *H* of
monitored traffic time series. Let be the threshold. Then the
detection hypotheses are as follows. , implies detection, while represents false alarm, where stands for *H* which is not used as the template
but obtained when there is no attacking. Clearly, and are random variables. Mathematically, there
are many distance measures available [15–17], but the
following works well:

According
to the previous discussions, we give the system diagram in Figure 3. The
measured arrival traffic first passes through an *H* estimator. The result
of *H* estimator goes to template database to produce the template . In addition, it outputs an online estimate of . and are compared in the distance
detector. The comparison result is
fed into threshold detector to compare with a given threshold *V*. In the
stage of decision analysis, the output of the threshold detector is analyzed
and its output gives a sign of detection according to preset detection
probability and false alarm probability.

#### 4. Performance Analysis

With the
partition explained in Section 2, we see that there is a value of representing the deviation of *H* of *y* in each segment. Therefore, in each
section, is a random sequence of *M* length. Denote as the expectation of in each section. Then is a random sequence of *N* length. In the case of well obeys Gaussian distribution [14]. For the simplicity, we still denote as .

##### 4.1. Detection Probability

Let and be the expectation and the variance of , respectively. Then Let Then detection probability is given by

##### 4.2. False Alarm Probability

Let and be the mean and the variance of . Then false alarm probability is given by

##### 4.3. Miss Probability

Let be miss probability. Then

Generally, . Besides, the numeric computation in data processing can be arranged such that . In this case, three probabilities are given by Figures 4–6 show the curves of three distributions, respectively. As , high implies low and vice versa.

##### 4.4. Threshold and Detection Region

As can be
seen from the previous discussions, the selection of a threshold value is
crucial to our system. In fact, given a false alarm probability *f*, we want to find the threshold such that . Clearly, If and when the selected precision is 4, we obtain Given a detection probability *d*, we want to find the threshold such that . Clearly, In the case of , Therefore, when and and are assured. That is, In the case of and , The constraint of (4.12) is given
by .

Obviously, the detection region is the intersection of three probability functions. Under the condition of and , the detection region is shown in Figure 7.

#### 5. A Case Study

Suppose the template as described in the appendix. Assume that the confidence level is 99.9999%. Thus we suppose or during the transition process of intrusion. In this case study, 1000 points of in or (0.7673,0.9900) are randomly selected to simulate the abnormal traffic deviating from the normal one. The error sequence is indicated in Figure 8. By the numeric computation, we obtain and . Therefore, we obtain the probability distributions for detection, false alarm and miss as shown in Figure 9. Under the conditions of and , we obtain and . Hence when we select , we have 99.9999% confidence to say that and are assured, which can be easily observed from Figure 9.

#### 6. Discussions

Since Yahoo servers were successfully attacked in 2001,
the issue of detecting DDOS flood attacking has been paid much attention to.
Various methods and systems have been proposed, see, for example, [18–25]. As known,
traffic under DDOS flood attack must be significantly different from that of
normal one [25]. Otherwise, DDOS flood attack would have no effect. From this
point of view, the value of *H* of traffic under DDOS flood attacks is
considerably different from that of normal one, see [12] for details.

For a stationary random time
series of finite length, ACF and power spectrum density (PSD) function are
commonly used in engineering for feature extraction in statistical
classifications [16, 17]. However, the PSD of traffic does not exist in the
domain of ordinary functions since it has long memory [8]. To avoid such a
difficulty in mathematics, consequently, ACF of traffic is considered for
feature extraction in our early work [25]. This paper focuses on detection of
local variations of traffic based on the self-similarity of traffic. Thus it
suggests a new method that substantially develops the work of [25], from the
point of view of traffic pattern matching, because feature extraction of
traffic time series by using a single parameter *H* makes pattern matching
more efficient.

#### 7. Conclusions

We have discussed the characterization of the local irregularity of traffic by . We have explained a principle of statistical detection to capture signs of DDOS flood attacking with predetermined detection probability and false alarm probability based on the variation of the local irregularity of traffic.

#### Appendix

#### Demonstration of Estimation of a Real-Traffic Series

This appendix gives a demonstration
with a real-traffic series, named LBL-PKT-4 [26, 27]. Denote as the series of LBL-PKT-4, indicating the number of bytes in the *i*th
packet. The length of that series is 1.3 million. The first 1024 points of that
series is plotted in Figure 10(a). Divide into 32
nonoverlapping sections. Computing *H* in each section yields as shown in Figure 10(b). Its histogram is indicated in
Figure 10(c).

According to (2.13), we have . The confidence interval with 95% confidence level is [0.7670,0.7672].
Hence we have 95% confidence to say that the *H* estimate in each section
of that series takes as its approximation with fluctuation
not greater than . Further, it is easy to obtain that the
confidence interval with 99.9999% confidence level is [0.7669, 0.7673]. Hence
we have 99.9999% confidence to say that the *H* estimate in each section
of that series takes as its approximation with fluctuation
not greater than .

#### Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under the project Grant no. 60573125. Wei Zhao’s work was also partially supported by the NSF (USA) under Contracts no. 0808419, 0324988, 0721571, and 0329181. Any opinions, findings, conclusions, and/or recommendations in this paper, either expressed or implied, are those of the authors and do not necessarily reflect the views of the agencies listed above.

#### References

- G. Coulouris, J. Dollimore, and T. Kindberg,
*Distributed Systems: Concepts and Design*, Addison-Wesley, Reading, Mass, USA, 3rd edition, 2001. - E. G. Amoroso,
*Intrusion Detection: An Introduction to Internet Surveillance, Correlation, Traps, Trace Back, and Response*, Intrusion.Net Book, Sparta, NJ, USA, 1999. - L. Garber, “Denial-of-service attacks rip the Internet,”
*Computer*, vol. 33, no. 4, pp. 12–17, 2000. View at Publisher · View at Google Scholar - G. Toma, “Practical test functions generated by computer algorithms,” in
*Proceedings of the International Conference on Computational Science and Its Applications (ICCSA '05)*, vol. 3482 of*Lecture Notes in Computer Science*, pp. 576–584, Singapore, May 2005. View at Publisher · View at Google Scholar - W. Willinger and V. Paxson, “Where mathematics meets the Internet,”
*Notices of the American Mathematical Society*, vol. 45, no. 8, pp. 961–970, 1998. View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet - M. Li, W. Zhao, W. Jia, D. Long, and C.-H. Chi, “Modeling autocorrelation functions of self-similar teletraffic in communication networks based on optimal approximation in Hilbert space,”
*Applied Mathematical Modelling*, vol. 27, no. 3, pp. 155–168, 2003. View at Publisher · View at Google Scholar · View at Zentralblatt MATH - B. Tsybakov and N. D. Georganas, “Self-similar processes in communications networks,”
*IEEE Transactions on Information Theory*, vol. 44, no. 5, pp. 1713–1725, 1998. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet - A. Adas, “Traffic models in broadband networks,”
*IEEE Communications Magazine*, vol. 35, no. 7, pp. 82–89, 1997. View at Publisher · View at Google Scholar - B. B. Mandelbrot,
*Gaussian Self-Affinity and Fractals*, Springer, New York, NY, USA, 2002. View at Zentralblatt MATH · View at MathSciNet - M. Li and S. C. Lim, “A rigorous derivation of power spectrum of fractional Gaussian noise,”
*Fluctuation and Noise Letters*, vol. 6, no. 4, pp. C33–C36, 2006. View at Publisher · View at Google Scholar · View at MathSciNet - J. Beran,
*Statistics for Long-Memory Processes*, vol. 61 of*Monographs on Statistics and Applied Probability*, Chapman and Hall, New York, NY, USA, 1994. View at Zentralblatt MATH · View at MathSciNet - M. Li, “Change trend of averaged Hurst parameter of traffic under DDOS flood attacks,”
*Computers & Security*, vol. 25, no. 3, pp. 213–220, 2006. View at Publisher · View at Google Scholar - M. Li and S. C. Lim, “Modeling network traffic using generalized Cauchy process,”
*Physica A*, vol. 387, no. 11, pp. 2584–2594, 2008. View at Publisher · View at Google Scholar - J. S. Bendat and A. G. Piersol,
*Random Data. Analysis and Measurement Procedures*, John Wiley & Sons, New York, NY, USA, 3rd edition, 2000. View at Zentralblatt MATH - M. Basseville, “Distance measures for signal processing and pattern recognition,”
*Signal Processing*, vol. 18, no. 4, pp. 349–369, 1989. View at Publisher · View at Google Scholar · View at MathSciNet - K. S. Fu, Ed.,
*Digital Pattern Recognition*, Springer, Berlin, Germany, 2nd edition, 1980. View at Zentralblatt MATH - A. R. Webb,
*Statistical Pattern Recognition*, Edward Arnold, London, UK, 1999. View at Zentralblatt MATH - M. Li and W. Zhao, “A statistical model for detecting abnormality in static-priority scheduling networks with differentiated services,” in
*Proceedings of the International Conference on Computational Intelligence and Security (CIS '05)*, vol. 3802 of*Lecture Notes in Computer Science*, pp. 267–272, Springer, Xi'an, China, December 2005. View at Publisher · View at Google Scholar - V. Paxson, “Bro: a system for detecting network intruders in real time,” in
*Proceedings of the 7th USENIX Security Symposium*, San Antonio, Tex, USA, January 1998. - W. Yu, D. Xuan, and W. Zhao, “Middleware-based approach for preventing distributed deny of service attacks,” in
*Proceedings of IEEE Military Communications Conference (MILCOM '02)*, vol. 2, pp. 1124–1129, Anaheim, Calif, USA, October 2002. - P. Innella and O. McMillan, “An introduction to intrusion detection systems, tetrad digital integrity, LLC,” December 2001, http://www.securityfocus.com/infocus/1520/.
- http://en.wikipedia.org/wiki/Denial-of-service_attack/.
- http://www.sans.org/dosstep/index.php/.
- R. Bettati, W. Zhao, and D. Teodor, “Real-time intrusion detection and suppression in ATM net-works,” in
*Proceedings of the 1st USENIX Workshop on Intrusion Detection and Network Monitoring*, Santa Clara, Calif, USA, April 1999. - M. Li, “An approach to reliably identifying signs of DDOS flood attacks based on LRD traffic pattern recognition,”
*Computers & Security*, vol. 23, no. 7, pp. 549–558, 2004. View at Publisher · View at Google Scholar - http://www.acm.org/sigcomm/ITA/.
- V. Paxson and S. Floyd, “Wide area traffic: the failure of Poisson modeling,”
*IEEE/ACM Transactions on Networking*, vol. 3, no. 3, pp. 226–244, 1995. View at Publisher · View at Google Scholar