Abstract

Network anomaly detection and localization are of great significance to network security. Compared with the traditional methods of host computer, single link and single path, the network-wide anomaly detection approaches have distinctive advantages with respect to detection precision and range. However, when facing the actual problems of noise interference or data loss, the network-wide anomaly detection approaches also suffer significant performance reduction or may even become unavailable. Besides, researches on anomaly localization are rare. In order to solve the mentioned problems, this paper presents a robust multivariate probabilistic calibration model for network-wide anomaly detection and localization. It applies the latent variable probability theory with multivariate t-distribution to establish the normal traffic model. Not only does the algorithm implement network anomaly detection by judging whether the sample’s Mahalanobis distance exceeds the threshold, but also it locates anomalies by contribution analysis. Both theoretical analysis and experimental results demonstrate its robustness and wider use. The algorithm is applicable when dealing with both data integrity and loss. It also has a stronger resistance over noise interference and lower sensitivity to the change of parameters, all of which indicate its performance stability.

1. Introduction

Network traffic anomalies are unusual and significant changes at network’s traffic level. Intrusions such as DDos attacks and zombie networks significantly jeopardize the Internet security, and network jams and malfunctions have unpleasant impact on service quality; therefore it is critical to detect and locate network anomalies for both network operators and end users. It is a challenging task to detect and locate them because one must extract and interpret anomalous patterns from large amounts of high-dimensional, intricate, and noisy background traffic data.

There are a great number of researches on anomaly detection. Host-based anomaly detection system monitors and analyzes the internals of a computing system by applying data mining of the system logs and audit records [1, 2]; another detection method based on performance measurement data such as end to end round-trip time and packet loss probability in a single path can be implemented by single variable time series analysis [3, 4]; network anomaly detection based on traffic measurements from single link can be implemented by applying machine learning and signal analysis [5, 6]. All of these methods have their limits, because they only concentrate on a part of the information and their detection area is limited. When the scale of network enlarges and data transfer rate speeds up, many network anomalies often exhibit strong network-wide characteristics [7, 8], and their impact may always spread to multiple links, while their local characteristics may not be that obvious. It is difficult to conduct network-wide analysis with above methods and their accuracy cannot be guaranteed.

In order to solve the problems mentioned above, Lakhina et al. come up with network-wide anomaly detection based on subspace construction via PCA [9]. This method employs network traffic of many Origin-Destination (OD) flows to establish a model of normal behavior and detects anomalies by measuring deviations from that model. With the concept of network-wide detection, researches are conducted in space-time expansibility [1013], robustness [1416], real-time processing [17, 18], and anomaly measure [19, 20], which enrich network-wide anomaly detection. This kind of methods uses whole-network traffic data, which has huge performance advantages over single point, single path, and single link. At the same time, compared with other methods, it sets up normal behavior model which enables us to avoid building anomalous feature library. Thus it can be implemented to detect known anomalies as well as unknown anomalies, and it can be used widely. Network-wide anomaly detection improves detection performance by introducing wider and multidimensional network information, but it also faces some real problems when implemented in large scale and high speed backbone network. Firstly, because of its wider collection area, more collection equipment, and faster network speed, it might not be applicable if collected data were lost during collection or transfer process [21]; secondly, traffic flows in backbone network continue to grow in volume and complexity, and hidden noise like anomalous traffic could degrade performance of the anomaly detection algorithms [22], while some of the attacks might even pollute the detectors [1416]; thirdly, the above anomaly detection methods can only find when anomalies happen, but they still have some defects on locating those anomalies [22].

Therefore, we propose an approach named RMPCM based on robust multivariate probabilistic calibration model to overcome these problems discussed above. This anomaly detection and locating algorithm introduces a latent variable probabilistic model based on -distribution instead of a Gaussian distribution to establish a normal traffic model. By judging if the sample’s Mahalanobis distance from the normal model exceeds the threshold, traffic anomaly detection is achieved. Locating anomalies is attained with contribution analysis. RMPCM approach is more robust; not only can it be widely used for processing complete data as well as missing data, but also it acquires stronger robustness under the noise interference and lower sensibility of model parameters. The contributions of this paper consist of following 4 aspects; the RMPCM approach can

solve anomaly detection problem when data loss occurs by establishing a latent variable probabilistic model,

increase detection accuracy by introducing multivariate -distribution to relieve noise interference in modeling the normal traffic behavior,

correctly locate the underlying Origin-Destination (OD) flows being the source of the anomaly,

reduce the amount of work involved in implementing complicated parameter testing, because RMPCM has a better stability and lower sensibility for model parameters.

This paper is organized as follows. We begin in Section 2 with a discussion of the related work. We describe data source model and problems that need to be solved in Section 3. In Section 4, we describe the RMPCM approach in detail and solve three problems raised in Section 3. We validate our approach in three different ways of experiments and contrast our RMPCM with existing approach on traffic anomaly detection in Section 5. A discussion of several details is presented in Section 6. Concluding remarks and our ongoing work are presented in Section 7.

Back in 1987 Denning had demonstrated statistic model for detecting network anomalies [23]. And it is becoming more and more important with the development of the Internet. There are many traditional anomaly detection approaches based on host computer, single path, and single link. Researches in [7] indicated the generation and development of network anomalies have exhibited a tendency of network-wide characteristic. They found that the performance of the anomaly detectors increases with enlarging the range of detection beyond linear growth, which sets up the foundation for network-wide anomaly detection. The authors of [9] proposed network-wide anomaly detection algorithm based on network traffic for the first time, which illustrated low dimensionality of OD flows. They also integrated traffic statistics of multiple OD flows to build up a model of normal behavior and detected anomalies by measuring deviations from that model. The authors of [11] came along to improve the anomaly detection approach based on PCA by applying stochastic matrix perturbation theory and proposed a PCA-based distributed approach for network-wide anomaly detection. Reference [13] expanded the classical PCA and proposed the Karhunen-Loeve expansion for network-wide anomaly detection. Reference [17] proposed an online anomaly detection approach using kernel recursive least squares algorithm to solve the problem of online detection. All of them did not address the problems of how to detect anomalies in the condition of noise interference and data loss and how to locate the anomalies in the actual network environment.

The authors of [22] took in-depth study on the influence of anomalous traffic on the performance of detector and indicated that large anomalies may cause the offset of normal model based on PCA, which increased the false positive rate (FPR) of anomaly detection. References [14, 15] took further steps to study poisoning attacks on anomaly detectors and evaluated poisoning techniques and developed defense. The authors of [16] listed 3 mechanisms of poisoning attacks and proposed defense based on robust PCA with projection pursuit. All of the above only focus on the poisoning and defense techniques based on PCA detector, but there is lack of researches on common modeling approaches in the intricate and noisy environment.

Data loss is very common in many fields, and the question of how to get enough information from missing data needs to be answered. The authors of [24, 25] proposed an algorithm to solve the problem of goodness-of-fit test for varying coefficient models with missing data. The authors of [21] gave their opinion on data loss problem when network traffic flows were measured in large scale and high speed backbone network; they proposed an approach of sparsity regularized matrix factorization (SRMF) to make the data complete. This is applicable in traffic engineering, capacity planning, forecasting, and so forth, but it did not research deeply in network anomaly detection when facing data loss.

Network anomaly detection can determine when anomalies take place, but locating anomaly is an extremely challenging task. The authors of [22] point out the deficiency of network-wide detection algorithm based on PCA in locating anomaly. Reference [26] then proposed an approach of BasisDetect for network-wide anomaly detection and locating, but it could only locate anomalies to border router; it was unable to pinpoint the position.

In this paper we propose a network-wide anomaly detection algorithm based on RMPCM, which will later be proved to have a better performance in solving problems of noise interference, data loss, and locating anomalies.

3. Data Model and Problems Description

3.1. Data Model

Conventionally the researches of the Internet traffic flow mainly focused on temporal characteristics of data package on a single link, which help in developing concepts of self-similar stochastic processes, long-range dependence, and so forth. One ISP (Internet service provider), however, consists of hundreds of those links which are connected all over, and the Internet contains several thousand ISPs. In such a vast background the spatial characteristics of network traffic come to people’s attention inevitably. However, it is difficult to analyze traffic flow data of all links in the network simultaneously, because it amplifies the complexity of modeling traffic on a single link which is itself a complicated task. As compact and elegant descriptions of traffic flows between nodes in a certain network structure, traffic matrix is a constantly employed model to conduct explorations on the spatiotemporal component of network-wide traffic. Traffic matrix is an overview of network-wide traffic. Instead of studying traffic on all links, applying traffic matrix provides more straightforward and fundamental insights into network-wide traffic study [8]. We employ traffic matrix at PoP (point of presence) level as data source in our research.

Traffic matrix at PoP level: assume that an autonomous system (AS) has PoPs. Continuous measuring of the traffic between each PoP pair at a certain period can obtain the traffic of Origin-Destination (OD) flows. An OD flow denotes the collection of all traffic that enters the network from an ingress node and departs from an egress node. Arranging these point-to-point measured values in matrix can obtain this AS’s traffic matrix . As shown in Figure 1, denotes the number of measurement periods, and denotes the number of measured value of OD flows at each measurement . The element in denotes the volume of a certain traffic measure at the th period and the th OD. Traffic volume (the number of bytes, packets, or flows) is adopted as a traffic measure in this paper.

3.2. Problem Description
3.2.1. Data Loss

In the process of collecting data, data loss may occur. This is because massive data in high speed backbone network may increase burden of collecting equipment and reduce its stability. Another reason is due to network congestion, equipment or link malfunctions when transferring data.

Traffic data loss is not all completely random, and many of the cases are highly structural. In order to describe the scenario of data loss in the process of data collection and transfer, four kinds of loss mechanisms are adopted.

(1) PureRandLoss. The elements in are missing independently at random with probability . This may be due to unexpected congestion by chance in measuring equipment or unreliable transfer mechanism.

(2) PeriodRandLoss. The rows of traffic matrix are corresponding to measurement periods, and PeriodRandLoss means the rows in are missing at random with probability . The reason of this type of structured loss may be that storage devices are overloaded or program breaks down when centralized processing large number of measured data during this period.

(3) ODRandLoss. The columns of traffic matrix are corresponding to OD flows, and ODRandLoss means the columns in are missing at random with probability . This type of structured loss could stem from OD identification error caused by either flow filtration or links/routers’ malfunction.

(4) PieceRandLoss. PieceRandLoss means the submatrixes in are missing at random with probability . This type of structured loss may be caused when storage devices are full but they are still keeping collecting data for several periods or devices’ breakdown for some time, which is corresponding to data loss of traffic matrix multiple adjacent columns. Note that the case single row PieceRandLoss corresponds to PeriodRandLoss, and single column PieceRandLoss corresponds to ODRandLoss.

3.2.2. Noise Interference

Model parameter estimation often uses maximum likelihood estimation (MLE) when samples are known, and the probability distribution of the samples is needed. However, it is very troublesome to accurately describe the distribution of data; therefore it is always assumed that the data generally follows the Gaussian distribution because its nice analytical property always yields tractable algorithms. The MLE is equivalent to the least square estimation in the linear Gaussian regression model which is noted for its unduly sensitive to atypical samples such as outliers and it would affect the accuracy of the model [27]. In real network measurement collected traffic data contains anomalous traffic which is outliers and will have a huge interference in the establishment of model if a Gaussian noise mode is selected. The question of how to set up a more accurate model of normal traffic under this circumstance needs to be addressed promptly.

3.2.3. Anomaly Localization

The time at which anomalies take place can be determined by implementing network anomaly detection, but locating anomalies is crucial if we want to pinpoint and solve security problems more precisely. Locating anomalies in this paper is corresponding to pointing out the intersections of the rows and columns of traffic matrix when anomalies occur. The row of corresponds to the time of anomaly occurrence, and the column of corresponds to the position where anomaly occurs (i.e., on one OD or a few OD).

4. RMPCM

The relationship between anomalous event generation and its detection and localization is shown in Figure 2. Anomaly events will affect part of traffic flows in a network and thus cause changes of corresponding statistics of overall network traffic. The network anomaly detectors can analyze these changes of statistics and raise an alarm; then network administrators can analyze anomalous traffic by locating anomalies to determine anomaly event. RMPCM which we propose consists of the process of anomaly detection and localization: firstly, model normal traffic with collected traffic data and then determine whether or not this sample is an anomaly by judging if the sample’s Mahalanobis distance exceeds the threshold. Secondly, contribution analysis is applied to anomalous samples in order to locate anomalies. The RMPCM method can be divided into 3 steps in detail which include normal traffic modeling, anomaly detection, and anomaly localization.

4.1. Introducing the Model

Applying network anomaly detection algorithm in the real world may encounter difficulties such as data loss in the process of transferring and collecting and modeling deviation caused by noise interference.

Traditional network anomaly detection algorithms will not be applicable any longer in the condition of incomplete data. It is considered to adopt Bayes method, but because of the complexity of network traffic data posterior mean estimation and asymptotic variance cannot be directly derived from this method. Therefore a latent variable probabilistic model is to be introduced, meaning some “latent data” are to be added in the known data in order to simplify the parameter estimation. In this process missing data along with unknown parameters treated as “latent data” will be solved by applying expectation-maximization (EM) algorithm to achieve the maximum likelihood estimation (MLE) of model parameters.

When computing MLE, probabilistic distribution of known data is required. Normally it is assumed that they are normally distributed, but because of some anomalous traffic, this assumption will cause parameter estimation to have a large deviation; therefore, multivariate Gaussian distribution is replaced by -distribution in our paper. Compared with Gaussian distribution, -distribution has heavier tails, which is a desirable property to handle data sets in the presence of anomalies. The explanation is as follows.

Specifically, suppose sample data are recorded ( represents the number of the samples), and one assumes that they are independent identically distributed Gaussian random vectors:in which is the dimension of the sample, is the mean vector function with parameter , and is the covariance matrix with parameter .

-distribution model replaces the above model, which is is also the dimension of the sample, is the location vector, is the scale matrix, and is degree of freedom for -distribution; probabilistic density function (p.d.f.) is calculated as where is the squared Mahalanobis distance and is Gamma function. When , -distribution model has a better robust against outlier’s interference over normal distribution by applying MLE. This is due to outliers that have relatively large Mahalanobis distance but comparably small contribution for model parameter estimation. The estimation of is taken as example to demonstrate the following.

For Gaussian distribution model (1), the equation for its MLE is , where is the corresponding log-likelihood and is the matrix of partial derivatives of with respect to ; the MLE of under -distribution model (2) satisfies , whereis the weight assigned to sample . The weight is a function of Mahalanobis distance which indicates that decreases while increases. The anomaly sample has a relatively larger ; therefore is smaller, meaning adopting -distribution in model parameter estimation lowers the sensitivity to anomaly samples. Compared with Gaussian distribution, -distribution model has a stronger robustness.

4.2. Normal Traffic Modeling
4.2.1. Normal Traffic Modeling in Noisy Traffic

RMPCM models normal traffic in noisy traffic by establishing a latent variable probabilistic model based on multivariate -distribution, and the procedure is illustrated in Figure 3.

In order to solve the problem that Gaussian noise models are too sensitive to atypical observations such as anomalous traffic observations, we suppose the noise is drawn from -distribution model instead of a Gaussian noise model. Assume each -dimensional latent vector comes from a linear probabilistic projection of -dimensional eigenvector , and we build a latent variable probabilistic model where we select a unit variance -distribution as the prior distribution on the latent vectors. The probabilistic model iswhere is the projection matrix, is the location vector, and is the unit matrix.

It is inapplicable to be analyzed and resolved using MLE directly. As noted in [28] -distribution model can be extended to an infinite Gaussian mixture model with the same mean where the prior distribution on is a Gamma distribution with parameters depending only on -distribution’s degrees of freedom :where is the probability density function of Gamma distribution. Based on (6), latent variable is introduced, and is a scalar, . As can be known in [29], if , then the conditional distribution of is Gaussian: . The latent variable model can be derived as follows. Figure 4 indicates the relationship of variables in the model:

In order to calculate model parameters, and are also needed to be calculated, and from (8) and (9), it can be solved that

Set , which is matrix, and the marginal distribution of follows -distribution, which is .

From (7) and (10) it can be calculated thatwhere .

From (8) and (9) it can be computed that where is matrix and follows -distribution, which is .

The above procedure establishes the latent variable probabilistic model by replacing Gaussian distribution model with -distribution model.

The model parameters need to be estimated. Parameter is the intrinsic dimension of traffic matrix . As shown in Figure 3 the value of is determined by scree plot based on PCA, and the details are in Section 6.1. Parameters are estimated by MLE, where corresponding log-likelihood isMLE can be calculated by applying EM algorithm. In order to simplify the calculation, REM (rapid expectation-maximization) is employed in this paper. REM remarkably increases the algorithm’s rate of convergence. This algorithm consists of 2 stages, estimating different parameters by applying EM algorithm in every stage, and does 2 stages’ iteration until satisfying the condition of convergence.

The First Stage. It is only estimating , ignoring in the first stage [30].

The log-likelihood isE-Step. As can be known from Jensen inequality, when , the inequality becomes equality, and the lower bound of is set. Its expectation can be calculated from (10), (11), and (14) and get (15), where is the expectation operator: where

M-Step. Maximization of with respect to can give the estimated value of . This process can be implemented by setting the partial derivative of with respect to at 0:

The Second Stage. In the second stage, REM algorithm introduces the latent variable and estimates parameters . The estimated value from the first stage is used in the second stage.

The log-likelihood is

E-Step. When , the expectation of can be calculated from (7), (8), and (9):where is in (16). Considerwhere denotes the digamma function in (23).

M-Step. Maximizing with respect to and gives the following updating formulas of :

The MLE of can be computed by solving the following equation by line search:

4.2.2. Normal Traffic Modeling with Data Loss

If data loss occurs on some dimensions, -dimensional data can still be used to conduct estimation for model parameters. The sample data can be divided into observed data and missing data; that is,Its mean and covariance can be also divided into blocks:

As noted in [31], , and then .

Thus

The procedures for normal traffic model establishment with data loss are shown in Figure 5. When calculating model parameters, the method of determining is the same as Section 4.2.1, and REM is also used to implement estimations of model parameters under the condition of data loss.

The First Stage

E-Step. Considerwhere

M-Step. Maximization of with respect to results in the updating formula of :

The Second Stage. In the second stage, the latent variable is introduced, and then the algorithm estimates parameters . The estimated value from the first stage is used in the second stage.

E-Step. Considerwhere is in (31). Consider

M-Step. Maximizing with respect to and gives the following updating formulas of :

The updating formula of is the same as (25).

The algorithm in Section 4.2.2 is applicable when data loss occurs as well as noise interference.

4.3. Anomaly Detection

Anomaly traffic flow samples of complex traffic flow data need to be determined by choosing measurement standards. There are mainly 2 strategies for determining anomaly samples: one is to determine whether samples are leverage outliers by judging if Hotelling’s exceeds the threshold; the other is to determine whether samples are orthogonal outliers by judging squared prediction error (SPE) which exceeds the threshold [32]. Because probabilistic model is established in this paper and Mahalanobis distance is employed to conduct anomaly measurement, there is no need for the two strategies [33].

For intact data samples, the squared Mahalanobis distance is

For samples containing some of dimension loss, the squared Mahalanobis distance is to be computed using its expectation:

Normal distribution “” control chart is adopted to determine anomaly, and the choosing reason is discussed in Section 6.2. When anomaly occurs in time series, corresponding distribution goes beyond the control boundary.

Establishing time series of the squared Mahalanobis distance as , its mean is , and the variance is , and then the configuration of “” control chart is as follows:Centre line: .Upper control limit: .Lower control limit: .

According to the Gaussian distribution, where is the probability distribution function of standard normal distribution.

Adopting normal distribution “” control chart, it is certain that anomaly occurs when the values deviate from by more than 3 times standard deviation , and its confidence coefficient is 99.74%.

4.4. Anomaly Localization

After confirming anomalous samples, it is needed to be determined which dimension (corresponding to OD) of the selected anomalous sample should be responsible for the anomaly, which is anomaly localization. The anomalous sample is a -dimensional vector, and is the th-dimensional variable of , adopting following contribution analysis (as shown in Figure 6) to locate OD of anomaly occurrence, which is divided into 2 stages.

Stage One. Eliminate any in , and get and .

Compute the conditional distribution of the selected anomalous sample with the th-dimensional variable missing from formula (28) to derive corresponding and , and compute the squared Mahalanobis distance with formula (37).

Calculate the contribution of using . It represents the degree of change in anomaly measurement after a certain dimension of anomaly sample missing.

Take each different value of , and repeat . If is smaller than the threshold of anomaly discriminant, the corresponding dimension is mainly responsible for the anomaly, since its elimination would bring the sample back to normal.

Stage Two. If there is not any in Step of Stage 1 smaller than the threshold of anomaly discriminant, it is indicated that the anomaly is not caused by anomalous traffic of only one OD flow but multiple OD, and then Stage 2 is to be carried out.

Arrange of every dimension of in descending order.

Remove the first two dimensions of after the arrangement in , and if is smaller than the threshold, the corresponding two dimensions are responsible for the anomaly.

Otherwise, sequentially increase the number of missing dimensions in the order arranged in until corresponding is smaller than the threshold, and then several missing dimensions are jointly responsible for the anomaly.

4.5. Algorithm Complexity Analysis

In RMPCM the major overheads are the inverse of and iterations of REM. is matrix; is dimensions of corresponding to the number of columns of traffic matrix, which is the number of OD . In the calculation process, directly computing affects the algorithm complexity significantly, and Woodbury matrix identity is adopted in the paper: , where is matrix. is determined by applying PCA dimensionality reduction, and (refer to Section 6.1). Calculating the inverse of matrix is changed into inverting matrix , which simplifies the algorithm complexity noticeably, and the time complexity is . The time complexity is also related with the iterations of EM algorithm. Iterative times are smaller than 15 in the paper. Applying Matlab to run RMPCM anomaly detection algorithm on selected data, execution time is shown in Table 1, and computer specifications are Windows 7, i7 3.5 GHz CPU, and 4 GB RAM. In the process of implementation the calculation will be faster when converting Matlab code into binary executable program.

5. Evaluation

Normally there are 3 methods that can be applied to assess the performance of network anomaly detection algorithm: network traffic simulation experiment [10, 1416], testbed experiment [19], and real network data analysis [9, 10, 20, 32, 34]. There are pros and cons regarding each method. For example, network traffic simulation experiment produces synthetic data which can be entirely controlled by researchers, but it is not so close to reality; real network data analysis is closest to reality but it is difficult to establish a benchmark; testbed experiment gives attention to both reality and controllability, but the real output data might also deviate from the experiment settings. In order to assess the performance of RMPCM more objectively, all the three of them were combined to conduct analysis. Experimental content and method are shown in Table 2.

RMPCM will be compared with the anomaly detection method based on subspace construction via PCA and its improved method ANTIDOTE [15] to assess its real performance. As it is known, the anomaly detection method based on PCA is accepted and applied commercially (a commercial anomaly detection system (NetReflex by Guavus) is based on the well-known anomaly detector using PCA [34]). Therefore, the results should be persuasive.

5.1. Network Traffic Simulation Experiment

Network anomalous traffic, especially some poisoning attack traffic, may cause the skewing of detection model, which would significantly decrease the performance of anomaly detectors [14, 22]. In this section the performance of RMPCM under poisoning will be assessed.

5.1.1. Anomaly Generation

The structure of network-wide traffic was revealed for the first time by Lakhina et al., which is that OD flows consist of large number of periodic and deterministic trends, some of the noises, and few of spikes [8]. In order to be closer to the reality, the real network data as shown in Table 1 is chosen as the blueprint of simulation, and Discrete Wavelet Transform is adopted to obtain periodic and deterministic trends of the real data which are the normal daily mode by eliminating noises and spikes [10]. After that, artificial noises and anomalies are synthetically injected into the periodic daily mode components. The process is as follows.

Daubechies 5 orthogonal wavelet is adopted for multiresolution analysis, and high-frequency components containing noise and anomalies are removed by filtration from 121 OD flows’ data in the traffic matrix, which computes the periodic smoothing components as the basis, as shown in Figure 7(a).

Zero-mean Gaussian white noise is injected after , as shown in Figure 7(b).

Typical anomalies are injected after and , as shown in Figure 7(c).

121 OD flows are produced in this way. Data collection occurs every 5 minutes, which is marked as one collection cycle. Traffic matrix with 2016 rows and 121 columns is created with one week’s collections in 121 OD flows.

As we mainly focused on traffic volume anomalies, five kinds of typical anomalies were simulated: DoS, DDoS, ALPHA, ingress/egress shift, and flash crowd. Their brief description is shown in Table 3. Table 4 indicates how we have them injected.

5.1.2. Poisoning Generation

The poisoning method Add-More-If-Bigger discussed in [15] was employed in this section. It suggests injecting poisoning traffic flow into targeted OD flow to increase its variance, which skews the detector’s normal traffic model and thereby increases the escape probability of anomalies. This poisoning method requires attackers to grasp the real-time traffic volume of background flows. Add-More-If-Bigger means attackers inject poisoning traffic when the background traffic volume exceeds its mean value. The volume of poisoning traffic is , where , is the volume of targeted OD traffic at the moment of . is a function with parameter which determines the degree of poisoning. In this experiment means medium poisoning, and means high poisoning.

5.1.3. Anomaly Detection under Poisoning

Anomalies were injected according to the method in Section 5.1.1. Different anomalies usually have different characteristics; thus injection should also be used in a different way. Usually DoS or DDoS attacks are the gradually increasing traffic volume of single or multiple OD flows, which normally lasts 5 to 30 minutes; ALPHA is always the sharp rise of single OD flow with uncertain duration; flash crowd occurs when traffic volume of multiple OD flows uprises fast and goes back to normal gradually; ingress/egress shift is likely to show a step change of 2 OD flows’ volume, which normally lasts a long time until the routing policy change. According to their own properties, 5 ALPHA attacks were injected in 100~500 with its volume accounting for 50% of the mean volume of OD flows; each of them lasted 20 minutes; at 600, an hour’s flash crowd anomaly was injected into OD flows from 3 origin nodes to one destination node, the volume of which was 10%~40% of the mean volume of OD flows; at the time of 700, 200 minutes’ ingress/egress shift were injected, and the volume of transferred traffic was 50% of the mean of OD flows; 5 Dos attacks were injected in 1000~1400 with its volume accounting for 30%~50% of the mean volume of OD flows, and every one of them lasted 30 minutes; at the time of 1700, 30 minutes’ DDoS attacks were injected into OD flows from 4 origin nodes to 1 destination node, and the volume of the anomalies is from 5% to 30% of the mean volume of OD flows.

Subspace construction via PCA and ANTIDOTE was selected in order to be compared with RMPCM under exactly the same circumstance to evaluate its detection performance under poisoning, and we plot the squared norm of the residual vector as a function of time to show PCA and ANTIDOTE method results. Receiver Operations Characteristic (ROC) curve is also applied to the overall performance estimation of three above methods. The axis means false positive rate (FPR), and axis represents true positive rate (TPR). ROC curve can display the relationship between TPR and FPR with the change of thresholds. The area under the curve is the performance indicator. The larger the area is, the better the performance is.

The experiment was conducted with the data generated in one week to compare those three methods. Three ROC curves are drawn corresponding to no poisoning, medium poisoning, and high poisoning (Figure 10), and part of the experiment results is shown in Figures 8 and 9. Before poisoning (Figure 8) these three methods have a similar performance. After medium poisoning (Figure 9) the volume of the residual vector of background traffic increases significantly in PCA method, and the performance of PCA reduces dramatically because lots of anomalies submerge in the background traffic (Figure 9(b)); ANTIDOTE performs well against poisoning, but low volume anomalies could easily escape its detection (Figure 9(c)); RMPCM has a stable performance, which does not decline too much after injection (Figure 9(a)). Figure 10 reveals that poisoning has a large impact on PCA anomaly detector, and FPR increases quickly after TPR exceeded 60%; RMPCM and ANTODOTE have a better antipoisoning ability; the detection performance of RMPCM is better than ANTIDOTE because its ROC curve rises up more quickly.

5.1.4. Anomaly Localization Test

Table 5 illustrates the setting of anomaly locating test, and a single OD or multiple OD flows were injected with anomalies generated in Section 5.1.3. The result of anomaly localization test in RMPCM method is shown in Figure 11, the histograms represent anomaly contribution degree of each OD at the time of anomaly occurrence, and the result agrees with the setting from Table 5. As shown in Figure 11(a), anomaly contribution degree of OD50 noticeably exceeds the contribution degree threshold; therefore it is certain that anomaly at the time of 300 occurs on OD50; from Figures 11(b), 11(c), and 11(d) it is found that there is no single OD contribution degree exceeding the threshold, and this requires locating multiple OD when anomalies occur.

5.2. Testbed Experiment

Cyber-defense technology experimental research laboratory testbed (DETERLab) [35] proposed by the University of Southern California is chosen to conduct the testbed experiment. Not only can DETERLab interconnect nodes in the prototype system with any topological structure, but also its configuration for experiment condition is flexible. It provides researchers with needed background traffic and attack traffic injection for conducting network attack-defense experiments and then deploys and evaluates feasible solutions. It can fully integrate local hardware resources and has a better simulation advantage over software such as NS2.

In our experiment attacking tools based on Metasploit frame was integrated into DETERLab’s security experimentation environment (SEER) toolset to generate multiple anomalies on the DETERLab platform. Our experiment sets up 10 PoP nodes and chooses the adjacent node of each PoP node as collection device, and the topological configuration is shown in Figure 12. The duration of the experiment is one week, data are collected every 5 minutes which is a collection cycle, and there are total 2016 cycles for a week. The collecting data is byte count of traffic flows.

5.2.1. Anomaly Detection in Noisy Traffic

In order to verify RMPCM’s performance in noisy traffic, the experiment set up three situations to compare with the detection method based on subspace construction via PCA, which test the detection accuracy, factors impacting performance, and poisoning of large anomalies, respectively, in the two methods.

At the time of 500 and 1000 DoS attacks using TCP SYN Flood were initiated from PoP1 to PoP2, at the time of 1800 DoS attacks were carried out from PoP3 to PoP4, and the duration of DoS attacks was all 4 cycles; at the time of 800 port scan was initiated from PoP1 to PoP2, PoP5, and PoP6 applying by Nmat, and the duration was 5 cycles; at the time of 1200 ingress/egress shift was initiated by transferring 50% of the traffic volume of the OD path (PoP1 to PoP2) to another (PoP7 to PoP8), and the duration was 40 cycles; at the time of 1500 DDoS attacks using UDP Flood on PoP10 were initiated from PoP2, PoP4, PoP5, and PoP8 simultaneously, and it lasted 6 cycles. Figure 13(a) and Table 6(a) illustrate the test results applying RMPCM and subspace construction via PCA. Both methods succeed in detecting 6 anomalies, but the latter does not detect anomalies every time in the corresponding cycles of anomaly occurrence, especially for the 1200–1239 ingress/egress shifts. The anomaly cycles detected by PCA are shorter than the settings, while RMPCM made a notable improvement over PCA.

In order to reveal the impacting factors and performance differences of two methods further, anomalies were adjusted. The volume of DoS attacks beginning at 500 and 1800 was reduced by 50%; the range of port scan beginning at 800 decreased to the range only from PoP1 to PoP2, and scanning frequency was deducted by 50% as well; the duration of ingress/egress shift beginning at 1200 was cut down by 20 cycles; DDoS attacks beginning at 1500 were narrowed down to the range from PoP2 and PoP4 to PoP10 and kept the same attack volume; DoS attacks beginning at 1000 remained unchanged. Figure 13(b) and Table 6(b) indicate that the change of volume and range of anomalies would influence the performance of both of the methods, and the impact of duration deduction is hardly noticeable. PCA gives us false positive alarms at total of 10 time points including 1272, 1407, and 1451, which is higher than RMPCM obviously.

The attack volume of DoS increased by 220% to produce a large anomaly, and other factors remained the same as the first test. A large anomaly could raise variance level of the path that it was on, and other anomalies causing small variance would be mistaken for a normal event in this situation. Figure 13(c) and Table 6(c) show that the large anomaly leads to the dramatic decrease in the detection accuracy of PCA method. OD paths of anomaly occurrence contain the OD path of the large anomaly (PoP1 to PoP2), and then the corresponding anomaly detection rate is affected: Dos beginning at 1000 and the large anomaly are on the same path, and the Dos attack is completely missing from the figure; ingress/egress shift involves 2 OD paths which contain the OD path of the large anomaly, and the residual vector also seriously decays and does not reach the detection threshold; port scan and DDoS also contain OD path where the large anomaly occurs, and its residual vector is affected in varying degrees; DoS beginning at 1800 does not involve the OD path of the large anomaly, thus the detection to the above DoS attack is not impacted. Comparably RMPCM has a strong robustness, and detection accuracy is not affected.

The above experiment demonstrates RMPCM has a better detection performance than the method based on subspace construction via PCA. Add-More-If-Bigger poisoning experiment in Section 5.1.3 and large anomaly poisoning experiment verify that RMPCM has a strong robustness under the poisoning of variance injection.

5.2.2. Anomaly Detection with Data Loss

Data loss may be caused by network malfunction, device breakdown, and so on; thus algorithms based on conventional nonstatistical model become inapplicable because of the incomplete input data. RMPCM based on the latent variable probability model of multivariate -distribution has its distinct advantage when solving the problem of data loss. The detection performance of RMPCM with data loss is tested on DETERLab. The scenarios are set to the occurrence of data loss when facing link fault, collection device breakdown, and PoP node malfunction. Link fault will incur data loss when OD flows go through the link; collection device breakdown will cause the data loss of OD flows whose source node is the node connected with the broken-down device; data loss caused by PoP node malfunction is related with malfunction types, network topology, and router strategies. Figure 12 displays experimental topology configuration, and 100 anomalies were injected in the way that was used in Section 5.2.1. In order to eliminate occasionality 10 experiments were conducted for each type of malfunctions, 403 cycles and 1008 cycles were randomly selected (which accounted for 20% and 50% of all 2016 cycles in one week resp.), in which a randomly selected link (collection device, PoP node) broke down, and the malfunction duration lasted 20 cycles. The mean of 10 experiment results was used to draw the ROC curves. The experiments results indicate that link fault, collection device breakdown, and PoP node malfunction regularly and increasingly deteriorate the accuracy of anomaly detection, but overall RMPCM has better robustness in this situation. Its TPR is close to 70% when FPR is 20% even in the worst scenario (PoP node malfunction occurred in 50% cycles) (Figure 14(c)).

5.2.3. Anomaly Localization Test

Table 7 illustrates the setting of anomaly locating test, and a single OD or multiple OD flows were injected with anomalies generated in Section 5.2.1. The result of anomaly localization test in RMPCM method is shown in Figure 15, the histograms represent anomaly contribution degree of each OD at the time of anomaly occurrence, and the results agree with the setting from Table 7. As shown in Figure 15(a), anomaly contribution degree of OD2 noticeably exceeds the contribution degree threshold; therefore it is certain that anomaly at the time of 502 occurs on OD2; from Figures 15(b) and 15(c) it is found that there is not a single OD contribution degree exceeding the threshold, and this requires locating multiple OD when anomalies occur. Some problems of anomaly localization will be discussed in Section 6.5.

5.3. Analysis of Real Network Data
5.3.1. Data Sets

Real network data sets are obtained from backbone network Abilene [2, 9, 10, 16, 18, 20], whose main users are American universities and scientific institutions. We choose the data from Abilene data sets dating from 15 December 2003 to 21 December 2003 for its completeness and convenience for reference. The raw data is NetFlow data from 11 PoP nodes, the access point and exit point of each flow are obtained on the basis of BGP and ISIS routing table, and then the volume of OD flows and traffic matrix can be calculated, as shown in Table 8. Detection performance evaluation with data loss and sensitivity analysis were conducted with the data set.

5.3.2. Anomaly Detection with Data Loss

When conducting anomaly detection with data loss, data set was chosen to implement four kinds of data loss mechanisms discussed in Section 3.2.1. In order to clearly identify the different performance of RMPCM in different data loss mechanisms and evaluate the factors influencing the performance, nearly the same loss rate was assigned to PureRandLoss, PeriodRandLoss, and ODRandLoss; the influence of different volume of missing piece on the detection performance was analyzed under PieceRandLoss. In order to eliminate accidents, 10 experiments were conducted, and mean values were calculated.

The true anomalies in the real network traffic data could be hardly obtained exactly, and RMPCM and PCA generated very similar alerts with the complete data set (as shown in Figure 16), so the benchmark was set to be the detecting result of RMPCM with complete data in order to test the performance of RMPCM under the situations of data loss.

In the experiment of PureRandLoss, three kinds of loss rates were selected which accounted for 10%, 20%, and 50% of total data; in PeriodRandLoss the missing periods were set to be 200, 400, and 1000, as the total periods were 2010; data loss rates in here were close to 10%, 20%, and 50%; in ODRandLoss we chose half adjacent data of a certain column to be empty because of the algorithm limitation that column of traffic matrix could not be empty entirely, lost OD numbers were set to be 24, 48, and 121, and the loss rate per OD was 50%; therefore, the total loss rates of ODRandLoss were still 10%, 20%, and 50%. Parts of the results and ROC curves under three kinds of loss mechanisms, respectively, are displayed in Figures 17(a), 17(b), and 17(c). Three tests above indicate the relatively high accuracy of RMPCM: when the loss rate is 10%, the effect on anomaly detectors is slight, and the detection performance is very similar to that of RMPCM and PCA with complete data; as the loss rate increasing the performance begins to decline, even when the loss rate achieves 50%, test results are still applicable. The tests also demonstrate that ODRandLoss has the biggest impact on the performance of anomaly detectors, followed by PeriodRandLoss and PureRandLoss. The total data loss remains the same when applying these three methods, but the performance of anomaly detectors deteriorates regularly, and it is highly probable that this is due to the increase of structured loss of data. The volume of missing piece in PureRandLoss is smallest, while volume of missing piece in ODRandLoss is largest.

In order to verify the impact of structured loss on the detection performance, we conducted experiment with PieceRandLoss, and the total volume of loss is set to be the same while the volume of each missing piece is set at different sizes, which are 5 5, 16 16, and 40 40, and the numbers of missing pieces were 2000, 200, and 30, respectively; the volume of total loss accounts for 20% of whole data. Parts of the results and ROC curves in Figure 17(d) indicate that the performance of anomaly detector declines significantly when the volume of structured loss increases while the total loss rate remains the same. However, RMPCM achieves about 70% TPR with less than 20% FPR when applying PieceRandLoss overall, which means the performance could satisfy the requirement.

5.3.3. Sensitivity Analysis

The authors of [22] point out that the detection algorithm based on subspace construction via PCA is too sensitive to the change of intrinsic dimensions and traffic measures; therefore, the same issues should also be considered with RMPCM. The test will be conducted in two scenarios: one is to analyze the sensitivity to the change of intrinsic dimension ; the other is to analyze the sensitivity to the change of traffic measures. The results of RMPCM are to be compared with the algorithm based on subspace construction via PCA.

Real network data set in Table 8 was selected to conduct the sensitivity to the change of intrinsic dimension analysis. Intrinsic dimension varies depending on the accumulative variance contribution of principal components which is defined beforehand. The experiment results are displayed in Figure 18. Figures 18(a) and 18(b), respectively, represent detection curves when PCA model’s intrinsic dimension was set to be 4 and 5, as it can be seen that the detection results vary largely; Figures 18(c) and 18(d) represent corresponding curves of RMPCM, and the detection results agree with each other. The further procedures are taken to test varying from 2 to 10 when using RMPCM, and test results are almost identical, which suggest that RMPCM is highly robust.

Real network data sets , , and in Table 8 were employed to analyze the sensitivity to the change of traffic measures. The accumulative variance contribution of principal components was set to be the same value of 0.85 when using different traffic measures. The experiment results are shown in Figure 19. The detection results in three kinds of measures (, , ) are completely different by applying PCA. While using RMPCM the baselines of three curves are similar to each other, and three test results are highly correlative. Although benchmarks of anomalies in real network data sets could not be acquired, the statistical data in three kinds of measures (flow, packet, and byte) has interrelationship; therefore corresponding anomaly detection results should somewhat coincide. PCA produces too much conflict, which indicates it is much sensitive to the change of traffic measures, while RMPCM generates more satisfying results and shows a strong robustness.

To summarize, RMPCM is convenient for practical implementation for its lower sensibility and higher robustness to the change of parameters such as intrinsic dimensions and traffic measures.

6. Discussion

Issues like parameter selection, distribution characters of data source, and so forth will be discussed further in this section.

6.1. Intrinsic Dimension

The Internet traffic matrix has this characteristic of low dimensionality. The intrinsic dimension of the matrix needs to be determined at the beginning in the process of modelling normal traffic by applying RMPCM. The scree plot of data sets of , , and in Table 8 is drawn using PCA, as is shown in Figure 20. The accumulative variance contribution threshold is set to 0.85, and then intrinsic dimension of these three data sets are all close to 5, which is far from their dimension of traffic matrix. We then go further to Abilene traffic data in the following three weeks and find that their intrinsic dimension is considerably stable. What is more, the experiment in Section 5.3.3 also shows that RMPCM performance keeps high stability when varies from 2 to 10, so we chose in this paper. In fact, when implementing RMPCM in real situation, if network does not change a lot, the intrinsic dimension of traffic matrix doesn’t need to be changed, and there is no need to determine the intrinsic dimension every time before detecting. This significantly reduces the complexity of computation.

6.2. Anomaly Detection Threshold

When network traffic is in normal status, the distribution of the squared Mahalanobis distance of traffic samples is close to Gaussian distribution. In order to verify this, the normal probability plot of in Table 8 is drawn by normplot of Matlab, as is shown in Figure 21. Samples are mainly distributed on the normal line, which are concentrated in the middle and sparse on both ends. A Jarque-Bera test is employed in order to further verify its normality. Since the samples’ squared Mahalanobis distance almost follows Gaussian distribution, “” control chart is taken for anomaly detection in Section 4.3.

6.3. Distribution Characters of Data Source

Conventional network-wide traffic anomaly detection algorithms usually assume that the traffic matrix elements are drawn from a Gaussian or Gaussian-like distribution [13]. We conduct statistical analysis for real network data sets in Table 8 and draw the normal probability plot. We find that the data (traffic volume) exhibit a highly skewed distribution, many of which are diverged from the normal line, and the density of samples scattered on two ends is higher than Gaussian distribution (Figure 22 illustrates some of drawn results). Frequency histograms with superimposed normal density curves are then plotted (as partially shown in Figure 23), which indicates that real traffic volume’s heavy-tailed feature is more noticeable compared with Gaussian distribution. This is naturally one of the reasons why Gaussian distribution is replaced by multivariate -distribution when modeling the traffic matrix.

6.4. False Positives and False Negatives

Compared with other methods listed in the paper, RMPCM achieves a better performance on reducing false positives and false negatives in the experimental scenarios. One reason is that it is more accurate for RMPCM to describe the traffic data. Conventional traffic anomaly detection methods usually assume that the traffic data are drawn from a Gaussian or Gaussian-like distribution. We conduct statistical analysis for real network data sets, and it indicates that real traffic volume’s heavy-tailed feature is more noticeable compared with Gaussian distribution. So we use the multivariate -distribution as prior distribution when modeling the traffic data. Another reason is that introducing a latent variable probabilistic model based on -distribution can relieve the noise interference and achieve a better performance. In addition, RMPCM method is within a probabilistic framework, making it attain advantages over traditional method of nonprobabilistic model (such as subspace construction via PCA (Lakhina et al.)) in handling missing data.

However, RMPCM also suffers from false positives and false negatives to some degree. This is due to the dynamic nature of the network traffic flows, which inevitably leads to some deviation when describing the change of traffic by applying this algorithm based on baseline. Moreover, false positives and false negatives will occur under the circumstances in which anomalous traffic generated in the experiments is excessively small or large along with the interference of background traffic. But on the whole the false positives and false negatives of RMPCM can meet the needs of engineering practices.

6.5. Problems on Anomaly Localization

The anomaly localization in scenarios with more anomalous OD is discussed here. The experimental environment is DETERLab with 10 PoP nodes the same as in Section 5.2. The benchmark is set up with the scenario where DDoS attacks took place from 1~9 nodes to one node. When the number of anomalous OD is from 1 to 4, the results agree with the setting. When the number of anomalous OD is 5, the preset anomalous OD are OD20, OD40, OD50, OD60, and OD80, but the localization results are OD20, OD39, OD40, OD60, and OD80, as shown in Figure 24. During the experiment, it is found that the accuracy of anomaly localization decreases with the increasing number of anomalous OD, as shown in Figure 25. This problem will be researched further in depth in the future. There are several preliminary considerations for this taking place.

(a) With the increasing number of anomalous OD, the volume of anomalous traffic also increases significantly, if the volume of anomalous traffic is excessively large which exceeds the anomalous observations tolerance of the proposed approach, and then it is highly likely to impact the detection performance and location results of the proposed approach.

(b) In addition, with the increasing number of anomalous OD and the volume of anomalous traffic, the impact on every single OD becomes relatively smaller, which brings negative impact on the accuracy of anomalous OD localization.

7. Conclusion

In conclusion, traditional network-wide anomaly detection methods have actual problems of performance reduction or being unavailable when noise interference or data loss takes place, and in order to solve these problems and advance anomaly detection and localization, we propose a network-wide approach based on robust multivariate probabilistic calibration model in this paper. The analysis conducted with simulations, DETERLab experiments, and real data from the Internet indicates that the performance of RMPCM is better than PCA and ANTIDOTE and has a better robustness. Regardless of data loss or noise interference, RMPCM demonstrates a stable performance and less sensitivity to the change of parameters. RMPCM can be also applied to locating anomalies. In the future, we will take our researches on more accurate and fine-grained anomaly localization and online RMPCM algorithm.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This research was supported by the National Grand Fundamental Research 973 Program of China under Grant no. 2012CB315901 and no. 2013CB329104; Science and Technology Commission of Shanghai Municipality Research Program under Grant no. 13DZ1108800; Science and Technology on Information Transmission and Dissemination in Communication Networks Laboratory Research Fund.