Abstract

Distributed denial of service (DDoS) attacks has caused huge economic losses to society. They have become one of the main threats to Internet security. Most of the current detection methods based on a single feature and fixed model parameters cannot effectively detect early DDoS attacks in cloud and big data environment. In this paper, an adaptive DDoS attack detection method (ADADM) based on multiple-kernel learning (MKL) is proposed. Based on the burstiness of DDoS attack flow, the distribution of addresses, and the interactivity of communication, we define five features to describe the network flow characteristic. Based on the ensemble learning framework, the weight of each dimension is adaptively adjusted by increasing the interclass mean with a gradient ascent and reducing the intraclass variance with a gradient descent, and the classifier is established to identify an early DDoS attack by training simple multiple-kernel learning (SMKL) models with two characteristics including interclass mean squared difference growth (M-SMKL) and intraclass variance descent (S-SMKL). The sliding window mechanism is used to coordinate the S-SMKL and M-SMKL to detect the early DDoS attack. The experimental results indicate that this method can detect DDoS attacks early and accurately.

1. Introduction

In recent years, the security of computer networks, chips, virtual networks, and mobile devices has been of wide concern [13]. As an important platform for information exchange, computer network security has attracted much attention. In the security of computer network, distributed denial of service (DDoS) attack is yet to be settled in a long time. DDoS is a traditional network attack method. It controls a large number of zombie machines sending a large number of invalid network request packets to a target host. It consumes and meaninglessly occupies the resources of the server, causing normal users to be unable to use the normal services provided by the target host [4]. Although the DDoS attack mode is simpler, its destruction power to the network is far more than other network attacks. Moreover, this traditional attack method in recent years can still cause great damage to the Internet, and the frequency of launch, loss caused, complexity of DDoS, diversity of DDoS, and difficulty of defense have increased more than before [5]. In June 2016, an ordinary U.S. jewelry online sales website was flooded with 35,000 HTTP requests (spam requests) per second, making the site unable to provide normal services. In October, DynDNS, which provides dynamic DNS services in the United States, was subject to large-scale DDoS attacks, resulting in access problems for multiple websites using DynDNS services, including GitHub, Twitter, Airbnb, Reddit, Freshbooks, Heroku, SoundCloud, Spotify, and Shopify. Twitter has even appeared in nearly 24 hours with a zero-visit situation. The reason why DDoS attacks have such a great destructive power is that DDoS uses a large number of zombie machines to launch attacks on a certain target. Each zombie machine has powerful computing capability. Through the massive distributed processing capabilities of zombie machines, it is easy for a server to no longer have the ability to provide services to normal users [6]. On the other hand, DDoS attacks are easy to implement. Unlike other network attacks, DDoS attacks require only a large number of zombie machines and a small amount of network security knowledge to launch an effective attack. This easy-to-grasp network attack method makes the DDoS attack more powerful.

At present, under the traditional network environment, methods for defense against DDoS attacks mainly include attack detection and attack response [7]. DDoS attack detection is based on attack signatures, congestion patterns, protocols, and source addresses as an important basis for detecting attacks, thereby establishing an effective detection mechanism. The detection model can be roughly divided into two categories: misuse-based detection and anomaly-based detection. Misuse-based detection is a technique based on feature-matching algorithms. It matches the collected and extracted user behavior features with the known feature database of DDoS attacks to identify whether an attack has occurred. Anomaly-based detection is adopted by monitoring systems. By establishing the target system and the user's normal behavior model, the monitoring systems can determine whether the states of the system and the user's activities deviate from the normal profile and can judge whether there is an attack. The attack response is to properly filter or limit the network traffic after the DDoS attack is initiated. The attack traffic to the attack target host is reduced as much as possible to mitigate the influence of the denial of a service attack.

With the rise of cloud computing technologies and software-defined networking (SDN) concepts, DDoS attack detection based on cloud computing environments and software-defined networks has received widespread attention [8, 9]. As a new computing model, cloud computing has powerful distributed computing capabilities, massive storage capabilities, and diverse service capabilities [10, 11]. It has become an important means of solving big data problems [12]. Therefore, establishing a cloud platform system is a necessary measure to effectively ensure cloud computing’s reliability, stability, and security [1315].

In recent years, machine learning has been applied to the field of security [17]. The method of constructing an attack detection model using machine learning has been widely used [18, 19]. The machine-learning method plays an important role in the traditional network environment, the cloud environment, and software-defined network architecture. The reason is that the machine-learning method can deeply mine the important information hidden behind the data and combine prior knowledge to discriminate and predict new data [20]. Therefore, compared with traditional detection methods, machine-learning methods can exhibit better detection accuracy [2125]. In the above analysis of defense measures, it is known that the traditional network environment, cloud environment, and software-defined network architecture all involve attack detection for the defense mechanism of DDoS. Therefore, studying the use of machine-learning methods to identify DDoS attacks is of great significance. However, the data generated by the DDoS attack is often burst and diverse, and the background traffic size also has a greater impact on the detection model, thereby reducing the model’s detection accuracy.

To solve the above problems, we propose a multiple-kernel learning DDoS attack detection method. The method uses the algorithm to extract five features and combines two multiple-kernel learning models with the adaptive feature weights to recognize attack flows and normal flows. For further improving the accuracy of DDoS attack detection, a sliding window mechanism is employed to coordinate two multiple-kernel learning models treating the detection results. Experiments show that our method can better distinguish DDoS attack flow from normal flow and can detect DDoS attacks earlier.

DDoS attacks can cause tremendous damage to a network and often subject the attacked party to great economic losses. This is one of the main ways that hackers initiate cyberattacks.

To reduce the damage of DDoS attacks, researchers have proposed a large number of attack detection methods in recent years. According to the application scenario, these methods can be divided into three categories: the detection method in the conventional network environment, the detection method in the cloud environment, and the detection method in the software-defined network (SDN) environment.

The conventional network environment refers to the Internet environment generally established on the Internet based on an open system interconnect reference model (OSI). In this regard, Saied et al. proposed a method for detecting known and unknown DDoS attacks using artificial neural networks [26]. Bhuyan et al. proposed an empirical evaluation method for the measurement of low-rate and high-rate DDoS attack detection information [27]. Tan et al. proposed a DDoS attack detection method based on multivariate correlation analysis [28]. Yu et al. proposed a DDoS attack detection method based on the traffic correlation coefficient [29]. Wang et al. conducted an in-depth analysis of the characteristics of DDoS botnets [30]. Kumar and others used the Jpcap API to monitor and analyze DDoS attacks [31]. Khundrakpam et al. proposed an application-layer DDoS attack detection method combining entropy and an artificial neural network [32].

The cloud environment refers to the network service platform with cloud computing as the core technology. In this regard, Karnwal et al. proposed a defense method for XML DDoS and HTTP DDoS attacks under cloud computing platforms [33]; Sahi et al. proposed the check and defense method for TCP-flood DDoS attacks in the cloud environment [34]. Rukavitsyn et al. proposed a self-learning DDoS attack detection method in the cloud environment [35].

Software-defined network refers to a new network architecture that adopts OpenFlow as the communication protocol and specifies the router as well as switch data exchange rules through the controller [36]. In this regard, Ashraf used machine-learning detection software to define DDoS attacks under the network [37]. Mihai–Gabriel proposed an intelligent elastic risk assessment method based on the neural network and risk theory in the SDN environment [38]. Yan et al. proposed an effective controller scheduling method to reduce DDoS attacks in software-defined networks [39]. Chin et al. proposed a DDoS flood attack method for selective detection of packets under SDN [40]. Dayal et al. analyzed the behavioral characteristics of DDoS attacks under SDN [41]. Ye et al. proposed a method of using SVM to detect DDoS attacks under the SDN environment [42]. Except the above detection methods used to ensure the security of the system, some efficient cryptography techniques can be applied to achieve privacy of the system [4346].

In summary, the core issue of DDoS attack detection research is the construction of feature extraction and classification models. The attack detection methods in the above three environments can effectively detect DDoS attacks corresponding to the environment. However, in the detection of early DDoS attack, these defense methods do not have a good detection effect. In addition, most of these methods use a single feature and do not consider the impact of multidimensional features on the classifier. Therefore, an adaptive DDoS attack detection method is proposed in this paper. Firstly, we design the algorithms to extract five features. Secondly, through an ensemble learning framework, the five features are used to train two multikernel learning models and obtain the adaptive feature weights with gradient method. Finally, the sliding window mechanism is used to coordinate the two models to improve the detection accuracy.

3. DDoS Attack Feature Extraction

3.1. Analysis of DDoS Attack Behavior

In the cloud environment, the botnets of DDoS attacks have distributed characteristics. Each zombie machine has the ability to independently calculate, send, and process data packets, and the source IP address of the packets can also be forged. The advantage of these DDoS attacks makes defense more difficult. However, under the background of time series, the characteristics of data packets generated by DDoS attacks are still quite different from those of normal users. The difference is reflected in the following three aspects.

(1) Asymmetry. DDoS attack is often caused by multiple zombie hosts sending a large number of packets to a host without the host’s response. These useless packets quickly consume the host's service resources so that the host can no longer provide services to other users. With this feature, the DDoS attack behavior is such that there are a large amount of packets sent to the host from the zombie hosts, and there are no or a small amount of packets sent to the zombie hosts from the host. The IP data packet often presents a situation in which multiple-source IP addresses point to the same or several destination IP addresses, which is expressed as the asymmetry of the source IP as well as the destination IP in sending and receiving.

(2) Interactivity. It is assumed that there are A (zombie host) and B (attacked host). When an attack occurs, there are two main communication ways as follows: A sends packets to B (denoted as A→B) and A and B send packets to each other (denoted as AB). And the packet amount sent with the way (A→B) is much more than those sent with the way (AB). Therefore, the interactivity of DDoS attack flow has different states in communication direction and amount compared with normal flow.

(3) Distribution. According to the characteristics of DDoS attack, when an attack occurs, the number of the hosts that launch the attack is much larger than that of the attacked hosts. And the number of the source IP address is much larger than that of the destination IP address, so that the source address and the destination address have different distribution characteristics. In addition, because DDoS attacks generate useless requests, so compared to normal flows, the host ports accessed by the attack requests are more dispersed. Therefore, the distribution of the ports is different in normal flows and attack flows.

Due to the limited ability of a single feature to express data, it cannot fully reflect the characteristics of the DDoS attack. Therefore, to effectively express the characteristics of the DDoS attack, this paper selects five feature extraction methods based on the above characteristics as follows. The address correlation degree (ACD) combines the traffic burstiness, flow asymmetry, and source IP address distribution of DDoS attack; the IP flow features value (FFV) exploits the asymmetry of attack flows and the distribution of source IP addresses; the IP flow’s interaction behavior feature (IBF) uses the different interactivity between normal flows and attack flows on the network; the IP flow multifeature fusion (MFF) exploits the different behavioral characteristics of normal flows as well as DDoS attack flows and integrates the multiple characteristics of DDoS attack flows; the IP flow address half interaction anomaly degree (HIAD) focuses on the characteristics of the aggregated attack flows that are mixture of a large number of normal background flows. In order to make the feature richer in representation, we refer to several articles and combine the five feature extraction algorithms, besides removing the less impactful parameters to form a multidimensional feature for DDoS attack detection [4551].

3.2. DDoS Attack Feature Extraction

In the cloud environment, assume that network flow is as follows: in a certain unit of time, where , and denote the time, source IP address, destination IP address, and the port of the -th data packet, respectively. All data packets which contain source IP address and destination IP address are denoted as class . All data packets with source IP address are denoted as class . All data packets with destination IP address are denoted as class . The packets with source IP address which exist in the class and class are denoted as . The packets with source IP address which exist in class and do not exist class are denoted as . The number of the different ports in is denoted as . The packets with the destination IP address which do not exist in class and exist in class are denoted as . The number of the different ports in is denoted as .

Definition 1. If there are different destination IP addresses and , making classes and both non-null, then delete the class where all source IP address packets reside.
Assume that the last remaining classes are denoted as and are statistically calculated to gain the ACD. The detailed formulation is as follows.In this part , where is the number of different ports in class , is the number of data packets in class , and is the weighted value.

Definition 2. If all the packets whose destination IP address is form the unique class , delete the class where the packet with the destination IP address is .
Assume that the last remaining classes are denoted as , all packets in these remaining classes with the destination IP address are denoted as , and all the classes are denoted as . The FFV is defined as follows: in formula (2) is presented as follows:In this equation, , is the number of different source IP addresses in ; is the number of source IP addresses in , and is the threshold of the number of packets: is the number of different destination ports in , is the threshold of the number of ports, and is the sampling time.

Definition 3. Assume that the IF flow is , the SH class is denoted as , and the DH class is denoted as . Then, define IBF as follows:, where is the threshold of the amount of port. in formula (6) is the number of IF flows within , and is the absolute value of the difference value between the number of source IP addresses and the number of destination IP addresses for all SH and DH flows in .

Definition 4. Assume that the resulting SD classes are and IF classes are . The number of packets of source IP address in class is denoted as , where ; the number of packets of all interworking flow classes is denoted as SN; and the source semi-interactive flow class is denoted as . The number of different ports in class is denoted as , where ; the destination semi-interactive class is denoted as ; and the number of different ports in class is denoted as , where .
The weighted value of all packets in SH class is defined as follows:The weighted value of all packets in SD classes is defined as follows:The weighted value of the number of packets of network flow F in unit time T is as follows:In these equations, is sampling time, and and are SH-type packet number abnormality thresholds; is the number of packets in , . The weighted value of the number of different ports in the SH and DH classes is as follows: where , is sampling time, and is the SH-type port number abnormality threshold.
In this part we define the MFF as follows:where .

Definition 5. The number of SH flows with different source IP addresses and the same destination IP address is denoted as . The SH class with the same destination IP address flow is denoted as , where .
Assume that all HSD classes are , and the number of different destination ports in the class is expressed as , where .
The HIAD is defined as follows: In (13), , is sampling time, and is the threshold for different destination ports.

4. The DDoS Attack Detection Model

The establishment of an attack detection model is an important part of the whole detection process. Based on the behavior of DDoS attack, we extract ACD, IBF, MFF, HIAD, and FFV features to express the inherent rules of attack flows. The disadvantages of the current DDoS attack detection models are summarized as follows: some models highly depend on the selection of kernel function; some models require data with highly stable value; some models can only fit linear rules, but DDoS attack can generate linearly inseparable data due to abrupt, unstable, and stochastic characteristics. Considering that the multiple-kernel learning model has a low requirement for data stability and can be used for nonlinear fitting, and it can treat flexibly linear and nonlinear data, this paper proposes an adaptive DDoS attack detection method based on the ensemble learning framework.

4.1. The Multiple-Kernel Learning Model

The multiple-kernel learning (MKL) model is developed from the original single-kernel SVM. In single-kernel SVM, a SVM only uses one kernel function to map the sample to high-dimensional spaces. By comparison, the multiple-kernel learning model uses multiple-kernel functions with weight to map the sample to high-dimensional space. Therefore, it has higher flexibility and adaptability on heterogeneous data.

The multiple-kernel learning is defined as follows: given training set ,

testing set , , , , R is real-number set, d is data dimension, . are kernel functions in , and is a kernel mapping for each function. In the classic multiple-kernel learning SimpleMKL [52], the objective function of the hyperplane is as follows:

where is the weight for each kernel function, and is bias. The relaxation factor is . According to the principle of minimum structure, the objective function can be optimized as follows:

By the two-order alternation optimization, the formula (15) can be converted to the optimization problem with as the variable:

The Lagrange function of is as follows:

where are Lagrange operators. First, are calculated for partial derivatives. Then, the extremums are gained when the partial derivatives are “0.” Finally, extremums are brought into the Lagrange function, which can be further changed to

The gradient descent method is used to adjust on , update , and optimize the as well as alternately. Then, an optimal solution is obtained:

; that is, the original objective function eventually turns into (22). The detailed formulation is as follows:

. When the test set data as is inputted to , the object function can determine the category of test set data.

4.2. The Attack Detection Model Based on Multiple-Kernel Learning

The SimpleMKL model can be suitable for all the dimension weight values with “1”. But it cannot fully exert the different features. This paper uses the feature weights to control the effect of different features on the model. To gain the appropriate feature weights in the SimpleMKL model, we combine the gradient method to optimize the weight parameters, so that the detection accuracy is further improved.

We marked ACD as , IBF as , MFF as , HIAD as , and FFV as , then the feature value vector is , and the marked weight vector is . Combinatorial features are , and the mean value of each dimension of normal flow is , or . Note the mean value of each dimension of the attack flow is , , , , or .

The interclass mean squared difference is expressed as follows:

The normal intraclass variance is denoted:

The attack intraclass variance is denoted:

The intraclass variance is . To improve classification accuracy and ensure a rapid convergence of functions, on the one hand, we should try to improve the mean difference between positive and negative samples, so that the two kinds of samples are far away from each other; that is, we should increase the M value. On the other hand, we should minimize the differences between samples. The variance corresponding to each dimension should be as small as possible, thus reducing the S value. Therefore, the classification model needs to train two different classifiers to classify the samples. One classifier is interclass mean squared difference growth (M-SMKL) and the other classifier is intraclass variance descent (S-SMKL). In combination with the SimpleMKL framework formula (15), the above problems can be transformed into (26). The detailed formulation is as follows:

If , the objective function is M-SMKL. If , the objective function is S-SMKL. and are converted to the learning rate of formula (35).

To solve the above problems, we use the way of updating iterative weights to get the objective function. The details are as follows. Firstly, the weights of each feature are assigned initial values. Secondly, they are combined with (26) and (27) to gain optimal function of this time. The mathematical form is expressed as follows:

The optimal equation obtained using (28) and (29) is as follows:

To further determine whether the optimal equation has achieved good results, this paper sets two constraint conditions for M-SMKL and S-SMKL, respectively, without conflict with formula (27) constraint conditions. These constraint conditions are expressed as follows.

The constraint conditions of M-SMKL are as follows:

The constraint conditions of S-SMKL are as follows:where the values of , , , and are close to “0”; the values of , , and are close to “1”; the values of , , and are close to “7.5”. If the constraint condition is satisfied, the algorithm will be stopped and formula (30) will become the optimal function; otherwise, each dimension weight will be updated iteratively. The gradient of M and S corresponding to each dimension weight is as follows:

where is the number of the normal flow feature of the training sample; is the number of the attack flow feature of the training sample. According to gradients in (33) and (34), the weight of each dimension is updated as follows (35):where is the learning rate of gradient ascent; is the learning rate of gradient descent. has the same function as and has the same function as . Each updated weight is multiplied by each original feature accordingly and the next round of iteration is carried out.

4.3. Framework of Multiple-Kernel Learning Detection Based on Ensemble Learning

We input the multidimensional data with weight and set the learning rate. Then two different classifiers are trained. M-SMKL is trained by increasing the M value mainly with reducing the S value secondarily and the S-SMKL is trained by reducing the S value mainly with increasing the M value secondarily. During the training process, the M value and the S value are constantly updated with the method of gradient rising and descending until the constraint conditions are met. The flowchart is provided in Figure 1.

The detection process is as follows: firstly, the test data is multiplied with two different weight vectors which are trained earlier; secondly, the calculated data are inputted to the corresponding M-SMKL and S-SMKL model; finally, we use the sliding window mechanism to coordinate two kinds of models. The sliding window mechanism is described as follows. Firstly, a sliding window with a size of is created. Secondly, the trained M-SMKL classifies the test data and obtains the first classification results; the trained S-SMKL classifies the test data and obtains the second classification results. Finally, four ways are used to cooperatively treat the first classification results and the second classification results; the details are as follows: if M-SMKL and S-SMKL identify that the current data category is both normal, the current data category is judged to be normal; if M-SMKL and S-SMKL identify that the current data category is both attack, the current data category is judged to be attack; if M-SMKL identifies that the current data category is normal but S-SMKL identifies that the current data category is attack, the current data category is judged to be attack; if M-SMKL identifies that the current data category is attack but S-SMKL identifies that the current data category is normal, then consider the following. Step 1. Move the starting point of the sliding window to the current position of the test data in the first classification result, and map the end point of the sliding window to the n-1 position of the first classification results. Step 2. If the results in the sliding window are all attack, the current data category is judged to be attack; otherwise, the current data category is judged to be normal. The flow chart is provided in Figure 2.

The reason for the training of two kinds of SMKL is that S-SMKL focuses on reducing the difference between the data of each dimension and can assemble the two types of samples in their respective central positions. However, S-SMKL does not consider the location of the two sample-center points. Although a better classification feature can be maintained on the whole, it is impossible to identify DDoS attacks earlier because the center distance of the normal flow and attack flow is small. M-SMKL focuses on the difference between the two types of data centers and maximizes the sample centers distance between the two types of sample centers, making the two samples as separate as possible. M-SMKL can expand the distance of different class so that the attack flow can be identified earlier but it makes intraclass data dispersed, causing default results. Therefore, the sliding window mechanism is adopted to coordinate the two models to detect early DDoS accurately.

5. Experimental Analysis

5.1. Experimental Data Sets and Evaluation Standards

The data set used for this experiment is the CAIDA “DDoS Attack 2007” data set [53]. This data set contains an Distributed Denial of Service (DDoS) anonymous traffic attack for approximately one hour on August 4, 2007. The total size of the data set is 21 GB, which accounts for approximately one hour (20:50:08 UTC–21:56:16 UTC). Attacks began around 21:13, causing the network load to grow rapidly (in minutes) from approximately 200 kbits/s to 80 megabits/s. One hour of attack traffic is divided into 5 minutes of files and stored in PCAP format. The contents of this data set are TCP network traffic packets. Each TCP packet contains the source address, destination address, source port, destination port, packet size, and protocol type. The duration of normal flow data used in this paper is 2 minutes in total, and the duration of attack data is 5 minutes in total.

The hardware equipment adopted is 8 GB memory, Intel Core i7 processor, and a computer with a Windows 10 64-bit system; the development environment is MATLAB 2014a and Codeblocks 10.05. The evaluation criteria used in this paper consist of the detection rate (DR), the false alarm rate (FR), and total error rate (ER).

Assume that TP indicates that the number of normal test samples is properly marked, FP indicates the number of normal test samples that have been incorrectly marked, TN indicates the number of attack test samples that are correctly marked, and FN indicates the number of attack test samples that have been incorrectly marked:

We used the above five feature extraction algorithms to extract features from the data set. The extracted feature values are normalized and used as a training set. The data in the training set can be regarded as the regularity of the change in network traffic. The network traffic has an abrupt and volatile nature. Therefore, although the collected network data have similarities with the conventional ones, they still have a certain degree of difference. To simulate this phenomenon for verifying the effectiveness of the presented method, three types of data are generated as follows. Normal flow feature values and attack flow feature values are multiplied by random number; only the attack flow feature values are multiplied by random number; and only the normal flow feature values are multiplied by random number.

5.2. Experimental Results and Analysis

Five features are used to extract feature data from attack data and normal data, and positive as well as negative sample sets are obtained. The sampling time is set to 1 s, and the remaining parameters of the five feature extraction methods are set as follows: = 0.5, = 0.5, = 3, = 3, = 3, = 3, = 3, = 3, and = 3. The total of normal feature values is 211 and the total of attack feature values is 280. Figures 39 illustrate the feature values extracted by the five algorithms.

As illustrated in Figure 3, the early attack feature values of DDoS attack are close to the normal feature values. This is because there are a large number of bidirectional flows in the early stage of the DDoS attack and these bidirectional flows gradually decrease with the increase of the attack degree. Therefore, using the ACD as a feature after 70 seconds can significantly reflect the difference between the attack flow and the normal flow. ACD can reflect the difference between normal flow and attack flow the earliest.

As illustrated in Figure 4, compared with ACD, although IBF does not recognize the attack flow earlier, the distribution range of its feature values is more uniform and presents a certain degree of volatility. This makes the feature less susceptible to individual outliers.

As illustrated in Figure 5, the FFV feature is very similar to the ACD, but as illustrated in Figures 6 and 7, in the initial stage, the FFV is more capable of reflecting the difference between the attack flow and the normal flow than the ACD is.

As illustrated in Figure 8, although the MFF feature cannot determine the attack flow and the normal flow as early as possible, it can make the feature values of the attack stage more stable, so that it can avoid the outliers of attack flows.

As illustrated in Figure 9, it can be seen from the value of the ordinate that the HIAD best reflects the difference between the normal flow and the attack flow while having better stability in the latter half of the attack flow. After the early data, this feature can greatly distinguish between normal flow and abnormal flow, influence the classifier more, and make better decisions.

In summary, all five features have their own unique characteristics. To make full use of the characteristics of each feature, the feature values extracted by these five algorithms are each used as a five-dimensional-feature data set. Using these five feature values as training sets, two multiple-kernel learning models dominated by gradient ascent and gradient descent are trained into the algorithm, and corresponding five-dimensional feature weight vectors are obtained. Finally, according to the framework of Figure 2, the classification results of test set are obtained and are used to verify the effectiveness of method. The parameters of M-SMKL are set as follows: , , , , , , and . The parameters of S-SMKL are set as follows: , , , , , , and . The size of the sliding window is 8. The parameters for multiple-kernel learning are all default values, and the kernel function includes two Gaussian functions and two polynomial functions. The SVM parameters are all default values, and the kernel function is linear function. The experimental results are illustrated in Figures 1018.

As shown in Figures 1018, under the three types of experiments, according to the three evaluation criteria, the overall performance of the algorithms from the highest to the lowest is the ADADM, the SVM method, the SMKL method, and Nezhad et al.’s method [16].

This is because although the method described by Nezhad et al. [16] is visibly superior to other methods in terms of DR indicators, it is far worse than other methods with respect to other indicators. The reason is that the Nezhad et al. [16] method relies excessively on the first reference point. When the first reference point fluctuates, this method recognizes easily some normal samples as attack samples.

Although the classification accuracy of the attack samples is high, a large number of normal samples are misjudged, so this method is superior in terms of DR and its other indicators are inferior to those of other methods. This is why, in this case, the Nezhad et al. [16] method performs the worst. The effect of SVM is generally better than that of the SMKL method because although the SMKL method coordinates multiple-kernel functions to map the sample to a high-dimensional Hilbert space, the linear kernel function is obviously more suitable for the sample. Using the linear kernel SVM can establish a better hyperplane than the SMKL method to identify the data containing early DDoS attacks. However, although the multiple-kernel learning method does not use a linear kernel function that is more suitable for the sample space, it can still maintain high accuracy, indicating that multiple-kernel learning has a lower dependence on the selection of kernel functions than the single-kernel SVM.

We compared the ADADM to the SVM method. The ADADM method uses the same kernel function as SMKL method. Because the multikernel learning method is flexible and adaptable, it is possible to continuously optimize the hyperplane by adjusting the weights of the feature of each dimension to recognize the DDoS as early as possible. Attack flow data and normal flow data are located on both sides of the hyperplane.

In addition, using the idea of ensemble learning to train two different classifiers and using the sliding window mechanism to further synthesize the advantage of each classifier improves the algorithm’s performance in the three types of experiments. This method we propose outperforms not only the SVM method but also other methods of DDoS attack detection. The experimental data are presented in Tables 1, 2, and 3.

6. Conclusion

In this paper, five-dimensional features are defined for describing the burstiness of DDoS attack flows, the distribution of IP source addresses, and the interactivity of DDoS attack flows. Based on the five-dimensional features and the ensemble learning framework, adaptive feature weights are obtained and the M-SMKL and S-SMKL multiple-kernel learning models are trained to detect DDoS attack. For identifying early attacks effectively, the sliding window mechanism is used to coordinate the S-SMKL and the M-SMKL to deal with the detection results. Experimental results show that, compared with similar methods, our method, can produce more accurate results for detecting early DDoS attack.

In the follow-up work, we will further study how to transform the multidimensional weight adaptive problem based on multiple-kernel learning into a convex optimization problem and improve the detection rate and convergence speed of the method.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

There are no conflicts of interest in this paper.

Acknowledgments

This work was supported by the Hainan Provincial Natural Science Foundation of China 2018CXTD333, 617048; the National Natural Science Foundation of China 61762033, 61702539; Hainan University Doctor Start Fund Project kyqd1328; and Hainan University Youth Fund Project qnjj1444.