Abstract

Distributed denial of service attacks seriously threatens the availability of highly resilient software-defined networking systems, such as data center networks. A traceback scheme is an effective means of mitigating attacks by identifying the location of the attacker and the attacking path. However, traditional traceback schemes suffer from low traceability success rates, high packet header overheads, and high communication traffic overheads, in addition to the fact that logically centralized traceability schemes make the control plane a prime target for attacks. To overcome the above challenges, we propose the low-overhead and high-precision traceback scheme, which is divided into two stages: packet marking and path reconstruction. The first stage of the traceback scheme utilizes programmable switches in the data plane to selectively mark the actual physical path information that the packet was forwarded on. The marking method is adaptive to the path length, which utilizes a combined Bloom filter so that the packet length does not grow with the length of the attacking path. The proposed probabilistic packet marking algorithm effectively reduces the number of packets collected to reconstruct the attacking path. The second stage of the traceback scheme utilizes the distributed victim host to reconstruct the attacking path without the controller and locate the source of the attacker. Theoretical analysis and experimental results show that the proposed scheme ensures the high accuracy of tracing and minimizes the traffic overhead and storage overhead required for the traceback process.

1. Introduction

Distributed denial of service attacks is the main means used by attackers and a major threat to the security of the Internet. The number of DDoS attacks has grown by 200% every year, causing losses of up to $100,000 an hour for the targeted service providers [1]. The problem of DDoS attacks is becoming so pervasive that researchers are motivated to find a reliable method of identifying the origin of attackers. Internet Protocol (IP) traceback is a method of reconstructing the attacking path and identifying the source of the attacker, with the message provided by the network devices where the attacking packets are forwarded. However, the effectiveness of traditional traceback schemes is affected by different problems, such as IP spoofing, packet length limits, and additional communication overhead [25]. Among them, IP spoofing means attackers often forge the source IP address of packets, resulting in low traceability accuracy. The number of marked path messages is limited by the length of the packet header field. Internet Control Message Protocol-based technologies need to send additional trace messages, imposing additional overhead on the communication channel. Furthermore, the operability of these schemes based on traditional IP network architectures is generally limited to the traditional network infrastructure, such as forwarding equipment with high storage space, computing power, and the ability to tag packets. The cost of replacing new hardware equipment is too high to implement, which limits the use of most IP tracking technologies.

Stanford University proposed the software-defined network (SDN) in 2008 [6, 7], which is a centralized, hierarchical architecture that separates forwarding and control. The SDN paradigm can be deployed in traditional SDN network architectures, such as highly resilient data center networks [8, 9], or new complex dynamic network architectures, such as SDN-based heterogeneous vehicular networks [1014]. SDN decouples the data plane and control plane of network devices, making the data plane devices dumb, and it places the control plane logically centralized at the network controller. The unique information architecture provides scholars with a new idea of IP traceback. SDN-based logging traceback schemes [1517] utilize the control plane to centralize DDoS attacks and identify the attackers. However, there is a key challenge in this scenario, where the existing traceability tasks are overly focused on the control plane. It is a serious burden for the controllers to handle illegal data packets, store log information, and maintain the routing table for dynamic links. As a result, the control plane is more likely to be a prime target for attacks. Once attacks threaten the control plane, the stability of the entire network is greatly compromised, and the traceability scheme is virtually ineffective.

We propose a new traceback scheme aimed to overcome the above-mentioned challenges, including identifying the attack sources with high accuracy and reducing its overreliance on the control plane for highly resilient software-defined network architectures such as data center networks. The overall traceback scheme is divided into two stages: packet marking and path reconstruction. The first stage of the traceback scheme utilizes the programmable switch in the data plane to collect the actual physical path information of packet forwarding and mark them as the tracing information, which can solve the problem of IP spoofing in the traditional tracing task. We consider that the packet field length is limited and that it does not grow with the attacking path length. For this reason, we develop a programmable field based on a combination of Bloom filters, which provides great flexibility for encoding various traceback information to be adaptive to the path length. The new algorithm for probabilistic packet marking proposed in this paper effectively reduces the number of packets collected to reconstruct the attacking path. The second stage of the traceback scheme is to reconstruct the attacking path on the victim host. Then, the victim host completes the task of locating the network attacker directly without the controller.

The following are the overall contributions of this paper:(i)The proposed traceback scheme is a fine-grained traceability scheme with high accuracy and low traffic overhead, which can be deployed in the existing SDN frameworks.(ii)The problem of packet header length being sensitive to path length is overcome by using a programmable combination of Bloom filters. Packets are designed to be adaptive to the attacking path length. Then, the packets can carry more marked path information while saving significant space in terms of memory space.(iii)Compared to existing SDN-based traceback schemes, the dynamic probabilistic packet marking (PPM) methods reduce the expected number of packets required to reconstruct the attacking path, speed up traceability convergence time, and reduce traceability traffic overhead.(iv)The design of the path identifier field improves the accuracy of the traceback scheme and supports the identification of multiple attackers outside the network.(v)The experimental results show that our scheme is not overly focused on the SDN control plane, which makes it a more stable solution. The marking part is done with the help of the SDN data plane, and the path reconstruction part is done at the victim host.

The rest of the paper is organized as follows: Section 2 summarizes related work to the different attacking traceback schemes based on the traditional IP network and SDN. Section 3 describes our design principles, system architecture, and the details of our proposed scheme. Section 4 presents the implementation details of our scheme and evaluates its specific techniques. Finally, conclusions and future work are drawn in Section 5.

As introduced previously, the first step in holding the attacker accountable is to identify the source of the attacking packets. Attack traceback is a means of defending against serious DDoS attacks. After calculating the attacking path followed by the attacking flow, the victim takes effective physical defensive measures. Traditional IP traceback schemes are well divided into six categories. (1) Link testing [18, 19] is often used to trace back the attacking path and the attacking source in real time. (2) The packet marking methods [2, 3, 20] are characterized by inserting the required traceability data into the IP packet to be traced, thus marking the packet as it travels through the forwarding network facilities (switches or routers) to the destination host. The packet marking schemes only use the information embedded in packets without generating additional traffic. The packet marking methods include PPM methods, deterministic packet marking (DPM) methods, and algebraically encoded packet marking methods. (3) ICMP-based technology [4] is a scheme in which routers generate ICMP messages to transmit packets with very low probability. The messages are sent to the attacking host or the victim host. (4) Logging-based schemes [5, 21, 22] utilize the packet digests or signatures stored on network devices to identify the hop-by-hop log information of the routers in the attacking path and then determine the closest router to the attacker. (5) Finding the source of forged IP datagrams in a large high-speed network is difficult due to the design of the IP protocol and the lack of sufficient capacity in most high-speed, high-capacity router implementations. An overlay network for IP traceback was proposed to solve this problem. CenterTrack [23] is a typical scheme to implement IP traceback with overlay networks and existing techniques (e.g., tunneling and input debugging). (6) Hybrid IP traceback schemes combine packet tagging and logging-based schemes. A high-precision single-packet IP traceback (HPSIPT) scheme [24] is a typical hybrid traceback scheme.

Hence, the traceback schemes discussed in this section exploit the features of the SDN paradigm to reconstruct the attacking path and identify the attacker source with the help of the SDN control plane.

Agarwal et al. [15] proposed the SDN traceroute that utilizes the forwarding mechanism of SDN to trace the forwarding path of packets through the network by sending probe messages with a specific tag and comparing the probe messages hop by hop to trace back to the source. Hadem et al. [17] introduced an intrusion detection system (IDS) using the Support Vector Machines (SVM) along with selective logging for IP traceback based on the SDN. Handigol et al. [25] proposed the Netshark tool, a tool similar to Wireshark that allows users to set filters on the entire packet history, recording their path and packet header information at each hop. Users can view packet attributes at a particular hop, thus enabling the tracing of attack packets. Ren et al. [26] proposed a global flow table algorithm based on SDN, which periodically traverses all switches to obtain the flow table through the controller interface, maintaining each flow in the SDN and enabling the tracing of abnormal traffic by analyzing the global flow table. Zhang et al. [27] suggested utilizing the controller language to complete packet traceback, predicting packet action expressions based on the current packet processing policy, and further calculating all preceding forwarding policies for any packet. Francois et al. [28] proposed an SDN-based anonymous IP traceback method, which models the SDN network topology with a directed graph and maintains the information of the flow table in this graph. Hadem et al. [29] proposed a traceback method based on SDN and multiprotocol label switching (MPLS).This method uses the MPLS technology to design a short path flag to represent the attack path information, which requires only tens of bits and maintains a mac table and arp table to record the attack path information. Sahay et al. [30] proposed a method by marking the packet header of each data stream in the network for identification purposes. When a network anomaly is detected, its source is identified by looking at the label of the anomalous stream. The tag consists of, for example, source IP information obtained by the switch. The reliability of the tag information is reduced when there is spoofing of the source IP address. Sahay et al. [31] addressed the problem of traceability tracing in a multicontroller scenario by dealing with traffic identification in ISP networks. Dayal and Srivastava [16] solved the problem of cross-domain and intradomain traceability for SD-WAN scenarios.

According to the above six common evaluation indexes, we compare the ten representative traceback schemes (ingress filtering [18], SPIE [22], PPM [20], iTrace [4], CenterTrack [23], HPSIPT [24], SDN traceroute [15], SDN selective logging method [17], SMITE [29], and cross-domain traceback scheme [16]). The comparison results are listed in Table 1. According to Table 1, there is no method whose six evaluation indexes are all the best or all the worst. Logging-based schemes and the SDN-based logging traceability techniques are storage overhead techniques, packet marking methods are the typical computing overhead techniques, and compounded IP traceback is a compromise between storage and computing overheads. SDN-based traceback schemes are suitable for new network architectures, while other traceability techniques are designed based on traditional network architectures. Where traceability tasks require the location of attacks across different autonomous domains, only packet marking methods do not require the support of network service providers (ISPs). This means that only one such technique is feasible for forensics in certain demanding network environments. The pros and cons of a method depend on the method itself and the requirements of end users.

As mentioned above, the traditional schemes are outstanding in their own respects; however, they have several drawbacks. Our proposed scheme significantly reduces the reliance on the control plane. Traceback tasks that were overly concentrated in the control plane are broken down and decentralized to the data plane and victim hosts. Each terminal that accesses the network is used to prestore topology information, which is a virtual computation independent of the underlying SDN infrastructure, eliminating the additional information exchange and storage overhead required for network transmission. In addition, a new dynamic probabilistic marking method in the traceback scheme reconstructs the attacking path to minimize the traffic overhead instead of setting the same marking probability method for every switch in the network.

3. Proposed Scheme

The traceback scheme described in this paper consists of two parts: the marking and the path reconstruction processes. In the first part of the marking process, the programmable switch in the data plane marks path information related to its entire address into the attacking packet with a certain probability. The encoded path information consists of the physical information of the switch ID, switch order, and port number. In the second part of the trancing procedure, the marked path information carried by the packet is decoded in the victim host. When the victim host receives packets, it decodes packet headers to reconstruct paths using the path reconstruction algorithm. Then, the source of the attackers is accurately identified.

3.1. Basic Assumptions

The notations used in this paper are listed in Table 2. The proposed traceback scheme is based on the following assumptions:(i)The probability of the attacking threat to the programmable switch in the data plane is almost zero.(ii)There is a risk of malicious tampering with the marked path information, but the probability is almost zero.(iii)Multiple attackers can simultaneously attack the same victim host.(iv)The controller of the control plane is informed of the network topology and sends flow mod messages to install flow entries on the programmable switches. In this way, the control plane does not interfere with the switches to automatically perform packet content updates and the switches actively write the path information into the packet.

3.2. Problem Statement

An actual network topology is illustrated in Figure 1. It is observed that multiple attackers attack the victim host, host 5, from host 1 and host 2, where there are two possible attacking paths: and .

The problem of finding the source of the attacker is to reconstruct the attacking path with the collected path information. In most cases, the packets that attack the victim host are sent by different hosts. In addition, the path between the same pair of the attacking and victim hosts changes in real time. In this process, the switch IDs on the forwarding path are linked together with their orders to constitute the corresponding path segment. Each forwarded packet carries the information of the switch ID, the switch order, the ingress port ID, and the egress port ID gathered into a path segment label, which consists of several path segment labels. We also introduce the path identifier in the packet header design to improve the accuracy of the traceback scheme, identify different attackers, and authentically reconstruct the fine-grained forwarding path. The switch ID, the switch order, the ingress port ID, and the egress port ID are the information on the real physical path written directly from the switch, avoiding the problem of low traceability success rate caused by DDoS attacks forging path information. The attacking path label is formally expressed as follows:The above equation represents the fine-grained attacking path of a packet forwarded with the assistance of the path identifier from the ingress port of the first switch to the egress port of the last switch.

3.3. Advanced Marking Methods

In the first part of the marking procedure, the packet header length is designed to utilize a programmable combinatorial Bloom filter that does not limit the packet header to populate complete traceability information due to changes in the actual topological path length for adaptive path length. The probabilistic packet marking algorithm can effectively reduce the number of marked packets when reconstructing the attacking path.

3.3.1. Marking Mechanism Based on a Combination of Bloom Filters

The Bloom filter is a random data structure with high spatial efficiency. During the marking procedure, the fast insertion and query functions provided by the combined Bloom filter are used to complete the tagging task for different path lengths, improving the coding efficiency and reducing the high cost associated with storing the entire attacking path in multiple packets.

We have transformed the path information storage and reduction problem into a set of key-value pair storage and query problems. The packet-encapsulated traceback information field contains four types of information about the switch ID, switch order, ingress port ID, and egress port ID. Then, the switch ID and the ingress and egress ports ID are abstracted as keys in the Bloom filter and the order as values. For example, in Figure 2, the information for attacking path 1 can be represented by , where S6 is the first hop, S3 is the second hop, S1 is the third hop, S5 is the fourth hop, and S10 is the fifth hop. At each hop in the network, the key-value pairs of path information are encapsulated in packets as they propagate across the network. When the packet reaches the victim host, it needs to recover not only the key but also the value to reconstruct the entire attacking path. To recover the key values and maintain the space efficiency of the Bloom filter, a combination of Bloom filters, such as BF1, BF2, and BFT, was chosen to store the key-value pairs. The process of storing and parsing the path information is shown in Figure 2. A programmable switch within the network encodes the path information in the packet header, and the victim host decodes and reconstructs the path. By considering the size of the existing network and the number of bits in the packet header option field, BF1 is designed to have an bit length and different hash functions. BF2 has an bit length and different hash functions. The information written to BF1 is about all the programmable switches through which the packet forwards. The information written in BF2 is about the number of switch labels (IDs and ports) through which the packet passes on the attacking path. The sequential number (order) is inferred by , written by the switch directly to the traceback information field in the packet as it travels across the network. The algorithm for path information encoding is given in Algorithm 1.

(1)procedureINFORMATION ENCODE
(2)Key = 
(3)Value = 
(4)For  = 1 to do
(5)
(6)
(7)End for
(8)x = DynamicProbabilityMarking (switch)
(9)If   is in x then
(10)For = 1 to do
(11)
(12)
(13)End for
(14)End if
(15)
(16)End procedure
3.3.2. Dynamic Probabilistic Packet Marking Algorithm

A key problem here is how to determine which switches are selected to achieve the fewest packets to encode different parts of the path with a high probability. The victim host combines this different path information and decodes the complete path. In order to solve this problem, traditional traceback schemes based on SDN, overly dependent on the control plane, choose to use enough packets to reconstruct the entire attacking path, which also causes a large number of data storage resources. We transform this problem into a dynamic probabilistic packet marking problem.

In this problem, the convergence time of the algorithm performed by the network infrastructure is related to the number of packets required by the host victim to reconstruct the complete attacking path. The recovery of the complete attacking path is called convergence, and the convergence time originally represents the speed of route tracing. Then, it is defined by the minimum number of packets required for convergence to measure the performance of the traceback scheme.

In the traditional deterministic marking method, each router explicitly marks data packets by appending its label to the IP packet field. However, the use of this method results in significant traffic overhead. Unlike traditional deterministic marking methods, the dynamic PPM methods can enhance the probability of a packet marked by setting a certain router marking probability and reserving space in the packet for a router label. In this paper, the PPM method is redefined that each packet in the SDN only needs to be written to the programmable switch label once to mark the attacking path without abnormal packet loss. Analyzing the problem from an attacker’s perspective, the packets that are further away from the victim host are more likely to be dropped or overwritten in the log store. Therefore, in order to improve the marking rate of the access switch, the function is set in this paper to be the probability of a packet marked by the switch:

The minimum number of packets to be obtained (desired value) is as follows:

In our scheme, the marking probability of the programmable switch is set as follows:

where c is a constant given according to the actual network topology. The proposed dynamic probabilistic packet marking method differs from the static probabilistic packet marking method (ST) and the traditional dynamic probabilistic packet marking method (DY), where the marking probability in the static probabilistic packet marking method is as follows:

The marked probability in the traditional dynamic probabilistic packet labeling method (DY):

According to equations (2) and (4), the value of d determines the value of the marking probability during the marking process, which means that the marking value increases as the value of d increases. This design of the probabilistic algorithm is superior to traditional marking methods because it increases the number of packets carrying marking information to the access switch, effectively reducing the number of packets stored on the victim host that are overwritten. To figure out the number of packets needed to reconstruct a meaningful attacking path, we need to get the expected value. The probability value of the packet with the farthest switch label known to be received is the lowest, and the convergence time of the marking process depends mainly on the distance of the tagged packet to the victim. The probabilistic marking problem discussed is a classic coupon collection problem. The coupon collection problem refers to repeatedly taking an object out of a set of objects and then finding the number of times that all the objects have to be taken out at least once. We use the probability property that the expectation of the sum of random variables is equal to the sum of the expectations of these random variables. The expectation of the coupon collection problem is thus obtained as . By doing so, we can conclude that the expected number of packets required to construct a meaningful attack graph is .

3.4. Path Reconstruction Method

In the second part of the path reconstruction process, each programmable switch encodes the path information into the packet header, and the victim host decapsulates the packet to restore the information in the traceback information field carried by the packet. Then, the information is used to restore the attack path segments. The entire attacking path is reconstructed through several attacking path segments.

A Bloom filter table is preset in the network access terminal. The BFT design idea is to prewrite all switch information in the network topology into the BFT of the victim host, which greatly reduces the delay consumption in the decoding process. We set the parameters of the Bloom filter table and BFT2 to the same parameters. Each cell in the Bloom filter table stores a collection of all switch IDs and switch port numbers that have been hashed to this location. Insert All switch IDs and switch port numbers in the topology into the Bloom filter table and obtain a set containing the two pieces of information after searching the Bloom filter table. The victim host first recovers the ID and port number of the switch inserted into BF2. When the packet arrives at the victim host, check which locations in BF2 are set to 1. If the set of switch ID and switch port number appears k times, we believe that the two have been inserted into the Bloom filter table; that is, the packet is forwarded through the switch. As shown in Figure 3, positions “0,” “1,” “3,” “6,” “8,” and “10” in the Bloom filter table are set to “1.” After searching the Bloom filter table, the victim host gets a set composed of , where appears three times, which is the number of hash functions used in BF2 and the Bloom filter table. Therefore, we believe that is written in BF2. Select the collection to hash k times with the order information in the unpacked packet inside the victim host. The operation result is matched with the position of “1” in BF1, and the false positives caused by a collision are checked. If the position of “1” in BF1 is the same, the corresponding mapping relation of the attacking path segment can be established.

The victim host divides packets into different packet groups by path identifiers. Here, we use BF1 as the identifier for each path. Establish the mapping between BF1 and path ID and then the mapping between path ID and path reconstruction result. When the victim host receives the packet, it first extracts BF1 in the packet header and queries the first mapping. If an invalid path ID is returned, the packet passes through a new forwarding path. Therefore, we create a new path ID and then assign it to the packet. Instead, we use this valid path ID to query the second map to obtain the previous path reconstruction results. The reconstructed result is updated with the decoded path segment. This way, decoding the desired number of packets in each path group restores the real attacking path.

4. Experimental Results and Analysis

4.1. Experimental Environment

The traceback scheme proposed in this paper was tested in the following simulation environment: the experimental environment is based on the Ubuntu operating system with an Intel(R) Core(™) i7-8700 CPU @ 3.20 GHz, using a VMware Workstation virtual machine with 8 GB RAM, 40 GB hard disk, and four core processors already configured. The network topology was simulated using the Mininet [32] simulation tool. Background traffic and DDoS attack traffic are simulated and generated by the Scapy library[33] and hping3[34] tool. Detailed data traffic was captured and analyzed using Wireshark software.

Mininet simulated a simple topology for the SDN network, as shown in Figure 4. The network topology consists of ten P4 programmable switches, five terminals, and a controller. We select Host B, Host C, Host D, and Host E as the attacking hosts and Host A as the victim host. The background traffic consisted of 75% TCP packets, 20% UDP packets, and 5% ICMP packets. The terminal of each host is opened on the Mininet side via the XTerm command. When the host terminal is turned on, a shell script is automatically started, which runs a background traffic generation code written in python to generate and send random packets to one of the hosts in the network to simulate the background traffic. The packet interval for DDoS traffic is set to 25 times the packet interval for background traffic. The relevant settings of the device are listed in Table 3.

In order to prove the authenticity and validity of the experimental topology of this scheme [3538], we analyzed the topology of different existing networks, as shown in Figure 5, where Facebook Fabric Topology is the topology of the latest generation of the Facebook data center network. The Stanford Backbone network is the backbone topology of the Stanford University campus network. In contrast, the other three topologies are the average path length of traditional network IPv4 packets forwarded from the CAIDA host. Our topology is designed to meet the requirements of network topologies based on SDN architectures.

4.2. Effect of Bloom Filter Parameters

As each Bloom filter suffers from false positives, the false positives of the Bloom filter in our proposed scheme only exist when a wrong path may be constructed. By correctly setting the optimal BF1, the BF2 parameters reduce the minimum false alarm rate of the Bloom filter, improve the path restoration rate, and eliminate the excessive effect of introducing a bloom filter on the traceability success rate. We analyze how much information, such as switch ID, should be written to BF2 to ensure successful decoding by the victim host, depending on the false alarm rates of BF1 and BF2 in the scenario. The length of the Bloom filter depends on the available packet header space. We choose to use the 16-bit IP_ID and 12-bit VLAN_ID as the length of in BF1 and in BF2, respectively. Equations (7) and (8) represent the respective false alarm rates for BF1 and BF2:

According to the following equations [39], the optimal number of hash functions and the optimal number of switches written to the Bloom filter are determined to achieve the goal of the lowest false positive rate:

As the length of the attacking path from the victim increases, more packets are needed to reconstruct the path, as shown in Figure 6. We can observe that the optimal number of switches to insert into BF2 in order to ensure successful path recovery is either 1 or 2.

Here, the success rate of path information restoration can be defined by the following formula. When the victim host queries BF1, it generates a possible key-value pair. However, only one of these key-value pairs matches the one in BF1, which means that only this one has the probability of false positives, and the remaining key-value pairs have no false positives. Therefore, the failure probability of path reconstruction is shown as follows:

The success rate of path reconstruction is shown as follows:

When there are N switches in the network topology, the success rate of path information restoration can be expressed as follows:

The above equation is consistent with the binomial distribution . We plot the change in the success rate of path information restoration when the reduction is small, as shown in Figure 7. Note that when , the success probability is greater than 0.99. Furthermore, when , the success probability is greater than 0.9999.

The traceability scheme proposed in this paper is evaluated and analyzed by four evaluation indexes: success rate of traceability, traffic overhead, storage overhead, and controller memory occupancy.

4.3. Accuracy

The success rate of the traceability of the proposed scheme refers to the accuracy of identifying the switch closest to the attack source in the case of IP spoofing or multiple attack sources simultaneously attacking the victim host:

refers to the number of times that the switch closest to the attacker is traced within a period of time. is the total number of traceability performed within the preceding period. Under the above experimental conditions, the traceability accuracy of the proposed traceability scheme can reach about 98.93%, higher than that of the existing SDN technology-based traceability scheme, as shown in Figure 8. In the case of multiple-attack-source IP address spoofing, the attack sources located on attack paths of different lengths can be traced accurately. The success rate of tracing is affected by actual operations, such as packet loss in links, packet fragmentation, repeated marking of packets, and overwriting of log records.

4.4. Traffic Overhead

Figure 9 compares trends in the packet expectations required to reconstruct meaningful attacking paths for different marking methods. When the value range of parameter C is set to [0.8, 1], the rule that each programmable switch has at least one flag is met. For different path lengths D, the new dynamic probabilistic packet marking scheme proposed by us has a better effect and a smaller expected value, which is superior to the traditional static and dynamic probabilistic packet marking methods.

When , we select three different attack path lengths. The ratio between the number of packets required by the recovery path and the success rate of attacking path reconstruction is shown in Figure 10. As expected, the longer the reconstituted attacking path length, the more packets are required. The proposed scheme to find the number of packets required by the attack source is far less than 100. The traffic cost is lower than other schemes.

4.5. Storage Requirement

If the number of packets required for the path reconstruction is , the storage size required for logging our proposed mechanism is the amount of memory required for storing switch ID, switch order, ingress port, and egress port fields in addition to the fields normally required for traceback scheme as follows:

The required storage size for the same above-mentioned fields carried by the packets in the traditional scheme is shown as follows:

Thus, the memory storage reduction in our proposed work taking the above scenario is shown in Figure 11, observing that, as the length of the attacking path increases in the topology, the storage overhead of the traceback information field in the proposed scheme is less than that of the same information field in the traditional traceback scheme.

4.6. Controller CPU Utilization

The controller CPU utilization indicates the CPU utilization of the ONOS controller during the tracing scheme and other tracing schemes. Figure 12 shows that the attack traffic is detected at T1, and the CPU usage of the controller changes with the tracing time after different tracing programs are started. The traceability scheme proposed in this paper does not depend on the controller. After the traceability task is started, the memory occupation of the controller will not surge over time. The stable fluctuation phenomenon indicates that the controller in the network only interacts with the data plane switch normally at this time.

5. Conclusions and Future Work

This paper presents a scheme for attacking path reconstruction and identification of attack sources based on a combined programmable Bloom filter and a new dynamic probabilistic packet marking algorithm. The method adaptively encodes the path information in two Bloom filters encapsulated in the packet header. The path information is converted into a set of key-value pairs where the switch ID, ingress port, and egress port are abstracted as keys in the Bloom filters and the order is abstracted as a value. The new dynamic probabilistic packet marking algorithm effectively reduces the number of packets collected to reconstruct the attacking pat. An experimental study implementing the traceback scheme in a network emulation system shows significant improvement in the performance of the low-overhead and high-precision traceback scheme compared to other reported works.

In future work, we plan to develop an improved packet traceability algorithm and combine this traceability algorithm with a large-scale autonomous domain incentive deployment algorithm to propose a traceability strategy in a federated model to perform cross-domain traceability more efficiently and reduce maintenance costs across autonomous domains.

Data Availability

The data used in this paper are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the National Key RD Program of China (2022YFB2901600).