Abstract

As the most competitive solution for next-generation network, SDN and its dominant implementation OpenFlow are attracting more and more interests. But besides convenience and flexibility, SDN/OpenFlow also introduces new kinds of limitations and security issues. Of these limitations, the most obvious and maybe the most neglected one is the flow table capacity of SDN/OpenFlow switches. In this paper, we proposed a novel inference attack targeting at SDN/OpenFlow network, which is motivated by the limited flow table capacities of SDN/OpenFlow switches and the following measurable network performance decrease resulting from frequent interactions between data and control plane when the flow table is full. To the best of our knowledge, this is the first proposed inference attack model of this kind for SDN/OpenFlow. We implemented an inference attack framework according to our model and examined its efficiency and accuracy. The evaluation results demonstrate that our framework can infer the network parameters (flow table capacity and usage) with an accuracy of 80% or higher. We also proposed two possible defense strategies for the discovered vulnerability, including routing aggregation algorithm and multilevel flow table architecture. These findings give us a deeper understanding of SDN/OpenFlow limitations and serve as guidelines to future improvements of SDN/OpenFlow.

1. Introduction

By decoupling the control plane from the data plane, Software-Defined Network (SDN) makes programmability a built-in feature for networks, thereby introducing automaticity and flexibility to the networking management. SDN has therefore been foreseen as the key technology that enables the next generation of networking paradigm. Despite its promise, one of the most significant barriers towards SDN’s wide practical deployment resides in overwhelming security concerns [1]. Therefore, proactively detecting, quantifying, and mitigating its security vulnerabilities become of fundamental importance.

In spite of its novelty, SDN indeed reuses various design and implementation elements ranging from architectures and protocols to systems from traditional network. It is not surprising that SDN inheres the vulnerabilities intrinsic to these elements. For example, similar to any networked service, secure channels between controllers and switches might be disrupted by DDoS attacks; like firewall rules, the flow entries may also conflict with each other, leaking unwanted traffic; malicious arp spoofing generated by attackers may poison the controller MAC table, disturbing the normal topology information gathering and packet forwarding; untrusted applications may instrument SDN controller to perform malicious behaviors without proper access control, which is one of the design objectives for modern operating systems. In response, existing research in the context of SDN security mainly focuses on detecting and mitigating these vulnerabilities. For example, [2] evaluates man-in-the-middle attacks that target at SDN/OpenFlow secure channels; FortNOX [3] brings security enforcement module into NOX [4] and enables real-time flow entry conflict check; VeriFlow [5] detects network-wide invariant violations by acting as a transparent layer between control plane and data plane.

In this paper, we introduce a novel SDN vulnerability. The novelty of this vulnerability stems from the feedback-loop nature of SDN, a fundamental difference compared with traditional networks. Particularly, this vulnerability can be extremely severe in SDN-based networks where network traffic from different sources shares the same SDN switch’s flow table, for example, different tenants in a SDN-based cloud computing network.

Specifically, most commercial SDN/OpenFlow switches have limited flow table capacities, ranging from hundreds to thousands [6]. Such capacity is usually insufficient to handle millions of flows that are typical for enterprise and data center networks [7]. Nevertheless, the flow table capacity was just considered as a potential bottleneck of resource consuming attacks in the past, motivating researches on flow caching systems like [810]. But according to our analysis, the flow table capacity can lead to inference attack and privacy leakage under certain circumstances.

As a consequence of flow table overflow, the SDN controller needs to dynamically maintain the flow table by inserting and deleting flow entries. The maintaining process typically includes packet information transferring, routing rule calculation, and flow entry deployment, which leads to measurable network performance decrease.

Particularly, once the flow table is full, extra interactions between controller and switch are needed to remove certain existing flow entries to make room for newly generated flow entries, resulting in further network performance decrease. An attacker can therefore leverage the perceived performance change to deduce the internal state of the SDN. To be more specific, we consider the scenario that an attacker resides in a network that is managed by a SDN. The attacker can then actively generate network traffic, triggering the interactions between the controller and switch with respect to flow entry insertion and deletion. The attacker can then measure the change of the network performance to estimate the internal state of the SDN including the flow table capacity and flow table usage. We have designed innovative algorithms to exploit this vulnerability and quantify their effectiveness on exploiting this vulnerability based on extensive evaluation.

Additionally, to mitigate this vulnerability, we have proposed two possible defense strategies. The first strategy is a new routing aggregation algorithm to compress the flow entries so they will consume less flow table space. The second strategy is building a multilevel flow table architecture. Multilevel flow table architecture can implement flow tables with larger capacities without introducing additional power assumption or charges.

To summarize, in this paper we made the following contributions:(i)We have identified a novel vulnerability introduced by the limited flow table capacities of SDN/OpenFlow switches and formalized that threat.(ii)We have designed effective algorithms that can successfully exploit this vulnerability to accurately infer the internal states of the SDN network including flow table capacity and flow table usage.(iii)We have performed extensive evaluation to quantify the effectiveness of proposed algorithms. The experimental results have demonstrated that the discovered vulnerability indeed leads to significant security concerns: our algorithm can infer the network parameters with an accuracy of 80% or higher across various network settings.(iv)We have proposed two possible defense strategies for the discovered vulnerability, including routing aggregation algorithm to compress the flow entries, and multilevel flow table architecture to implement flow tables with larger capacities.

The rest of this paper is organized as follows. Section 2 gives an overall statement of the inference attack problem. Section 3 gives detailed inference algorithms targeting at FIFO and LRU replacement algorithms, respectively. Section 4 gives a detailed evaluation of the simulation results. Section 5 proposes two possible defense methods against this kind of inference attack. Section 6 is a brief discussion about our findings and future research. Section 7 describes some related works in this area. Finally, Section 8 concludes this paper.

2. Problem Statement

The vulnerability of flow table overflow in SDN potentially exists in SDN-based cloud computing network and other important SDN-based networking systems [11, 12].

After analyzing current structure and implementation of SDN/OpenFlow, its decoupled nature gives us inspiration: the interactions between control plane and data plane will lead to network performance decrease, which can be measured through performance parameters like round trip time (RTT). If a flow matches one flow entry, the flow will be forwarded directly according to the matched entry. This process is fast and will cost little time. When the flow table is full, some flow entry will be removed, then the controller has to calculate the rule and send a new flow entry to the switch, and this process is more complex and has more interactions between controller and switch than the previous case, which will cost more time.

Figure 1 gives an overall flowchart of packet processing in an OpenFlow switch. The three rectangular regions surrounded by dotted line stand for three possible packet processing branches, respectively. When the switch encounters an incoming packet, it will parse it and send the parsed packet into the subsequent processing pipeline.

Then as the first step of the pipeline, the switch will look up its flow table to search flow entries matching the packet. When there is a match, the switch will directly forward the packet according to actions associated with the corresponding flow entry. This branch is illustrated in the innermost rectangle of Figure 1.

When there is no corresponding flow entry in the flow table, extra steps will be introduced into the procedure. Additional interactions between the switch and the controller will happen to acquire corresponding routing rules, including packet information transferring, routing rule calculation, and flow entry deployment. The middle rectangle of Figure 1 illustrates this process.

Before the switch inserts the newly generated flow entry, it has to check the flow table status to make sure that there is enough space in the flow table. When the flow table is full, the controller has to perform flow table replacement operations to make room for the upcoming flow entry. These operations include deciding which old flow entry to delete according to certain flow table replacement algorithm and flow entry deletion. The outermost rectangle in Figure 1 stands for this branch.

That is exactly where the vulnerability lies. In traditional networks, the switches and routers are autonomous, which means they can maintain their routing tables locally without interacting with an external device. But due to the decoupled nature of SDN/OpenFlow, maintaining switch flow tables needs frequent interactions between switches and controllers, making it possible for an attacker to leverage the perceived performance change to deduce the internal state of the SDN network.

As shown in Figure 1, the rectangular regions surrounded by dotted line correspond to different possible packet processing branches. The larger a rectangle is, the longer the processing time of that branch will be because of the extra steps that rectangle contains. When there is a match in the flow table, the processing time will be the shortest; when there is no match in the flow table and the flow table is not full, the processing time will be longer because of addition routing calculation and flow entry deployment; when there is no match in the flow table and the flow table is full, the processing time will be the longest because a flow table replacement operation has to be performed. So as a network parameter directly influenced by the processing time, the RTT of a packet can serve as an indicator of flow table state and flow entry state.

The process of deciding RTT thresholds for flow table state detection is shown in Figure 2.

Figures 2(a) and 2(b) represent two cooperating threads, the -axis represents the packet sequence, and the -axis represents the recorded RTT of every packet. Firstly, in the upper thread, we generate a packet with a specific combination, calling it . Send to the target OpenFlow switch and record the corresponding RTT as . Currently there is no corresponding flow entry in the OpenFlow switch because is a new packet. After a time span , send to the target OpenFlow switch again and record the corresponding RTT as . If is chosen properly, the newly installed flow entry matching should still exist in the OpenFlow switch. Next, in the lower thread, we continuously generate packets , each with a different combination of and send these packets to the target OpenFlow switch with the time span of . Because there are no flow entries matching their packets in the OpenFlow switch, the recorded RTTs will be approximately the same as . Keep generating and sending packets until we observe a sudden increase of the RTT, which indicates that the flow table is full. Then in the upper thread we send again immediately and record the RTT as . To achieve higher precision, we can repeat the process and use average values of , , and as final results.

From the process above we can see that , , and will serve as thresholds for flow table state detection: when the measured RTT is around , we can infer that there is corresponding flow entry in the flow table; when the measured RTT is around , we can infer that there is no corresponding flow entry in the flow table and the flow table is not full; when the measured RTT is around , we can infer that there is no corresponding flow entry in the flow table and the flow table is full.

We model the SDN/OpenFlow network as a black box and observe its response (RTT) to different input (network packets), then we use the response to estimate the flow table state and flow entry state and perform further inference. The whole process comes in three steps.

Firstly, we send probing packets into the network to trigger the interaction. As there is still no mature routing aggregation algorithm or hierarchical routing rule solution, current SDN/OpenFlow switches typically use exact match rules. That means if we send packets with different faked metainformation like src_ip and dst_ip, there will be newly generated flow entries inserted into the flow table. If we send excessive probing packets in a short period of time, the flow table will overflow and then the interaction process will be triggered. Secondly, we measure RTTs of the responded packets and infer the flow table state and flow entry state. Thirdly, we use observed flow table states and flow table states as controlling signals in our inference algorithm and perform flow table capacity inference.

Having to achieve a hit rate as high as possible in a rather limited space, flow table serves like a “cache” in operating systems and web proxy servers. In this paper we choose FIFO and LRU because they are common and popular [13].

3. Inference Algorithm

The logical structure of our inference algorithm is shown in Figure 3. The inference algorithm consists of two main part: flow table state detection and flow table state control. For flow table state detection, we perform RTT measurement to classify the different states of flow table and specific flow entry. For flow table state control, we generate specific sequence of attacking network packets to manipulate the state of flow entries. For different flow table replacement algorithms, the relation between network traffic sequence and flow entry state will be different, so we will have different network traffic generation strategy for different flow table replacement algorithms like FIFO and LRU. We will introduce the inference algorithms for FIFO and LRU, respectively.

3.1. FIFO Inference Algorithm

As mentioned in Section 2, the inference process of FIFO algorithm will be as follows: we generate and send a huge amount of probing packets each with a different combination of src_ip, dst_ip, src_mac, and dst_mac, and the newly inserted flow entries matching the generated packets will “push” the other users’ flow entries out of the flow table. We can detect if the flow table is full and the existence of our flow entries. Combined with the number of inserted flow entries we recorded, we can infer the flow table capacity and flow table usage. The process of flow table state transformation is shown in Figure 4.

We use to represent the number of our inserted flow entries and use to represent the number of flow entries from other users in the flow table. Both and are functions of time. We use , , , and to represent four time points corresponding to four subfigures, respectively, and use to represent the flow table capacity.

Figure 4  () shows the flow table and the flow entries it contains just before the experiment starts. The rectangle items represent the flow entries from other users sharing the OpenFlow switch. The current number of other users’ flow entries can be expressed as .

Figure 4  () illustrates the time when we start to send generated packets, inserting new flow entries into the flow table. The grey rectangles represent the flow entries inserted by us. As we can see, our flow entries keep pushing other users’ flow entries to the front of the FIFO queue. During the experiment, we should keep a record of the generated packets, including their attributes and serial numbers.

Figure 4  () shows the time when we detect the flow table is full. At this point of time, flow entries from us and other users add up to fill the whole flow table precisely. We have

Figure 4  () shows the time when we detect that one of our inserted flow entries has been deleted. That means the flow table is now full of our flow entries, without any flow entries from other users. We have

Combine the two equations above; we have

According to the analysis above, we describe the inference process for FIFO algorithm as shown in Algorithm 1.

Require:
Packet-Sending Function: ;
List of IP: ;
Ensure:
The flow table capacity: ;
The number of other users’ flow entries: ;
while    do
SENDPACKET
if Flow table is full then
continue
end if
if One of our flow entries is deleted then
break
end if
end while
return ,

The main error of the inference comes from the flow entries inserted by other users when our insertion is in progress. We assume that our flow entry insertion speed is fast enough so that, during the period of experiment, the newly inserted flow entries are all from us. But that is not always the truth. Ignoring the possible flow entries inserted by other users will make our inference result smaller than the actual value.

Considering the flow entries inserted by other users, the actual equations are listed below.

When we detect the flow table is full, if we use to represent the number of just inserted flow entries from other users from time point to time point , the equation becomes

And when we detect that one of our inserted flow entries is deleted, the equation becomes

Combine the two equations above; we have

So the actual equation considering flow entry insertions during inference should be

Compared with our former equation ignoring flow entry insertions,

We can see that the inferred flow table usage and the inferred flow table capacity will both be smaller than the actual value.

3.2. LRU Inference Algorithm

The experiment principle of LRU algorithm has something in common with that of FIFO algorithm, because under these two circumstances we can both keep our flow entries stay in the back of the cache queue using certain operations. However, there are still differences lies in the flow entry maintaining process.

The nature of FIFO algorithm ensures that the position of the flow entries only depends on the time they are inserted. The earlier inserted flow entries are sure to be nearer to the front of the cache queue compared with the later inserted flow entries. But in LRU algorithm, the positions of the flow entries depend not only on the time they are inserted, but also on the last time they are accessed. In order to keep our flow entries stay in the back of the cache queue, we need to continuously access the previously inserted flow entries.

During the maintain process, every time we insert a new flow entry, we need to access all previously inserted flow entries for one time to “lift” them to the back of the cache queue. The access history may be like , and we call it a “rolling” maintaining process. The maintaining algorithm is shown in Algorithm 2. According to the analysis above, we describe the inference process for LRU in Algorithm 3.

Require
Packet-Sending Function: ;
List of Inserted IP: ;
functionROLLINGPACKETSENDER
while    do
for  ; ;   do
SENDPACKET
end for
end while
end function
Require:
Packet-Sending Function: ;
List of IP: ;
Ensure:
The flow table capacity: ;
The number of other users’ flow entries: ;
)
while    do
ROLLINGPACKETSENDER
if Flow table is full then
continue
end if
if One of our flow entries is deleted then
break
end if
end while
return  ,

The feasibility and error analysis of LRU algorithm is similar to that of FIFO algorithm. The inferred flow table usage and the inferred flow table capacity will both be smaller than the actual value because of ignoring the flow entries inserted by other users during the experiment.

4. Evaluation

4.1. Implementation

The emulation environment of our experiment consists of three parts: a network prototyping system used to emulate host and switch, a network controller, and our inference attack toolkit.

We choose Mininet [14] as the network prototyping system because it encapsulates host and switch emulation and thus easy to use. Our emulated network prototype for evaluation uses a star topology, consisting of hosts connected to a single OpenFlow switch. We build FIFO and LRU controller applications using Python on the basis of POX [15] OpenFlow controller. As for the inference attack toolkit, we use libnet [16] to generate probing packets, and libpcap [17] to capture replied packets. To simulate the background traffic in real network, we built a SDN testbed using Mininet and POX. On the SDN testbed, we performed a series of basic SDN operations. These operations include building a customized SDN network topology, setting up the link between SDN switches and performing the ping test between all SDN nodes. We captured the network traffic generated during these operations and use them as the background network traffic sample.

4.2. RTT Measurement

As we have mentioned in Section 2, the difference between traditional network and SDN/OpenFlow network in handling previously unseen packets gives us a possible indicator of the flow table state and the flow entry living state, RTT. When there is not corresponding flow entry existing in the flow table, the RTT of a packet will significantly increase due to the interactions between controller and switch in order to acquire new flow entries. That is the case when there is still space in the flow table. Once the flow table is full, the RTT of a packet will further increase as a result of extra flow table replacement operations. To prove the effectiveness of using RTT as the flow table state and flow entry state indicator, we measured packet RTTs corresponding to different flow table state and flow entry state.

Figure 5 gives the RTT measurement result showing the difference. The points with different symbols represent the total 300 times of RTT measurements we have conducted, 100 times of measurement for each combination of flow table state and flow entry state. The square points stand for RTTs when flow entry exists in flow table. The circle points and triangle points both stand for RTTs when flow entry does not exist in flow table; the circle points are measured when the flow table is full, and the triangle points are measured when the flow table is not full.

As can be seen from the figure, when flow entry exists in flow table, the packet RTTs are highly concentrated in the range of 0.2~0.3 ms; when flow entry does not exist in flow table and flow table is not full, the packet RTTs will increase to about 3~5 ms; when flow entry does not exist in flow table and flow table is full, the packet RTTs will be the highest, ranging from ms to ms. These three groups of RTTs all distribute intensively in a small range without overlapping other groups, showing the excellent discrimination of using RTT as a flow table state and flow entry state indicator.

4.3. Timeout
4.3.1. Default Timeout Values

According to our previous analysis, the feasibility of our inference attack depends on whether we can generate enough flow entries to fulfill the flow table within a single timeout cycle. That means we must have the ability to generate as many flow entries as the flow entry can hold during a timeout period. So we analyze several popular open-source controllers and search for their default timeout values in the built-in applications. The result is presented in Table 1. The zero values in the table mean the corresponding timeout will not take effect, or in other words the timeout value is “permanent.” As can be seen from the table, most available controllers have timeout values in the range of 5 s to 30 s.

If we take the flow table capacity of flow entries as an example, the minimum packet generating speed required will be packets per second, while libnet can generate tens of thousand packets per second. So the default timeout values ensure the feasibility of our inference attack.

4.3.2. Timeout Measurement

Though default timeout values of mainstream OpenFlow controllers can be read from their source codes, it is still possible for SDN network administrators to manually change the default timeout values. In order to handle nondefault timeout values and provide basis for adjusting packet generating speed, it is essential to examine the accuracy of passive timeout measurement.

Figure 6 illustrates relative errors (see equation (9)) of hard_timeout and idle_timeout measurement, respectively. We manually modify hard_timeout and idle_timeout values of POX OpenFlow controller to 5 s, 10 s, 15 s, 20 s, 25 s, and 30 s, and then we use timeout measurement algorithm mentioned in Section 2 to measure these timeout values and calculate relative errors:

Every line in Figures 6(a) and 6(b) corresponds to times of repeated measurements conducted under a certain timeout setting from 5 s to 30 s. The margin stays in the range of plus-or-minus 10 percent, showing the effectiveness and high accuracy of our timeout measurement algorithm.

4.4. Flow Table Capacity

Flow capacity is the primary target of our inference attack. It reflects the hardware specification of an OpenFlow switch. Figure 7 illustrates the flow table capacity measurement result when controller adopts FIFO replacement algorithm. We manually limited the switch flow table capacity to different values from flow entries to flow entries and used our framework to perform the inference.

The dark bars represent the manually set flow table capacities or real capacities. The light bars represent the measured flow table capacities. For every manually set flow table capacity, we conduct times of repeated measurements and take their mean value as the final result. From the figure we can see that the measured capacities are quite close to the real capacities, indicating the high accuracy of our inference framework. For example, when the real capacity is flow entries, our measured capacity is flow entries with an error of only flow entries. As the real capacity grows, the packet generating speed required becomes faster, placing higher requirements on packet sending, receiving synchronization and accurate timing. But our inference algorithm shows unbelievable stability and accuracy: when the real capacity is flow entries, our measured capacity is flow entries with an error of just flow entries.

Like Figure 7, Figure 8 also illustrates the flow table capacity measurement results, with the only difference of being performed under LRU replacement algorithm instead of FIFO.

According to our previous analysis, the inference principle of LRU replacement algorithm is more complex because of the unavoidable mixed nature of flow entries in the flow table and the rolling maintaining process. But our inference framework still shows high accuracy and reliability. Even when the real flow table capacities are set to be rather large values like and , the errors of our measure capacities are just around flow entries.

Only illustrating the mean value of measured flow table capacities may not be enough: the mean value may be the result of error compensations and hide the detailed measurement errors of every separate experiment. So in Figure 9 we illustrate the relative error of every single flow table capacity measurement.

We choose groups of different flow table capacities from flow entries to flow entries and perform times of measurements under every single flow table capacity value. Figure 9(a) stands for relative error of flow table capacity measurements conducted under FIFO replacement algorithm, showing that the margin is no larger than plus-or-minus percent. Figure 9(b) stands for relative error of flow table capacity measurements conducted under LRU replacement algorithm. Due to the more complex inference principle and the rolling maintaining process, the margin becomes larger but still has not exceeded percent even in the worst case.

The above inference attacks are performed without any background network traffic. When performing inference attack in real networks, the impact of background network traffic cannot be ignored. So it is necessary to examine the efficiency of our inference algorithm under these circumstances.

In this evaluation, we choose the background network traffic dataset from a SDN testbed. Figures 10 and 11 have the same experiment setting with Figures 7 and 8, with the only difference of replaying background traffic captured from SDN testbed during the inference attack process. Even with the impact of background traffic, our inference algorithm still shows high accuracy.

4.5. Flow Table Usage

In this section we evaluated our framework’s efficiency of inferring the number of flow entries from other users sharing the same flow table or the flow table usage. Flow table usage is our secondary inference target, and it reflects the network resource consuming condition of other tenants in the same SDN network. Figures 12 and 13 illustrate the flow table usage measurement results conducted under FIFO and LRU replacement algorithm, respectively.

Again we manually set different flow table usage values from to flow entries by manually generating and inserting corresponding number of flow entries into the flow table beforehand. Then we use our inference algorithm to infer the flow table usage and take mean values of every times of measurements as the final results. The errors of all these measurements show the high accuracy, stability and reliability of our inference algorithm.

We also conducted the flow table usage inference attack with background traffic. Experiment results with testbed background traffic are shown in Figures 14 and 15. Our inference algorithm can smoothly handle the impact of background traffic, which ensures the stability and robustness demonstrated in the experiment results.

The relative errors are shown in Figure 16. We emulate groups of different flow table usage values and conducted times of flow table usage inference for every single value. For both FIFO and LRU replacement algorithm, the relative errors of flow table usage inference stay in a quite small range. The results prove that our algorithm can infer other tenants’ flow table usage condition in high accuracy.

5. Defense

From the previous sections we can conclude that the inference attack is rooted in the flow table overflow. To defend that kind of inference attack, we have to prevent flow table overflow in two aspects: one aspect is to compress the flow entries to save flow table space, and the other one is to implement a larger flow table to store more flow entries.

5.1. Routing Aggregation

Routing aggregation is to combine multiple entries in the flow table without changing the next hops for packet forwarding. This approach is particularly appealing because it can be done by a software upgrade at the OpenFlow switch and its impact is limited within that switch. Routing aggregation has already been used in traditional networks, but it has not been deployed in SDN/OpenFlow networks. To fully utilize the flexibility of SDN/OpenFlow network under certain scenarios like load balancing, we proposed a global routing schedule using packing optimization algorithms.

Traditional routing aggregation algorithms [18] can be used to compress the flow table, but their effectiveness cannot be ensured. If the matching fields and next hops are dispersed enough, chances are that we may not be able to perform any routing aggregation because we cannot find flow entries sharing common matching fields and next hops. This is often the case when dealing with web traffic, for example, load balancing services. For that reason, we introduced an extra stage of routing aggregation: global routing schedule optimization. First we model this routing aggregation problem as a packing optimization problem and solve it, then we perform global routing reschedule by rewriting the flow entries according to the optimization result, and finally we perform another time of traditional routing aggregation on these new flow entries. After the global routing reschedule, there will be much more aggregatable flow entries, so the effectiveness of routing aggregation is ensured.

5.2. Multilevel Flow Table Architecture

It is important to note that routing aggregation is not a replacement for the long-term architectural solutions because it does not address the root causes of the flow table scalability problem and the following inference attack. To eliminate the inference attack vulnerability, a flow table architecture with larger capacity is required, which can be achieved through multilevel flow table consisting of both TCAM and SRAM.

The original single-level flow table architecture is shown in Figure 17. In this architecture, the flow table is completely implemented using TCAM. An input packet will traverse from table to table and add corresponding actions to the action set. Then all actions in the action set are executed and the packet is forwarded according to these actions.

Our proposed multilevel flow table architecture is shown in Figure 18. Besides flow table implemented using TCAM, we add another flow table implemented using SRAM, which is cheaper and can provide larger flow table space. Under this multilevel flow table architecture, the packet processing pipeline will be different: first an input packet will find matching flow entries in TCAM flow table, just like in the original single-level flow table architecture. If there is a match, the packet will execute the corresponding actions and get forwarded. If there is no match, the packet will continue its lookup in the SRAM flow table. If there is a match in the SRAM flow table, the packet can then be forwarded; otherwise it will be sent to the controller.

From the process above, we can see that if the capacity of TCAM flow table is , the capacity of SRAM flow table is , and then the multilevel flow table will have a capacity of . Actually is far more larger than , so this approach can greatly increase the flow table capacity, thus preventing flow table overflow.

6. Discussion

SDN/OpenFlow has become a competitive solution for next-generation network and is being more and more widely used in modern datacenters. But considering its key role as the fundamental infrastructure, we have to admit that the security issues of SDN/OpenFlow have not been explored to a large extent. Particularly, the flow table capacity of SDN/OpenFlow switch is only considered as a vulnerable part for DDoS and flooding attacks in published researches. But according to our analysis in previous sections, the flow table capacity can lead to potential inference attack if combined with reasonable assumptions and RTT measurements.

Firstly, we found in Section 2 that exact match flow entries as well as the lack of route aggregation would consume a lot of flow table space, making it impossible to process millions of flows per seconding using SDN/OpenFlow. Secondly, we found in Section 4 that assigning the decision making job of flow table replacement to the controller would lead to significant network performance decrease, which had to be changed in time. Thirdly, there is currently no mature attack detection mechanism for SDN/OpenFlow network, so it is quite easy for criminals to exploit system vulnerabilities or invoke DDoS attacks.

The inference method proposed in this paper just uses some basic elements and parameters of OpenFlow, such as idle timeout and hard timeout, which are significant for the implementation of SDN. These features will not be removed except very huge changes made. On the other hand, although some security frameworks [3] were proposed to detect the malicious insertion of flow rules, attackers can also bypass the detection by some well-designed insertion strategies.

All these security issues call for improvements to current OpenFlow switch and flow table design. The improvements should at least contain the following aspects: (1) New OpenFlow switch architecture, like embedding local caches in the switch or implementing multilevel flow table to achieve a much larger flow table capacity. With larger flow table capacity, the switch will not have to query the controller for flow entries, which will reduce the interaction latency to a large extent. (2) New flow table maintaining mechanism, like transferring the flow entry deleting workload from controller to switch. Switch itself can decide which flow entry to delete and then sync state with controller, and during the flow entry deleting process, the controller’s intervention is not needed. In the widely used OpenFlow Switch Specification 1.4.0 [19], this mechanism has been added as an optional feature, but without any mature implementation so far. (3) Routing aggregation. Routing aggregation can match a group of flows using one flow entry, which will reduce the flow table consuming significantly compared with exact match. (4) Inference attack detection. Administrators can develop inference attack detecting applications and then perform defenses like portspeed limiting or network address validation.

From the discussion above, we can see that there is still a long road to go before SDN/OpenFlow becomes a truly mature and reliable network paradigm. There are still urgent and severe issues to solve, which have been neglected in the past. Only by solving these security issues and architectural vulnerabilities can SDN/OpenFlow be widely deployed in real-world commercial datacenters and fully demonstrate its revolutionary flexibility and intelligence.

The inference attack proposed in this paper is motivated by the limited flow table capacity of SDN/OpenFlow switches. The flow table capacity issue has been presented in many previous works like [2024]. They all point out the limitation of switch flow table memory and potential scalability and security issue. However, these works do not give further analysis on the inference attack and information leakage caused by the limited flow table capacity.

Klöti et al. [25] present potentially problematic issues in SDN/OpenFlow including information disclosure through timing analysis. However, this information disclosure requires disclosing existing flows with side channel attack, which is hard to perform in real world. Compared with their approach, our inference attack is self-contained and requires no prior knowledge.

Gong et al. [26] present a kind of inference attack using RTT measurement to infer which website the victim is browsing. They recover victims’ network traffic patterns based on the queuing side channel happened at the Internet router. However, the scenario of their work is in the public Internet, while our approach focuses on SDN/OpenFlow infrastructures in cloud computing network. Compared with public Internet and website inference, the inference attack and information leakage in modern data centers are more sensitive and valuable.

Shin and Gu [27] demonstrate a novel attack targeting at SDN networks. This attack includes fingerprinting SDN networks and further flooding the data plane flow table by sending specifically crafted fake flow requests in high speed. In the fingerprinting phase, header field change scanning is used to collect the different response time (RTT) for new flow and existing flow. The fingerprinting result is then analyzed to estimate if the target network used SDN technology. The RTT measurement and analysis they used in fingerprinting are similar to our approach. But they just perform DoS attacks to the SDN network, without performing any further information leakage or network parameter inference.

As for flow table overflow defending strategy, Shelly et al. [28] and Katta et al. [29] introduce flow entry caching mechanism into SDN/Openflow network by inserting a transparent intermediate layer between controller and switch. Yan et al. [9] use CAB to generate wildcard flow entries dynamically and reactively to handle bursting network traffic. Kannan and Banerjee [30] present a flow entry compaction algorithm to save TCAM flow table space. This algorithm uses flow entry tags instead of matching fields as forwarding rules. Kim et al. [31] develop a new flow entry management scheme to reduce the controller overhead.

8. Conclusion

In this paper, we have explored the structure of SDN/OpenFlow network and some of the possible security issues it brings. After our detailed analysis of the SDN/OpenFlow network, we proposed a novel inference attack model targeting at the SDN/OpenFlow network, which is the first proposed inference attack model of this kind in the SDN/OpenFlow area. This inference attack is introduced by the OpenFlow switch, especially by its limited flow table capacity. The inference attack can be done in a completely passive way, making it hard to detect and defend. We also implemented the inference attack framework and examined the efficiency and accuracy of it using network traffic data from different sources. The simulation results show that the inference attack framework can infer the network parameter (flow table capacity and flow table usage) with an accuracy of up to 80% or higher. We also proposed two possible defense strategies for the discovered vulnerability, including routing aggregation algorithm and multilevel flow table architecture.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The research presented in this paper is supported in part by the National Key Research and Development Program of China (no. 2016YFB0800100), the Fund of China National Aeronautical Radio Electronics Research Institute (PM-12210-2016-001), the National Natural Science Foundation (61572397, U1766215, U1736205, 61502383, 61672425, and 61702407), and State Grid Corporation of China (DZ71-16-030).