Wireless Communications and Mobile Computing

Wireless Communications and Mobile Computing / 2021 / Article
Special Issue

Enabling Techniques for 6G Aerial Access Networks (AANs)

View this Special Issue

Research Article | Open Access

Volume 2021 |Article ID 9926619 | https://doi.org/10.1155/2021/9926619

Na Li, Yue Zhao, Ning Hu, Jing Teng, "Disruption-Free Load Balancing for Aerial Access Network", Wireless Communications and Mobile Computing, vol. 2021, Article ID 9926619, 15 pages, 2021. https://doi.org/10.1155/2021/9926619

Disruption-Free Load Balancing for Aerial Access Network

Academic Editor: Haitao Xu
Received03 Mar 2021
Accepted11 May 2021
Published24 May 2021

Abstract

A fundamental issue of 6G networks with aerial access networks (AAN) as a core component is that user devices will send high-volume traffic via AAN to backend servers. As such, it is critical to load balance such traffic such that it will not cause network congestion or disruption and affect users’ experience in 6G networks. Motivated by the success of software-defined networking-based load balancing, this paper proposes a novel system called Tigris, to load balance high-volume AAN traffic in 6G networks. Different from existing load balancing solutions in traditional networks, Tigris tackles the fundamental disruption-resistant challenge in 6G networks for avoiding disruption of continuing flows and the control-path update challenge for limiting the throughput of updating load balancing instructions. Tigris achieves disruption-free and low-control-path-cost load balancing for AAN traffic by developing an online algorithm to compute disruption-resistant, per-flow load balancing policies and a novel bottom-up algorithm to compile the per-flow policies into a highly compact rule set, which remains disruption-resistant and has a low control-path cost. We use extensive evaluation to demonstrate the efficiency and efficacy of Tigris to achieve zero disruption of continuing AAN flows and an extremely low control-path update overhead, while existing load balancing techniques in traditional networks such as ECMP cause high load variance and disrupt almost 100% continuing AAN flows.

1. Introduction

The next-generation cellular networks (i.e., 6G networks) can interconnect devices in space, air, and ground networks with the help of aerial access networks (AAN) to provide users Internet access with a substantial higher coverage than traditional network technologies (e.g., 5G networks) and hence have drawn much attention from both academia and industry [15]. Figure 1 gives an abstract architecture of 6G networks with AAN as a key component. After receiving data traffic from end devices, AAN forwards traffic to edge servers, which process and forward traffics to backend servers.

A fundamental issue of this 6G architecture is that with higher coverage of user devices and more ultra-high-bandwidth, ultra-low-latency applications such as VR/AR and remote medical, AAN needs to deliver high-volume data traffic involving a high number of flows from user devices to the corresponding backend servers via edge and backbone networks. As such, it is critical to load balance such traffic before they enter the backbone networks, such that it does not cause any congestion or network disruption in 6G networks.

Specifically, motivated by the recent success of flexible load balancing using software-defined networking (SDN), e.g., [614], in this paper, we investigate the feasibility and benefits of load balancing AAN traffic in 6G network using a software-defined edge (SDE), which we term as SDE-LB. Specifically, continuing the trend of moving from dedicated load balancing appliances to native network switches for load balancing (e.g., [68, 12, 1519]), SDE-LB takes advantage of the flexibility of a logically centralized controller to collect load statistics from both the network and the servers to compute load balancing flow rules and install them on programmable switches (also called load balancing switches), whose TCAM flow tables allow high-speed packet processing [2022] to forward packets to different backend servers, based on the matching results [12, 16, 18, 23, 24].

Despite the potential of SDE-LB, key challenges remain. In particular, we found that directly using existing load balancing solutions in SDN to SDE-LB encounters substantial challenges under dynamicity, where dynamicity can happen either on the data path or on the control path.

The key dynamicity challenge on the data path is the disruption-free challenge. Specifically, as the load of AAN from the flows contained in each flow rule changes over time, imbalance occurs and rebalancing is needed. Although one may apply previous studies [16, 18, 25] to conduct rebalancing, since they do not consider existing assignments of continuing flows from AAN, i.e., flows with open TCP connections, they can result in unacceptable disruptions of continuing flows. Although one may use migration to reduce the impacts of disruption, migration is considered adding substantial system complexity and hence is not preferred by many operators. Another possibility is to install exact machines to pin the assignment of continuing flows, but flow table size constraints make this approach infeasible. For example, one commodity edge server equipped with Dell Z9100 switch has only 2304 flow entries [26], while the number of data flows with different source IP addresses is much larger. Utilizing TCAM wildcard rules to aggregate flow rules may reduce the number of rules (e.g., [16, 18, 21, 24, 2729]) but would again result in a mass disruption of continuing flows.

The key dynamicity challenge on the control path is the control-path update challenge. Specifically, rebalancing AAN traffic load among servers requires an SDE-LB controller to send flow-mod instructions to update flow tables on the load balancing switches. Unfortunately, state-of-the-art SDN controllers have limited throughput in sending flow-mod instructions, putting a limit on control-path update frequencies.

In this paper, we cope with both dynamicity challenges by proposing the first disruption-resistant, low-control-path-cost, dynamic SDE load balancer for AAN traffic in 6G networks. Tigris provides two key novel insights for addressing the dynamicity challenges: (1) it shifts a small number of incoming flows among servers to achieve load balancing without disrupting continuing flows and (2) it leverages the small number of shifted flows, the decomposition of aggregation of a large rule set into parallel aggregation of multiple smaller rule sets, and the cached intermediate rule aggregation results from previous time slots to substantially improve the efficiency of flow rule aggregation and reduce the control-path update cost. This work sheds light for future research on system and protocol design in AAN and 6G networks, such as traffic engineering, resource orchestration, and network-application integration [30].

The main contributions of this paper are as follows. (i)We study a fundamental problem for 6G networks with AAN, the disruption-free load balancing problem for AAN traffic, identify the disruption-resistant and the control-path-cost challenges, and design a novel dynamic load balancer at the edge called Tigris to address these challenges(ii)We design DR-LB, an online algorithm, as the first phase of Tigris to compute per-flow disruption-resistant load balancing policies and prove that DR-LB achieves a competitive ratio of for load balancing, where is the number of backend servers(iii)We develop Tree-Agg, an incremental, bottom-up rule aggregation algorithm as the second phase of Tigris to compile per-flow load balancing policies into a highly compact, disruption-resistant rule set which also has a low control-path update cost(iv)We conduct extensive evaluations to show that Tigris achieves zero disruption of continuing flows in AAN of 6G networks, a close-to-1 competitive ratio on load balancing, a less than 5% rule update ratio between time slots, and up to 53x flow rule compression ratio, while other state-of-the-art approaches such as ECMP can cause 2-4x load variance and disrupt almost 100% continuing flows in AAN

The remainder of this paper is organized as follows. We discuss related work on different load balancing approaches in Section 2. We present the system settings and problem definition in Section 3. In Section 4, we propose the Tigris load balancer, introduce its overall workflow and the details of its key components, and discuss its generality and overhead. We evaluate the performance of Tigris with extensive simulation in Section 5 and finally conclude our work in Section 6.

Data traffic load balancing is a well-studied problem for which not only many algorithms have been designed and thoroughly analyzed [3133] but also many systems have been developed in supporting scalable, reliable network services [68, 12, 1519]. Modern networks increasingly rely on switches to develop load balancing solutions. In this section, we give a brief review of existing studies on load balancing and discuss the key limitations of these techniques for load balancing traffic from AAN.

2.1. Hash-Based Load Balancing

Among various load balancing systems, hash-based solutions such as Equal-Cost Multipath (ECMP) [25] and Weighted-Cost Multipath (WCMP) [7] are the most widely used ones. ECMP-based systems [25] evenly partition the flow space into hash buckets and use a hardware switch to redirect incoming packets to different software load balancers based on the hash value of the packet header. WCMP-based systems [7] achieve an unevenly weighted partition of the flow space by repeating the same software load balancers as the next hop multiple times in an ECMP group. However, these hash-based solutions split the traffic based on the size of flow space rather than the actual volume of flows, resulting in load imbalance due to the unequally distributed and dynamically changing traffic contribution from different flow spaces. Because ECMP hash functions are usually proprietary [18, 25], users have a limited customization capability in rebalancing to adapt to such dynamic flow statistics.

2.2. SDN Load Balancing

Compared with hash-based load balancing, SDN load balancing is a more powerful and flexible load balancing technique. As such, it is a more suitable solution to AAN load balancing, because balancing the dynamic traffic in AAN and 6G networks requires more flexible load balancing policies. SDN supports the programming of the flow rule table on switches using wildcards and hence enables a more flexible, fine-grained load balancing service [12, 16, 18, 23, 24]. The select group table defined in OpenFlow specification [21] can be used for load sharing, but it requires the controller’s guidance to respond to events not detectable by the switch, e.g., server failures. Solutions in [23, 24] send the first packet of every flow back to the controller for calculating the flow rules. Benson et al. [12] integrate the per-flow load balancing rule generation with WCMP to minimize the traffic congestion. Sending the first packet of each flow to the controller would add extra delay for these packets. Kang et al. [18] and Wang et al. [16] study the TCAM size-constrained traffic splitting problem and design an algorithm to generate efficient flow rules to partition the flow space into weighted subspaces for different servers. In addition, TCAM wildcards are also utilized to reduce the number of flow rules used for expressing network policies (e.g., TCAMRazor [27] and CacheFlow [28]), including the load balancing policy.

One important limitation of applying existing SDN load balancing solutions to load balance AAN traffic at an SDE of 6G networks is that they do not consider the key dynamic SDE load balancing challenges, i.e., the disruption-resistant challenge and the control-path update challenge, leading to the unacceptable disruption of continuing flows and a high control-path update cost. Miao et al. [34] design a stateful data plane load balancer, but it requires the support of expensive, customized programmable hardware. On the contrary, Tigris addresses the above challenges to design the first disruption-resistant, low-control-path-cost SDE load balancer using commodity SDN switches.

3. Dynamic SDE Load Balancing: System Settings and Problem Definition

We consider a dynamic SDE-LB system shown in Figure 2: it provides a service using multiple servers, indexed by . Clients access the service through a single public IP address, reachable via AAN. For any incoming packet, the load balancing switch at SDE finds the matched flow rule based on its source IP address, rewrites its destination IP addresses to that of the assigned server, and forwards it correspondingly to achieve load balancing. One may extend our system to multiple switches and to match on other fields for more complex load balancing scenarios for better scalability and fault tolerance, but we focus on the single load balancing switch and the source IPv4 address matching for clarity.

We consider that our system operates in discrete time where time is divided into slots, indexed by . Different time slots can have different duration to support a combination of periodic operations and event-driven operations, where events can include server up/down and the burst of data flows. At the beginning of each slot , the SDE controller computes a set of load balancing flow rules denoted as and update the flow table of load balancing switch accordingly.

Given a flow rule , it has three basic attributes: the set of source IP addresses it matches, the forwarding action, and priority, denoted as , , and , respectively. Given a packet, its source IP may be matched by multiple rules. In this case, the switch will rewrite its destination IP and forward it following the action of the rule with the highest priority, as specified in OpenFlow specification [21]. We use an attribute named to denote the total estimated load of the flows that followed the action of . Our system allows the load metric to be a generic metric, considering factors such as data volume and CPU. The controller collects related data from servers and switches to compute the load metric.

A set of flow rules can be aggregated into a single rule using wildcard. We call the source rules of , denoted as . We also define some parameters to assist rule aggregation. We use to record the aggregation level of rule , where the first matching bits of must not contain any wildcard. As we will show in Section 4.3, can be increased even if it fails to be aggregated with another rule.

An efficient SDE-LB system must satisfy multiple constraints, which we separate into two categories: (1) traditional constraints (i.e., table size, full-coverage, and zero-ambiguity) and (2) dynamicity constraints (i.e., disruption resistance and low control-path update).

3.1. Traditional Constraints

The first traditional constraint is the table size constraint. Specifically, the set of flow rules installed on the load balancing switch in any time slot must not exceed its flow table capacity , as expressed in Inequality

Secondly, for any incoming packet, there must be at least one flow rule that matches its source IP address and does not have punt as its action. This is the full-coverage constraint. It ensures that no packet will be punted back to the controller, eliminating the switch-to-controller delay for all incoming data traffic, and is expressed as

Thirdly, for any incoming packet, if there exists more than one flow rule matching its source IP, only one rule’s action will be taken. This is the zero-ambiguity constraint. We use to denote the set of source IP addresses that belong to but will not follow and express this constraint as where is the logical conjunction.

3.2. Dynamicity Constraints

Other than the traditional constraints, SDE-LB under dynamicity also introduces additional constraints on both data path and control path. For the data path, any flow whose TCP connections to the servers are currently open should not be shifted to another server unless the current server fails. We call it the disruption-resistant constraint. Denoting a continuing flow as and its server time slot as , this constraint can be expressed as

For the control path, the number of rules updated (i.e., deleted or inserted) in each time slot should be within the capacity of the controller, which is called the control-path update constraint. Using to denote this constraint, we have

Denoting the data flow forwarded to server in time slot as and the number of time slots that server is working till time slot as , we formulate the disruption-resistant, low-control-path-cost, dynamic load balancing problem as subject to constraints (1), (2), (3), (4), and (5). We prove the NP-hardness of this problem via a transformation from the classic multicore balancing problem [31]. Note that in Equation (6), we use the time-averaged load balancing as the system objective. This is the typical objective of load balancing systems [3133], and it reflects the requirement of dynamic SDE-LB, i.e., long-term, online load balancing. As we will discuss in Section 4.4, we design the Tigris system to be modular so that it provides users the flexibility to define different load balancing objectives and methods while maintaining the benefits of Tigris, e.g., low-control-path-cost and disruption resistance.

4. Tigris: A Disruption-Resistant, Low-Control-Path-Cost, Dynamic SDE Load Balancer

We now fully specify Tigris, in which a controller computes disruption-resistant load balancing policies and generates compact flow rules with low control-path cost correspondingly. We will start with an overview of Tigris in Section 4.1. Then, we will give the design details of its key components: policy generation in Section 4.2 and rule aggregation in Section 4.3. We also discuss the generalization of Tigris in Section 4.4.

4.1. Overview of Tigris

In practice, Tigris operates in both periodic mode and event-driven mode in response to events such as the up/down of servers and the burst of data flows. For simplicity of presentation, we assume a periodical operation mode, where the controller executes Tigris at the beginning of every slot . Figure 3 presents the workflow of Tigris. Specifically, each invocation of Tigris can be divided into two phases: (1) policy generation and (2) rule compilation.

4.1.1. Policy Generation

During the policy generation phase, Tigris first collects load statistics and related flow information, e.g., TCP connection status, from each server , and estimates the target load of , i.e., the estimated load of incoming flows which should be assigned to in time slot for load balancing purpose (Step 1). A flow is identified by a TCP/IP 5-tuple. It then uses the DR-LB algorithm to compute a set of disruption-resistant load balancing policies for all servers available in time slot and generates a set of per-flow rules to express the computed load balancing policies (Step 2). In particular, DR-LB keeps the forwarding action of continuing flows on available servers and adopts an online, greedy approach to shift incoming flows from overloaded servers to underloaded servers based on the target load of each server and the similarity of source IPs between continuing flows and incoming flows. This design is disruption-resistant. It achieves load balancing by shifting a small number of flows among servers and increases the chance of reusing rule aggregated results from last slot during the rule aggregation phase, substantially increasing the aggregation efficiency and reducing the control-path update cost. And we prove that it achieves a competitive ratio for Equation (6) in certain cases.

4.1.2. Rule Compilation

In the second phase, Tigris adopts a bottom-up rule aggregation algorithm Tree-Agg to iteratively aggregate the large per-flow rule set into a highly compact rule set along a binary IP tree (Step 3). Tree-Agg has three novel design points: (1) only traversing from IP leaves of shifted flows to the root of the IP tree, (2) decomposing the aggregation of a large rule set into parallel aggregation of multiple smaller sets, and (3) using cached intermediate rule aggregation results from slot during aggregation. These design points substantially increase the efficiency of rule aggregation and yield a compact rule set with a high similarity with , which reduces the control-path cost for updating the data plane. After getting , Tigris updates the data plane by deleting the set of obsolete rules from the switch and installing the set of new rules (Step 4).

4.1.3. Addressing the SDE-LB Constraints

We show in the next subsections that the compact rule set expresses the same policies as the per-flow rule set and satisfies the disruption-resistant constraint, the full-coverage constraint, and the zero-ambiguity constraint. Because it is NP-hard to decide if a rule set can be aggregated into a smaller set of a given size [35], Tigris uses a software switch as a safety measure to install extra rules if the compact rule set still exceeds the table size of the hardware switch. In this way, when packets arrive, the hardware switch first tries to find a matching rule. If no matching rule is found, the packet is forwarded to the software switch (e.g., OpenvSwitch) for matching and processing. We show through extensive evaluation in Section 5 that this measure is rarely needed in practice since the aggregated rule set computed by Tigris is highly compact and outperforms the state-of-the-art rule aggregation solution, i.e., TCAM Razor. Furthermore, we also show that the compact rule set computed by Tigris has an extremely low control-path update cost.

4.2. Online, Disruption-Resistant Policy Generation

We now give the details of the policy generation phase of Tigris. It involves two steps: statistics collection and target load adjustment and online, disruption-resistant load balancing, marked as Step 1 and Step 2 in Figure 3, respectively.

Step 1 (statistics collection and target load adjustment). At the beginning of time slot , Tigris first collects , the actual load of each server in slot , and related info about its flows, e.g., TCP connection status. In practice, such statistics can be retrieved from the log or the monitor process of the servers. It then estimates the total incoming load for all servers in slot as , the load of continuing flows (i.e., flows with open TCP connection) at each server in slot as , and the load of incoming flows at each server in slot as when the load balancing policy stays the same in as in , using methods adopted in [21, 28]. Next, Tigris calculates a key metric , the “target share” of each available server at each slot till the current time slot , in where is the number of time slots server is available among all slots. Note that one can extend it to the case where different servers have different capacities.

With , Tigris then computes the target load assigned to individual server which is available in slot , denoted as in Specifically, according to , should serve in all slots until . Since it has already served and will serve in slot for continuing flows, these “credits” are deducted. Note that the way Tigris computes depends on different load balancing objectives. This paper focuses on Equation (6).

Step 2 (online, disruption-resistant load balancing DR-LB). With the targeting load for every server , we design the DR-LB algorithm, summarized in Algorithm 1, to compute a set of disruption-resistant, per-flow load balancing policies and the corresponding per-flow rule set .

4.2.1. Basic Idea

We design DR-LB as an online algorithm to achieve load balancing by shifting a small number of incoming flows among servers. This design ensures disruption resistance of continuing flows. It also increases the chance of reusing cached aggregation results from the last time slot during the rule aggregation phase (Section 4.3), substantially increasing the aggregation efficiency and reducing the control-path cost to update the flow table on the switch.

At the beginning of the whole system, i.e., , it sets the load balancing policies to evenly divide flows from the whole source IP space to all servers and generates a per-flow rule set (Line 1). At each time slot , it takes the collected load statistics from Step 1 and , the per-flow rule set for slot as input (Line 3). It then iteratively shifts flows from overloaded servers to underloaded servers to achieve the target load for each server (Lines 4-18). During this process, it replaces the per-flow rule for every shifted flow with a new per-flow, eventually getting the new per-flow rule set (Lines 19-23).

1 : a set of per-flow rules which approximately evenly divide the whole IP space to all servers, where each rule has and
2 foreachdo
3  ,
4  foreachavailable serverdo
5   ifthen
6    
7   else ifthen
8    
9  foreachunavailable serverdo
10   ,
11   
12  foreachdo
13   while and do
14    
15    
16    
17    ifthen
18    
19    
20    , where and is the assigned server of the flow with the longest IP prefix match of .
21    , , ,
22    ,
23    
24  return
4.2.2. Categorizing Overloaded and Underloaded Servers

DR-LB first categorizes servers into overloaded and underloaded based on their target load (Lines 4-11). Given an available server at time slot , it is overloaded (underloaded) if its target load is smaller (larger) than the estimated load of incoming flows when load balancing policies stay the same in slot as in (Lines 4-8). For any server that is unavailable in slot , we set its target load as 0 and consider all the incoming flows destined to incoming (Lines 9-10). Hence, every unavailable server is an overloaded server (Line 11).

4.2.3. Shifting Incoming Flows from Overloaded Servers to Underloaded Servers

Next, DR-LB adopts a greedy approach to balance the load among servers. For every overloaded server , it iteratively finds the incoming flow with the largest estimated load and shifts it to an underloaded server which is the assigned server for the flow with the largest IP prefix match with (Lines 14-18). This process stops when is no longer overloaded or there is no underloaded server (Line 13). The rationale of this approach includes that (1) it achieves the load balancing among servers by shifting a small number of flows among servers and (2) it increases the chance of rule aggregation for the flow rules for shifted data flows. For instance, suppose has the source IP 10.0.0.0 and was forwarded to an overloaded server . And there exists a flow whose source IP is 10.0.0.1 and was forwarded to an underloaded server . DR-LB will make the load balancing decision to forward to and generates a flow rule to express this decision. In this way, the rules for and can be aggregated into during the rule aggregation process.

During flow shifting, DR-LB also generates the per-flow rule set for slot . It initializes as the per-flow rule set for the last slot (Line 3). For each shifted data flow, DR-LB generates a flow rule with a priority of 32, a level of 0 (Lines 19-22). is also assigned a property as 1 to indicate that represents a newly generated load balancing policy. Then, it finds the flow rule in who has the same match field as and replaces it with (Line 23). In the end, we get the per-flow rule set representing the load balancing policies for slot .

4.2.4. Performance Analysis of DR-LB

It is easy to see that satisfies the disruption-resistant and the full-coverage constraints. We then propose the following proposition on the performance of DR-LB.

Proposition 2. When all servers are available, repeatedly generating load balancing policies via DR-LB at every time slot achieves a competitive ratio of on the objective function in Equation (6).

Proof. When all servers are available, any instance of our load balancing problem can be transformed into an instance of the classic load balancing problem, which is aimed at minimizing the maximal task completion time across servers. And in the transformed instance, all tasks arrive at the same time. With this transformation, we can see that the load balancing policy computed by DR-LB is in the set of all possible policies computed by greedy Graham’s algorithm [31]. Hence, the competitive ratio of of DR-LB for Equation (6) is a direct result of applying the technique in proving the competitive ratio of Graham’s algorithm.

Although the complexity of DR-LB depends on the number of arriving flows, which can be large due to mouse flows, in practice, we can only focus on elephant flows for computing the load balancing decisions and assign mouse flows evenly across different servers.

4.3. Incremental, Bottom-Up Rule Compilation

We now give the details of the rule compilation phase of Tigris. It involves two steps: incremental, bottom-up rule aggregation and data plane update, marked as Step 3 and Step 4 in Figure 3, respectively.

Step 3 (incremental, bottom-up rule aggregation (Tree-Agg)). Directly installing the initial disruption-resistant per-flow rule set computed by DR-LB into the load balancer switch is infeasible because the size of is much larger than the size of the switch flow table, tens of thousands vs. a few hundreds or thousands. Next, we develop the Tree-Agg algorithm which adopts a bottom-up approach to aggregate into a highly compact flow rule set expressing the same load balancing policies as does.

4.3.1. Basic Idea

We design Tree-Agg to iteratively aggregate rules in along the 32-level binary tree for IPv4 addresses, starting from the leaf nodes, i.e., level 0. At a first glance, this approach is impractical since the complete IPv4 address tree has leaf nodes, an extremely large number to traverse. However, we propose three novel design points in Tree-Agg. First, and typically only have small different flow rules, i.e., rules where is set to in DR-LB, due to the online feature of the DR-LB. Hence, at each time slot , Tree-Agg only needs to traverse from the leaves representing the new rules in to the root of the binary tree. Secondly, Tree-Agg decomposes the aggregation of a large rule set into parallel aggregation on multiple disjoint subsets. Thirdly, Tree-Agg uses the cached intermediate aggregation results from the last time slot to aggregate with the new rules from . These design points have two major benefits: (1) they substantially reduce the traverse and aggregation scale, hence significantly increasing the efficiency of aggregation, and (2) they reuse the cached intermediate aggregation results from at maximum to ensure that would have a high similarity with , the installed rule set for time slot , hence substantially reducing the control-path cost to update the flow table of the switch.

4.3.2. Rule Aggregation Operations

Before we present the details of the Tree-Agg algorithm, we first introduce some basic operations for rule aggregation, which are summarized in Algorithm 2. To aggregate a set of flow rules into one rule , we first need to decide the matching field of . This is computed in the matchFieldAgg function, where a wildcard is used at every matching bit that is not the same across every rule , and an exact bit 0 or 1 is used otherwise (Lines 1-7). In this way, it is guaranteed that every flow that matches at least one rule in will also match .

After computing the matching field of , we use the ruleAgg function to set other properties of , including action and priority. Note that during the whole rule aggregation process, we only consider aggregating rules that have the same forwarding action to avoid altering the original forwarding policies (Line 10). And the priority of the aggregated rule is set as the number of exact matching bits in its matching field (Line 11). The aggregated source and are set based on the definition in Section 3, and the aggregation level of is also increased by 1 (Lines 12-15).

1 Function matchFieldAgg()
2  for to do
3   if is the same then
4    ;
5   else
6    ;
7  return;
8 Function ruleAgg()
9  
10  
11  
12  
13  
14  
15  return;
16 FunctionfindDom()
17  
18  foreachdo
19   ifand and then
20    
21  return
22 FunctionruleLevUp()
23  foreachdo
24   ;
25  return

Given an aggregated rule and a set of rules , we can also compute the set of rules that direct a subset of the flow space of to a different server, i.e., , using the findDom function. This process is straightforward by checking the matching field, the action, and the priority of every rule (Lines 19-22). In addition, as we will show next in the Tree-Agg algorithm, if an aggregated rule cannot be inserted in the flow rule set due to the violation of the no-ambiguity constraint, the aggregation level of every rule in still needs to be increased by 1, by using the ruleLevUp function.

4.3.3. Bottom-Up Rule Aggregation along an IP Tree

Having introduced the basic operations for aggregating rules, we now present the details of the Tree-Agg algorithm. Tree-Agg iteratively aggregates rules in along the 32-level binary tree for IPv4 addresses, starting from the leaf nodes, i.e., . As stated earlier, Tree-Agg is an incremental aggregation algorithm. At each time slot, , it only traverses from the leaves representing the new rules in , i.e., rules with , to the root of the binary tree. During the traverse, it uses , the cached intermediate aggregation results from last slot , to aggregate with the new rules from .

The pseudocode of Tree-Agg is shown in Algorithm 3. In the beginning, we construct a rule set by adding all newly generated rules in (Lines 2-4). We use this initialization of to ensure that the rule aggregation process only traverses from the leaves representing the new rules to the root of the binary IP tree. And because the load balancing policies for time slot are different from slot , intermediate aggregated rules from that conflict with the load balancing policies of slot need to be removed (Line 5). For instance, suppose a rule with is in and a rule with is in . Because and have the same location on the IP tree, i.e., the same leaf node, needs to be removed from . And we define the function to remove all the rules with that have the same location as rules in in the IP tree.

4.3.4. Decomposition of Rule Set into Disjoint Subsets Using 3-Node Subtree Representation

In each iteration of the main loop (Line 6-49), we aggregate the set of flow rules at the current level of the IP tree into a more compact set and send it to the next level. In particular, every time, we randomly select a remaining flow rule from the current rule set (Line 9). And we define the function to return all the rules whose first matching bits are the same as but the th bit is different, from the union of and . Using this function, we construct two subsets and and remove their overlapping rules from and (Lines 10-12). We can view and as two nodes with the same parent in the IP tree. Figure 4(a) gives an example of this subtree. Denoting the aggregated rule set of and as and placing it in the parent node of this 3-node subtree, we can see that the aggregation process in each iteration of the main loop can be decomposed into the aggregation process of multiple 3-node subtrees. We prove the following property on the IP tree.

Proposition 4. If and are two rule sets belonging to different 3-node subtrees on the same level of the IP tree, and , .

Proof. For any and , they only share the same first bits and differs on the . And we see that and only contain rules with lower levels. Therefore, it is impossible to have a rule in to intersect the flow space of both and .

With this proposition, the aggregation of each subtree on the same level of the IP tree can be performed in parallel without violating the no-ambiguity constraint. Hence, such a decomposition substantially reduces the problem scale and improves the efficiency of rule aggregation. Next, we describe the aggregation process for this 3-node subtree.

4.3.5. Aggregating a 3-Node Subtree When Is Empty

If the set is empty, i.e., has no sibling with the same parent in the IP tree, it means that there are no flows with nonzero estimated load coming to the load balancing switch in the next time slot. In this case, we generate a new set of aggregated rules by changing the th bit of the matching field of all rules in to the wildcard (Lines 14-15). It is straightforward that the newly generated aggregated rules do not increase the size of the rule set, cause no violation of the no-ambiguity constraint, and help ensure the full-coverage constraint by covering the flow space that has zero expected loads. As a result, it is a successful aggregation and we insert the updated with an increased level into the aggregated rule set (Line 16).

1 ,
2 foreachdo
3  if is 1 then
4   
5 
6 whiledo
7  
8  whiledo
9   Randomly select one rule
10   ,
11   ,
12   
13   ifthen
14    foreachdo
15     ,
16    ,
17   else
18    foreach server do
19     foreach where do
20      
21      foreach where do
22       
23       
24       
25       ifthen
26        continue
27       
28       ifthen
29        ,
30        ,
31        break
32       else
33        if
34         continue
35        else ifthen
36         
37         ,
38         
39         break
40      if then
41       
42       ,
43     foreach where do
44      
45      ,
46   
47  
48  
49  ,
50 return as
4.3.6. Aggregating a 3-Node Subtree When Is Nonempty

If the set is not empty, we move on to aggregate rules with the same forwarding server in and (Lines 18-45). To do this, we randomly select from and from whose forwarding actions are the same and generate the aggregated rule (Line 22). Note that in Tree-Agg, we select and from different sets. This is because we can prove.

Proposition 6. Given any two rules and both from or whose forwarding actions are the same, they cannot be aggregated to reduce the number of rules without violating the nonambiguity constraint.

Proof. Without loss of generality, assume that there exist such two rules and with the same forwarding action and both from . If they can be aggregated to reduce the number of flow rules, they should have been done during the aggregation of the 3-node subtree rooted at . The only reason they are not aggregated is that they cannot reduce the number of flow rules, i.e., aggregating them requires extra rules to eliminate ambiguity. So, at itself, they should not be aggregated. Now, assume that there is another rule that shares the same forwarding action. Aggregating , , and together would cause the same issue because only shares the same first bits with and . Hence, and cannot be aggregated to decrease the number of flow rules.

After generating , we then compute using the findDom function defined in Algorithm 2 (Lines 23-24). Leveraging Proposition 4, we only search in the union of , , and instead of the whole rule set to find efficiently. Having computed , we can check if directly inserting will lead to any violation of no-ambiguity constraint. To this end, we first check if there exists a rule from which violates this constraint with and . If so, cannot be inserted into the aggregated rule set since such ambiguity cannot be removed even if is later aggregated with another rule (Lines 25-26). If such an does not exist, we then check if any ambiguity will happen between and other newly aggregated rules in (Line 27) and take different actions in different cases.

Case 1. If such a rule does not exist, can be directly inserted into as no ambiguity will be introduced by this insertion (Lines 28-31).

Case 2. If there are more than two such rules, we do not insert into because at least 2 rules in have to be unaggregated to avoid ambiguity, which would increase the size of the rule set (Lines 33-34).

Case 3. If there is only one such from , we compare the size of and and only keep the one with a smaller set of flow-space intersection rules (Lines 35-39). The rationale of this strategy is that an aggregated rule with a smaller set would have a smaller chance to cause ambiguity with future aggregations in and .

If the aggregated rule eventually cannot be inserted into the aggregated rule set, we move on to select the next pair of rules from and for aggregation trial. For any rules and that has failed all possible aggregations, we increase their aggregated level by 1 and insert them directly into (Lines 40-45). The aggregation process of a 3-node subtree stops when both and are empty. We insert the resulting aggregated rule set for this subtree into a temporary set and move on to select the next 3-node subtree for aggregation. The aggregation process for the current level stops when every nonempty 3-node subtree on this level has been aggregated. And we then repeat the whole aggregation process for the next level on the complete aggregated set (Lines 46-49) until we reach the root of the IP tree, i.e., . Note that before each iteration of the main loop, the function is invoked to remove obsolete, conflicting rules from (Line 48). And we also cache to assist the future rule aggregation for time slot (Line 49). In the end, Tree-Agg returns as the compact rule set (Line 50).

4.3.7. An Example

We use the example in Figure 4 to illustrate the whole flow rule aggregation process on a subtree. The original 3-node subtree is shown in Figure 4(a), and we omit the first 29 bits for simplicity. We start by aggregating the two rules forwarding to server and get with priority 1. Because this rule does not cause any ambiguity violation with either or , we insert it into in Figure 4(b). Next, we generate another aggregate rule for server , i.e., with priority 1. Though this rule does not cause any ambiguity violation with , it conflicts with the newly inserted rule in . To decide which rule to keep, we compare the size of their set. contains both and from , while only contains . Therefore, we keep the latter to reduce the probability of ambiguity conflict with future aggregated rules, and unaggregate back to its sources, and get the new shown in Figure 4(c). Last, we generate an aggregate rule for server and found that it has no ambiguity conflict with other rules. In this way, we get the minimal aggregated rule set with only 5 rules in Figure 4(d). Readers may find that if we keep in , the minimal size of will increase to 6, causing unnecessary waste of limited flow rule space.

4.3.8. Performance Analysis of Tree-Agg

From the previous propositions, we see that Tree-Agg does not change the action of any rule in when compressing it into . Hence, expresses the same load balancing policies as does, i.e., it satisfies the disruption-resistant and the full-coverage constraints. And Propositions 4 and 6 ensure that satisfies the zero-ambiguity constraint. One may notice that this algorithm has a polynomial complexity of the number of shift flows. However, because the number of the shifted flow is usually small, the decomposition of flow rule aggregation and the cached intermediate rule aggregation results from slot ensure that Tree-Agg is computationally efficient and that has a high similarity to , substantially reducing the control-path update cost of Tigris.

Step 4 (data plane update). After computing the compact rule set , Tigris deletes the set of obsolete rules from the switch and then installs the set of new rules . It is proved NP-hard to decide if a given set of flow rules can be aggregated into a smaller set of a given size [35]. Hence, Tigris uses a software switch as a safety measure to install extra rules if the compact rule set still exceeds the flow table size of the hardware switch. As we will show in Section 5, however, it is rarely needed in practice since is highly compact. And we will also show that is highly similar to , i.e., a low rule update ratio, yielding an extremely low control-path update cost.

4.4. Generalization of Tigris
4.4.1. Supporting Heterogeneous Servers and Other Load Balancing Objectives

For simplicity, we assume identical server machines in this paper, and the DR-LB algorithm makes online, disruption-resistant load balancing policies that achieve a good competitive ratio on the load balancing objective in Equation (6). However, Tigris can also be applied to scenarios where servers have different computation resources, e.g., CPU and memory. This is because Tigris adopts a modular design which separates the load balancing decision process from the rule compression process. With this design, users have the flexibility to define and implement different load balancing algorithms, while still leveraging the high rule set compressing capability of Tree-Agg.

4.4.2. Prefix vs. Suffix and Per-Flow Load Balancing vs. Per-Network Load Balancing

Though in the Tree-Agg algorithm we assume an aggregation process based on the IP prefix, it is straightforward to apply it for suffix-based aggregation. In addition, Tigris supports not only per-flow load balancing but also per-network load balancing. In the latter case, the DR-LB algorithm can make load balancing decisions for each network, i.e., the traffic from the same network will be grouped and forwarded to the same server. With this type of load balancing decision, the Tree-Agg algorithm starts the aggregation from the network level, instead of from the leaf level of the IP tree. The computation overhead for per-network load balancing hence can be substantially reduced, with the trade-off of fine-grained load balancing policies.

5. Evaluations

We implement a prototype of Tigris and carry out extensive simulations to evaluate the efficiency of Tigris in achieving size-constrained, disruption-resistant load balancing. The evaluation is performed on a MacBook Pro with 4 2.2 GHz Intel i7 cores and 16 GB memory.

5.1. Methodology

In our evaluations, we assume the setting of Figure 1, where a load balancer switch direct clients’ requests to a set of servers. We generate flows for the slot. Each flow has a source IP address chosen uniform randomly in the whole IP address space. Among the flows, a ratio of flows are considered continuing flows with randomly chosen forwarding servers in the previous slot and hence cannot be disrupted (i.e., cannot be redirected to another server). We assume that the load of each flow is its traffic volume and the volume is uniformly distributed between 1 MB and 1000 MB. We evaluate different settings of flows, servers, and ratio of continuing flows. We compare Tigris with the state-of-the-art ECMP [25] system on the following load balancing metrics: (i)Load variance: the standard derivation of servers’ load over the load balancing target (ii)Disruption ratio: the percentage of disrupted continuing flows

We also compare Tigris with the state-of-the-art TCAM Razor system [27] on the following rule aggregation metrics: (i)Rule compression ratio: the ratio of the size of per-flow rule set generated by DR-LB and that of the aggregated rule set computed by Tree-Agg(ii)Rule set size: the size of the compact flow rule set computed by the Tigris

For these metrics, we run Tigris for time slots and summarize the average results. In addition, we also study the following metric to measure the control-path cost of Tigris: (i)Rule update ratio: the ratio of compact flow rules in slot that are different from slot

5.2. Results

Due to the limited space, we only present the results with , which are more representative with a larger load balancing solution space and a more stringent requirement on rule aggregation, and omit the similar results of servers. Figure 5 summarizes the load balancing performance of Tigris under different numbers of flows and ratios of continuing flows. From Figure 5(a), we see that Tigris achieves a stable, small load variance, i.e., between 1000 MB and 2500 MB, and the competitive ratio of the Tigris algorithm in all settings is close to 1, as shown in Figure 5(b). This is consistent with our theoretical finding in Proposition 2. Taking the case of 8000 flows as an example, we also compare the load variance of Tigris with that of ECMP in Figure 5(c). We see that the load variance of ECMP is between 2-4x higher than Tigris. These observations demonstrate the cumbersomeness of hash-based load balancing solutions in adapting to the dynamics of flow statistics, the necessity for developing a per-flow-based load balancing system, and the efficiency of Tigris in generating dynamic load balancing policies. We then plot the ratio of disrupted flows of ECMP and Tigris in Figure 5(d). We observe that Tigris achieves a 0 disruption ratio of continuing flows in all cases of evaluations, while ECMP has a close-to-100% disruption ratio of continuing flows in all cases. This is because continuing flows are randomly distributed across all flows with random current servers, but ECMP simply divides the flow space evenly to different servers. This huge difference in the disruption ratio between Tigris and ECMP demonstrates the efficacy of Tigris for disruption-resistant load balancing.

We then study the capability of Tigris in generating highly compact flow rule sets in Figure 6. We see from Figure 6(a) that in most cases, Tigris is able to compute an aggregated rule set of less than 400 rules. Even in the worst case with 2000 continuing flows and a total of 8000 flows, Tigris yields an aggregated rule set of less than 800 rules. Figure 6(b) shows that Tigris achieves a high rule compression ratio, i.e., between 8 and 52. These observations show that Tigris is capable of computing a highly compact flow rule set that fits into typical commodity switches without the need to truncate any rules and hence achieves efficient utilization of the limited TCAM resource on commodity switches. Taking the case of 8000 flows as an example, we also compare the compression ratio of Tigris with that of TCAM Razor in Figure 6(c). We see that when the ratio of continuing flows is small, i.e., 0.05, Tigris and TCAM Razor give almost the same compression ratio. As the ratio of continuing flows increases, Tigris outperforms TCAM Razor by yielding a higher rule compression ratio. This is because the design of Tigris shifts a small number of flows during rebalancing and leverages the cached rule aggregation results from the previous time slot for better rule aggregation performance, while TCAM Razor has to start from the per-flow rule set for a clean-slate aggregation. Furthermore, we plot the rule update ratio of Tigris over from to for the setting of flows and in Figure 6(d). We observe that Tigris yields a less than 5% rule update ratio, implying an extremely low control-path update cost of Tigris.