Abstract
Network survivability—the ability to maintain operation when one or a few network components fail—is indispensable for presentday networks. In this paper, we characterize three main components in establishing network survivability for an existing network, namely, (1) determining network connectivity, (2) augmenting the network, and (3) finding disjoint paths. We present a concise overview of network survivability algorithms, where we focus on presenting a few polynomialtime algorithms that could be implemented by practitioners and give references to more involved algorithms.
1. Introduction
Given the presentday importance of communications systems and infrastructures in general, networks should be designed and operated in such a way that failures can be mitigated. Network nodes and/or links might for instance fail due to malicious attacks, natural disasters, unintentional cable cuts, planned maintenance, equipment malfunctioning, and so forth. Resilient, fault tolerant, survivable, reliable, robust, and dependable, are different terms that have been used by the networking community to capture the ability of a communications system to maintain operation when confronted with network failures. Unfortunately, the terminology has overlapping meanings or contains ambiguities, as pointed out by AlKuwaiti et al. [1]. In this paper, we will use the term survivable networks to refer to networks that, when a component fails, may “survive” by finding alternative paths that circumvent the failed component. Three ingredients are needed to reach survivability.(1) Network connectivity, that is, the network should be well connected (connectivity properties are discussed in Section 1.1).(2) Network augmentation, that is, new links may need to be added to increase the connectivity of a network.(3) Path protection, that is, a procedure to find alternative paths in case of failures.
These three ingredients will be explained in the following sections.
1.1. Network Connectivity
A network is often represented as a graph , where is the set of nodes (which for instance represent routers) and is the set of links (which for instance represent optical fiber lines or radio channels). Links may be characterized by weights representing for instance their capacity, delay, length, cost, and/or failure probability. A graph is said to be connected if there exists a path between each pair of nodes in the graph, else the graph is said to be disconnected. In the context of survivability, the notion of connectivity may be further specified as connectivity, where at least disjoint paths exist between each pair of nodes. Depending on whether these paths are node or link disjoint, we may discriminate between node and link connectivity. The link connectivity of a graph is the smallest number of links whose removal disconnects . Correspondingly, the node connectivity of a graph is the smallest number of nodes whose removal disconnects .
In 1927, Menger provided a theorem [2]—in German—that could be interpreted as follows.
Theorem 1 (Menger's theorem). The maximum number of link/nodedisjoint paths between A and B is equal to the minimum number of links/nodes that would separate A and B.
Menger's theorem clearly relates to the link/node connectivity of a graph, in the sense that a link/nodeconnected graph has at least link/nodedisjoint paths between any pair of nodes in the graph. The minimum number of links/nodes separating two nodes or sets of nodes is referred to as a minimum cut. In order to assess the link/node connectivity of a network, we therefore need to find its minimum cut.
A somewhat less intuitive notion of connectivity stems from the spectrum of the Laplacian matrix of a graph and is denoted as algebraic connectivity. The algebraic connectivity was introduced by Fiedler in 1973 [3] and is defined as follows.
Definition 2 (algebraic connectivity). The algebraic connectivity equals the value of the second smallest eigenvalue of , where is the Laplacian matrix , with an adjacency matrix with elements if there is a link between nodes and , else , and an diagonal matrix with the degree of node .
The algebraic connectivity has many interesting properties that characterize how strongly a graph is connected (e.g., see [4, 5]). Moreover, the multiplicity of the smallest eigenvalue (of value 0) of the Laplacian is equal to the number of components in the graph . Hence, if the algebraic connectivity is larger than 0, the network is connected, else the algebraic connectivity is 0, and the network is disconnected. We have that where is the minimum degree in the network. For ease of notation, when is not specified, we use , , , and .
Connectivity properties, may be less obvious when applied to multilayered networks [6, 7], like IP over WDM networks, where a 2linkconnected IP network operated on top of an optical WDM network, with multiple IP links sharing (e.g., groomed on) the same WDM link, could still be disconnected by a singlelink failure at the optical layer.
In probabilistic networks, links and/or nodes are available with a certain probability , which is often computed as , with the mean time to failure of and the mean repair time of . Often the term network availability is used to denote the probability that the network is connected (e.g., see [8]). When the node probabilities are all one and all the link probabilities are independent and of equal value , then a reliability polynomial (a special case of the Tutte polynomial, e.g., see [9]) is a polynomial function in that gives the probability that the network remains connected after its links fail with probability .
1.2. Network Augmentation
The outcome of testing for network connectivity could be that the network is not sufficiently robust (connected). Possibly, rewiring the (overlay) network could improve its robustness properties [10]. However, this is more involved when applied to the physical network, and improving network performance or network robustness is therefore often established by adding new links and possibly also nodes to the network. Adding links or nodes can be costly (which could be reflected by link/node weights), and the new links/nodes should therefore be placed wisely, such that the desired network property is obtained with the fewest amount of links/nodes or such that the addition of a fixed amount of links/nodes maximizes the desired network property. This class of problems is referred to as (network) augmentation problems, and within this class the problems only differ in their objectives. For instance, connectivity is an important property in the context of network robustness, and reaching it through link additions might be one such objective. The alternative objective of algebraic connectivity augmentation leads to an NPhard problem [11]. Similarly, adding a minimum amount of links to make a graph chordal is also NPhard [12] (a graph is chordal if each of its cycles of four or more nodes has a link connecting two nonadjacent nodes in the cycle).
1.3. Path Protection
Network protocols like OSPF are deployed in the internet to obtain a correct view of the topology and in case of changes (like the failure of a link) to converge the routing towards the new (perturbed) situation. Unfortunately, this process is not fast, and applications may still face unacceptable disruptions in performance. In conjunction with MPLS, an MPLS fast reroute mechanism can be used that, as the name suggests, provides the ability to switch over in subsecond time from a failed primary path to an alternate (backup) path. This fast reroute mechanism is specified in RFC 4090 [13], May 2005, and has already been implemented by several vendors. The concept has also been extended to pure IP networks and is referred to as IP fast reroute [14]. RFC 4090 defines RSVPTE extensions to establish backup labelswitched path (LSP) tunnels for local repair of LSP tunnels. The backup path can either be configured to protect against a link or a node failure. Since the backup paths are precomputed, no time is lost in computing backup paths or performing signalling in the event of a failure. The fast reroute mechanism as described in RFC 4090 assumes that MPLS (primary and backup) paths are computed and explicitly routed by the network operator. Hence, there is a strong need for efficient algorithms to compute disjoint paths.
Depending on whether backup paths are computed before or after a failure of the primary path, survivability techniques can be broadly classified into restoration or protection techniques.(i) Protection scheme: protection is a proactive scheme, where backup paths are precomputed and reserved in advance. In 1 : 1 protection, traffic is rerouted along the backup path upon the failure of the primary path. In 1+1 protection, the data is duplicated and sent concurrently over the primary and backup paths.(ii) Restoration scheme: restoration is a reactive mechanism that handles a failure after it occurs. Thus, the backup path is not known a priori. Instead, a backup path is computed only after the failure in the primary path is sensed.
In general, protection has a shorter recovery time since the backup path is precomputed, but it is less efficient in terms of capacity utilization and less flexible. Restoration, on the other hand, provides increased flexibility and efficient resource utilization, but it may take a longer time for recovery, and there is no guarantee that a backup path will be found. As a compromise between the two schemes, Banner and Orda [15] considered designing a lowcapacity backup network (using spare capacity or by installing new resources) that is fully provisioned to reroute traffic on the primary network in case of a failure. The backup network itself is not used to transport “primary” traffic. Backup networks with specific topological features have also been addressed in the literature, for instance protection [16] and preconfigured [17] cycles or redundant trees [18].
Depending on how rerouting is done after a failure in the primary path, there are three categories of survivability techniques.(i) Pathbased protection/restoration: in pathbased protection, a link or nodedisjoint backup path is precomputed and takes over when the primary path fails. In pathbased restoration, a new path is computed between the source and destination nodes of the failed path. If such a backup path cannot be found, the request is blocked.(ii) Linkbased protection/restoration: in linkbased protection, each link is preassigned a local route that is used when the link fails, and in linkbased restoration, the objective is to compute a detour between the two ends of the failed link for all paths that are using the link. Since linkbased protection/restoration requires signaling only between the two ends of the failed link, it has a smaller recovery time than pathbased protection/restoration, which requires endtoend signaling between the source and destination nodes.(iii) Segmentbased protection/restoration: the segmentbased scheme (e.g., see [19]) is a compromise between pathbased and linkbased schemes. Thus, in segmentbased protection, backup routes are precomputed for segments of the primary path. In segmentbased restoration, a detour of the segment containing the failed link is computed following a failure.
Depending on whether sharing of resources is allowed among backup paths, protection schemes can be of two types:(i) Dedicated protection: in this scheme, resources (e.g., links, wavelength channels, etc.) are not shared among backup paths and are exclusively reserved for a given path request.(ii) Shared protection: in this scheme, backup paths may share resources as long as their primary paths do not share links. In : protection, backup paths are used to protect primary paths. The shared scheme provides a better resource utilization; however, it is more complicated and requires more information, such as the shareability of each link.
In general, path protection requires less capacity than link protection, while shared protection requires less capacity than dedicated protection. However, path protection is more vulnerable to multiple link failures than link protection, and so is shared protection compared to dedicated protection.
1.4. Paper Outline and Objective
The remainder of this paper is structured as follows. In Section 2, we give an overview of several methods for determining the connectivity properties of a network. In case a network is found to be insufficiently connected from a survivability perspective, links may have to be added to increase the connectivity. In Section 3, we list key results in network connectivity augmentation. Once a network is designed to withstand some failures, proper path protection/restoration schemes should be in place that can quickly defer traffic to alternate routes in case of a failure. In Section 4, we survey work on finding disjoint paths in a network. We conclude in Section 5.
Throughout the paper, the objective is not to list and explain all the relevant algorithms. Rather, we aim to briefly explain some fundamental concepts and some polynomialtime algorithms that could easily be deployed by practitioners or which can be (and have been) used as building blocks for more advanced algorithms, and to provide pointers to further reading.
2. Determining Network Connectivity
In Section 1.1, we indicated that Menger's theorem implies that finding a minimum cut corresponds to finding the connectivity of a network. In this section, we will look further at finding cuts in a network.
Definition 3 (link (edge) cut). A link cut refers to a set of links whose removal separates the graph into two disjoint subgraphs, and where all links in the removed cutset have an endpoint in both subgraphs.
The two subgraphs need not be connected themselves.
Definition 4 (node (vertex) cut). A node cut refers to a set of nodes whose removal separates the graph into two disjoint subgraphs, and where all nodes in the removed cutset have at least one adjacent link to both subgraphs.
Definition 5 (minimum link/node cut). A minimum cut is a cut whose cardinality is not larger than that of any other cut in the network.
Definitions for a cut also have a variant in which a source node and a terminating node need to be separated.
Definition 6 ( cut). An cut refers to a cut that separates two nodes and in the graph such that both belong to different subgraphs.
Often, when referring to a cut, a link cut is meant. In the remainder of this paper, we will use the same convention and only specify the type of cut for node cuts.
Definition 7 (maximum cut). A maximum cut is a cut whose cardinality is not exceeded by that of any other cut in the network.
Definition 8 (sparsest cut). The sparsest cut (sometimes also referred to as the (Cheeger) isoperimetric constant) is a cut for which the ratio of the number of links in the cutset divided by the number of nodes in the smaller subgraph is not larger than that of any other cut in the network.
Finding a maximum or sparsest cut is a hard problem (the maximumcut problem is APXhard [20] and the sparsestcut problem is NPhard [21, 22]), but fortunately a minimum cut, and consequently the network's connectivity, can be computed in polynomial time as will be indicated below. The algebraic connectivity could be used to approximate the sparsest cut as [4, 21]. Dinh et al. [23] investigated the notion of pairwise connectivity (the number of connected pairs, which bears similarities to the sparsestcut problem), and proved that finding the smallest set of nodes/links whose removal degrades the pairwise connectivity to certain degree is NPcomplete.
2.1. Determining Link Connectivity
In the celebrated paper from Ford and Fulkerson [24] (and independently by Elias et al. [25]) a maximum flow from a source to a terminal in a network, where the links have a given capacity, is shown to be equal to the minimumweight  link cut in that network, where the weight of the cut is the sum of the capacities of the links in the cutset; the socalled maxflow mincut theorem. By using a maxflow algorithm and setting the capacity of all links to 1, one can therefore compute the minimum  link cut, or the minimum link cut when repeated over all possible  pairs. It is not our goal to overview all maximumflow algorithms (an excellent discourse of the subject is presented in the book by Ahuja et al. [26]), but we will present Dinitz's algorithm, which can be used to determine the minimum  link cut in time. We will subsequently present the algorithm of Matula for determining the minimum link cut in time.
2.1.1. Dinitz' Algorithm
Dinitz' algorithm, published in 1970 by Yefim Dinitz, was the first maximumflow algorithm to run in polynomial time (contrary to the pseudopolynomial running time of the FordFulkerson algorithm [24]). The algorithm is sometimes referred to as Dinic's or Dinits' algorithm, and also different variants are known. A historical perspective of the different variants is presented by Dinitz himself in [27]. In order to describe Dinitz' algorithm, as presented in Algorithm 1, some definitions are given.

Definition 9. The residual capacity of a link is interpreted in two directions as follows: where the flow over a link cannot exceed the capacity of that link.
Definition 10. The residual graph of is the graph in which a directed link exists if .
Definition 11. A blocking flow is an  flow such that any other  flow would have to traverse a link already saturated by .
A blocking flow could be obtained by repeatedly finding (via DepthFirstSearch [28]) an augmenting flow along an  path (or pruning the path from the graph in unitcapacity networks). In unitcapacity networks, the algorithm runs in , which therefore also is the time complexity to determine a minimum  link cut with Dinitz' algorithm (for unit node capacities, a complexity of can be obtained [29]).
For further reference, in Table 1, we present some key achievements in computing minimum  link cuts.
2.1.2. Matula's Algorithm
In this section, we describe the algorithm from Matula [43] for determining the link connectivity of an undirected network. Matula's algorithm is based on the following lemma.
Lemma 12. Let be a graph with a minimum cut of size that partitions the graph into two subgraphs and , then any dominating set of contains nodes of both and (a dominating set is a subset of the nodes in , such that every node in is either in or adjacent to a node in ).
Proof. For subgraph , , holds that the sum of the nodal degrees in is bounded by The upper bound occurs if all nodes in are connected to each other and some of the nodes have a link that is part of the cutset. The lower bound stems from each node having a degree larger or equal than the minimum degree . From the bounds in (3), we can derive that Since is assumed, and consequently both terms on the lefthand side cannot be smaller than 1. Hence, , which means that, under the assumption that , there is at least one node in that does not have a neighbor in (and vice versa). In other words, any dominating set of should contain nodes of both and .
The algorithm of Matula (see Algorithm 2) starts with a node of minimum degree (e.g., node in ) and gradually builds a dominating set by adding nodes not yet part of or adjacent to the growing set. Since at one point a node, for example , from needs to be added, keeping track of the minimum cut between newly added dominating nodes, and will result in finding the overall minimum cut. The algorithm is presented below.

In the algorithm of Matula, an augmenting path is a path in the residual network, where a residual network is the network that remains after pruning the links of a previous augmenting path. There are no 1hop paths from to , because then . If has neighbors that belong to , then there exist 2hop paths from to , for which either the first hop from to or the second hop from to is part of the minimum cut. These paths form the first augmenting paths, after which remains. These remaining augmenting paths can be found in time each and since there are at most such paths, the complexity of the algorithm is bounded by . Finally, if , then the initialization guarantees that that value would be found.
For directed multigraphs, Shiloach [44] provided a theorem that is stronger than Menger's theorem, namely.
Theorem 13. Let be a directed linkconnected multigraph, then for all , (not necessarily distinct) there exist linkdisjoint paths from to for .
We refer to Mansour and Schieber [45] for an time algorithm for determining the link connectivity in directed networks.
For further reference, in Table 2 we present some key achievements in computing minimum link cuts.
2.2. Determining Node Connectivity
Maximumflow algorithms can also be used to determine the node connectivity, as demonstrated by Dantzig and Fulkerson [46] (and also discussed in [47]), by transforming the undirected graph to a directed graph as follows.
For every node place two nodes and in and connect them via a directed link , using the convention that the link starts at and ends at . For every undirected link place directed links and in . All links are assigned unit capacity.
The  node connectivity in can be computed by finding a maximum flow from to in . This can be seen as follows. Assume that there are nodedisjoint paths between and , then there are also corresponding nodedisjoint paths from to in . Since each link has unit capacity, there thus exists a flow of at least . Since each link entering a node has to traverse a single unitcapacity link at most one unit of flow can pass through a node, which corresponds to a nodedisjoint path. Since there are only nodedisjoint paths, the maximum flow in is equal to .
By using Dinitz' algorithm, one may compute the  node connectivity in time, and by using the algorithm of Mansour and Schieber [45], the node connectivity can be determined in time. We refer to Henzinger et al. [48] and Gabow [49] for more advanced algorithms to compute the node connectivity in directed and undirected graphs and to Yoshida and Ito [50] for a nodeconnectivity property testing algorithm (in property testing the objective is to decide, with high probability, whether the input graph is close to having a certain property. These algorithms typically run in sublinear time).
3. Network Connectivity Augmentation
In the previous section, we have provided an overview of several algorithms to determine the connectivity of a network. In this section, we will overview several network augmentation algorithms that can be deployed to increase the connectivity (or some other metric) of a network by adding links. Network augmentation problems seem closely related to network deletion problems (e.g., see [51]), where the objective is to remove links in order to reach a certain property. However, there may be significant differences in terms of complexity. For instance, finding a minimumweight set of links to cut a linkconnected graph such that its connectivity is reduced to is solvable in polynomial time (as discussed in Section 2.1), while adding a minimumweight set of links to increase a disconnected graph to linkconnectivity is NPcomplete as shown in Section 3.1. When both link deletions and link additions are permitted, we speak of link modification problems, for example, see [52].
3.1. Link Connectivity Augmentation
In this section, we consider the following link augmentation problem.
Problem 1 (the link connectivity augmentation (LCA) problem). Given a graph consisting of nodes and links, link connectivity and an integer , the link connectivity augmentation problem is to add a minimumweight set of links, such that the link connectivity of the graph is increased from to .
We can discriminate several variants based on the graph (directed, simple, planar, etc.) or if link weights are used or not (i.e., in the unweighted case all links have weight 1). Let us start with the weighted link connectivity augmentation problem.
Theorem 14. The weighted LCA problem is NPhard.
We will use the proof due to Frederickson and JáJá [53] to show that the 3dimensional matching (3DM) problem is reducible to the weighted LCA problem (an earlier proof has been provided by Eswaran and Tarjan [54], but since it aims to augment a network without any links to one that is 2 connected and has links (a cycle), it has the characteristics of a design rather than an augmentation problem).
Problem 2 (3dimensional matching (3DM)). Given a set of triplets, where , , and are disjoint sets of elements each, is there a matching subset that contains all elements, such that , and thus no two elements of agree in any coordinate?
Proof. For a 3DM instance , with , , , and , we create the graph of the corresponding instance of the weighted LCA problem as follows:
The graph as constructed above forms a tree and therefore is 1 connected. Links from the complement of can be used to augment the graph to 2link connectivity. The weights of the links in are for , and for the remaining links in , the weight is 2.
contains a matching if and only if there is a set of weight such that is 2link connected. Assuming exists, then adding links and for each triple will establish the (2connected) cycle . Since , the weight of these added links is . The remaining nodes that are not yet on a cycle are the nodes and belonging to . These nodes will be directly connected, thereby creating the cycle . In total, additional links will be added, leading to a total weight of links that have been added of . Since the graph is a tree with leaves and the minimum link weight is 1, a network augmentation solution of weight is indeed the lowest possible. It remains to demonstrate that an augmentation of weight will lead to a valid matching . Since, in a solution of weight , each leaf will be connected by precisely one link from , a link will prevent adding a link , and therefore also link must be added. The corresponding triple was not augmented before and is, therefore, part of a valid matching. The remaining links do not contribute to the matching.
Frederickson and JáJá also used the construction of this proof to prove that the nodeconnectivity and strongconnectivity variants of the weighted LCA problem are NPhard (in a directed graph strong connectivity is used, which means that there is a directed path from each node to every other node in the graph). We remark that the unweighted simple graph preserving LCA problem was claimed to be NPhard by Jordán (reproduced in [55]) by using a reduction to one of the problems treated by Frederickson and JáJá. However, Jordán appears to be using an unweighted problem of which only (in the paper [53] referred to) the weighted version is proved to be NPhard, and it is therefore not clear whether the unweighted problem is indeed NPhard. For fixed , the unweighted simple graph preserving problem can be solved in polynomial time [55].
Eswaran and Tarjan [54] were the first to report on augmentation problems. They considered augmenting a network towards either 2link connectivity, 2node connectivity or strong connectivity, and provided for each unweighted problem variant an algorithm of complexity (Raghavan [56] pointed out an error in the strong connectivity algorithm and provided a fix for it). Since most protection schemes only focus on protecting against one single failure at a time (by finding two disjoint paths as discussed in Section 4), we will first present the 2linkconnectivity augmentation algorithm of Eswaran and Tarjan [54].
3.1.1. Eswaran and Tarjan Algorithm
The algorithm of Eswaran and Tarjan as presented in Algorithm 6 makes use of preorder (Algorithm 4) and postorder (Algorithm 3) numbering of nodes in a tree (the label of node denotes its number as a result of the ordering) and a procedure (Algorithm 5) to find 2linkconnected components.




We have assumed that the initial graph was connected. Eswaran and Tarjan's algorithm also allows to start with disconnected graphs, by augmenting the forest of condensed 2linkconnected components to a tree.
3.1.2. Cactus Representation of All Minimum Cuts
The algorithm of Eswaran and Tarjan uses a tree representation of all the 2linkconnected components in , which is subsequently used to find a proper augmentation. By using a socalled cactus representation of all minimum cuts in a network, a similar strategy could be deployed to augment a network to a connectivity >2. A graph is defined to be a cactus graph if any two distinct simple cycles in have at most one node in common (or equivalently, any link of belongs to at most one cycle). In this section, we will present the cactus representation.
We will use the notation to represent a set of links that connect nodes in to nodes in . The linkset , with , refers to a cutset of links whose removal separates the graph into two subgraphs of nodes and nodes . Dinitz et al. [58] have proposed a cactus structure to represent all the minimum cuts of a graph (possibly with parallel links) and have shown that there can be at most such minimum cuts. The structure possesses the following properties.(1) is a cactus graph, that is, any two distinct simple cycles of have at most one node in common.(2) Each proper cut in is a minimum cut (a cut is called proper if the removal of the links in that cut partitions the graph in precisely two subgraphs. A minimum cut is always proper).(3) For any link that is part of a cycle in the weight , else .(4), where represents the minimumweight link cut of .
A cactus graph without cycles is a tree, and if is odd, then is a tree. Cycles in the cactus graph reflect socalled crossing cuts in .
Definition 15. Two cuts and , with , are crossing cuts, if all four sets , , , and are nonempty.
Karzanov and Timofeev [59] have outlined an algorithm to compute that consists of two parts: (1) computing all minimum cuts and (2) constructing the corresponding cactus representation. However, Nagamochi and Kameda [60] reported that their cactus representation may not be unique. We assume that all minimum cuts are already known (e.g., by computing minimum  cuts between all possible sourcedestination pairs, by the GomoryHu tree algorithm [61], or with Matula's algorithm as explained in [62]) and focus on explaining—by following the description of Fleischer [63]—how to build a unique cactus graph for the graph .
Karzanov and Timofeev [59] observe that for a link , any two minimum cuts and that separate and are nested, which means that (or vice versa). If we assign the nodes of a preorder labelling , such that node is adjacent to a node in the set , and define to be the set of minimum cuts that contain but not , then it follows that all cuts in are noncrossing for each . For instance, consider a 4node ring , where three minimum cuts separate nodes and , namely, , , and . Clearly , which allows us to represent them as a path graph . The three possibilities to cut this chain correspond to the three minimum cuts that separate and in the ring graph. For each there is a corresponding path graph . These path graphs are used to create a single cactus graph. We proceed to present the algorithm as described by Fleischer [63] (for an alternative description we refer to [64]), see Algorithm 7. We define to be the function that maps nodes of to nodes in , and we define to be the graph with nodes contracted to a single node (and any resulting selfloops removed). Let be the smallest graph that has a minimum cut of value , where corresponds to the largest index of such a graph. is a path graph. The algorithm builds from until is obtained.

Figure 1 gives an example of the execution of the algorithm on a 4node ring.
3.1.3. NaorGusfieldMartel Algorithm
Naor et al. [65] have proposed a polynomialtime algorithm to augment the link connectivity of a graph from to , by adding the smallest number of (possibly parallel) links. The authors first demonstrate how to augment the link connectivity by one in time, after which it is explained how executing this algorithm times could optimally augment the graph towards link connectivity (Cheng and Jordán [66] further discuss link connectivity augmentation by adding one link at a time). In practice, as a result of the costs in network augmentation, a network's connectivity is likely not augmented with . We will therefore only present the algorithm to augment the link connectivity by one, see Algorithm 9, and refer to [65] for the extended algorithm. The algorithm uses the cactus structure that was presented in the previous section to represent all the minimum cuts of a graph . The algorithm is similar in approach to the EswaranTarjan algorithm, since a cactus representation of a 1connected network is the tree representation used by Eswaran and Tarjan, and the algorithm connects “leafs” as Eswaran and Tarjan have done. Naor et al., however, use a different definition of leafs for cactus graphs.
Definition 16 (cactus leaf). A node in a cactus representation is a cactus leaf if it has degree 1 or is a cycle node of degree 2.
Similarly to a tree, if the cactus has leafs, then links need to be added to increase the connectivity by 1.
The algorithm uses a DepthFirstSearchlike procedure, see Algorithm 8, to label the nodes of the cactus graph.


For further reference, in Table 3, we present some key achievements in augmenting link connectivity in unweighted graphs.
Splitting off a pair of links and refers to deleting those links and adding a new link . A pair of links is said to be splittable if the  mincut values remain unaffected after splitting off the pair of links and is considered in the context of Mader's theorem.
Theorem 17 (Mader [68, 69]). Let be a connected undirected graph where for some node the degree , and the removal of one of the adjacent links of does not disconnect the graph, then has a pair of splittable links.
Mader's theorem has been used by for instance Cai and Sun [75] and Frank [77] in developing network augmentation algorithms. The algorithms (as already outlined in 1976 by Plesník [84]) attach a new node to the graph with parallel links between and all other nodes in the graph and subsequently proceed to split off splittable links.
As indicated by Theorem 14, the weighted LCA problem is NPcomplete for both undirected graphs and directed graphs. Frederickson and JáJá [53] provided an algorithm to make a weighted graph 2 connected. The algorithm is a 2approximation algorithm if the starting graph is connected, else it is a 3approximation algorithm. Khuller and Thurimella [85] proposed a 2approximation algorithm for increasing the connectivity of a weighted undirected graph to that has a complexity of . Taoka et al. [86] compare via simulations several approximation and heuristic algorithms, including their own maximumweightmatchingbased algorithm.
Under specific conditions, the weighted LCA problem may be polynomially solvable, as shown by Frank [77] for the case that link weights are derived from node weights.
3.2. Node Connectivity Augmentation
In this section, we consider the following node augmentation problem.
Problem 3. The Node Connectivity Augmentation (NCA) problem. Given a graph consisting of nodes and links, node connectivity and an integer , the node connectivity augmentation problem is to add a minimumweight set of links, such that the node connectivity of the graph is increased from to .
Like for the LCA problem.
Theorem 18. The weighted NCA problem is NPhard.
Proof. The proof of Theorem 14 also applies here.
The unweighted undirected NCA problem has received most attention. The specific case of making a graph 2node connected was treated by Eswaran and Tarjan [54], Rosenthal and Goldner [87] (a correction to this algorithm has been made by Hsu and Ramachandran [88]). Watanabe and Nakamura [89] and Jordán [90] solved the case for achieving 3nodeconnectivity, while Hsu [91] developed an algorithm to upgrade a 3node connected graph to a 4nodeconnected one. Increasing the connectivity of a nodeconnected graph (where can be any integer) by 1 was studied by many researchers [90, 92–97], since it was long unknown whether the problem was polynomially solvable. In 2010, Végh [98] provided a polynomialtime algorithm to increase the connectivity of any nodeconnected unweighted undirected graph by .
Augmenting the node connectivity of directed graphs has been treated by Frank and Jordán [99]. They found a minmax formula that finds the minimum number of required new links to make an unweighted digraph node connected. Frank and Végh [100] developed a polynomialtime algorithm to make a nodeconnected directed graph node connected.
As the weighted NCA problem is NPcomplete, special cases have been considered [88–91, 98, 101]. Most of these articles discuss specific connectivity targets ( and/or have specific values) or specific topologies, like trees. Also heuristic and approximation algorithms have been proposed [85, 102–107].
4. Disjoint Paths
When a network is (made to be) robust, algorithms should be in place that can find link or nodedisjoint paths to protect against a link or node failure. There can be several objectives associated with finding link or nodedisjoint paths.
Problem 4. Given a graph , where and , a weight and a capacity associated with each link , a source node and a terminal node , and two bounds and find a pair of disjoint paths from to such as the following.
MinSum Disjoint Paths Problem
The total weight of the pair of disjoint paths is minimized.
MinMax Disjoint Paths Problem
The maximum path weight of the two disjoint paths is minimized.
MinMin Disjoint Paths Problem
The smallest path weight of the two disjoint paths is minimized.
Bounded Disjoint Paths Problem
The weight of the primary path should be less than or equal to , and the weight of the backup path should be less than or equal to .
Widest Disjoint Paths Problem
The smallest capacity over all links in the two paths is maximized.
The most common and simpler one is the minsum disjoint paths problem. If the two paths are used simultaneously for loadbalancing purposes (or protection), then the minmax objective is desirable. Unfortunately, the minmax disjoint paths problem is NPhard [108]. If failures are expected to occur only sporadically (and in case of 1 : 1 protection), then it may be desirable to minimize the weight of the primary (shorter) path (minmin objective), which also leads to an NPhard problem [109]. The minmax and minmin disjoint paths problems could be considered as extreme cases of the bounded disjoint paths problem, which was shown to be NPhard [110] and later proven to be APXhard by Bley [111] (the graph structure referred to as lobe that was used by Itai et al. [110] to prove NPcompleteness has since often been used to prove that other disjoint paths problems are NPcomplete, e.g., [112–114]). Finding widest disjoint paths can easily be done by pruning “lowcapacity” links from the graph and finding disjoint paths. When the capacity requirements for the primary and backup paths are different, disjoint paths problems usually become NPcomplete [115].
Beshir and Kuipers [116] investigated the minsum disjoint paths problem with minmax, minmin, bounded, and widest, as secondary objectives in case multiple minsum paths exist between and . From these variants, only the widest minsum linkdisjoint paths problem is not NPhard.
Li et al. [112] studied the minsum disjoint paths problem, where the linkweight functions are different for the primary and backup paths and showed that this problem is hard to approximate. Bhatia [117] demonstrated that the problem remains hard to approximate in the case that the weights for the links of the backup path are a fraction of the normal link weights (for the primary path).
Sherali et al. [118] investigated the timedependent minsum disjoint paths problem, where the link weights are timedependent. They proved that the problem is NPhard, even if only one link is timedependent and all other links are static.
4.1. MinSum Disjoint Paths
Finding minsum disjoint paths is equivalent to finding a minimumcost flow in unitcapacity networks [26]: a minimumcost flow of will traverse disjoint paths. In fact, Suurballe's algorithm, which is most often cited as an algorithm to compute two disjoint paths, is an algorithm that uses augmenting paths, like in several maxflow algorithms. The original Suurballe algorithm as presented in [119] allows to compute node (or link) disjoint paths between a single sourcedestination pair, by using shortest path computations. Later, this approach was used by Suurballe and Tarjan [120] to find two link (or node) disjoint paths from a source to all other nodes in the network (i.e., sourcedestination pairs), by using only two shortestpaths computations, that is, in time. Both papers focus on directed networks, but can also be applied to undirected networks.
In directed networks, a linkdisjoint paths algorithm can be used to compute nodedisjoint paths, if we split each node into two nodes and , with a directed link , and the incoming links of connected to and the outgoing links of departing from .
In undirected networks, a linkdisjoint paths algorithm can be used to compute nodedisjoint paths by the transformation described in Section 2.2.
We will present the SuurballeTarjan algorithm, see Algorithm 10, for computing two linkdisjoint paths between and every other node in the network.

Instead of finding an augmenting path for each sourcedestination pair, Suurballe and Tarjan have found a way to combine these augmenting flow computations into two Dijkstralike shortestpaths computations. First a shortest paths tree is computed in line 1, and based on the computed shortest path lengths, the link weights are modified in line 2. This link weight modification was also used by Suurballe and is to assure that for all links, with equality if is in . In Suurballe's original algorithm the direction of the links on the shortest path from to was reversed, after which a shortest (augmenting) path in the newly modified graph was computed. In SuurballeTarjan's algorithm the links maintain their direction, but an additional parameter is used instead. The algorithm proceeds in a Dijkstralike fashion. Lines 3–6 correspond to the initialization of the smallest length from to found so far, its corresponding predecessor list and . The algorithm repeatedly extracts a node of minimum length (in line 8) and removes that node from the tree (in line 9). A slightly different relaxation procedure is used (lines 10–12). Upon termination of the algorithm, the disjoint paths between the source and a destination can be retrieved via the lists and with Algorithm 11.

TaftPlotkin et al. [121] extended the approach of Suurballe and Tarjan in two ways: (1) they return maximally disjoint paths, and (2) they also take bandwidth into account. Their algorithm, called MADSWIP, computes maximumbandwidth maximally disjoint paths and minimizes the total weight as a secondary objective. Consequently, by assigning all links equal bandwidth, the MADSWIP algorithm returns the minsum maximally disjoint paths.
For a distributed disjoint paths algorithm, we refer to the work of Ogier et al. [122] and Sidhu et al. [123].
Roskind and Tarjan [124] presented an algorithm for finding linkdisjoint spanning trees of minimum total weight. Xue et al. [125, 126] have considered quality of service and quality of protection issues in computing two disjoint trees (quality of Protection (QoP) as used by Xue et al. refers to the amount of link failures that can be survived. QoP sometimes is used to refer to probabilistic survivability, as discussed in the following section or protection differentiation as overviewed by Cholda et al. [127]). Ramasubramanian et al. [128] proposed a distributed algorithm for computing two disjoint trees. Guo et al. [129] considered finding two linkdisjoint paths subject to multiple QualityofService constraints.
4.2. Probabilistic Survivability
When two disjoint primary and backup paths are reserved for a connection, any failure on the primary path can be survived by using the backup path. The backup path therefore provides 100% survivability guarantee against a single failure. When no backup paths are available, that is, unprotected paths are used, then the communication along a path will fail if there is a failure on that path. Banner and Orda [130] have introduced the term survivable connection to denote a connection for which there is a probability that all its links are operational (the related notion of Quality of Protection (QoP), as defined by Gerstel and Sasaki [131], was argued to be difficult to apply to general networks). The previous two cases correspond to and , respectively. Banner and Orda proved that, under the singlelink failure model, at most two paths are needed to establish a survivable connection, if it exists. Based on this observation, they studied and proposed algorithms for several problem variants, namely establishing survivablebandwidth, most survivable, and widest survivable connections for 1 : 1 and protection architectures (the MADSWIP algorithm [121] can also be used to find the most survivable connection). The survivablebandwidth problem asks for a connection with survivability and bandwidth and solving it provides a foundation for solving the other problems. We will therefore discuss the solution proposed by Banner and Orda for the survivablebandwidth problem.
The approach by Banner and Orda to solve the survivablebandwidth problem is twofold. First, the graph is transformed, after which a minimumcost flow is found on the resulting graph. The graph transformation is depicted in Figure 2 and slightly differs for the 1 : 1 and cases. Clearly, if a link does not have sufficient spare capacity to accommodate the requested bandwidth , then it does not need to be considered further (Figure 2(a)). If, for 1 : 1 protection, , then there is sufficient bandwidth for both disjoint paths, since the backup path is only used after failure of the primary path. To allow for both paths to share that link, it is transformed to two links (Figure 2(b)). If the original link is only used by one path, then that link is protected, and hence the weight is assigned to the top link. If both paths have to use the original link, then the connection's survivability is affected by the failure probability of that link, which is why the weight is assigned to the lower link (the logarithm is used to transform a multiplicative metric to an additive metric). The same applies to the case, with the exception that the concurrent transmission of data over both paths requires twice the requested bandwidth. For the remaining range for protection (Figure 2(c)), it holds that only one of the paths can use that link, which is why there is no weight penalty.
(a)
(b)
(c)
In the transformed graph, a minimumcost flow of units corresponds to two maximally disjoint paths of each bandwidth. The minimumcost flow could, for instance, be found with the cyclecanceling algorithm of Goldberg and Tarjan [132], while the corresponding maximally disjoint paths could be returned via a flowdecomposition algorithm [26].
Luo et al. [133] studied the minsum survivable connection problem, where each link is characterized by a weight and a failure probability, and the problem is to find a connection of least weight and survivability ≥. Contrary to the minsum maximallydisjoint paths problem, this problem is NPhard, since it contains the NPhard restricted shortest paths problem (e.g., see [134]). Luo et al. proposed an ILP and two approximation algorithms for this problem. Chakrabarti and Manimaran [135] studied the minsum survivablebandwidth problem, for which they considered a segmentbased protection strategy.
She et al. [114] have considered the problem of finding two linkdisjoint paths, for which the probability that at least one of these paths will not fail is maximized. They refer to this problem as the maximumreliability (maxrel) linkdisjoint paths problem. The rationale behind this problem is to establish two disjoint paths that give 100% protection against a singlelink failure, while reducing the failure probability of the connection as much as possible when multiple failures may occur. Assuming that the linkfailure probabilities are independent, then the reliability of a connection (consisting of two linkdisjoint paths and ) is defined as , with , for . The maxrel linkdisjoint paths problem is proven to be NPcomplete. She et al. [114] evaluated two simple heuristic algorithms that both transform the link probabilities to link weights , with , for , and . Based on these weights, one heuristic finds a shortest path, prunes its links from the graphs, and finds a shortest path in the pruned graph. This is often referred to as an activepathfirst (APF) approach. The second heuristic uses Suurballe's algorithm to find two linkdisjoint paths. Contrary to the first heuristic, the second always returns linkdisjoint paths if they exist.
4.3. Multiple Failures
The singlelink failure model has been most often considered in the literature, but multiple failures may occur as follows. (i) Due to lengthy repair times of network equipment, there is a fairly long time span in which new failures could occur.(ii) In case of terrorist attacks, several targeted parts of the network could be damaged. With Suurballe's algorithm, link/nodedisjoint paths can be found to establish full protection against link/node failures.(iii) In layered networks, for instance IPoverWDM, one failure on the lowestlayer network, may cause multiple failures on higherlayer networks. Similarly, the links of a (singlelayered) network may share the same duct, in which case a damaging of the duct may damage all the links inside. These links are often said to belong to the same shared risk link group (SRLG) (the node variant SRNG also exists. When both nodes and links can belong to a shared risk group, the term Shared Risk Resource Group (SRRG) is used, e.g., see [136]). Finding two SRLGdisjoint paths—paths of which the links in one path may not share a risk link group with links from the other path—is an NPcomplete problem [137]. In specific cases, the SRLGdisjoint paths problem is polynomially solvable, as discussed by Bhandari [138], Datta and Somani [139], and Luo and Wang [140]. In those cases, for instance, when the links in a SRLG share the same endpoint, a graph transformation can be made that reflects the shared risk groups, and on which a simple linkdisjoint paths algorithm can be run. Lee et al. [141] have generalized the SRLG problem to include failure probabilities. In the deterministic SRLG scenario, when a SRLG fails (e.g., a cable in the physical network is cut) all higherlayer links that belong to that group fail. In the probabilistic SRLG (PSRLG) scenario, the links in that PSRLG fail with probability . If , for , then the problem of finding PSRLGdisjoint paths reduces to the NPcomplete problem of finding SRLGdisjoint paths. For SRLG types of problems, often an integer programming formulation is provided (e.g., [137, 142–144]) or an activepathfirst (APF) approach is used as a heuristic. Hu [137] provided a basic ILP formulation to return a minsum pair of SRLGdisjoint paths. Xu et al. [143] gave an ILP and an APF heuristic for the case of shared (backup paths) protection.(iv) Natural disasters may affect all nodes and links within a certain geographical area. Work on multilink geographic failures has mostly focused on determining the geographic maxflow and mincut values of a network under geographic failures of circular shape (e.g., Sen et al. [145], Agarwal et al. [146], and Neumayer et al. [147]). Trajanovski et al. [148] proved that, the problem of finding two regiondisjoint paths is NPhard, and they proposed a heuristic for it.
5. Conclusion
We have provided an overview of algorithms for network survivability. We have considered how to verify that a network has certain connectivity properties, how to augment an existing network to reach a given connectivity, and, lastly, how to find alternative paths in case network failures occur. Our focus has been on algorithms for general networks, although much work has also been done for specific networks, such as optical networks, where additional constraints like wavelength continuity and signal impairments induce an increased complexity, for example, see our work [149–151]. We have not discussed how to design a survivable network from scratch. Typically network design problems are hard to solve and involve many constraints, but since they only need to be solved sporadically, longer computation times are permitted. Predominantly, integer programming is used to design a network, as we have done in [152].
Acknowledgment
The author would like to thank Professor Piet Van Mieghem for his constructive comments on an earlier version of this paper.