Abstract
We introduce two new combinatorial optimization problems: the Maximum Spider Problem and the Spider Cover Problem; we study their approximability and illustrate their applications. In these problems we are given a directed graph , a distinguished vertex , and a family D of subsets of vertices. A spider centered at vertex s is a collection of arcdisjoint paths all starting at s but ending into pairwise distinct vertices. We say that a spider covers a subset of vertices X if at least one of the endpoints of the paths constituting the spider other than s belongs to X. In the Maximum Spider Problem the goal is to find a spider centered at s that covers the maximum number of elements of the family D. Conversely, the Spider Cover Problem consists of finding the minimum number of spiders centered at s that covers all subsets in D. We motivate the study of the Maximum Spider and Spider Cover Problems by pointing out a variety of applications. We show that a natural greedy algorithm gives a 2approximation algorithm for the Maximum Spider Problem and a approximation algorithm for the Spider Cover Problem.
1. Introduction
Given a digraph and a vertex , a spider centered at is a subgraph of consisting of arcdisjoint paths sharing the initial vertex and ending into pairwise distinct vertices. The vertex is called the center of the spider. The endpoints of the paths composing the spider —other than the center —are called the terminals of the spider. In other words, a spider is a subdivision of , where is the number of terminals. Given a spider , we say that reaches a vertex if is a terminal of ; we say that the spider covers a subset of vertices if reaches at least a vertex in .
In this paper we consider the approximability of the following problems.
Maximum Spider Problem (MSP)
We are given a digraph , a distinguished node , and a family of subsets of vertices. The objective is to find a spider centered at such that the number of subsets covered by is maximum among all possible spiders centered at .
We also consider the related minimization problem, where one wants to cover all the elements of .
Spider Cover Problem (SCP)
As before, we are given a digraph , a distinguished vertex , and a family of subsets of vertices. The goal is to find a minimum cardinality collection of spiders centered at such that each subset is covered by at least a spider in the collection.
1.1. Motivations
The Maximum Spider and the Spider Cover Problems are far reaching generalizations and unifications of several Maximum Coverage and Set Cover Problems which, in turn, are fundamental algorithmic and combinatorial problems that arise frequently in a variety of settings [3]. To start, recall that in the basic formulation of the Maximum Coverage Problem [3], one is given a ground set , a collection of sets , where each , for , and an integer . The goal is to find sets such that the cardinality of their union is maximum. To see that the Maximum Coverage Problem is a very particular case of the Maximum Spider Problem, let us consider the digraph of Figure 1, with node set . The vertex is connected to each of the nodes , and each is connected to every , for and . The family is defined as , where . One can see that the Maximum Spider Problem in is equivalent to the Maximum Coverage Problem on the original instance , , and . To that purpose, let us proceed as follows. Let be a spider in that covers a maximum number of subsets . Let be these subsets. By our definition of spider cover, the (at most ) terminals of in correspond to some , such that for any there exists for which . This implies that for any there exists such that , consequently and . Conversely, let , be a solution to the Maximum Coverage Problem on the original instance , and . Let . Consider now the spider in starting at and having terminal nodes equal to . By definition, spider covers at least the subsets .
Thus, the Maximum Coverage Problem corresponds to the Maximum Spider Problem in a very simple digraph . By allowing more flexibility in the structure of , one can describe many more combinatorial optimization problems in this framework. For instance, Chekuri and Kumar in [4] considered the following generalization of Maximum Coverage.
Maximum Coverage with Group Budget Constraints (MCG) (see [4])
We are given a ground set and a collection of subsets of . We are also given sets , each , with for , and integer bounds . A solution is a subset , such that and , for . The goal is to find a solution such that is maximized.
Before showing how MGC easily fits into our scenario, let us mention that the MGC problem itself was introduced and studied in [4] since it represents a useful generalization of several combinatorial optimization problems, like the multiple depot traveling repairmen problem with covering constraints [5] and the orienteering problem with time windows [6–8].
Given an instance of MCG, consider the digraph with vertex set There is an edge from to each . Moreover, there is a complete bipartite graph between and (with orientation of the edges going from the 's to the 's). Finally, there is a complete bipartite graph between the set and the set , for and, in case , there is a complete bipartite graph between and . As before, the family is defined as consisting of subsets of vertices , for each . Figure 2 below depicts the situation.
Again, it is not hard to see that MGC is equivalent to the Maximum Spider Problem in the graph . At this point it should be clear that by variating the structure of the graph between the vertex and the family of subsets , one can describe many more covering problems.
Just as the Maximum Spider Problem encompasses a variety of coverage problems formulated in term of maximization of the objective function, the related Spider Cover minimization problem includes particular cases variants and extensions of the wellknown Set Cover Problem. One of such an extension was considered in [4, 9, 10].
Set Cover with Group Budget (SCG)
We are given a ground set and a family of subsets of . The family is partitioned into subfamilies . The goal is to find an such that all elements of are covered by sets in , and is minimized.
Elkin and Kortsarz [9] studied the SCG problem as a preliminary tool for their multicasting algorithm in synchronous directed networks. Gargano et al. [10] studied the SCG problem in the context of multicasting in optical networks. Interestingly, Gargano et al. [10] also noticed that SCG naturally arises in airline scheduling problems [11]. We trust that the experienced reader can now appreciate the flexibility of our approach by checking that the SCG is equivalent to the Spider Cover problem in the graph shown in Figure 3. The family to cover is , where for each we have .
In general, we expect that the capability of our approach to easily describe and deal with diverse requirements in covering problems to be quite useful. In any case, it seems to provide a nice and unified view of many different questions.
1.2. Our Results in Comparison with Previous Work
To the best of our knowledge, the Maximum Spider and the Spider Cover Problems have not been considered before, apart from the different special cases mentioned in the previous section. Our results are the following. (1)We show that the greedy approach yields a 2approximation algorithm for the Maximum Spider Problem. (In this paper approximation ratios for both maximization problems and minimization problems will be greater than 1). It is remarkable that we achieve the same approximation ratio obtained in [4] for the Maximum Coverage with Group Budget Constraints, although our Maximum Spider Problem is much more general. Since the Maximum Spider Problem contains the classical Maximum Coverage Problem as particular case, from results of [12] it follows that it is hard to approximate within a factor of , unless TIME. In the paper [4] it is additionally proved that the approximation factor 2 is tight for their problem in the oracle model. Obviously, this tightness of analysis transfers also to our Maximum Spider Problem. (2)We give a greedy algorithm for the Spider Cover Problem with approximation ratio . Again, we match the results of [4, 9, 10], who obtained the same result in case the graph is the simple tree of Figure 3. Since the Maximum Spider Problems include the Set Cover problem as a particular case, from [12] one gets a factor for the hardness of its approximation, for any . We also observe that our algorithm for the Spider Cover Problem provides a approximation algorithm for the MulticastingtoGroups Problems considered in [10], extending the main result of the same paper from trees to general networks. The problem considered therein was to find a set of paths from a source node to at least one node in each subset of a set of groups and assignments of wavelengths to paths so that paths sharing a same physical link of the network are assigned different wavelengths. The goal is to minimize the number of wavelengths. It can be seen that the paths constituting the spiders covering the family , and an assignment of different wavelengths to paths in different spiders, give an admissible solution to the MulticastingtoGroup problem in general optical networks.
2. A Greedy Algorithm for the Maximum Spider Problem
In this section we will present a approximation greedy algorithm for the Maximum Spider Problem (MSP).
Given an instance of the MSP, where is a digraph, is a designated vertex in , and is a family of subsets of , we say that the subsets of vertices are reachable if there exists a spider in , with center in , such that each node is reached by such a spider. In other words, is reachable if there is a spider in whose set of terminals includes . For any set —not necessarily reachable—we define as the number of elements in covered by , that is, In terms of the function , our original objective is essentially that of finding a reachable set of maximum value .
For any , we define the covering improvement of over as
Definition 2.1. Given a reachable set we say that: (1)a node improves on if is reachable; (2)a node maximally improves on if , where the maximum is taken on all nodes that improve on ;(3)the set is maximal if no node improves on .
We can now describe the skeleton of our 2approximation algorithm. (We point out that the algorithm could also stop as soon as it finds a first node maximally improving on with the property that . However, we let MAX_SP generate a maximal set to make the analysis cleaner).
In the rest of this section we will show how to efficiently implement step 2. Of the above greedy algorithm and how to compute a spider centered at and with set of terminals , and we will also show that the number of sets in covered by the terminals in is at least half of the optimum number.
Let us first check that the algorithm is polynomial.
Lemma 2.2. The algorithm MAX_SP is polynomial.
Proof. In order to compute the node that maximally improves on we proceed as follows. First, for each we check whether is reachable, that is, whether there is a spider centered at and with set of terminals equal to . This can be done by constructing a flow network (For undefined terminology about flows in networks, see for example [13]) from , assigning the source node at , connecting all nodes in to a sink node , setting all flow capacities equal to , and by verifying whether or not in this flow network there exists a flow of value . This entire procedure can be performed clearly in polynomial time. Subsequently, among all 's for which is reachable, we compute the one that maximally improves on by using the identity . Finally, the spider that reaches the set ,—output of the algorithm MAX_SP—is computed from the executions of the maximum flow algorithm, and it consists of all the flow paths from to with assigned flow value equal to 1.
In order to show that Algorithm MAX_SP() is a 2approximation algorithm for the Maximum Spider Problem, we first need the following technical result.
Lemma 2.3. Let be an instance of the Maximum Spider Problem, and let denote the family of reachable subsets of . For any with there exists such that the set .
Proof. Consider two arbitrary sets , such that . Let denote a spider reaching , and let be a spider reaching . We will show that there exists a new spider , with terminals , where . Hence, we will get that .
Starting from , let us construct the flow network with
where is the source of the flow network, is the sink, and each arc has capacity 1.
The existence of the spider in centered in and reaching all nodes in implies the existence of a flow in such that
The value of is .
In the same way, the existence of spider in implies the existence of a flow of value in . Since , we know that the maximum flow in is at least . Hence, the flow given in (2.4) can be augmented. Consider then the residual graph obtained starting from the initial flow ; must contain an augmenting path from to . Moreover, the path must use the arc for some (since only contains the arcs for any ). Consider then the augmented flow implied by and . Since it modifies the values of only on arcs on , we get that induces a set of arc disjoint paths in from to the nodes in . This gives the desired spider covering .
We notice that the family is hereditary, that is, any subset of a reachable set is reachable. This fact and Lemma 2.3 tell us that
Lemma 2.4. The pair forms a matroid.
However, the set system associated to our optimization problem is not , but it is , where ; which is hereditary but not a matroid.
Nonetheless, the fact that is a matroid represents a useful fact for us. Indeed our coverage function is submodular, for example for any it holds Hence the Maximum Spider Problem corresponds to the maximization of the submodular function on the independent sets of the matroid . By a wellknown result of Nemhauser et al. [14] we have that the greedy algorithm MAX_SP given in Algorithm 1 returns a set such that where represents an optimal solution to the problem. Hence, we have proved the desired approximation result.

Theorem 2.5. The Algorithm MAX_SP) is a 2approximation algorithm for the Maximum Spider Problem.
3. The Spider Cover Problem
In this section we will build up on the results for the Maximum Spider Problem in order to design a approximation algorithm for the Spider Cover Problem. Recall that in this latter problem we are given digraph , a vertex , a family , and the goal is to cover all elements in by using the minimum number of spiders centered at . Our first step will be to introduce a parametrized family of digraphs and reduce the problem of determining the minimum number of spiders in necessary to cover all elements of to the problem of determining the minimum value of for which contains a single spider covering all vertices in a designated subset of vertices of . Subsequently, using iteratively the approximation algorithm MAX_SP on certain ’s, plus some additional constructions, will allow us to construct an approximation algorithm for the Spider Cover Problem.
3.1. Constructing the Digraph
Let be an instance of the Spider Cover Problem, and let be an integer. We first construct graphs as follows: for any the vertex set of the th digraph contains a corresponding vertex , for . Vertex will be called the th copy of in the final digraph . If the designated vertex is connected to vertices in , then each contains copies of , let be such copies, for .
Now for the arcs in the ’s. For each arc , , we insert a corresponding arc in . We also insert in the arcs , where, we recall, .
For the final construction of we introduce new nodes , for each , and a special node . There are arcs between and each , and for each there is an arc from to , for each .
Formally, is a directed graph where
An example of digraph and associated graph is presented in Figure 4. The relevance of digraph to our questions is explained by the following two evident results.
(a)
(b)
Lemma 3.1. Let be an instance of the Spider Cover Problem. There are spiders centered at in that altogether reach a set of nodes if and only if there exists a spider centered at in the digraph reaching the corresponding set of nodes .
Notice that the spiders in can also be easily constructed from the “big” spider in and vice versa.
Given an instance of the Spider Cover Problem, let be the family of subsets of nodes of digraph consisting of all subsets , for any .
Theorem 3.2. An instance of the Spider Cover Problem admits an optimal solution with spiders if and only if is the minimum integer for which an optimal solution of the Maximum Spider Problem on the instance consists in a spider covering all elements in the family of subsets .
3.2. The Spider Cover Algorithm
Our spider cover algorithm SP_COV is presented in next box Algorithm 2. The algorithm consists of successive iterations, based on the Algorithm MAX_SP. At each iteration a certain set of spiders is constructed in order to cover as many subsets in as possible. Namely, at each iteration, if is the subfamily of subsets not covered yet, the algorithm seeks for the minimum number for which the algorithm MAX_SP returns a spider centered in that covers at least half of the subsets in . The minimum number can be obtained by applying the algorithm MAX_SP in a binary search fashion, with in the range . Thereafter, via Lemma 3.1, one obtains spiders in from the “big” spider in .

The total number of used spiders will be the sum of the number of spiders used at each iteration.
We show now that the number of spiders returned by the algorithm SP_COV is at most times the optimal number of spiders necessary for the given instance of the Spider Cover Problem.
Theorem 3.3. The number of spiders returned by the algorithm SP_COV is , where is the number of spiders in an optimal solution for the given instance of the problem.
Proof. Consider any iteration of the cycle. The algorithm computes the minimum integer such that MAX_SP outputs a spider covering at least elements of the family . This means that the current size of the family of yet uncovered groups is decreased of at least of its value during each iteration. Hence, the algorithm SP_COV consists of at most iterations.
Moreover, at each iteration the minimum integer computed by the algorithm is upperbounded by . In fact, it is certain that in there exists a spider reaching elements of , for any , and the algorithm MAX_SP is guaranteed to find a spider that covers at least elements of .
We can then conclude that the total number of spiders used by SP_COV, which is the sum of all the values obtained at the various iterations, is upperbounded by .
4. Final Comments
We have provided a general framework for covering problems and shown that several seemingly different problems naturally fit in our scenario. We have given approximation algorithms with best possible approximation ratios, under widely believed computational complexity assumptions. We would like to point out that we can easily extend our results to undirected graphs or to spiders defined as a collection of vertex disjoint paths sharing only a common vertex, using standard tricks.
In case the graph is undirected, we can consider the corresponding directed symmetric graph where contains the pair of arcs and if and only if and are neighbors in . One must only be careful in the case in which one could get a spider containing both the opposite arcs, say and , corresponding to one edge of . However, if two branches of a spider are of the form and , one can modify the spider so to contain and . This implies that we can always get spiders in with edge disjoint branches. We can then apply the result of the present paper to the directed graph .
In case we are interested in spiders made of vertex disjoint paths sharing a single vertex, we can obtain the same results as for arcdisjoint spiders by substituting in each node with a pair of nodes and , connected by the arc (. Moreover, each arc entering in now enters , and each arc leaving in now leaves .