Abstract
This paper extends the wellknown CLP with one server to CLP with identical servers, denoted by CLP. We propose the closest server orienting protocol (CSOP), under which every client connects to the closest server to itself via a shortest route on given network. We abbreviate CLP under CSOP to CSOP CLP and investigate that CSOP CLP on a general network is equivalent to that on a forest and further to multiple CLPs on trees. The case of is the focus of this paper. We first devise an improved time parallel exact algorithm for CLP on a tree and then present a parallel exact algorithm with at most time in the worst case for CSOP CLP on a general network. Furthermore, we extend the idea of parallel algorithm to the cases of to obtain a worstcase time exact algorithm. At the end of the paper, we first give an example to illustrate our algorithms and then make a series of numerical experiments to compare the running times of our algorithms.
1. Introduction
Caching has become an important tool to improve the network performance efficiency, reducing delays to every client and alleviating the overload on the server [1–4]. Initially, a large amount of studies considered how to optimize cache performance [5–7], cache hierarchies [5], and cooperations among multiple web servers [8, 9]. Subsequently, how to locate caches or proxies optimally in networks to alleviate the server load became more popular [2, 10–13]. The most popular practice in the past is to place caches on the edges of networks, acting as the network browser and proxy or part of cache hierarchies [1, 3–5]. Later, Danzig et al. [2] discovered that the advantage of placing caches on the nodes of networks instead of on the edges of networks is to reduce overall network congestion greatly. In this paper, we only discuss how to place caches on the nodes of networks.
The focus of placing caches in networks is how to enhance the effect and efficiency of caching in networks as greatly as possible. This problem can be modeled as the cache location problem (abbreviated to CLP or CLP) or proxy problem. Both of their initial models can be reduced to themedian problem [14, 15] essentially. Throughout this paper, we let denote the number of network nodes, let denote the number of network edges, let denote the number of caches or proxies, let and denote the height of tree. Later, Abrams et al. [7] investigated that almost all current cache products contain a transparent operation mode, called a transparent enroute cache (TERC). When using TERCs in networks with one server, all clients connect to server and caches are placed on the routes from clients to server. Heddaya and Mirdad [10] suggested making use of TERCs to balance load due to the manageability of TERC. Further, Krishnan et al. [11] proposed the cache location problem involving TERCs, studied the problem in several special networks, and presented polynomial time exact algorithms. In the rest of this paper, all of CLPs involve TERC.
The known algorithms for the proxy problem also apply to CLP. For a linear network, Li et al. studied the proxy problem and presented an time exact algorithm [12]. Later, Woeginger used the Monge property to obtain an improved algorithm with time complexity [16]. For a tree network, Li et al. devised an time exact dynamic programming algorithm [13] and Chen et al. presented an improved time algorithm [17]. For a general tree of rings network, Chen et al. showed an time exact algorithm [18]. Moreover, a variety of objectives have been considered, such as the overall time, cost, and hop count. Du [19] and Jia et al. [20] studied the proxy problem with readwrite operations. In [21], Liu and Yang considered the delayconstrained proxy problem. Given a general network, provided that all clients are intelligent and connect to the server via a shortest route on the network, we claim that CLP in this situation is equivalent to CLP on the tree and thus is solvable in polynomial time [11, 13, 17] since all the shortest routes between clients and server form a shortestpaths tree rooted at the server node.
All past works on CLP considered a single server. This paper is the first one to study CLP with multiple identical servers. In this model, every client connects to some server via a route and all caches lie on such routes. We further suppose that every client is intelligent; that is, it connects to the closest server to itself via a shortest route. This produces the closest server orienting protocol (abbreviated to CSOP). Under CSOP, all the shortest routes connecting clients and server form a forest, each component of which is rooted at a server node. Therefore, the CLP with multiple identical servers under CSOP on a general network is equivalent to that on a forest.
We abbreviate CLP with identical servers to CLP and further CLP under CSOP to CSOP CLP. In this paper, we first propose an improved parallel exact algorithm for CLP on a tree, which reduces time of algorithm in [17] to . Based on the improved algorithm, we design a parallel exact algorithm for CSOP CLP on a general network, which takes time and at most time in the worst case. Furthermore, we extend the algorithm idea to CSOP CLP on a general network and get a parallel exact algorithm, which takes time and at most time in the worst case.
The rest of this paper is organized as follows. In Section 2, we define notations used frequently and CSOP CLP formally. In Section 3, we make some fundamental preliminaries, develop an improved algorithm for CLP on a tree, and then devise a parallel exact algorithm for CSOP CLP based on the improved algorithm. In Section 4, we define CSOP CLP formally and devise an efficient parallel exact algorithm. In Section 5, we first give an example to illustrate our algorithm and then make a series of numerical experiments to compare the running time of our algorithm. In Section 6, we conclude this paper with some future research topics.
2. Problem Description
Let represent a communication network or computer network, where is the node set and is the edge set. Every node represents a processing or switching element and every edge represents a communication link [22]. Every node has a weight representing the demand amount of , and every edge has a weight representing the cost per demand. For any pair of nodes and , we let denote the edge of between and and let denote the shortest path in connecting and . Let denote the cost of edge , and denote the cost of which is equal to the sum of all the costs on edges of . So,
Let and be two identical servers, which are allocated to a pair of nodes of in advance. Let denote the set of cache locations. Suppose that CSOP works and each cache is a duplicate of server. Given any set , the cost of node paying for its per demand depends on the locations of and is denoted by . Thus, the cost of paying for its overall demand is equal to . Let denote the total cost of all the nodes paying for their overall demand, that is,
The cache location problem with two identical servers under CSOP (i.e., CSOP CLP) aims to find cache locations in to minimize the total cost of all the nodes paying for their overall demand. In other words, the aim of CSOP CLP can be reduced to find an optimal set from to minimize the value of , that is,
3. A Parallel Exact Algorithm for CSOP CLP
In the scenario of CSOP CLP, every client knows the location of the closest server to itself and connects to it via a shortest route. If its service request encounters the closest cache on the route, it will get information therein. Otherwise, it get information from the server. Therefore, CSOP CLP can be viewed as the combination of two CLPs when and are predesignated to two locations of network. One is CLP with as the server and the other is CLP with as the server.
3.1. Preliminaries
Once and are fixed at two predesignated locations of , it is certain that some nodes of are closer to and the other nodes are closer to . Let be the set of nodes that are closer to and be the set of nodes that are closer to . Thus, Lemma 1 follows immediately. Let (resp. ) denote the singlesource shortest paths tree in with (resp. ) as the origin spanning (resp. ). We can use Dijkstra’s algorithm [23] to compute and . We can transform (resp. ) into a rooted tree with (resp. ) as the root without loss of generality. Furthermore, let denote the subtree of rooted at for any and let denote the subtree of rooted at for any .
Lemma 1. .
Specifically, we let denote the unique path on tree between and when is a tree graph. Let (resp. ) be the set of nodes on which are closer to (resp. ) and let be the subset of nodes which reach and via for any node on . We investigate that every node in , belongs to and every node in , belongs to , and vice versa. So, Lemma 2 follows. By Lemmas 1 and 2, we can compute and by applying the depthfirst search (DFS) procedure to the tree, which only takes a linear time.
Lemma 2. , .
By Lemma 1, we are sure that one cache is placed either in or . The set of cache locations in and is denoted by and , respectively; thus, , and . Under CSOP, every connects to and thus the cost of paying for its overall demand is equal to . Similarly, the cost of is equal to . We can further rewrite (2) as
Let be the number of caches in and let be the number of caches in . Obviously, . A combination of having caches and having caches is called a cache allocation scheme (abbreviated to CAS), denoted as CAS where and cannot be exchanged. Clearly, CSOP CLP contains CASs in total. For any CAS, CSOP CLP is composed of two subproblems, that is, CLP in and CLP in . Let denote the minimum cost of CSOP CLP in , and let denote the minimum cost of CLP in and let denote the minimum cost of CLP in . The cost of CSOP CLP in for any given CAS is equal to the sum of and , and further results from the optimal CASs; that is,
3.2. Preprocessing
In this subsection, we give a new method of transforming an arbitrary rooted tree into a binary tree. Let be a rooted tree. For any nonleaf node of , the subgraph of is composed of the edges between and all its children are called a star of with center , denoted by . Let denote the number of children of . We process in the following way:(i)if , then we add a new child to and set both and to zero;(ii)if , then we need no work;(iii)if , (let all children of be ), then we delete the edges , add two new nodes and to , and set and to zero. For each , we add new edge and set to . For each , we add new edge and set to .
We use the above way recursively to process every node of topdown to obtain a binary tree . This idea can be described as algorithm BINY. Our way improves that one proposed by Chen et al. [18]. Moreover, we will analyze the performance of BINY in the following while they provided no analysis of their algorithm [18].
Theorem 3. For each of with , the subtree of derived from transforming by BINY has a height of and has dummy nodes added in the worst case.
Proof. The essence of BINY processing is to bisect all the children of recursively. At the final step, two dummy nodes are added if four nodes are bisected into two groups of two nodes and three dummy nodes are added if three nodes are bisected into one group of two nodes and one group of one node. So, BINY adds the most dummy nodes in the worst case of . In this case, the subtree of derived from transforming by BINY has a height of and the number of dummy nodes added is . In fact, we investigate that the subtree of derived from a star with nodes satisfying that has a height of . So, . Therefore, the subtree of derived from has a height of . The number of all the dummy nodes added by BINY in the worst case is .
Theorem 4. BINY can transform with nodes into with a height of at most , which takes time and adds at most dummy nodes.
Proof. Suppose that has stars and every , , has children. Obviously, . By Theorem 3, BINY adds dummy nodes for every in the worst case of , , . So, the number of dummy nodes added by BINY is .
Next, we discuss the height of . We construct a worstcase tree consisting of threechildren stars lined one by one. In other words, for every star of other than the bottom one, two children of the star are leaves of and the other one is the center of the next star. Clearly, satisfies that . Since the subtree of derived from transforming a threechildren star has height of 2, the height of is .
Therefore, we obtain in the worst case and conclude that BINY spends time to transform into .
Theorem 5. CLP in a general tree is equivalent to CLP in .
Proof. This theorem is equivalent to the proposition that no cache is located at one dummy node in any optimal solution to CLP in . Suppose that a cache is located at a dummy node in an optimal solution and is added by transforming by BINY. The cost of a new solution obtained by replacing with into is less than the cost of . This causes a contradiction.
The binary tree obtained by applying Tamir’s algorithm [15] to a general tree with nodes has a height of at most , while one obtained by applying BINY to has a height of at most . In terms of height of binary tree, BINY is superior to Tamir’s algorithm. This will help to reduce the running time of algorithm SUB (Algorithm 1) shown in Section 3.3.

3.3. Algorithm for CLP on Trees
By Theorem 5, we only need to discuss CLP on binary trees. Let be a binary tree and the server (root) node, and let denote the node set of or its subtree. For any node of , we let denote the subtree of rooted at . Let and be the left child and right child of , respectively, and then let (resp. ) denote the subtree of rooted at (resp. ). We use to denote the height of and to label the levels of bottomup, and we use to denote the th level of . Let denote the minimum cost of the subproblem of CLP on when the closest cache to on is located at and caches are placed in . Similar to the idea of solving the proxy problem in [18], we propose our way of computing and giving a new proof, shown in Theorem 6.
Theorem 6. For each node of other than leaves, each node on , and each , one has
Proof. For any node of , we need to consider whether a cache is located at or not when discussing the subproblem of CLP on .(1)If a cache is located at , then needs no paying for its overall demand. So, the cost of the subproblem of CLP on is equal to the sum of that on and that on . When the closest cache to on is located at , we observe that is the closest cache on to and . The possible number of caches placed in is at most while at most . The number of caches in plus the number of caches in is equal to . The key work is to find the optimal CAS with , from which the cost of the subproblem of CLP on results is equal to (2) If no cache is located at , then the cost of paying for its overall demand is . So, the cost of the subproblem of CLP on is equal to the sum of that on and that on and the cost of . When the closest cache to on is located at , we see that is also the closest cache on to and . As discussed above, . The number of caches in plus the number of caches in is equal to . The key work is to find the optimal CAS with , from which the cost of the subproblem of CLP on results is equal to
Obviously, the case of is forbidden. Let , for each node of and each node on . From the definition of , we know that is just the minimum cost of CLP. Initially, we set and for each leaf node of and each node on .
We can first use the depthfirst search (DFS) based algorithm in [24] to traverse , by which we can record the parent node of every node (thus record step by step) and compute the cost of path connecting any node and its ancestor. Based on Theorem 6 and above discussions, we devise a bottomup dynamic programming algorithm, which can be described as a parallel algorithm SUB by using the techniques in [25].
Theorem 7. Given any binary tree with nodes and a height of , SUB runs in time for computing CLP on .
Proof. Step 0 uses DFS based algorithm in [24] to traverse , which runs time. In Step 1, for each level , the processor at every leaf node uses the algorithm in [24] to make initialization for each , which takes time. So, Step 1 runs at most time. In Step 2, for each level , the processor at every nonleaf node uses the algorithm in [24] to compute and then compute by (6) for every and every , which takes at most time. So, Step 2 runs at most time. We can use the method in [15] to infer that the practical running time of Step 2 is . Therefore, SUB runs in time.
3.4. Algorithm for CSOP CLP on General Graphs
Based on (5) and discussions therein, we can solve CSOP CLP on a general graph by first computing in and in for any CAS with and then determining an optimal CAS such that the sum of and is minimized. We discover that the output , , of SUB is just the value of when we apply SUB to . This forms our algorithm for CSOP CLP on a general graph, described as algorithm GLOB (Algorithm 2).

Suppose that has nodes and has nodes. Clearly, . GLOB uses BINY to transform into and into . From Theorem 4, we know that , , has at most nodes including at most dummy nodes and has a height of at most . Let denote the height of . For any CAS with , GLOB applies SUB to CLP on and CLP on , respectively. It follows from Theorem 7 that the former takes at most time and the latter takes at most time. Therefore, the running time of GLOB is at most Further, we take and into the above inequality to obtain that the running time of GLOB in the worst case is at most
Theorem 8. Given an undirected graph with nodes and two server nodes, GLOB runs in time for CSOP CLP on and runs in at most time in the worst case.
4. Generalization
In this section, we discuss the cache location problem with identical servers under CSOP (abbreviated to CSOP CLP) on an undirected graph . Let be a collection of identical servers. Given any set , we let denote the cost of node paying for its per demand and let denote the total cost of all the nodes paying for their overall demand; that is, . The aim of CSOP CLP is to find cache locations in to minimize the total cost of all the nodes paying for their overall demand. In essence, CSOP CLP aims to find an optimal set to minimize the value of ; that is, .
In CSOP CLP, every client connects to the closest server to itself via a shortest route and gets information from the closest cache on the route or server. Let be the subset of nodes of to which is the closest server. Let denote the singlesource shortest paths tree in with as origin spanning and let denote the subset of caches placed in . Hence, we can rewrite to be
Let be the number of caches in . Let with . Such a combination of having caches is denoted as CAS. Thus, CSOP CLP consists of CLPs when servers are placed at predesignated locations of . For any CAS, CSOP CLP consists of subproblems; that is, CLP in . Let denote the minimum cost of CSOP CLP on and let , denote the minimum cost of CLP in . Therefore,
Based on (12), for any CAS with , we can solve CSOP CLP on by first computing in for every and then determining an optimal CAS such that the sum of , , is minimized. This idea can be described as algorithm EXTD.
Lemma 9. The number of CASs in CSOP CLP is .
Proof. The problem of allocating caches to distinct subtrees can be reduced to the model of putting same balls into distinct boxes. We first draw dots one by one in a line and then select dots to place balls and the other dots to place baffles. The line is partitioned into sections (boxes) by these baffles together with two immaterial baffles at two ends of the line. There are ways in all to partition the line into boxes. Every way of partitioning the line produces a CAS. Therefore, CSOP CLP contains CASs.
For any CAS with , the combination of Theorems 7 and 4 yields that the running time of EXTD solving all CLP in , is By Lemma 9, we conclude that the running time of EXTD is From , for every , we get Obviously, we have We take (15) and (16) into (14) to obtain that the running time of EXTD in the worst case is at most .
Theorem 10. Given an undirected graph with nodes and servers, EXTD runs in time for CSOP CLP on and runs in at most time in the worst case.
5. Numerical Experiments
5.1. An Illustrative Example
In this subsection, we first give an example to illustrate our algorithm GLOB for computing CSOP CLP. Considering that CSOP CLP on a general network can be reduced to that on a corresponding tree network, we select a tree network as our example for ease of illustration, shown in Figure 1. The example tree has 25 client nodes labelled by and two server nodes labelled by and . The number , , on client node represents the demand account of , and the number on every edge represents the cost of one node paying for its per demand. For instance, the demand account of is 0.76, and the total cost of paying for its overall demand is .
First, we make preparation works. The unique path of connecting and is . It is easy to see that and are closer to than while and are closer to than . Thus, and . Based on Lemma 2, we use the DFS based approach to obtain and , shown in the left subfigure of Figure 2 and the left subfigure of Figure 3, respectively. Both the heights of and are three. We apply BINY to transform into a binary tree shown in the right subfigure of Figure 2, where eight dummy nodes added by BINY are labelled by . Similarly, we obtain shown in the right subfigure of Figure 3, where nine dummy nodes are labelled by . All the dummy nodes have a weight of zero and all the edges between a dummy node and its parent node have a weight of zero. Both the height of and are five.
Next, we use GLOB to solve CSOP CLP on . Set and as examples. For any CAS with , GLOB computes the minimum cost of CLP in and the set of cache locations and the minimum cost of CLP in and . The data are listed in Table 1. Clearly, the optimal value of CSOP CLP on is 33.2950, and thus the optimal set of cache locations is . Similarly, the data output by GLOB for any CAS with are listed in Table 2. Clearly, the optimal value of CSOP CLP on is 27.7770, and thus the optimal set is .
5.2. Comparison of Running Times
In this subsection, we make a large number of numerical experiments to compare the running times of our algorithm GLOB and EXTD, respectively. In view of the fact that CSOP CLP with on a general graph can be reduced to multiple CLPs on binary trees, we select a series of complete binary trees as examples for ease of comparison. All the binary trees are generated randomly and have almost same number of nodes. We build a centralized parallel computer system (i.e., a star network with one central computer and five parallel computers) by connecting six identical PCs equipped with 2 GB RAM and Intel core i5 CPU using a Windows 7 operating system. Our numerical experiments were carried out on this computer system.
For CSOP CLP, we consider different inputs of and : and . All the binary trees we select have odd nodes. Given and , there are 100 different combinations of and ; that is, and . All the running times of GLOB for 100 combinations are depicted in Figure 4(a). Given any combination of and , we record the average running time of 100 combinations. In Figure 4(b), all the average times for the combinations of and fixed are depicted, for any given . In Figure 4(c), all the average times for the combinations of and fixed are depicted, for any given . In Figure 4(d), all the average times for the combinations of and are depicted.
(a) The number of nodes in subtree () 
(b) The number of nodes: 
(c) The number of caches: 
(d)
For CSOP CLP, we are given or 201 and consider different inputs of and : and . All the binary trees we select have odd nodes. Given and , there are 10626 different CAS’s. All the running times of EXTD for 10626 CAS’s are depicted in Figure 5(a). Given any combination of and , we record the average running time of 10626 CAS’s. In Figure 5(b), all the average times for the combinations of and fixed are depicted, for any given . In Figure 4(c), all the average times for the combinations of and fixed are depicted, for any . In Figure 5(d), all the average times for the combinations of and are depicted.
(a) Cache allocation schemes
(b) The number of servers: 
(c) The number of caches: 
(d)
Next, we present the running times of EXTD for different inputs of . Given , we consider six different combinations of and fixed . In Figure 6(a), all the average times for , , are depicted. Also, we depict all the average times for the combinations of and in Figure 6(b). Note that every complete binary tree has nodes. Given , we consider four different combinations of and fixed