Abstract

Graph is an important complex network model to describe the relationship among various entities in real applications, including knowledge graph, social network, and traffic network. Shortest path query is an important problem over graphs and has been well studied. This paper studies a special case of the shortest path problem to find the shortest path passing through a set of vertices specified by user, which is NP-hard. Most existing methods calculate all permutations for given vertices and then find the shortest one from these permutations. However, the computational cost is extremely expensive when the size of graph or given set of vertices is large. In this paper, we first propose a novel exact heuristic algorithm in best-first search way and then give two optimizing techniques to improve efficiency. Moreover, we propose an approximate heuristic algorithm in polynomial time for this problem over large graphs. We prove the ratio bound is 3 for our approximate algorithm. We confirm the efficiency of our algorithms by extensive experiments on real-life datasets. The experimental results validate that our algorithms always outperform the existing methods even though the size of graph or given set of vertices is large.

1. Introduction

Graph is an important complex network model to describe the relationship among various entities in real applications, including knowledge graph, RDF graph, linked data, social network, biological network, and traffic network [14]. Shortest path query is a basic problem on graph model. For example, in knowledge graphs, it is to find the closest connection between two entities or concepts; in social networks, it is to find the closest relationships such as friendship between two individuals; in traffic networks, it is to compute the shortest route between two locations.

Shortest path routing is an important problem in location-based services (LBS) and has been well studied in the past decades [57]. However, a special kind of shortest path query with vertex constraint is more and more important in real life. For instance, in knowledge graphs, a data miner is interested in investigating the closest relationship between two entities connected by some specified entities or concepts. In traffic networks, carpooling becomes a common business with the rapid development of sharing economy. A car driver may carry some fellows on the way home from company and the fellows are going to get down at distinct locations. Thus a critical problem is how to find a route with the minimum length passing through these locations. In above examples, both knowledge graph and traffic network can be modeled as a large graph . The query of shortest path with vertex constraint can be defined as follows: given a starting vertex , an ending vertex , and a subset , find a path with the minimum length among all the paths passing through every from to . The subset is called vertex constraint; that is, the shortest path must pass through every vertex in the subset .

The above problem is a special case of Generalized Traveling Salesman Path (GTSP) problem [8], which is known to be NP-hard. In GTSP problem, all the vertices in are partitioned into several categories. The objective is to find a path that visits at least one vertex for every category specified by user. For example, a tourist plans to travel through three kinds of locations, e.g., a coffee shop, a gas station, and a bank. Because he/she may have several choices for every location category, then it is necessary to find an optimal route for him/her. The basic idea of most existing works on GTSP problem is as follows: they first compute all permutations for given categories. Each permutation represents a class of path which has the same order of the categories. Next, for every permutation, these methods enumerate all possible paths from source to destination by concatenating the subpaths between vertices in two successive categories. Finally, they find the optimal one from these paths. In our problem, every vertex in represents a category different to others. Thus these methods need to calculate all the permutations of the vertices to be visited, which incur too heavy computational consumption. However, most of these permutations are unnecessary for computing the shortest path. Therefore, the main challenge is how to avoid computing unnecessary permutations when finding the shortest path with vertex constraint. In this paper, we propose a novel efficient algorithm based on the best-first search to compute the shortest path with vertex constraint. The main idea of our method is to avoid calculating the unnecessary permutations as soon as possible. We also propose an approximate algorithm in polynomial time which is more efficient for large graphs. The contributions of this paper are summarized below.(i)We propose a novel and efficient exact heuristic algorithm with two optimizing techniques to find the shortest path with vertex constraint.(ii)We also propose an approximate algorithm in polynomial time for our problem over large graphs. We prove the ratio bound of our approximate algorithm is 3.(iii)We conduct extensive experiments on several real-life datasets. We compare our algorithms with the state-of-the-art methods. The experimental results validate the efficiency and effectiveness of our algorithms.

The rest of this paper is organized as follows. Section 2 gives the problem statement. Section 3 introduces the CH technique for preprocessing graphs. Section 4 proposes the best-first searching algorithm with two optimizing techniques. Section 5 proposes the approximate algorithm and analyzes the ratio bound. The experimental results are presented in Section 6. The related work is in Section 7. Finally, we conclude this paper in Section 8.

2. Problem Statement

An undirected weighted graph is denoted as (or for short), where is the set of vertices and is the set of edges in . is a function that assigns a nonnegative weight on every edge ; i.e., . Note that is equivalent to because is an undirected graph. The number of vertices (or edges) is denoted as (or ) in . A path in is a sequence of vertices; i.e., , where every is an edge in for . The weight of path , denoted as , is the sum of the weights of all the edges in ; i.e., . We say a path is simple if and only if there is no repeated vertex in . The shortest path between and is a path with the minimum among all the paths between and . For simplicity, in the following, we use to denote the weight of the shortest path between and in .

In this paper, we study the problem of finding the shortest path with vertex constraint. Table 1 summarizes the symbols in this paper. We first give the definition below.

Definition 1 (shortest path with vertex constraint). Given a graph , a vertex subset , a starting vertex , and an ending vertex in , a path is called the shortest path between and with vertex constraint of , denoted as , if it satisfies the following two conditions:    travels through all the vertices in ; i.e., for every vertex and    is with the minimum weight among all the paths satisfying the condition .

Figure 1 illustrates an example of the shortest path with vertex constraint. In this example, is and these vertices are colored with yellow in Figure 1(b). Two gray vertices, and , are the starting vertex and the ending vertex, respectively. Therefore, the shortest path between and with vertex constraint of is , which is shown as the green path in Figure 1(b).

Hamilton path problem is a special case of our problem; then, we have the following theorem straightforwardly.

Theorem 2. The problem of finding the shortest path with vertex constraint over graphs is NP-hard.

Proof. We proof it by reducing Hamilton path problem, which is NP-complete. Given a undirected graph , let and denote starting vertex and ending vertex, respectively. The weight of every edge in is set as one. The vertex subset is set as . Obviously, there exists a Hamilton path from to in if and only if the length is for the shortest path from to with vertex constraint of . This reduction can be done in polynomial time. Therefore, the problem of finding the shortest path with vertex constraint over graphs is NP-hard.

3. CH Technique for Preprocessing Graphs

Contraction Hierarchies (CH) proposed in [9] is a well-known technique for speeding up the traditional shortest path query effectively. It essentially builds an index by maintaining the shortest paths for some pairs of vertices. In this paper, we use CH technique for preprocessing graphs to make our method more efficient.

Given a graph , CH first sorts all vertices in an ascending order and then contracts the vertices one by one under this order. Contraction of vertex can be described as removing from a graph by adding new edges which represent the shortest path between two vertices adjacent to . Such edges are called shortcut edges. Specifically, for each pair of incoming edge and outgoing edge of , if is a unique shortest path, then a new shortcut edge is added with weight to obtain a new graph .

We use an example in Figure 2 to illustrate the process of vertex contraction. Figure 2(a) shows a graph before the contraction of . Note that there are two shortest paths between and , which are and , respectively. Thus it is unnecessary to add the edge from to when removing . We also note that there is only one shortest path from to . Because this path goes through , a new edge from to can be constructed by removing . Similarly, a new edge from to also can be constructed. Both the weights of such two new edges are 2. The result graph after contraction of is shown in Figure 2(b).

After contracting vertices, CH divides into an upward graph and a downward graph . The shortest paths can be calculated on and . Given a starting vertex and an ending vertex , a forward Dijkstra [10] search from and a backward Dijkstra search from are executed on and , respectively. The more details about CH technique are given in [9].

4. Permutation-Expanding Algorithm

In this section, we propose an algorithm to find the shortest path with vertex constraint. We first introduce the definition of permutation expanding, which is the basis of our algorithm, and then we explain the algorithm Permutation-Expanding. Two optimizing techniques are proposed in Section 4.3 and we analyze the time and space complexity of our algorithm in Section 4.4

4.1. Permutation-Expanding

Given a vertex subset on , , a permutation of is a sequence of all vertices in , where every and for , . Obviously, there are permutations for a given . We use to denote that if is before in , a permutation is essentially an order of the vertices in . We say a path is under a permutation , denoted as , if it satisfies the following two conditions:    for every and there exists a subpath from to if in . Given a , path can be divided into subpaths , where and are the starting vertex and the ending vertex of , respectively. Each is called a “segment” of . We use to denote the set of all the segments of .

In the example of Figure 1, is a path under permutation . Here, .

A path is called the shortest path between and under permutation , denoted as , if every segment is a shortest path. Then we have the following theorem.

Theorem 3. Given an undirected graph , a vertex subset , a starting vertex , and an ending vertex in , the shortest path between and with vertex constraint of is exactly with the minimum weight among all the permutations of ; i.e., , where is the set of all permutations of .

Proof. Assuming that is a path under a permutation from to and the weight of is less than that of , then there will be the following four situations.(1)If and are the same permutations and every segment is a shortest path, obviously and have the same weight. This contradicts the assumption.(2)If and are the same permutations and not all of the segments are shortest path, obviously the weight of is greater than that of . This contradicts the assumption.(3)If and are different permutations and every segment is a shortest path, because is the path with the minimum weight among all the permutations, the weight of is less than that of . This contradicts the assumption.(4)If and are different permutations and not all of the segments are shortest path, obviously the weight of is not smaller than that of . This contradicts the assumption. To sum up, is the shortest path between and with vertex constraint of .

For two vertex subsets and on , if , for every permutation of , there must exist a permutation of , such that in if in . is a subpermutation of , denoted as . If , is also called a -permutation of vertex set . Specifically, is called a prefix of , denoted as , if and is at the beginning of . For example, is a prefix of .

Given a permutation , is an expanded permutation with one vertex from , where is the concatenation operator appending at the end of . Obviously, and . This process is called permutation expanding. For the example in Figure 1, given a permutation , and are two expanded permutations with one vertex and , respectively.

4.2. Main Algorithm

We propose an algorithm, PERMUTATION-EXPANDING, to find the shortest path with vertex constraint by expanding permutation incrementally. The main idea of the algorithm is essentially best-first searching on the shortest paths under 1-permutation to -permutation of as soon as possible, until the optimal one has been searched.

The pseudocode of PERMUTATION-EXPANDING is shown in Algorithm 1. Algorithm 1 utilizes a min priority queue to maintain a set of tuples (line 1). is a subpermutation of . is the weight of the shortest path under from to the last vertex of . If , then . Here represents the shortest path without vertex constraint and can be easily calculated by CH technique as discussed in Section 3. Initially, only contains all the 1-permutations of with its (lines 2-3). Algorithm 1 dequeues iteratively according to . In each iteration, a with the minimum is dequeued from (line 11). Let be the vertex set of . If , the algorithm generates every permutation by appending every vertex at the end of and enqueues into . Otherwise, is a permutation of ; Algorithm 1 generates and enqueues it into (lines 6-10). Algorithm 1 terminates when a permutation is dequeued for the first time, where is a permutation of (line 5). At this moment, is the weight of the shortest path with vertex constraint of and we can obtain by the CH technique (line 12). There is a special case that no path is between (or ) and where . Algorithm 1 can find such case by computing the shortest path between two vertices. For such case, we return no solution for this problem.

Input:.
Output:.
// Input: : an undirected weighted graph
// : a vertex subset of
// , : starting vertex and ending vertex respectively
// Output: : the shortest path between and with vertex
// constraint of
1: Let be a min priority queue with entries in the form , sorted in ascending order of ;
2: for  each    do
3: Enqueue an entry into ;
4: Dequeue the first entry from and let be the last vertex of ;
5: while    do
6: if    then
7: for  each    do
8: Enqueue an entry into ;
9: else
10: Enqueue an entry into ;
11: Dequeue the first entry   from    and let be the last vertex of ;
12: Generate the shortest path    between    and under a permutation ;
13: return  ;

Example 4. Given a graph shown in Figure 1(a), let , , and . Algorithm 1 first enqueues and into and then dequeues the first entry from . is enqueued into . Then the entry is dequeued from . is enqueued into . Then the entry is dequeued from . Due to where , is enqueued into . Then the entry is dequeued from . Due to the fact that the last vertex of is the ending vertex , where , Algorithm 1 returns as the shortest path with vertex constraint of .

4.3. Optimizing Techniques

We give two optimizing techniques to improve the efficiency of PERMUTATION-EXPANDING algorithm.

Cache Mechanism. Given two different permutations and , there may exist the overlapping segments for the shortest paths under and . The weights of these overlapping shortest subpaths are unnecessary to be calculated for many times during the permutation expanding. Cache Mechanism is utilized to maintain these values. For the example in Figure 1(a), and are the starting and ending vertices, respectively, and is the vertex constraint. Let and . Obviously, and are two permutations of . When calculating the shortest path between and for the first time, the distance between and is maintained and it only needs to be calculated once when and are both expanded in PERMUTATION-EXPANDING. The experimental results validate that Cache Mechanism can avoid redundant calculation effectively.

Permutation Filtering. When a permutation is dequeued from in an iteration, PERMUTATION-EXPANDING generates all expanded permutations by appending every vertex at the end of . Note that it is unnecessary to enqueue every into in this iteration. For two expanded permutations and , , if the shortest path under between and is a subpath of , then permutation can be filtered and it does not need to be enqueued into . The following theorem guarantees the correctness of permutation filtering.

Theorem 5. For two expanded permutations and , , if the shortest path under from to is a subpath of , then for any permutation of , , there exists a permutation of , , such that the weight of the shortest path under from to must not be less than the weight of the shortest path under from to .

Proof. Given a shortest path under from to , is obvious a prefix subpath of . Let denote the shortest path from to . Consider the path obtained by concatenating and , because is a subpath of , then we have , where represents the weight of path . Next, we consider the subpath of ; must go through . Let and represent the precursor and successor of in subpath . A new path can be obtained by utilizing the shortest path from to to replace the part in . It is obvious that . We concatenate , , and to get a path from to . Obviously, is a path under a permutation of and we have . Theorem 5 has been proved.

The conclusion of Theorem 5 is obvious. For the example in Figure 1, let , , and . We consider two permutations and , which are expanded from . The shortest paths under and are and , respectively. Because is a subpath of , does not need to be enqueued into in the iteration when is dequeued from . The reason is that all the paths under the permutations expanded from cannot be the shortest path with vertex constraint.

4.4. Complexity Analysis

In this section, we analyze the complexity of Algorithm 1. We first analyze the time complexity and then analyze the space complexity.

Time Complexity. Because Algorithm 1 may calculate the shortest path for every two vertices in in the worst case, it needs at most calculations for the shortest paths, where . For each shortest path calculation, CH runs in time where and . In addition, at most permutations of may be created and every permutation is maintained as a tuple which can be done in O time. Therefore, Algorithm 1 runs in time. It is worth noting that is always far less than in real applications.

Space Complexity. Algorithm 1 mainly needs to maintain the expanded permutations and expand at most permutations. Therefore, the space complexity of Algorithm 1 is .

5. Approximate-Path Algorithm

In this section, we propose an approximate algorithm APPROXIMATE-PATH to find the shortest path with vertex constraint in polynomial time. In the following, we first define query graph and then explain our approximate algorithm in detail. Next, we prove that the ratio bound of our approximate algorithm is 3. Finally, we analyze the time and space complexity of APPROXIMATE-PATH.

Given a graph , a vertex subset , a starting vertex , and an ending vertex in , a query graph is a complete graph on , where , . In , the weight of every edge is the shortest distance between and in . Here, is weight of the shortest path between and without vertex constraint in .

The following theorem indicates that we only need to find the shortest path with vertex constraint over .

Theorem 6. It is identical for the weight of the shortest path between and with vertex constraint of in and .

The main idea of APPROXIMATE-PATH is as follows. We first compute the minimum spanning tree of and then “adjust” some edges in such that is converted into a path satisfying the vertex constraint. The pseudocode of APPROXIMATE-PATH is shown in Algorithm 2. In Algorithm 2, the minimum spanning tree of is first generated in a similar way as Prim Algorithm [11] (lines 1-14). Next, Algorithm 2 executes a preorder traversal on and then we have a permutation corresponding to the order of vertices in such preorder traversal on (line 15). Note that in the ending vertex may not be the last one. In this case, is put into the end of and we get a new permutation (line 16). Finally, Algorithm 2 returns the shortest path under permutation as a result (lines 17-18), which is an approximate solution for our problem.

Input:.
Output:.
// Input: : an undirected weighted graph
// : a vertex subset of
// , : starting vertex and ending vertex respectively
// Output: : the approximate shortest path between and
// with vertexconstraint of
1: Let be a min priority queue with the entries in the form , sorted in the ascending order of
, where is the shortest distance between and ;
2: ;
3: for  each    do
4: Enqueue an entry into ;
5: , ;
6: while    do
7: Dequeue the first entry from ;
8: if    then
9: continue;
10: else
11: , ;
12: for  each    do
13: Enqueue an entry into ;
14: ;
15: Traverse by preorder and let be a permutation corresponding to the order of
vertices in preorder traversal on ;
16: Move the ending vertex to the end of to get ;
17: Generate the shortest path between and under a permutation ;
18: return  ;

Example 7. Figures 3(a) and 3(b) show the query graph and the minimum spanning tree of , respectively. Let be a permutation corresponding to the preorder traversal on shown in Figure 3(c). Then APPROXIMATE-PATH removes the ending vertex to the end of to get . The path between and under in is shown in Figure 3(d), and its weight is 12. The shortest path with vertex constraint for the input graph is shown in Figure 1(b) and its weight is 10.

Next, we prove that APPROXIMATE-PATH is a 3-approximation algorithm for shortest path problem with vertex constraint.

Theorem 8. APPROXIMATE-PATH is a 3-approximation algorithm for finding the shortest path with vertex constraint.

Proof. Let denote a shortest path with vertex constraint of in . Obviously, is a spanning tree of . Therefore, the weight of the minimum spanning tree of , computed by APPROXIMATE-PATH, provides a lower bound on the weight of :The preorder traversal of is essentially a vertex permutation of . Let , where for . We use to denote a path on under permutation . Note that may not be a simple path and every edge in appears at most twice. For the example in Figure 3, and its . Here, the edge (or ) appears twice in . Because travels through every edge in at most twice, then we haveBased on inequality (1) and equation (2), we haveBecause is a complete graph, we can generate a simple path on under permutation . Note that if , then and every is an edge in for . Additionally, the weight of every edge in is equal to the weight of the shortest path between and in ; thus, the weight of edge cannot be larger than the weight of subpath between and in . It meansGiven the permutation of preorder traversal of , Algorithm 2 obtains another permutation by removing the ending vertex to the end of . For the last two vertices and of , if is an edge in , its weight must be less than the weight of . Otherwise, there must exist a simple path between and in and its weight cannot be less than the shortest distance between and . Therefore, for both two cases, and then we haveBecause is exactly the weight of the approximate shortest path returned by Algorithm 2, then the proof is completed.

Complexity Analysis. We first analyze the time complexity for Algorithm 2. In order to construct the minimum spanning tree of , we utilize the CH technique to calculate the weight of shortest path between any two vertices in . It needs time, where , , and , then the time complexity of Algorithm 2 is . In order to construct the minimum spanning tree, Algorithm 2 needs to maintain the weight of shortest path for any two vertices in , then the space complexity of Algorithm 2 is .

6. Experiments

This section experimentally evaluates our algorithms against the current state-of-the-art methods. Section 6.1 explains the experimental settings. Section 6.2 presents the performance of algorithms.

6.1. Experimental Settings

All methods are implemented in C++ and tested on a Linux machine with an Intel(R) Core(TM) i7-4770K and 32GB RAM. We repeat each experiment 100 times and report the average result. If a method requires more than 24 hours or more than 32GB RAM to preprocess a dataset , we omit the method from the experiments on .

Datasets. We test 4 real road networks from the 9th DIMACS Implementation Challenge (http://www.dis.uniroma1.it/challenge9/index.shtml) and an email network (http://snap.stanford.edu/data/) as shown in Table 2. For each graph, each vertex represents a road junction and each edge represents a road segment. Table 2 describes the properties of the datasets, where , , and are the number of vertices, the number of edges in the road network, and the average degree of vertex, respectively. The full name of each road network is shown in description.

Query Set. In this paper, we investigate the query efficiency by varying the size of the vertex constraint. The size of the vertex constraint is the number of vertices in . We test 15 kinds of query sets Q1 to Q15, where every query set is a set of queries with an appropriate size of . For each query set, we test 100 random queries and report the average querying time and space consumption as the results for the current query set. Specifically, the sizes of for Q1-Q5 are 4,5,6,7,8, respectively, and the sizes of for Q6-Q10 are 12,14,16,18,20, respectively. The starting and ending vertex for every query are additionally selected in random way. Q11-Q15 are generated as follows. We first randomly select 500 pairs of the starting vertex and the ending vertex and then calculate distance for every pair of and . We sort these distances in ascending order and generate Q11-Q15 by dividing these pairs of and into five query sets. For example, Q11 represents the queries for the pairs of and whose distances are in the top 100, and so on. For each query, we randomly select six vertices as ; that is, the size of is 6.

For a query, if the starting vertex and ending vertex are the same, we call this starting-to-starting query (STS query); otherwise, we call this starting-to-ending query (STE query). In this paper, we present the experimental results of our algorithms for both STS query and STE query.

Compared Methods. For each experiment, we compare PERMUTATION-EXPANDING (PE) and APPROXIMATE-PATH (AP) against three algorithms which are unidirectional Dijkstra Search (U.Dijkstra) [8], Level-Sweeping Search (LESS) [8], and Nearest Neighbor Algorithm (ANN) [12]. We use CH technique to preprocess the input graphs. The first two compared algorithms are exact algorithms and the last one is an approximation algorithm. The other methods are not included in our comparison for the following reasons: INC [13] computes a simple path which does not contain repeated vertex; however, we do not require a simple path in this problem and P-LESS [8] is an optimization algorithm of LESS and mainly achieves the size of search space which typically grows in size proportional to the density of category. When each category contains only one vertex, P-LESS is equivalent to LESS.

6.2. Experimental Results

Exp-1. Query Efficiency. We investigate the impact of the size of and show the experimental results of STE query in Figure 4(a). On each dataset, we find that U.Dijkstra has the largest querying time for every query. PE outperforms LESS by large margins depending on the size of for each dataset and their maximum difference is close to two orders of magnitude. The reason is that LESS calculates all the permutations of . In contrast, PE finds the shortest path with vertex constraint by expanding permutation incrementally, which can avoid calculating the unnecessary permutations as soon as possible. We can see that PE begins to degrade as the size of graph increases. Despite this degradation, it only requires no more than 3 seconds in the worst case (for Q5 on FLA).

For each dataset, we find that AP has the minimum time cost than the other algorithms on every query. Specifically, AP outperforms ANN by one order of magnitude. When the size of is small, our exact algorithm PE runs less time than the approximate algorithm ANN, and AP answers these queries in subsecond time. We find the querying times of ANN and AP are nonsensitive to the size of in Figure 4(a).

As shown in Figure 4(b), the query efficiency of STS query is similar to STE query. PE is better than the other exact algorithms and AP has the minimum time cost than the other algorithms on every query. For the same size of and dataset, the querying time of STE query is less than that of STS query. The reason is that given a starting vertex, PE uses best-first searching on the shortest paths under 1-permutation to -permutation of as soon as possible, until the optimal one has been searched out. PE gradually expands the path, and finally each vertex in will be arranged according to its shortest distance from the starting vertex. However, STS query eventually returns to the starting vertex, so it will generate more permutations than STE query, which increases the running time of the algorithm.

When the size of becomes large, for Q6-Q10 query, because the runtime of the exact algorithms is too long, here we only compare the query efficiency of the approximate algorithms. Figure 5 shows the results of these queries. We find the performance of AP is also better than ANN by an order of magnitude and the querying time of AP does not exceed 2 seconds in the worst case for both STE query and STS query.

Q11-Q15 has the same size of and the query time is shown in Figure 6. As the distance between the starting vertex and the ending vertex increases, the time required for the query does not increase. This shows that the time required for the query is not related to the distance between the starting vertex and the ending vertex but is only related to the size of and the scale of the graph. For PE and AP algorithms, they find the shortest path with vertex constraint by expanding permutation incrementally, which can avoid calculating the unnecessary permutations as soon as possible. Moreover, AP can quickly give a solution to the problem by using the query graph. Therefore, AP and PE are more efficient than the other algorithms.

Figure 7 shows the space consumption of our algorithms on Q1-Q5. We can find that the space consumptions of STE query and STS query are nearly the same on every dataset. For every dataset, U.Dijkstra has the largest space consumption. PE has the smallest space consumption among all the exact algorithms and ANN has the smallest space consumption among all algorithms. Because ANN only needs to calculate the shortest subpaths and does not save any intermediate calculation results, it has less space consumption than AP. Note that our approximation algorithm is with the least space consumption except ANN.

Exp-2. Effectiveness of Optimizing Techniques. For PE, we design two optimizing techniques. The optimizing effectiveness of PE is shown in Figure 8. The speedup ratio is the ratio of the query times of using optimizing techniques and without optimizing techniques. We can see that the optimizing techniques can greatly reduce the query time. Figure 8(a) shows the effectiveness of optimizing techniques on STE query. The results show that the efficiency of PE can be increased by several times through optimizing techniques depending on the size of for each dataset. In addition to COL, with the increase of the size of , the ratio of speedup is also increasing. For COL, due to its larger diameter but narrower width, which means that the traffic network is in strip sharp, PE can have better performance even without any optimizing technique. Consider an extreme case, when the network degenerates into a line, PE also can achieve the best performance without any optimizing technique. Of course, this kind of network is very rare in real life. Figure 8(b) shows the ratio of speedup on STS query. Since STS query needs to calculate more permutations than STE query, the ratio of speedup on STS query is relatively small.

Exp-3. Relative Error. The relative error is , where and are the weights of approximation solution and optimal solution, respectively. For every query in this group of experiments, we first use PE to calculate the optimal result, and then use ANN and AP to calculate the approximate result. Figure 9 shows the relative errors of those two approximation algorithms on the different datasets. For STE query, the relative errors in the two datasets NY and FLA are not much different. For datasets BAY and COL, the relative errors of ANN are lower than that of AP. With the increasing of the size of , the relative errors of both algorithms gradually increase. In all datasets, the relative errors of AP do not exceed 25%. However, for STS query, the relative error is relatively smaller than STE query and the relative errors of AP do not exceed 15%. For dataset FLA, the relative errors of AP are lower than that of ANN.

In this section, we introduce existing works and categorize them as follows.

Traveling Salesman Problem (TSP). The traveling salesman problem is a very classic graph theory problem. So far, there are many algorithms to solve this problem, including exact and approximate algorithms [14]. TSP can be transformed into a linear programming problem and solved by some methods for solving linear programming [1517]. Dorigo [18] solves TSP problem using ant colony algorithm. In this work, ants of the artificial colony generate pheromones on the edges of the graph. As the pheromone accumulates, the path formed by the pheromone trail produces a shorter feasible solution of TSP. As time progresses, the amount of pheromone in the shorter path gradually increases. The shorter the path, the more the pheromone deposited on it. There are also some approximate algorithms that can quickly give a better solution to the TSP problem [1921]. However, TSP is a special case of the problem we studied in this paper. All the methods for TSP cannot solve our problem when . Additionally, these methods cannot be used for large graphs.

Generalized Traveling Salesman Problem (GTSP). The Generalized Traveling Salesman Problem is a variant of the classical Traveling Salesman Problem. It was first introduced in the late 1960s [22]. There are some exact algorithms to solve the GTSP [2325]. Specifically, a salesman travels in cities (each city can only be visited for one time) and has to eventually return to the starting city. Under the conditions that the distances between cities are given and the traveling route meets certain constraints (for example, if a salesman would like to visit city 1, he/she must ensure that he/she has visited city 2 and city 3), an optimal traveling route can be explored known as Traveling Salesman Problem with Precedence Constraint (TSPPC). Ascheuer et al. [26] proposes an algorithm based on branch cut to solve the asymmetric traveling salesman problem with constraints. Moon et al. [27] and Wang et al. [28] solve the traveling salesman problem with constraints by genetic algorithm and integer programming, respectively. The Hamiltonian path problem with precedence constraints is also known as the sequential ordering problem, which can be described as finding the shortest path between the specified starting point and the specified ending point, which passes through every point once and satisfies the sequence constraints. Karan et al. [29] proposes an algorithm based on the branch boundary method to solve the sequential ordering problem. The existing algorithms for solving GTSP are essentially exhaustive for each possible path and cannot be applied to large graphs. Our algorithm can be applied to large graphs very well.

Trip Planning Query (TPQ). All vertices in a graph are divided into groups, each representing a category. Trip Planning Query is to find a minimum-cost route where, for each given category, at least one vertex should be contained. Li et al. [12] introduce four algorithms for answering TPQ; these algorithms achieve various approximation ratios with respect to and . is the size of categories and is the maximum cardinality of any category. Our algorithm is a 3-approximation algorithm and the ratio bound is lower than that of the algorithm in [12]. Rice et al. [8] present two exact algorithms to solve this problem. These algorithms use an exhaustive way to search for the optimal path, which adds a lot of unnecessary calculations and greatly increases the running time of the algorithms. Hars et al. [13] propose a heuristic algorithm that follows the divide-and-conquer approach to compute a simple path which passes through all vertices specified by user. The original question is divided into two subquestions and the algorithm consists of two main steps: for a given set of must-visited vertices and the corresponding visited order, consider each pair of consecutive vertices represent a subpath of the entire end-to-end path, and then calculate all candidate subpaths; concatenate candidate subpaths, one from each pair of consecutive vertices, in order to establish a simple path from starting vertex to ending vertex. Since the path we are finding does not require a simple path, the algorithm does not apply to our problem. Cao et al. [30] introduce some algorithms for solving Keyword-aware Optimal Route (KOR) queries. A KOR query adds a cost constraint based on the category constraint,; that is, the optimal path returned should satisfy the user-specified cost budget. Shang et al. [31] propose and study a novel problem for dynamically monitoring the shortest path in spatial network, with the aim of accelerating the shortest path computation in a dynamic spatial network. Shang et al. [32] design an exact algorithm and an approximation algorithm to solve Collective Travel Planning query problem. The query finds the lowest cost route connecting multiple sources and a destination with up to meeting points.

8. Conclusion

To find the shortest path with vertex constraint, we propose an exact algorithm named PERMUTATION-EXPANDING and give two optimizing techniques to improve its efficiency. Moreover, we also propose an approximate algorithm named APPROXIMATE-PATH in polynomial time for this problem over large graphs. We conduct extensive experiments on real-life datasets and compare our algorithms with the state-of-the-art methods. The experimental results validate that our algorithms always outperform the existing methods even though the size of graph or given set of vertices is large. In the future work, we will study the index techniques to facilitate the queries such that our algorithms are more time and space efficient on the larger graphs.

Data Availability

The road network datasets used to support the findings of this study are included within the article. They can be downloaded from http://www.dis.uniroma1.it/challenge9/index.shtml.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work is supported by the grants of the National Natural Science Foundation of China nos. 61402323, 61572353, and U1736103, the Opening Project of State Key Laboratory of Digital Publishing Technology, and the Australian Research Council Discovery Grant DP130103051.