Analysis and Applications of LocationAware Big Complex Network Data
View this Special IssueResearch Article  Open Access
Shuang Wang, Yingchun Xu, Yinzhe Wang, Hezhi Liu, Qiaoqiao Zhang, Tiemin Ma, Shengnan Liu, Siyuan Zhang, Anliang Li, "SemanticAware Topk Multirequest Optimal Route", Complexity, vol. 2019, Article ID 4047894, 15 pages, 2019. https://doi.org/10.1155/2019/4047894
SemanticAware Topk Multirequest Optimal Route
Abstract
In recent years, research on locationbased services has received a lot of interest, in both industry and academic aspects, due to a wide range of potential applications. Among them, one of the active topic areas is the route planning on a pointofinterest (POI) network. We study the topk optimal routes querying on large, general graphs where the edge weights may not satisfy the triangle inequality. The query strives to find the topk optimal routes from a given source, which must visit a number of vertices with all the services that the user needs. Existing POI query methods mainly focus on the textual similarities and ignore the semantic understanding of keywords in spatial objects and queries. To address this problem, this paper studies the semantic similarity of POI keyword searching in the route. Another problem is that most of the previous studies consider that a POI belongs to a category, and they do not consider that a POI may provide various kinds of services even in the same category. So, we propose a novel topk optimal route planning algorithm based on semantic perception (KORSP). In KORSP, we define a dominance relationship between two partially explored routes which leads to a smaller searching space and consider the semantic similarity of keywords and the number of single POI’s services. We use an efficient label indexing technique for the shortest path queries to further improve efficiency. Finally, we perform an extensive experimental evaluation on multiple realworld graphs to demonstrate that the proposed methods deliver excellent performance.
1. Introduction
In recent years, the rapid advancements of wireless communication techniques, Global Positioning System (GPS), and smart mobile devices have enabled a lot of Locationbased Services (LBS). Among them, one of the popular issues is the path/route planning in a pointofinterest (POI) network [1, 2]. The users of the LBS often want to find short routes that pass through multiple POIs; consequently, developing trip planning queries that can find the shortest routes that passed through userspecified categories has attracted considerable attention [3, 4]. While the problem of computing the optimal route has been extensively studied and many efficient techniques have been developed over the past several decades, most of the past studies on route planning focused on origindestination route planning and did not consider the user’s specific requirements.
Recently, there are some approaches that find the route by using the user’s queries. However, the approaches may find a longer route than the one that meets the user’s actual demands, because the query keywords only show the meaning of user’s query rather than requiring the conformance in shape. A major problem with the existing approaches is that they only output routes that perfectly match the given categories [5–8]. Take Figure 1 as an example: each object can be viewed as a POI that has a spatial location and additional keywords. Considering a user who wants to watch a film and she issues a keyword query with her current location and keyword film, if we apply the traditional spatial keyword query method, is returned as it contains the query keyword. However, we can find that also meets the user’s requirement as we know that the user only wants to watch a film. To overcome this problem, we introduce flexible semantic matching based on POI keywords to find shorter routes in a flexible manner. In addition, each POI contains a lot of keywords and provides multiple services. A POI may cover multiple keywords in query; otherwise, each POI only corresponds to one of the keywords in query. For example, a POI, called WanDa Plaza, has a lot of keywords as cinema, popcorn, food, etc. If a user is looking for POIs where she can eat something and see a movie, she issues a query with keywords movie and food. We can know that this POI may meet the all requirements of the user; otherwise, she must visit a restaurant and a cinema, respectively. So, we can cut the length of route down by reducing the number of POIs in the route.
Existing approaches find the shortest route that is an optimal sequenced route, but these approaches result in a lack of flexibility in route planning and leave user without possibility of choice. We are proposing to the topk algorithm in order to provide more choices and satisfy users to the maximum. Besides, compared with keyword ordered query, keyword unordered query is more flexible, and we only need to consider the distance between POI and the current point under the premise of meeting user requirements. At the same time, unordered query can avoid the distant POI becoming the nearest neighbor because the query order of the keywords is no longer considered during the query process.
In this paper, solving the topk route search problem faces three challenges. The first challenge is the larger search space of the query. Because we consider the semantic relation of the queries and POIs, while not the string matching, then the number of candidate objects is larger than the existing approaches. It calls for effective methods to filter some candidates for avoiding exhaustive search. The second challenge is the strategies to extend the route. Many existing methods only consider the nearest neighbor of the current point, but the route generated by extending from their neighbor perhaps does not become the final optimal result. In this paper, we not only consider the distance between the current object and the neighbors, but also need to take the neighbor as the current object and consider its neighbor for which the distance is the smallest. It shows that the extending route by this method has higher probability to be the final optimal route. Here, we consider the semantic distance and spatial distance simultaneously. In order to efficiently compute the distance cost, we propose a method to use the 2hop labeling technique [9–12]. The third challenge is route refinement mechanism. The POIs in the final route found by our method may be redundant; that is to say, perhaps more than two POIs provide the same services in one route, since our algorithm is greedy approach. So, we need to propose a refinement mechanism to further enhance the route quality.
The main contributions of this paper can be summarized as follows:
(1) We introduce a semantic similarity to the route search query, which allows us to search for routes flexibly
(2) We propose the topk optimal route based on semantic perception (KORSP), which finds all preferred routes related to keyword with semantic perception
(3) We propose a method to find the th nearest neighbor based on semantic perception
(4) We use realworld POI datasets to test and prove the superiority of the algorithm.
The remainder of this paper is organized as follows. In Section 2, we briefly review the related work. In Section 3, we formally state the problem. In Section 4, we first introduce the KORSP algorithm and how to find the th nearest neighbor. The empirical performance study is presented in Section 5. Conclusion and future work are presented in Section 6.
2. Related Work
We review the related works in this section. Route planning is one of the hot topics on LBSs [13, 14]. The algorithms on destinationoriented route planning have been split into singledestination route planning and multidestination route planning. Among them, a number of algorithms belonging to singledestination route planning, such as Dijkstra [15] and [16], have been proposed to find the shortest route between two locations. Besides, an increasing number of approaches on multidestination route planning have been proposed [17–19], such as Traveling Salesman Problem (TSP) [19] and TSP with Neighborhood (TSPN) [17]. All of the above are destinationoriented route planning, but requirementoriented route planning is another kind of routing problem. Li [18] et al. proposed Trip Planning Query (TPQ) and proposed Nearest Neighbor Algorithm () and Minimum Distance Algorithm (). visits the nearest POI that belongs to the last visited POI and finds each “good” POI that belongs to each unvisited category and traverses these POIs in a nearest neighbor order. However, the planned results of these algorithms are not good since the routes found by these algorithms may be tortuous which means that they are full of twists, turns, or bends. Based on TPQ, Ahmadi and Nascimento [20] studied Sequenced Group Trip Planning Queries (SGTPQs) to find a sequence of POIs belonging to the specified categories and minimize the total distance travelled by all groups of users. Sharifzadeh [7] et al. proposed a related query problem named Optimal Sequenced Route (OSR) to retrieve the shortest route from a given source via several locations with different categories in a particular order. Based on OSR, Liu [21] et al. proposed topk optimal sequenced route (KOSR) to find the topk optimal routes from a given source to a given destination, which must visit a number of POIs with specific categories in a particular order. However, the above works considered that a POI only belongs to a category. In an urban area, a POI may not only belong to a category but also provide various services. A better routing approach should consider whether the provided services on the route satisfy user’s requests rather than the categories.
In addition, there is also a lot of research on POI. POI recommendation is one of hot topics and it can provide better POIs for route planning. Lei Tang [22] et al. proposed a personal POI recommendation method based on destination prediction. And Jianxin Li [23] proposed Personalized Influential Topic Search, or more succinctly PITSearch. The goal of PITSearch is to find how important topics and influential users might be better leverage to meet a specific user’s information need.
Route planning has attracted a lot of attention. So far, people still focus on user preferences or POI’s categories to extend their work. But in reality, the POI’s categories are not able to sufficiently represent services provided by POI. So, in order to better meet the needs of the user, it is necessary to consider the services provided by POI when designing the algorithm. So, in this paper, we designed a multirequest route planning algorithm considering the POI’s service.
3. Problem Statement
We formalize the KORSP problem in this section. Frequent notations are summarized in Table 2.
3.1. Some Definitions
In this section, we first define some terms used in this paper and then specify our research problem.
Definition 1 (graph). An undirected weighted graph consists of a set of edge weights that represent the distance between two POIs including a vertex set and an edge set . Weight function takes an edge as input and returns a nonnegative cost of the edge . For example, in Figure 2, we have . Note that the edge weights can be arbitrary and may not satisfy the triangle inequality. At the same time, it is also applicable to directed weighted graph and the edge weights can represent distance, time and so on.
Definition 2 (request). Request means a thing, a need, a requirement, or a service that the user wants. denotes the collection of requests.
Definition 3 (POI). POI, the abbreviation of point of interest, represents the specific location in the map. It includes two aspects information: spatial information and key words. It is defined as follows:where is the POI, is the POI’s spatial information, usually in the form of latitude and longitude of POI, and is the set of keywords.
It is worth mentioning that each POI may contain multiple keywords, and through these keywords it can get the basic information of the POI. For example, there is a POI named WanDa that contains keywords as movie, food, and so on. These keywords can describe the basic characteristics of the POI. And these keywords of POI can be defined as follows:where is the th keyword of POI and is the total number of POI’s keywords.
In addition, we need to consider the semantics of the keyword when querying the POI. For each keyword, we acquire its topics through the Latent Dirichlet Allocation (LDA) [24] and set up a collection to hold these topics and every topic’s probability. This can be defined as follows:where and are the set of topics and topic’s probability, respectively.
It is worth mentioning that we use the probabilistic topic model to transform the textual description into their semantic representations, and then we can use them to quantify the semantic correlation between textual descriptions. By applying a popular probabilistic topic model called LDA, we can obtain a topic distribution of each object to describe the semantic correlation between the object keywords and a limited set of potential topics. Given a query and an object, it is possible to measure their semantic similarity based on their topic distributions.
Definition 4 (keyword similarity). Given a query keyword and a POI’s keyword , the similarity is calculated by an arbitrary function such as the Wu and Palmer similarity or length [7, 15]. We assume the relations in the similarity as follows:where is the query keyword and is the POI’s keyword. If is relevant to and corresponding probability , we set ; otherwise, And when we calculate the value as in the following, and are topic probability distribution vectors representing the query keyword and the POI’s keyword, respectively:
For example, each tuple in Table 3 is a topic distribution over five topics. Considering a user who wants to watch a movie and she issues a keyword query with her current location and keyword cinema, we can get that and the value is larger than 0.5, so the semantic similarity is .
Definition 5 (PRQ and QRP). provides the set of POIs that is the services with which it can meet user’s querying keyword and provides the set of point’s services. Given a and a , we can get that and . Take Table 1 as an example: and .



Definition 6 (route). Route refers to a collection containing several POIs. POI in the collection has a certain order to form a route, so the route is defined as follows:where is the th POI in the route and is the number of POIs in the route.
And each route also has its keywords, because the route contains several POIs, so the route also contains the keywords of all POIs, which can be expressed as follows:
As you can see, the keyword of the route is a collection of all the POI’s keywords in the route.
Definition 7 (route cost). Given a set of user’s requests and a route , the cost of a route is the spatial distance through all the POIs in the entire route from a given source . We can denote the cost of route as follows:where is the spatial distance of the route, is the distance between the starting point and the first POI in route, and is the distance of the th POI to the th POI in the route.
Definition 8 (route average cost). Because one POI may include multiple services and requests, we should consider the number of services in route when extending a partial route. So, the best method is that, calculating the average cost that is the route cost divided by the number of services, we consider the partial route with the least average cost to extend. Given a route and a set of user’s requests , we denote the average cost as follows:
3.2. TwoHop Labeling Technique
The POI map data is stored on disk. To answer user queries rapidly with low access and speedup distant cost computation, we build index HI stored on disk.
Table 4 shows the HI for the POI map in Figure 2; for each vertex , 2hop labeling maintains a label HI(). In particular, HI() consists of a set of label entries in the form of , where is a vertex that is able to reach and .

We note that it is NPhard to construct 2hop tags at a minimum size while satisfying the coverage feature. Therefore, the existing methods [9–11, 25] are all heuristic and approximate the minimal 2hop labeling index. Alternatively, we can use the full pair shortest path algorithm to generate the index. Although it works, it requires an index size of , which is unacceptable for large graphs.
4. Proposed Solutions
In this section, we provide the effective method of solving KORSP. We described a route planning method based on semantic perception that satisfies multiple requests of user. The method proposes a dominating relationship on candidate routes to filter the candidate routes and thus reduces the search space. In addition, by combining an optimization technique, we can effectively find the th nearest neighbor of the current vertex.
4.1. KORSP Algorithm
We introduce the domination relationship: the socalled domination relationship is in the same starting point and end point, and the route with larger average cost dominates the small one, as shown in Figure 2, when the user demands service for when considering (2,250) and 2,220). In the case of the same destination, the numbers of services of route and provided are equal, but route has smaller distance cost, so the belongs to the dominating path, and belongs to the dominated path.
Definition 9 (domination). Given a user’s request and two partially explored candidate routes and , if and and holds, dominates , denoted as .
Lemma 10. Given a KORSP query and two partially explored routes and , if , then ,where and are the optimal feasible routes that are extended from and , respectively.
Proof. Suppose , , and ; since is the optimal feasible route extended from , must be the optimal route from to . Because , we have , and the services provided by route are the same; thus, can be represented by , and then and , and since , we have and the services are the same.
According to Lemma 10, before the optimal potential route expanded from their dominating route to be one of the topk optimal routes, there is no need to extend the routes that are dominated. Based on dominating relationship, we put forward KORSP method (Algorithm 1).
Input: Graph: G(V,E); Request: ; number of routes: K;  
Output: topk routes  
1 , initialize and ;  
2 Ψ←Ø;  
3 priority queue ;  
4 while R is not empty and do  
5 P=(,x)←R.extractMin();  
6  
7 if then  
8 Ψ←ΨU;  
9 for each i=1,…,q1 do  
10 if QRP()=QRP(.getValue())  
11 then  
12 P’=(,) ←.getValue().extractMin();  
13 R.insert(P’);  
14 .remove();  
15 else  
16 if QRP(p)=QRP () then  
17 .add;  
18 NN(,QPR(QQ’),1);  
19 R.insert((,1));  
20 else  
21 .add(,P);  
22 if q>0 then  
23 NN(,PRQ(RQRP()),x+1);  
24 R.insert((,x+1));  
25 return Ψ; 
To check relationship of domination and store dominated route, for each POI , we recommend two hash tables in the shape of (key, value) pairs. The first is for saving dominating route, where key is the number of services that meet the user’s requests, provided by the partially dominating route that has been extended from current POI and explored, and value is the route itself. Another one is for saving dominated routes, where key is the number of services which meet the request of user, provided by the partially explored dominated route that has been extended from , and value is the route itself, and the dominated routes are ordered according to their average costs in an ascending order. We also keep as a result set to save topk optimal routes and a global priority queue for partially explored routes sorted by their average costs in an ascending order. In addition, for each , we introduce an additional attributes to represent that is the th nearest neighbor of when generating . Initially, only the source with is added to the queue . Then, we begin a loop until is empty or topk optimal sorting route has been found.
Pruning Dominated Routes. At each iteration, the algorithm chooses the route with the minimum average cost to be checked. If it has completed all of the user’s requests, we will add it to Ψ and reconsider dominated routes (lines 514). Otherwise, we inspect if it is dominated or not. For a route to be examined, if is the first route with QRP() that reaches vertex , we add to of and extend it via ’s nearest neighbor (lines 1417). Otherwise, if its QRP() belongs to the of , it signifies that existing other route with QRP() and smaller average cost has been maintained and expanded to the , so that is dominated. According to Lemma 10, there is no need to extend anymore; therefore, we add it into of rather than the priority queue (lines 19). Then, we generate a new candidate route from . Because the th nearest neighbor of has generated in the previous iteration, we need to find the th nearest neighbor of by invoking algorithm FindNN and create candidate route with incremental and insert it into the priority queue (lines 2224).
Rethink Dominated Route. After finding the optimal route , we need to rethink the partial routes that has been explored and been dominated by subroutes of , because these routes are more likely to be extended to be another optimal route now. Therefore, for every POI in , if dominates the routes with QRP() in the of (line 10), we only consider the dominated route ’ with the least average cost and at the end of , because other routes at the end of are dominated by ’. This also accounts for why we use a priority queue as value in hash table . Since p’s nearest neighbor has been computed after it is dominated, we set its to “” (which means it makes no sense generating candidate route) and add it to the priority queue (lines 1013). Meanwhile, we remove from the of ; thus, the next candidate route can be extended (line 14).
Example 11 (consider Figure 2). Suppose the given query is . Table 5 shows the routes in the priority queue at each step. At step 1, route is added to the queue, and then it is extended via (’s nearest neighbor in C) that is the collection of POIs with the unfinished services in , and no candidate route can be generated. At step 2, is examined, it is extended via (’s nearest neighbor in C) that is the collection of POIs with the unfinished services and candidate route is generated via ’s nearest neighbor in C that is the collection of POIs with the unfinished services. And so on until the exit condition is met.

Finding the th Nearest Neighbor. Next, we interpreted how to find the th nearest neighbor, the core operation in KORSP.
Definition 12 (neighbor distance). Because the POI keywords and the query keywords have a semantic difference, we are not able to choose the nearest neighbor according to the actual distance. And then, we calculate the neighbor distance by combining the semantic difference with the actual distance. And we choose the nearest neighbor according to the neighbor distance. We measure the neighbor distance bywhere is the current vertex, ’ is the possible nearest neighbor, and is the number of that is not equal to 0. Besides, is keyword of ’ and is the query keyword. Now, given a route , when we extend to , we can estimate the cost of as follows:where is the current node and represents one of the neighbors of . For example, in Figure 2, we can know that , , and , so .
A straightforward way to find the th nearest neighbor of vertex in collection of POIs with services in is by using 2hop labeling technique rather than Dijkstra’s search, since FindNN is frequently invoked. Frequent Dijkstra searches on large graphs are practically inefficient. When the number of unfinished services is greater than 1, we do the following steps (lines 37). We start from and extend vertices via the equation (line 6) that calculates the average distance between two points, and each vertex’s average distance with the current vertex is stored in the ascending sorting queue N (line 7). When the number of unfinished services is less than 1, we perform the following steps (lines 911). We directly consider the actual distance between the current point and the nearest possible neighbor as the average distance (line 10) and store it in the queue N (line 11). Finally, we output the corresponding vertex as needed (lines 1314).
4.2. Approximate KORSP
4.2.1. Domination Conditions
This is different from KORSP. The dominating relationship changes such that it does not require the partial routes that have been queried to provide the same services. Next, we will introduce the novel dominating relationship in detail.
We reconsider the dominating relationship. The original requirements are too strict. First, some partial explored routes should have the same end; second, they should provide the same number of the services required by user. For example, in Figure 2, (2,250) and (3,250) have the same destination, and the numbers of services of route and provided are not equal; according to the original definition, they do not satisfy the dominating relationship. Now, we relax this restriction. The number of services of is 2, which is one less than . If reaches the same number of services of , the number is 3, and it should add an edge. We assume the cost of the new added edge is ave_weight, that is the average edge weight of all the edges in the graph; as shown in the following, is the number of edges and is the th edge’s weight:
Now, after adding, we can find that route has smaller distance cost, so the is the dominating route, and is dominated.
Definition 13 (optimize domination). Given a user’s request and two partially explored candidate routes and , if and the service number of is less than or equal to , and holds, dominates , denoted as , where is the difference value of the number of services of two routes.
For example, if have two services and have four services, in order to achieve the same number of services of , we will add two average edge weights to as the route’s estimated cost. After that, we decide the dominating relationship.
Lemma 14. Given a query and two partially explored routes R_{1} and R_{2}, if , then , where and are the optimal feasible routes that are extended from and , respectively.
Proof. Suppose , , , and . and are the optimal route from R_{1} and R_{2}, respectively, where and have the same service and . According to Algorithm 2, and , so . Because of and , we can know that .
Input: Vertex , collection of POIs with service in C, integer .  
Output: The th nearest neighbor of in C.  
1  
2 ;  
3 if then  
4 for in C  
5  
6 ;  
7 N.add(v′,average);  
8 else  
9 for v′ in C  
10 average =  
11 N.add(v′,average);  
12 N.ascending(average);  
13 if then  
14 return ; 
4.2.2. Priority Query
We introduce the priority of query in this section. By analyzing the travel routes of most users, we find that some services inherently have a higher priority than others. Next, we keep these services and their priorities in the dictionary . For example, a large amount of data shows that users first go to the bank to withdraw money and then spend money, so the priority of withdrawing money is higher than that of consumption. If the services do not have the priority relationship, they are considered the same. By classifying the priority, the services with higher priority are queried firstly, and the services with the same priority are queried randomly. We call this kind of query partial ordered query based on priority (POQP). In order to facilitate the classification of user request priority, we first plan the priority of services with obvious priority relationship in offline work.
At each iteration, the algorithm classifies the priority of user’s requests and queries the highpriority requests firstly. Algorithm 3 assigns priority for query keywords and stores the result in a specific set named pre_set (lines 1). Initially, is equal to and we check every keyword in the Q. Some keywords have a prior relationship. If a keyword’s priority level is the highest in these keywords, we define that the keyword’s priority level is 1; otherwise, the priority is 0 (lines 210). For other keywords, they have no prior relationship, so they are independent. These keywords have no effect on other keywords, so we also define that their priority level is 1 (line 12). After that, we add the keyword to the result set and assign the result set to pre_set (lines 1417). After the initial step, the algorithm finds the service with lower priority through the last finished service and stores the service into the result set (lines 1928). Finally, the algorithm outputs the result set (line 29). Algorithm 3 is used in Algorithm 1. When the algorithm queries the nearest neighbor, Algorithm 3 can reduce the number of the candidate POIs.
Input: Request: ; : the number of services;  
Priority dictionary: D(keyword, priority); initialize S=Q;  
Output: priority set  
1 n=1, , , t=0, , ;  
2 if ==N then  
3 for in Q do  
4 if (Q)∩. then  
5 HQ= (Q)∩.D  
6 t=max(HQ.priority)  
7 if .priority then  
8 .priority=1  
9 else  
10 .priority=0  
11 else  
12 .priority=1  
13 QH.add(,.priority)  
14 for in QH do  
15 if .priority==1 then  
16 set.add()  
17 pre_set←set;  
18 ;  
19 else  
20 r’=pre_setS;  
21 HQ=(Q r’)∩r’.D;  
22 if then  
23 t=max(HQ.priority)  
24 for r in HQ do  
25 if r.priority==t then  
26 S.add(r)  
27 pre_set←S;  
28 set←pre_set;  
29 return set; 
By classifying the queries according to their priority, we can avoid the possibility that the route formed is inconsistent with the actual situation, and at the same time we can reduce the number of candidate POIs.
4.3. Route Replacement
We proposed a routing optimization mechanism, namely, route replacement, to check whether the routing cost can be shorter.
After obtaining the final route produced by Algorithm 1, we propose a postprocessing mechanism named route replacement to refine it. As shown in Figure 2, the route from to must go through , and we can find that the services provided by are the same as those provided by , and we can know that is smaller than the sum of and . So, we can use to replace . Finally, the shortest route is R’ . Obviously, the routing cost of R’ is smaller than that of . When we refine the route, we should store these points which satisfies the following requirements: (1) those that have not been searched, (2) those that must be passed by the refined route, and (3) those that provide the same services as the existing points in the refined route. In the refining process, we justify whether to use these points to replace the existing point, which can reduce the route cost. We should choose the replacing point with more services that satisfies the user’s requests, so it can not only reduce the cost but also make the route more concise by reducing the point in the route at the same time. We can understand the process of route replacement in detail through Algorithm 4.
Input: route: R=; query: Q  
Output: route: R’  
1 ,  
2 for in R and do  
3 V.add(between(,));  
4 for in V do  
5 if QRP() then  
6 for in R do  
7 if QRP()∩Q=QRP()∩Q and cost()< cost()+cost() then  
8 R’=R.del();  
9 R’=R’.add(v);  
10 else continue;  
11 else continue;  
12 return R’; 
As shown in Algorithm 4, the pseudocode has described the process of route replacement. Firstly, we define the notion that V is for storing POIs (line 1). These POIs are going to be used to replace the POI in the route. For each POI in the route R, we look for the POIs between and . Then, we add these POIs to the V (lines 23). Next, we check each POI in the V and seek out the POI that is able to meet the user’s query (line 45). We look for a POI in the route. If and are able to provide the same services, we compare the cost between and . If is less than , is able to be replaced by (lines 610).
5. Experimental Evaluation
5.1. Experimental Setup
5.1.1. Datasets
We use two realworld datasets from Zeng [26]. Singapore represents Foursquare checkin data collected in Singapore, and Austin represents Gowalla checkin data collected in Austin. Singapore has 189,306 checkin points, 5,412 locations, and 2,321 users. Austin has 201,525 checkins, 6,176 locations, and 4,630 users. The same as suggested [26, 27], we built an edge between two locations if they were visited on the same date by the same user. The locations not connected by edges were ignored. We filled in the edge costs by querying the traveling time in minute using Google Maps API under driving mode. The statistic information of the dataset is shown in Table 6.

Both datasets were used in [26], which also studied a route planning problem. The datasets are not small considering the scenario for a daily trip in a city where the user has a limited cost budget. Even with 150 POIs to choose from, the number of possible routes consisting of 5 POIs can reach 70 billion. Compared to our work, Jeffrey [28] evaluated its itinerary recommendation methods using theme park data, where each park contains only 20 to 30 attractions.
5.1.2. Algorithms
We compared the following algorithms. PACER [8] models the personalized diversity requirement by retrieving POIs indexes related to feature space and route space, as well as various strategies of pruning search space with user preferences and constraints, and the optimal solution of the topk path search problem is given. PruningKOSR [21] uses dominance relationship to filter temporarily unnecessary routes. KORSP is our proposed optimal algorithm.
5.1.3. Queries
For each KORSP query , we randomly select a source and an integer , and then we issue query on all the graphs. In each experiment, 50 random query instances were constructed and the average query time was reported. If the query cannot be stopped within 4200 seconds or fails due to a memory overflow exception, we represent its corresponding query time as INF.
5.1.4. Evaluation Criteria
We evaluate the performance of different methods in four different aspects: the query runtime, the number of examined routes (witnesses), the number of (next) nearest neighbor (shortened as kNN) queries executed by calling Algorithm FindNN, and the cost of the routes.
5.2. Experimental Results
We first evaluate the efficiency of different algorithms answering KOR query in the default parameter setting on two real graphs and then evaluate the impact of parameters on the results.
5.2.1. Overall Performance under Default Parameter Settings
Figures 3–6 show the performance of three different algorithms on two graphs. The runtime of the algorithms on different graphs is displayed in Figure 3. Since all the algorithms have reduced the searching space, these can return the results on all graphs. At the same time, all the algorithms express efficient queries by using 2hop label index. Figures 4 and 5 show the number of examined routes and NN queries, respectively. We can find that the number of examined routes in KORSP is much fewer than PruningKOSR on all graphs and the number of NN queries is larger than other algorithms. From this phenomenon, we can know the importance of a rich candidate. Because of semantic matching, KORSP has more candidate POIs and it can complete the route query with examining fewer routes. The consequence means KORSP is better than PruningKOSR. Figure 6 shows the cost of routes. As it is shown, the cost of routes of KORSP is much smaller than other algorithms, because this method has more candidate POIs that contains some POIs of which distance is shorter by semantic matching.
(a) methods on Singapore
(b) methods on Austin
(a) methods on Singapore
(b) methods on Austin
(a) methods on Singapore
(b) methods on Austin
(a) methods on Singapore
(b) methods on Austin
5.2.2. Effect
Figures 7–10 show the influence of parameter on the runtime of the three different methods on the two graphs. As shown in Figure 7, we know that all three methods can complete the function of query within the specified time, and with the increase of , the advantages of KORSP are more and more obvious. Figures 8 and 9 show the impact of the number of routes and NN queries checked in different methods on different graphs. We can find that there are far fewer routes and NN queries checked in KORSP. Compared with other algorithms, this algorithm has significant advantages under different k conditions. Figure 10 shows the influence of k on routing cost. It can be seen from the figure that, due to semantic matching, routing cost of KORSP at different k is the lowest.
(a) Effect of on Singapore
(b) Effect of on Austin
(a) Effect of on Singapore
(b) Effect of on Austin
(a) Effect of on Singapore
(b) Effect of on Austin
(a) Effect of on Singapore
(b) Effect of on Austin
5.2.3. Effect
Figure 11 shows the effect of parameter . We can find that different in KORSP algorithm has a great influence. The smaller the value of is, the less the route cost is, because there are more candidate POIs with decreasing the value, and then we can find more and more nearer neighbors to extend the route.
(a) running time
(b) examined route
(c) NN query
(d) route cost
Figure 12 shows the difference among the four different KORSP algorithms. In the figure, the KORSP is the basic algorithm, the PKORSP is the KORSP combining with the priority relationship, the OKORSP is the KORSP combining with optimize domination, and RKORSP is KORSP combining with the route replacement. Form Figure 12(a), we can find that the PKORSP has the best performance. We know that the priority relationship helps us reducing the size of query. And the route replacement only refines the result of KORSP, so RKORSP’s running time is longer than KORSP. From Figure 12(b), we can find that the OKORSP has the best performance. At the same time, other algorithms do not make much difference. By comparing other algorithms, we think that the optimized domination has a good effect.
(a) running time
(b) examined route
6. Conclusion
In this paper, we study the topk optimal sequenced routes problem. We propose an efficient algorithm called KORSP, based on a novel route dominate relationship and a semantic matching by using the LDA model. Extensive experiments on realworld graphs demonstrate that the proposed algorithms are efficient. KORSP algorithm improves the flexibility of POI query and provides rich candidate sets for POI query by keyword semantic matching. And KORSP algorithm can quickly find the th nearest neighbor of the current POI by using FindNN algorithm and reduce the route search space by dominating relation. In addition, the algorithm uses route refinement mechanisms to improve route quality.
As a future work, we plan to study the keyword unordered query which is disordered for the whole, but it is order for part of keywords that have causality.
Data Availability
The graph data used to support the findings of this study are from [8, 26], and the datasets are available at https://github.com/LazyAir/SIGIR18.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
Acknowledgments
This research is partially funded by National Natural Science Foundation of China, under Grant no. 61602102 and no. 61872069, and the Fundamental Research Funds for the Central Universities, under Grant no. N161704004.
References
 S. H. Fang, E. H. Lu, and V. S. Tseng, “Trip recommendation with multiple user constraints by integrating pointofinterests and travel packages,” in Proceedings of the 2014 15th IEEE International Conference on Mobile Data Management (MDM), pp. 33–42, 2014. View at: Publisher Site  Google Scholar
 E. H. Lu, C. Lin, and V. S. Tseng, “TripMine: an efficient trip planning approach with travel time constraints,” in Proceedings of the 2011 12th IEEE International Conference on Mobile Data Management (MDM), pp. 152–161, Lulea, Sweden, June 2011. View at: Publisher Site  Google Scholar
 J. Dai, C. Liu, J. Xu, and Z. Ding, “On personalized and sequenced route planning,” World Wide Web, vol. 19, no. 4, pp. 679–705, 2016. View at: Publisher Site  Google Scholar
 F. Li, D. Cheng, M. Hadjieleftheriou, G. Kollios, and S.H. Teng, “On trip planning queries in spatial databases,” in Proceedings of the 9th International Symposium on Spatial and Temporal Databases, SSTD 2005, pp. 273–290, Brazil, August 2005. View at: Google Scholar
 J. Eisner and S. Funke, “Sequenced route queries: getting things done on the way back home,” in Proceedings of the 20th ACM Sigspatial International Conference on Advances in Geographic Information Systems, pp. 502–505, USA, 2012. View at: Google Scholar
 Y. Ohsawa, H. Htoo, N. Sonehara, and M. Sakauchi, “Sequenced route query in road network distance based on incremental Euclidean restriction,” Database and Expert Systems Applications, vol. 7446, no. 1, pp. 484–491, 2012. View at: Google Scholar
 M. Sharifzadeh, M. Kolahdouzan, and C. Shahabi, “The optimal sequenced route query,” The VLDB Journal, vol. 17, no. 4, pp. 765–787, 2008. View at: Publisher Site  Google Scholar
 H. Liang and K. Wang, “Topk route search through submodularity modeling of recurrent POI features,” in Proceedings of the 41st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2018, pp. 545–554, USA, July 2018. View at: Google Scholar
 I. Abraham, D. Delling, A. V. Goldberg, and R. F. Werneck, “Hierarchical hub labelings for shortest paths,” Algorithms{ESA}, vol. 7501, pp. 24–35, 2012. View at: Publisher Site  Google Scholar  MathSciNet
 T. Akiba, Y. Iwata, and Y. Yoshida, “Fast exact shortestpath distance queries on large networks by pruned landmark labeling,” in Proceedings of the 2013 ACM SIGMOD Conference on Management of Data, pp. 349–360, USA, 2013. View at: Google Scholar
 E. Cohen, E. Halperin, H. Kaplan, and U. Zwick, “Reachability and distance queries via 2hop labels,” Siam Journal on Computing, vol. 32, no. 5, pp. 937–946, 20032. View at: Publisher Site  Google Scholar  MathSciNet
 R. Bramandia, B. Choi, and W. K. Ng, “On incremental maintenance of 2hop labeling of large graphs,” IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 5, pp. 682–698, 2010. View at: Publisher Site  Google Scholar
 J. J. Ying, W. Kuo, V. S. Tseng, and E. H. Lu, “Mining user checkin behavior with a random walk for urban pointofinterest recommendations,” ACM Transactions on Intelligent Systems and Technology, vol. 5, no. 3, pp. 1–27, 2014. View at: Publisher Site  Google Scholar
 E. H.C. Lu, W.C. Lee, and V. S.M. Tseng, “A framework for personal mobile commerce pattern mining and prediction,” IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 5, pp. 769–782, 2012. View at: Publisher Site  Google Scholar
 E. W. Dijkstra, “A note on two problems in connexion with graphs,” Numerische Mathematik, vol. 1, pp. 269–271, 1959. View at: Publisher Site  Google Scholar  MathSciNet
 P. E. Hart, N. J. Nilsson, and B. Raphael, “A formal basis for the heuristic determination of minimum cost paths,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 4, no. 2, pp. 100–107, 1968. View at: Publisher Site  Google Scholar
 E. M. Arkin and R. Hassin, “Approximation algorithms for the geometric covering salesman problem,” Discrete Applied Mathematics: The Journal of Combinatorial Algorithms, Informatics and Computational Sciences, vol. 55, no. 3, pp. 197–218, 1994. View at: Publisher Site  Google Scholar  MathSciNet
 F. Li, D. Cheng, M. Hadjieleftheriou, G. Kollios, and S. Teng, “On trip planning queries in spatial databases,” in Proceedings of the International Symposium on Spatial and Temporal Databases, vol. 31 of Lecture Notes in Computer Science, no.1, pp. 273–290, Springer, Boston, MA, USA, 2005. View at: Publisher Site  Google Scholar
 K. Menger, “Ergebnisse eines mathematischen Kolloquiums,” Monatshefte Fur Mathematik  Monatsh Math, vol. 39, no. 1, 1932. View at: Publisher Site  Google Scholar
 E. Ahmadi and M. A. Nascimento, “A mixed breadthdepth first search strategy for sequenced group trip planning queries,” in Proceedings of the 16th IEEE International Conference on Mobile Data Management, pp. 24–33, USA, 2015. View at: Google Scholar
 H. Liu, C. Jin, B. Yang, and A. Zhou, “Finding topk optimal sequenced routes,” International Council for Open and Distance Education, pp. 569–580, 2018. View at: Publisher Site  Google Scholar
 L. Tang, D. Cai, Z. Duan, J.