Analysis and Applications of Location-Aware Big Complex Network DataView this Special Issue
Semantic-Aware Top-k Multirequest Optimal Route
In recent years, research on location-based services has received a lot of interest, in both industry and academic aspects, due to a wide range of potential applications. Among them, one of the active topic areas is the route planning on a point-of-interest (POI) network. We study the top-k optimal routes querying on large, general graphs where the edge weights may not satisfy the triangle inequality. The query strives to find the top-k optimal routes from a given source, which must visit a number of vertices with all the services that the user needs. Existing POI query methods mainly focus on the textual similarities and ignore the semantic understanding of keywords in spatial objects and queries. To address this problem, this paper studies the semantic similarity of POI keyword searching in the route. Another problem is that most of the previous studies consider that a POI belongs to a category, and they do not consider that a POI may provide various kinds of services even in the same category. So, we propose a novel top-k optimal route planning algorithm based on semantic perception (KOR-SP). In KOR-SP, we define a dominance relationship between two partially explored routes which leads to a smaller searching space and consider the semantic similarity of keywords and the number of single POI’s services. We use an efficient label indexing technique for the shortest path queries to further improve efficiency. Finally, we perform an extensive experimental evaluation on multiple real-world graphs to demonstrate that the proposed methods deliver excellent performance.
In recent years, the rapid advancements of wireless communication techniques, Global Positioning System (GPS), and smart mobile devices have enabled a lot of Location-based Services (LBS). Among them, one of the popular issues is the path/route planning in a point-of-interest (POI) network [1, 2]. The users of the LBS often want to find short routes that pass through multiple POIs; consequently, developing trip planning queries that can find the shortest routes that passed through user-specified categories has attracted considerable attention [3, 4]. While the problem of computing the optimal route has been extensively studied and many efficient techniques have been developed over the past several decades, most of the past studies on route planning focused on origin-destination route planning and did not consider the user’s specific requirements.
Recently, there are some approaches that find the route by using the user’s queries. However, the approaches may find a longer route than the one that meets the user’s actual demands, because the query keywords only show the meaning of user’s query rather than requiring the conformance in shape. A major problem with the existing approaches is that they only output routes that perfectly match the given categories [5–8]. Take Figure 1 as an example: each object can be viewed as a POI that has a spatial location and additional keywords. Considering a user who wants to watch a film and she issues a keyword query with her current location and keyword film, if we apply the traditional spatial keyword query method, is returned as it contains the query keyword. However, we can find that also meets the user’s requirement as we know that the user only wants to watch a film. To overcome this problem, we introduce flexible semantic matching based on POI keywords to find shorter routes in a flexible manner. In addition, each POI contains a lot of keywords and provides multiple services. A POI may cover multiple keywords in query; otherwise, each POI only corresponds to one of the keywords in query. For example, a POI, called WanDa Plaza, has a lot of keywords as cinema, popcorn, food, etc. If a user is looking for POIs where she can eat something and see a movie, she issues a query with keywords movie and food. We can know that this POI may meet the all requirements of the user; otherwise, she must visit a restaurant and a cinema, respectively. So, we can cut the length of route down by reducing the number of POIs in the route.
Existing approaches find the shortest route that is an optimal sequenced route, but these approaches result in a lack of flexibility in route planning and leave user without possibility of choice. We are proposing to the top-k algorithm in order to provide more choices and satisfy users to the maximum. Besides, compared with keyword ordered query, keyword unordered query is more flexible, and we only need to consider the distance between POI and the current point under the premise of meeting user requirements. At the same time, unordered query can avoid the distant POI becoming the nearest neighbor because the query order of the keywords is no longer considered during the query process.
In this paper, solving the top-k route search problem faces three challenges. The first challenge is the larger search space of the query. Because we consider the semantic relation of the queries and POIs, while not the string matching, then the number of candidate objects is larger than the existing approaches. It calls for effective methods to filter some candidates for avoiding exhaustive search. The second challenge is the strategies to extend the route. Many existing methods only consider the nearest neighbor of the current point, but the route generated by extending from their neighbor perhaps does not become the final optimal result. In this paper, we not only consider the distance between the current object and the neighbors, but also need to take the neighbor as the current object and consider its neighbor for which the distance is the smallest. It shows that the extending route by this method has higher probability to be the final optimal route. Here, we consider the semantic distance and spatial distance simultaneously. In order to efficiently compute the distance cost, we propose a method to use the 2-hop labeling technique [9–12]. The third challenge is route refinement mechanism. The POIs in the final route found by our method may be redundant; that is to say, perhaps more than two POIs provide the same services in one route, since our algorithm is greedy approach. So, we need to propose a refinement mechanism to further enhance the route quality.
The main contributions of this paper can be summarized as follows:
(1) We introduce a semantic similarity to the route search query, which allows us to search for routes flexibly
(2) We propose the top-k optimal route based on semantic perception (KOR-SP), which finds all preferred routes related to keyword with semantic perception
(3) We propose a method to find the -th nearest neighbor based on semantic perception
(4) We use real-world POI datasets to test and prove the superiority of the algorithm.
The remainder of this paper is organized as follows. In Section 2, we briefly review the related work. In Section 3, we formally state the problem. In Section 4, we first introduce the KOR-SP algorithm and how to find the -th nearest neighbor. The empirical performance study is presented in Section 5. Conclusion and future work are presented in Section 6.
2. Related Work
We review the related works in this section. Route planning is one of the hot topics on LBSs [13, 14]. The algorithms on destination-oriented route planning have been split into single-destination route planning and multidestination route planning. Among them, a number of algorithms belonging to single-destination route planning, such as Dijkstra  and , have been proposed to find the shortest route between two locations. Besides, an increasing number of approaches on multidestination route planning have been proposed [17–19], such as Traveling Salesman Problem (TSP)  and TSP with Neighborhood (TSPN) . All of the above are destination-oriented route planning, but requirement-oriented route planning is another kind of routing problem. Li  et al. proposed Trip Planning Query (TPQ) and proposed Nearest Neighbor Algorithm () and Minimum Distance Algorithm (). visits the nearest POI that belongs to the last visited POI and finds each “good” POI that belongs to each unvisited category and traverses these POIs in a nearest neighbor order. However, the planned results of these algorithms are not good since the routes found by these algorithms may be tortuous which means that they are full of twists, turns, or bends. Based on TPQ, Ahmadi and Nascimento  studied Sequenced Group Trip Planning Queries (SGTPQs) to find a sequence of POIs belonging to the specified categories and minimize the total distance travelled by all groups of users. Sharifzadeh  et al. proposed a related query problem named Optimal Sequenced Route (OSR) to retrieve the shortest route from a given source via several locations with different categories in a particular order. Based on OSR, Liu  et al. proposed top-k optimal sequenced route (KOSR) to find the top-k optimal routes from a given source to a given destination, which must visit a number of POIs with specific categories in a particular order. However, the above works considered that a POI only belongs to a category. In an urban area, a POI may not only belong to a category but also provide various services. A better routing approach should consider whether the provided services on the route satisfy user’s requests rather than the categories.
In addition, there is also a lot of research on POI. POI recommendation is one of hot topics and it can provide better POIs for route planning. Lei Tang  et al. proposed a personal POI recommendation method based on destination prediction. And Jianxin Li  proposed Personalized Influential Topic Search, or more succinctly PIT-Search. The goal of PIT-Search is to find how important topics and influential users might be better leverage to meet a specific user’s information need.
Route planning has attracted a lot of attention. So far, people still focus on user preferences or POI’s categories to extend their work. But in reality, the POI’s categories are not able to sufficiently represent services provided by POI. So, in order to better meet the needs of the user, it is necessary to consider the services provided by POI when designing the algorithm. So, in this paper, we designed a multirequest route planning algorithm considering the POI’s service.
3. Problem Statement
We formalize the KOR-SP problem in this section. Frequent notations are summarized in Table 2.
3.1. Some Definitions
In this section, we first define some terms used in this paper and then specify our research problem.
Definition 1 (graph). An undirected weighted graph consists of a set of edge weights that represent the distance between two POIs including a vertex set and an edge set . Weight function takes an edge as input and returns a nonnegative cost of the edge . For example, in Figure 2, we have . Note that the edge weights can be arbitrary and may not satisfy the triangle inequality. At the same time, it is also applicable to directed weighted graph and the edge weights can represent distance, time and so on.
Definition 2 (request). Request means a thing, a need, a requirement, or a service that the user wants. denotes the collection of requests.
Definition 3 (POI). POI, the abbreviation of point of interest, represents the specific location in the map. It includes two aspects information: spatial information and key words. It is defined as follows:where is the POI, is the POI’s spatial information, usually in the form of latitude and longitude of POI, and is the set of keywords.
It is worth mentioning that each POI may contain multiple keywords, and through these keywords it can get the basic information of the POI. For example, there is a POI named WanDa that contains keywords as movie, food, and so on. These keywords can describe the basic characteristics of the POI. And these keywords of POI can be defined as follows:where is the -th keyword of POI and is the total number of POI’s keywords.
In addition, we need to consider the semantics of the keyword when querying the POI. For each keyword, we acquire its topics through the Latent Dirichlet Allocation (LDA)  and set up a collection to hold these topics and every topic’s probability. This can be defined as follows:where and are the set of topics and topic’s probability, respectively.
It is worth mentioning that we use the probabilistic topic model to transform the textual description into their semantic representations, and then we can use them to quantify the semantic correlation between textual descriptions. By applying a popular probabilistic topic model called LDA, we can obtain a topic distribution of each object to describe the semantic correlation between the object keywords and a limited set of potential topics. Given a query and an object, it is possible to measure their semantic similarity based on their topic distributions.
Definition 4 (keyword similarity). Given a query keyword and a POI’s keyword , the similarity is calculated by an arbitrary function such as the Wu and Palmer similarity or length [7, 15]. We assume the relations in the similarity as follows:where is the query keyword and is the POI’s keyword. If is relevant to and corresponding probability , we set ; otherwise, And when we calculate the value as in the following, and are topic probability distribution vectors representing the query keyword and the POI’s keyword, respectively:
For example, each tuple in Table 3 is a topic distribution over five topics. Considering a user who wants to watch a movie and she issues a keyword query with her current location and keyword cinema, we can get that and the value is larger than 0.5, so the semantic similarity is .
Definition 5 (PRQ and QRP). provides the set of POIs that is the services with which it can meet user’s querying keyword and provides the set of point’s services. Given a and a , we can get that and . Take Table 1 as an example: and .
Definition 6 (route). Route refers to a collection containing several POIs. POI in the collection has a certain order to form a route, so the route is defined as follows:where is the -th POI in the route and is the number of POIs in the route.
And each route also has its keywords, because the route contains several POIs, so the route also contains the keywords of all POIs, which can be expressed as follows:
As you can see, the keyword of the route is a collection of all the POI’s keywords in the route.
Definition 7 (route cost). Given a set of user’s requests and a route , the cost of a route is the spatial distance through all the POIs in the entire route from a given source . We can denote the cost of route as follows:where is the spatial distance of the route, is the distance between the starting point and the first POI in route, and is the distance of the -th POI to the -th POI in the route.
Definition 8 (route average cost). Because one POI may include multiple services and requests, we should consider the number of services in route when extending a partial route. So, the best method is that, calculating the average cost that is the route cost divided by the number of services, we consider the partial route with the least average cost to extend. Given a route and a set of user’s requests , we denote the average cost as follows:
3.2. Two-Hop Labeling Technique
The POI map data is stored on disk. To answer user queries rapidly with low access and speed-up distant cost computation, we build index HI stored on disk.
Table 4 shows the HI for the POI map in Figure 2; for each vertex , 2-hop labeling maintains a label HI(). In particular, HI() consists of a set of label entries in the form of , where is a vertex that is able to reach and .
We note that it is NP-hard to construct 2-hop tags at a minimum size while satisfying the coverage feature. Therefore, the existing methods [9–11, 25] are all heuristic and approximate the minimal 2-hop labeling index. Alternatively, we can use the full pair shortest path algorithm to generate the index. Although it works, it requires an index size of , which is unacceptable for large graphs.
4. Proposed Solutions
In this section, we provide the effective method of solving KOR-SP. We described a route planning method based on semantic perception that satisfies multiple requests of user. The method proposes a dominating relationship on candidate routes to filter the candidate routes and thus reduces the search space. In addition, by combining an optimization technique, we can effectively find the -th nearest neighbor of the current vertex.
4.1. KOR-SP Algorithm
We introduce the domination relationship: the so-called domination relationship is in the same starting point and end point, and the route with larger average cost dominates the small one, as shown in Figure 2, when the user demands service for when considering (2,250) and 2,220). In the case of the same destination, the numbers of services of route and provided are equal, but route has smaller distance cost, so the belongs to the dominating path, and belongs to the dominated path.
Definition 9 (domination). Given a user’s request and two partially explored candidate routes and , if and and holds, dominates , denoted as .
Lemma 10. Given a KOR-SP query and two partially explored routes and , if , then ,where and are the optimal feasible routes that are extended from and , respectively.
Proof. Suppose , , and ; since is the optimal feasible route extended from , must be the optimal route from to . Because , we have , and the services provided by route are the same; thus, can be represented by , and then and , and since , we have and the services are the same.
According to Lemma 10, before the optimal potential route expanded from their dominating route to be one of the top-k optimal routes, there is no need to extend the routes that are dominated. Based on dominating relationship, we put forward KOR-SP method (Algorithm 1).
|Input: Graph: G(V,E); Request: ; number of routes: K;|
|Output: top-k routes|
|1 , initialize and ;|
|3 priority queue ;|
|4 while R is not empty and do|
|7 if then|
|9 for each i=1,…,q-1 do|
|10 if QRP()=QRP(.getValue())|
|12 P’=(,-) ←.getValue().extractMin();|
|16 if QRP(p)=QRP () then|
|22 if q>0 then|
|25 return Ψ;|
To check relationship of domination and store dominated route, for each POI , we recommend two hash tables in the shape of (key, value) pairs. The first is for saving dominating route, where key is the number of services that meet the user’s requests, provided by the partially dominating route that has been extended from current POI and explored, and value is the route itself. Another one is for saving dominated routes, where key is the number of services which meet the request of user, provided by the partially explored dominated route that has been extended from , and value is the route itself, and the dominated routes are ordered according to their average costs in an ascending order. We also keep as a result set to save top-k optimal routes and a global priority queue for partially explored routes sorted by their average costs in an ascending order. In addition, for each , we introduce an additional attributes to represent that is the -th nearest neighbor of when generating . Initially, only the source with is added to the queue . Then, we begin a loop until is empty or top-k optimal sorting route has been found.
Pruning Dominated Routes. At each iteration, the algorithm chooses the route with the minimum average cost to be checked. If it has completed all of the user’s requests, we will add it to Ψ and reconsider dominated routes (lines 5-14). Otherwise, we inspect if it is dominated or not. For a route to be examined, if is the first route with QRP() that reaches vertex , we add to of and extend it via ’s nearest neighbor (lines 14-17). Otherwise, if its QRP() belongs to the of , it signifies that existing other route with QRP() and smaller average cost has been maintained and expanded to the , so that is dominated. According to Lemma 10, there is no need to extend anymore; therefore, we add it into of rather than the priority queue (lines 19). Then, we generate a new candidate route from . Because the -th nearest neighbor of has generated in the previous iteration, we need to find the -th nearest neighbor of by invoking algorithm FindNN and create candidate route with incremental and insert it into the priority queue (lines 22-24).
Rethink Dominated Route. After finding the optimal route , we need to rethink the partial routes that has been explored and been dominated by subroutes of , because these routes are more likely to be extended to be another optimal route now. Therefore, for every POI in , if dominates the routes with QRP() in the of (line 10), we only consider the dominated route ’ with the least average cost and at the end of , because other routes at the end of are dominated by ’. This also accounts for why we use a priority queue as value in hash table . Since p’s nearest neighbor has been computed after it is dominated, we set its to “-” (which means it makes no sense generating candidate route) and add it to the priority queue (lines 10-13). Meanwhile, we remove from the of ; thus, the next candidate route can be extended (line 14).
Example 11 (consider Figure 2). Suppose the given query is . Table 5 shows the routes in the priority queue at each step. At step 1, route is added to the queue, and then it is extended via (’s nearest neighbor in C) that is the collection of POIs with the unfinished services in , and no candidate route can be generated. At step 2, is examined, it is extended via (’s nearest neighbor in C) that is the collection of POIs with the unfinished services and candidate route is generated via ’s nearest neighbor in C that is the collection of POIs with the unfinished services. And so on until the exit condition is met.
Finding the -th Nearest Neighbor. Next, we interpreted how to find the -th nearest neighbor, the core operation in KOR-SP.
Definition 12 (neighbor distance). Because the POI keywords and the query keywords have a semantic difference, we are not able to choose the nearest neighbor according to the actual distance. And then, we calculate the neighbor distance by combining the semantic difference with the actual distance. And we choose the nearest neighbor according to the neighbor distance. We measure the neighbor distance bywhere is the current vertex, ’ is the possible nearest neighbor, and is the number of that is not equal to 0. Besides, is keyword of ’ and is the query keyword. Now, given a route , when we extend to , we can estimate the cost of as follows:where is the current node and represents one of the neighbors of . For example, in Figure 2, we can know that , , and , so .
A straightforward way to find the -th nearest neighbor of vertex in collection of POIs with services in is by using 2-hop labeling technique rather than Dijkstra’s search, since FindNN is frequently invoked. Frequent Dijkstra searches on large graphs are practically inefficient. When the number of unfinished services is greater than 1, we do the following steps (lines 3-7). We start from and extend vertices via the equation (line 6) that calculates the average distance between two points, and each vertex’s average distance with the current vertex is stored in the ascending sorting queue N (line 7). When the number of unfinished services is less than 1, we perform the following steps (lines 9-11). We directly consider the actual distance between the current point and the nearest possible neighbor as the average distance (line 10) and store it in the queue N (line 11). Finally, we output the corresponding vertex as needed (lines 13-14).
4.2. Approximate KOR-SP
4.2.1. Domination Conditions
This is different from KOR-SP. The dominating relationship changes such that it does not require the partial routes that have been queried to provide the same services. Next, we will introduce the novel dominating relationship in detail.
We reconsider the dominating relationship. The original requirements are too strict. First, some partial explored routes should have the same end; second, they should provide the same number of the services required by user. For example, in Figure 2, (2,250) and (3,250) have the same destination, and the numbers of services of route and provided are not equal; according to the original definition, they do not satisfy the dominating relationship. Now, we relax this restriction. The number of services of is 2, which is one less than . If reaches the same number of services of , the number is 3, and it should add an edge. We assume the cost of the new added edge is ave_weight, that is the average edge weight of all the edges in the graph; as shown in the following, is the number of edges and is the -th edge’s weight:
Now, after adding, we can find that route has smaller distance cost, so the is the dominating route, and is dominated.
Definition 13 (optimize domination). Given a user’s request and two partially explored candidate routes and , if and the service number of is less than or equal to , and holds, dominates , denoted as , where is the difference value of the number of services of two routes.
For example, if have two services and have four services, in order to achieve the same number of services of , we will add two average edge weights to as the route’s estimated cost. After that, we decide the dominating relationship.
Lemma 14. Given a query and two partially explored routes R1 and R2, if , then , where and are the optimal feasible routes that are extended from and , respectively.
Proof. Suppose , , , and . and are the optimal route from R1 and R2, respectively, where and have the same service and . According to Algorithm 2, and , so . Because of and , we can know that .
|Input: Vertex , collection of POIs with service in C, integer .|
|Output: The -th nearest neighbor of in C.|
|3 if then|
|4 for in C|
|9 for v′ in C|
|10 average =|
|13 if then|
|14 return ;|
4.2.2. Priority Query
We introduce the priority of query in this section. By analyzing the travel routes of most users, we find that some services inherently have a higher priority than others. Next, we keep these services and their priorities in the dictionary . For example, a large amount of data shows that users first go to the bank to withdraw money and then spend money, so the priority of withdrawing money is higher than that of consumption. If the services do not have the priority relationship, they are considered the same. By classifying the priority, the services with higher priority are queried firstly, and the services with the same priority are queried randomly. We call this kind of query partial ordered query based on priority (POQP). In order to facilitate the classification of user request priority, we first plan the priority of services with obvious priority relationship in offline work.
At each iteration, the algorithm classifies the priority of user’s requests and queries the high-priority requests firstly. Algorithm 3 assigns priority for query keywords and stores the result in a specific set named pre_set (lines 1). Initially, is equal to and we check every keyword in the Q. Some keywords have a prior relationship. If a keyword’s priority level is the highest in these keywords, we define that the keyword’s priority level is 1; otherwise, the priority is 0 (lines 2-10). For other keywords, they have no prior relationship, so they are independent. These keywords have no effect on other keywords, so we also define that their priority level is 1 (line 12). After that, we add the keyword to the result set and assign the result set to pre_set (lines 14-17). After the initial step, the algorithm finds the service with lower priority through the last finished service and stores the service into the result set (lines 19-28). Finally, the algorithm outputs the result set (line 29). Algorithm 3 is used in Algorithm 1. When the algorithm queries the nearest neighbor, Algorithm 3 can reduce the number of the candidate POIs.
|Input: Request: ; : the number of services;|
|Priority dictionary: D(keyword, priority); initialize S=Q;|
|Output: priority set|
|1 n=1, , , t=0, , ;|
|2 if ==N then|
|3 for in Q do|
|4 if (Q-)∩. then|
|5 HQ= (Q-)∩.D|
|7 if .priority then|
|14 for in QH do|
|15 if .priority==1 then|
|21 HQ=(Q- r’)∩r’.D;|
|22 if then|
|24 for r in HQ do|
|25 if r.priority==t then|
|29 return set;|
By classifying the queries according to their priority, we can avoid the possibility that the route formed is inconsistent with the actual situation, and at the same time we can reduce the number of candidate POIs.
4.3. Route Replacement
We proposed a routing optimization mechanism, namely, route replacement, to check whether the routing cost can be shorter.
After obtaining the final route produced by Algorithm 1, we propose a postprocessing mechanism named route replacement to refine it. As shown in Figure 2, the route from to must go through , and we can find that the services provided by are the same as those provided by , and we can know that is smaller than the sum of and . So, we can use to replace . Finally, the shortest route is R’ . Obviously, the routing cost of R’ is smaller than that of . When we refine the route, we should store these points which satisfies the following requirements: (1) those that have not been searched, (2) those that must be passed by the refined route, and (3) those that provide the same services as the existing points in the refined route. In the refining process, we justify whether to use these points to replace the existing point, which can reduce the route cost. We should choose the replacing point with more services that satisfies the user’s requests, so it can not only reduce the cost but also make the route more concise by reducing the point in the route at the same time. We can understand the process of route replacement in detail through Algorithm 4.
|Input: route: R=; query: Q|
|Output: route: R’|
|2 for in R and do|
|4 for in V do|
|5 if QRP() then|
|6 for in R do|
|7 if QRP()∩Q=QRP()∩Q and cost()< cost()+cost() then|
|10 else continue;|
|11 else continue;|
|12 return R’;|
As shown in Algorithm 4, the pseudocode has described the process of route replacement. Firstly, we define the notion that V is for storing POIs (line 1). These POIs are going to be used to replace the POI in the route. For each POI in the route R, we look for the POIs between and . Then, we add these POIs to the V (lines 2-3). Next, we check each POI in the V and seek out the POI that is able to meet the user’s query (line 4-5). We look for a POI in the route. If and are able to provide the same services, we compare the cost between and . If is less than , is able to be replaced by (lines 6-10).
5. Experimental Evaluation
5.1. Experimental Setup
We use two real-world datasets from Zeng . Singapore represents Foursquare check-in data collected in Singapore, and Austin represents Gowalla check-in data collected in Austin. Singapore has 189,306 check-in points, 5,412 locations, and 2,321 users. Austin has 201,525 check-ins, 6,176 locations, and 4,630 users. The same as suggested [26, 27], we built an edge between two locations if they were visited on the same date by the same user. The locations not connected by edges were ignored. We filled in the edge costs by querying the traveling time in minute using Google Maps API under driving mode. The statistic information of the dataset is shown in Table 6.
Both datasets were used in , which also studied a route planning problem. The datasets are not small considering the scenario for a daily trip in a city where the user has a limited cost budget. Even with 150 POIs to choose from, the number of possible routes consisting of 5 POIs can reach 70 billion. Compared to our work, Jeffrey  evaluated its itinerary recommendation methods using theme park data, where each park contains only 20 to 30 attractions.
We compared the following algorithms. PACER  models the personalized diversity requirement by retrieving POIs indexes related to feature space and route space, as well as various strategies of pruning search space with user preferences and constraints, and the optimal solution of the top-k path search problem is given. PruningKOSR  uses dominance relationship to filter temporarily unnecessary routes. KOR-SP is our proposed optimal algorithm.
For each KOR-SP query , we randomly select a source and an integer , and then we issue query on all the graphs. In each experiment, 50 random query instances were constructed and the average query time was reported. If the query cannot be stopped within 4200 seconds or fails due to a memory overflow exception, we represent its corresponding query time as INF.
5.1.4. Evaluation Criteria
We evaluate the performance of different methods in four different aspects: the query run-time, the number of examined routes (witnesses), the number of (next) nearest neighbor (shortened as kNN) queries executed by calling Algorithm FindNN, and the cost of the routes.
5.2. Experimental Results
We first evaluate the efficiency of different algorithms answering KOR query in the default parameter setting on two real graphs and then evaluate the impact of parameters on the results.
5.2.1. Overall Performance under Default Parameter Settings
Figures 3–6 show the performance of three different algorithms on two graphs. The runtime of the algorithms on different graphs is displayed in Figure 3. Since all the algorithms have reduced the searching space, these can return the results on all graphs. At the same time, all the algorithms express efficient queries by using 2-hop label index. Figures 4 and 5 show the number of examined routes and NN queries, respectively. We can find that the number of examined routes in KOR-SP is much fewer than PruningKOSR on all graphs and the number of NN queries is larger than other algorithms. From this phenomenon, we can know the importance of a rich candidate. Because of semantic matching, KOR-SP has more candidate POIs and it can complete the route query with examining fewer routes. The consequence means KOR-SP is better than PruningKOSR. Figure 6 shows the cost of routes. As it is shown, the cost of routes of KOR-SP is much smaller than other algorithms, because this method has more candidate POIs that contains some POIs of which distance is shorter by semantic matching.
(a) methods on Singapore
(b) methods on Austin
(a) methods on Singapore
(b) methods on Austin
(a) methods on Singapore
(b) methods on Austin
(a) methods on Singapore
(b) methods on Austin
Figures 7–10 show the influence of parameter on the runtime of the three different methods on the two graphs. As shown in Figure 7, we know that all three methods can complete the function of query within the specified time, and with the increase of , the advantages of KOR-SP are more and more obvious. Figures 8 and 9 show the impact of the number of routes and NN queries checked in different methods on different graphs. We can find that there are far fewer routes and NN queries checked in KOR-SP. Compared with other algorithms, this algorithm has significant advantages under different k conditions. Figure 10 shows the influence of k on routing cost. It can be seen from the figure that, due to semantic matching, routing cost of KOR-SP at different k is the lowest.
(a) Effect of on Singapore
(b) Effect of on Austin
(a) Effect of on Singapore
(b) Effect of on Austin
(a) Effect of on Singapore
(b) Effect of on Austin
(a) Effect of on Singapore
(b) Effect of on Austin
Figure 11 shows the effect of parameter . We can find that different in KOR-SP algorithm has a great influence. The smaller the value of is, the less the route cost is, because there are more candidate POIs with decreasing the value, and then we can find more and more nearer neighbors to extend the route.
(a) running time
(b) examined route
(c) NN query
(d) route cost
Figure 12 shows the difference among the four different KOR-SP algorithms. In the figure, the KOR-SP is the basic algorithm, the PKOR-SP is the KOR-SP combining with the priority relationship, the OKOR-SP is the KOR-SP combining with optimize domination, and RKOR-SP is KOR-SP combining with the route replacement. Form Figure 12(a), we can find that the PKOR-SP has the best performance. We know that the priority relationship helps us reducing the size of query. And the route replacement only refines the result of KOR-SP, so RKOR-SP’s running time is longer than KOR-SP. From Figure 12(b), we can find that the OKOR-SP has the best performance. At the same time, other algorithms do not make much difference. By comparing other algorithms, we think that the optimized domination has a good effect.
(a) running time
(b) examined route
In this paper, we study the top-k optimal sequenced routes problem. We propose an efficient algorithm called KOR-SP, based on a novel route dominate relationship and a semantic matching by using the LDA model. Extensive experiments on real-world graphs demonstrate that the proposed algorithms are efficient. KOR-SP algorithm improves the flexibility of POI query and provides rich candidate sets for POI query by keyword semantic matching. And KOR-SP algorithm can quickly find the -th nearest neighbor of the current POI by using FindNN algorithm and reduce the route search space by dominating relation. In addition, the algorithm uses route refinement mechanisms to improve route quality.
As a future work, we plan to study the keyword unordered query which is disordered for the whole, but it is order for part of keywords that have causality.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
This research is partially funded by National Natural Science Foundation of China, under Grant no. 61602102 and no. 61872069, and the Fundamental Research Funds for the Central Universities, under Grant no. N161704004.
F. Li, D. Cheng, M. Hadjieleftheriou, G. Kollios, and S.-H. Teng, “On trip planning queries in spatial databases,” in Proceedings of the 9th International Symposium on Spatial and Temporal Databases, SSTD 2005, pp. 273–290, Brazil, August 2005.View at: Google Scholar
J. Eisner and S. Funke, “Sequenced route queries: getting things done on the way back home,” in Proceedings of the 20th ACM Sigspatial International Conference on Advances in Geographic Information Systems, pp. 502–505, USA, 2012.View at: Google Scholar
Y. Ohsawa, H. Htoo, N. Sonehara, and M. Sakauchi, “Sequenced route query in road network distance based on incremental Euclidean restriction,” Database and Expert Systems Applications, vol. 7446, no. 1, pp. 484–491, 2012.View at: Google Scholar
H. Liang and K. Wang, “Top-k route search through submodularity modeling of recurrent POI features,” in Proceedings of the 41st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2018, pp. 545–554, USA, July 2018.View at: Google Scholar
T. Akiba, Y. Iwata, and Y. Yoshida, “Fast exact shortest-path distance queries on large networks by pruned landmark labeling,” in Proceedings of the 2013 ACM SIGMOD Conference on Management of Data, pp. 349–360, USA, 2013.View at: Google Scholar
F. Li, D. Cheng, M. Hadjieleftheriou, G. Kollios, and S. Teng, “On trip planning queries in spatial databases,” in Proceedings of the International Symposium on Spatial and Temporal Databases, vol. 31 of Lecture Notes in Computer Science, no.1, pp. 273–290, Springer, Boston, MA, USA, 2005.View at: Publisher Site | Google Scholar
E. Ahmadi and M. A. Nascimento, “A mixed breadth-depth first search strategy for sequenced group trip planning queries,” in Proceedings of the 16th IEEE International Conference on Mobile Data Management, pp. 24–33, USA, 2015.View at: Google Scholar
D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichlet allocation,” Journal of Machine Learning Research, vol. 3, no. 4-5, pp. 993–1022, 2003.View at: Google Scholar
Y. Zeng, X. Chen, X. Cao, S. Qin, M. Cavazza, and Y. Xiang, “Optimal route search with the coverage of users' preferences,” in Proceedings of the 24th International Joint Conference on Artificial Intelligence, IJCAI 2015, pp. 2118–2124, Argentina, July 2015.View at: Google Scholar
X. Cao, L. Chen, G. Cong, and X. Xiao, “Keyword-aware optimal route search,” VLDB Endowment, vol. 5, no. 11, pp. 1136–1147, 2012.View at: Google Scholar
K. H. Lim, J. Chan, S. Karunasekera, and C. Leckie, “Personalized itinerary recommendation with queuing time awareness,” in Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2017, pp. 325–334, Japan, August 2017.View at: Google Scholar