Abstract
Given a set of positiveweighted points and a query rectangle r (specified by a client) of given extents, the goal of a maximizing range sum (MaxRS) query is to find the optimal location of r such that the total weights of all the points covered by r are maximized. All existing methods for processing MaxRS queries assume the Euclidean distance metric. In many locationbased applications, however, the motion of a client may be constrained by an underlying (spatial) road network; that is, the client cannot move freely in space. This paper addresses the problem of processing MaxRS queries in a road network. We propose the externalmemory algorithm that is suited for a large road network database. In addition, in contrast to the existing methods, which retrieve only one optimal location, our proposed algorithm retrieves all the possible optimal locations. Through simulations, we evaluate the performance of the proposed algorithm.
1. Introduction
With the widespread use of mobile computing devices [1–7], locationbased services [8] have attracted much attention as one of the most promising applications whose main functionality is to process locationrelated queries on spatial databases. Most traditional research in spatial databases have focused on finding nearby data objects (e.g., range queries, nearest neighbor queries [9], etc.), rather than finding the best location to optimize a certain objective. Recently, a maximizing range sum (MaxRS) query was introduced in [10]. This query is useful in many locationbased applications such as finding the most representative place in a city with a limited reachable range for a tourist or finding the best location for a pizza store with a limited delivery range. Given a set of positiveweighted points and a query rectangle (specified by a client) of a given size, the goal of a MaxRS query is to find the optimal location of such that the sum of the weights of all the points covered by is maximized.
Figure 1 shows an example of the MaxRS query, where the size of the query rectangle is and all the points are assumed to have the same weight and be equal to 1. In the figure, the center of the solidlined rectangle is the optimal location of because the solidlined rectangle covers the largest number of points (i.e., 3).
To process MaxRS queries, Choi et al. [10] proposed an externalmemory algorithm, while Imai and Asano [11] an internalmemory algorithm. Tao et al. [12] proposed the solution for approximate MaxRS queries, each of which retrieves a rectangle whose coveredweight is at least , where is the optimal coveredweight and is an arbitrary constant between 0 and 1. All of these studies aim at Euclidean spaces. In many reallife locationbased services, however, the motion of a client may be constrained by an underlying (spatial) road network; that is, the client cannot move freely in space. Consider the scenario of a tourist service as an example, where a tourist (i.e., client) tries to find the hotel whose location is close to as many sightseeing spots as possible (e.g., maximum is 1.5 km walking from the hotel). In this scenario, a MaxRS query can be applied. However, the existing MaxRS query processing methods cannot be applied in this scenario because the distance between the hotel and each sightseeing spot is confined by the underlying (spatial) road network, and thus the actual distance between two locations can differ significantly from their Euclidean distance. We can see this significant difference in Figure 2, where the Euclidean distance between and is about 1.24, while for moving from to in reallife, we must pass through and with total length around 3.74, which is three times farther than Euclidean distance. With this problem in mind, we study, for the first time to the best of our knowledge, the problem of processing MaxRS queries in a road network, where the distance between two points is determined by the length of the shortest path connecting them (i.e., network distance [13]).
Figure 2 shows an example of the road network, which consists of 5 nodes (square vertices) and 7 edges. In the figure, there are 4 facilities (weighted points), each of which, denoted by , is associated with a positive weight indicating the importance of . The numbers that appear in parenthesis next to nodes and facilities show their respective coordinates. Note that it is assumed in this paper that all the facilities must be located on edges of the road network. Then, a MaxRS query in a road network is defined as follows. Given a set of facilities and a radius , the MaxRS query finds all the locations (on a road network), which maximizes the total weights of all the facilities whose network distance to is less than or equals .
In the case of road network in Figure 2, we have an example of MaxRS query with the radius 1.5 (km) in Figure 3 (the weight of each facility is 1). The distance between each point in the stage to three facilities , , and is less than or equal to 1.5. And the total weight of all the facilities whose network distance to all points of stage is less than or equals 1.5 is 3, which is maximum in this scenario. Then, stage is an optimal result in this MaxRS query and user can choose any hotel on this stage.
In this paper, we propose the externalmemory algorithm for MaxRS queries in a road network. The proposed algorithm is suitable for a large road network database. In addition, in contrast to the existing methods, which find only one optimal location, our proposed algorithm finds all the possible optimal locations. This can help clients of diverse interests choose their own best locations by considering other additional conditions.
The remainder of this paper is organized as follows. In Section 2, the problem is formally defined, and in Section 3, the details of the proposed algorithm are provided. In Section 4, the performance evaluation results are presented. In Section 5, some related work is reviewed. Finally, Section 6 concludes the paper.
2. Problem Formulation
A road network is represented by an undirected graph , where is a set of vertices (i.e., nodes) and is a set of edges. Let be a set of facilities, each of which, denoted by , is located on an edge (in ) and is associated with a positive weight .
Definition 1 (network range and network radius). Network range of a point in a road network consists of all points (in the network) whose network distance to is less than or equals the value , where is called the network radius of .
Definition 2 (a MaxRS query in road network). Given , a set of positiveweighted points , and a network radius value , let be the network range of a point in the network and the set of facilities covered by . Then, a Maximizing range sum (MaxRS) query in a road network finds all points (in ) that maximizes
3. The Proposed Method
3.1. Preliminaries
In this subsection, we review the idea of transforming the maxenclosing rectangle query into the rectangle intersection query discussed in [14], which is the fundamental idea for processing MaxRS queries in Euclidean space [10].
Definition 3 (maxenclosing rectangle query). Given a set of points , a rectangle with a given size, a maxenclosing rectangle query finds the location of such that encloses the maximum number of points in .
The MaxRS query calculates the total weight of points, while the maxenclosing rectangle query counts the number of points in rectangle. Note that when assuming all points have the weight being equal to 1, the result of the MaxRS query equals that of the maxenclosing rectangle query.
Definition 4 (rectangle intersection query). Given a set of rectangles , a rectangle intersection query finds the area, where most rectangles overlap.
Figure 4 shows two examples of the maxenclosing rectangle query and the rectangle intersection query. It can be observed from the figure that the optimal location in the maxenclosing rectangle query can be any point in the most overlapped area (i.e., the gray area, where 3 rectangles overlap), which is the outcome of the rectangle intersection query.
Our solution is based on the above idea. Consider an example of a MaxRS query in a road network shown in Figure 5. To simplify our discussion, we use a simple road network that consists of two edges (i.e., and ) and two facilities (i.e., and ) on two edges.
In this example, we assume that the weight of each facility is 1 and the network radius is 1. The gray solid segments in Figure 5 indicate the network range of the facility , and gray dotted segments indicate the network range of facility . Let be the set of all segments presented in the network range of all facilities in the road network. Then, we define the following two important notions for the MaxRS query in the road network.
Definition 5 (locationweight). Let be the location in road network. The locationweight of with regard to equals the total weights of all the segments (in ) that cover .
Definition 6 (maxsegment). The maxsegment with regard to is a segment such that every point in has the same locationweight , and no point in the network has a locationweight higher than .
From the idea of the transformation mentioned before, we can see that the overlapping segment in Figure 5 is a maxsegment. Because all maxsegments in the network contain all the optimal locations (i.e., the result of the MaxRS query in the road network), we need to find all maxsegments in the network to evaluate the MaxRS query.
3.2. Storage System
Similar to the diskbased storage model proposed in [13], the road network and the facility set are stored in a secondary storage.
Figure 6 shows the files and indexes for the network and facility set. In this storage model, the network (adjacency list) is stored in a flat file, which is indexed by the B^{+}tree. For each node (e.g., ), besides the information of (i.e., node identifier, coordinates), we also store the additional information of all adjacent nodes including adjacent node identifier and Euclidean distance between and its adjacent node (e.g., length of edge is 2.236). Similarly, the facility list is also stored in a flat file and indexed by the B^{+}tree. To support the algorithm efficiently, besides the information of each facility (i.e., facility identifier, coordinates, and weight of facility), we store the additional information of the edge that contains including start node identifier, end node identifier, and the Euclidean distance (offset) between start node and (e.g., start node of is , end node of is , and length of segment is 1.0).
3.3. Main Algorithm
3.3.1. Overview
Our algorithm is based on the idea mentioned in Section 3.1. From each facility , we generate segments that cover the network range . The segments generated by facility will have the weight of , namely, . These segments are organized in a segfile. Then, we process the segfile to find out all maxsegment. The following three main steps constitute the proposed algorithm:(1)generating segments;(2)inserting segments into segfile;(3)processing segfile to find maxsegments.
3.3.2. Generating Segments
In this step, we generate segments from all facilities of facility flat file. For each facility , we generate the segments, which cover the overall network range . This process is described in Algorithm 1. First of all, we retrieve the information of the edge that contains , start node, and end node. Then, we generate the segments at the start node side first (lines 8–16), after which we generate the segments at the end node side (lines 17–26). If the distance between and the start node is greater or equals the network radius , we only need to generate one segment with the length being equal to (lines 910). On the contrary, we generate the segment between and the start node (the length is equal to the offset of facility, lines 1314) and continuously generate segments from the start node with the remaining network radius by calling the function recursiveGenerateSegs (line 15), which will be described in Algorithm 2. We do the same way to generate segments at the end node side (with the new offset is the length from to end node, line 17). Each new generated segment has the weight of and contains the facility identifier of . This facility identifier will help the merging process when there is more than one segment of generated in one edge. These new generated segments are inserted into the segfile with the edge that contains them. In our algorithm, we use a list in order to contain edges processed completely in generating process of a facility (finishededgelist). The edges in this finishededgelist will not be processed during the invocation of the function recursiveGenerateSegs. After generating the segments of finishes, we need to clear the finishededgelist to start generating the segments of a new facility (line 27).


After finishing generation of the segments from a facility to start node (and the end node) in Algorithm 1, if the network radius is greater than the distance between and the start node (and the end node), the generating process of the segments is continued from this start node (end node) with the new shortened network radius (lines 15 and 25). This process is described in Algorithm 2, which helps segments spread out the network range .
In Algorithm 2, we generate all edges of the current node (i.e., the node we start generating segments). These edges are created from the neighbor list of current node, except the old node, which has been already processed (line 1). To process an edge, we need to consider two situations. In the first situation, this edge does not exist in finishededgelist (line 5). If the length of this edge (e.g., ) is greater than or equals the new network radius, we only need to create a new segment between the current node and the neighbor node with its length being equal to the new radius. Then, we insert this segment into segfile (lines 6–8). If the length of the edge is smaller than the new network radius, we create a new segment between the current node and the neighbor node, and insert this new segment into segfile, after which we continuously generate segments from the neighbor node with the new shorten network radius (line 13). In the second situation, this edge existed in the finishededgelist (lines 15–19). If the length of the edge is smaller than the new network radius, we only need to generate segments from the neighbor node with the new shortened network radius (line 17). This process continues until the generated segments cover the network range of the original facility.
Figure 7 shows the process of generating segments of facility in road network shown in Figure 3. In this example, the network radius is 1.5. First of all, we generates the first segment with length 1 and then two segments with length 0.5 on 2 edges and . After that, we generate segment with length 0.803 and 3 segments on 3 edges, , and with the same length 0.697. The numbers nearby segments show the generating order of these segments.
3.3.3. Inserting Segments into SegFile
Segments generated at step 1 are inserted into segfile (together with containing edge information). Algorithm 3 describes this insertion process. One important point of segfile is that all segments on the same edge will be grouped into one record (edgerecord). So, each edgerecord in segfile has the format of the form . This segfile is indexed by B^{+}tree. This structure of segfile helps to find maxsegments effectively.

When we insert a segment into segfile, if there is no edgerecord of that segment in segfile, we create a new edgerecord of that segment and insert it into segfile (lines 34). In case an edgerecord of that segment has already existed in segfile, we need to check if there exist any segments of the same facility in this edgerecord. If this is the case, we need to merge these existing segments with the new segment (lines 7–13). Then, the mergeSegment function merges two segments into the same edge (line 8). Figure 8 shows some situations of position of two segments in an edge. In the first three situations, the mergeSegment function returns one new segment, whereas in the last situation, it returns null (two segments cannot be merged). After updating segment list of edgerecord, we update this edgerecord in the segfile (lines 1516).
(a)
(b)
(c)
(d)
Figure 9 shows the records in segfile after finishing the generating segments step and inserting segments step. In the figure, the segments generated from the facility are gray dotted segments, the segments generated from the facility are gray solid segments, the segments generated from the facility are black solid segments, and finally the segments generated black dotted segments originate from the facility . Each record associates with one edge (e.g., the thin solid line ). In Figure 9, the first record associates with the edge and contains one segment generated from facility .
3.3.4. Finding MaxSegments
After finishing construction of the segfile, Algorithm 4 is invoked, which is the process of finding maxsegments from the segfile.

In this algorithm, we find the local optimal segments in each edgerecord first (line 4), after which we compare the maximum weight of segments on these edgerecords, and the segments that have maximum weight are added into the list as final result (lines 6–14). The process of finding local optimal segments is processed by function lineSweep, which is the line version of algorithm plane Sweep proposed in [11].
Figure 10 illustrates the algorithm line Sweep on the record associated with the edge . Assuming that we are sweeping on an edge (e.g., ), if we meet a start node of a segment (e.g., positions 1 in the case of segment 2,…) the weight of this segment will be included in the calculation of local maximum weighted segment; in case we meet an end node (e.g., position 4 in the case of segment 2,…), we will remove the weight of this segment from the calculation. In the figure, the segment from position 3 to position 4 on edge is the local maximum weighted segment of this record.
After finishing the finding maxsegments step, from Figure 11, we can see that two segments (in edge ) and (in edge ) are maxsegments with maximum weight (e.g., 3) in the example of Figure 3 (we assume that the weight of each facility is 1).
4. Performance Evaluation
4.1. Simulation Setup
We use two real datasets, namely, North America (NA) road network and San Francisco (SF) road network. These datasets are depicted in Figure 12. The NA dataset is obtained from http://www.cs.fsu.edu/~lifeifei/SpatialDataset.htm and the SF dataset is obtained from [15]. The cardinalities of datasets are shown in Table 1.
Because this is the first work for processing MaxRS queries in a road network database, we develop a naive algorithm to compare with our proposed algorithm. The naive algorithm uses an unstructured segfile, and thus the generated segments are inserted directly to segfile in step 2 (segments on the same edge are not grouped into one edgerecord). In step 3, the naive algorithm reads the segments from segfile, groups segments in the same edge, and finds maxsegments.
We use diskbased storage model to store very large road network databases, so in our simulation, the performance metric is the number of I/O’s, which is the number of read/write blocks from files. We do not consider CPU time because it is dominated by I/O cost [10, 12, 16]. The default values of the parameters are shown in Table 2.
4.2. Simulation Results
4.2.1. Effect of the Number of Facilities
Figure 13 shows the effect of the number of facilities on the I/O cost. For both datasets NA and SF, when the number of facilities increases, the I/O cost increases. However, the proposed method is much less sensitive to this parameter than the naive algorithm.
(a) NA
(b) SF
4.2.2. Effect of the Network Radius
Figure 14 shows the results for the varying of network radius (network range). When the network radius increases, the number of segments increases, and thus the I/O cost also increases. The increment of I/O cost in SF dataset is greater than NA dataset because we can see the destiny of edges in SF is higher than NA. Therefore, the number of generated segments of SF is more than NA.
(a) NA
(b) SF
4.2.3. Effect of the Buffer Size
Figure 15 shows the results for the varying of buffer size. Although both algorithms have better performance as the buffer size increases, the proposed algorithm is more sensitive to the size of buffer than the naive algorithm.
(a) NA
(b) SF
4.2.4. Effect of the Block Size
Figure 16 shows the results for the varying of block size. We can see that when the block size increases, the I/O cost decreases. This is because as the block size increases, the number of objects stored in a block also increases, which causes the number of read/write blocks to decrease. Similar to the buffer size case, the proposed algorithm is more sensitive to the size of block than the naive algorithm.
(a) NA
(b) SF
5. Related Work
In this section, we review related work on facility optimization location problem in general and MaxRS problem in particular.
Facility Optimization Location Problem. MaxRS problem can be seen as an instance of facility location optimization problem, which has been studied extensively in current years. The aim of this facility location optimization problem is to find an optimal location to maximize/minimize an objective function. Cabello et al. introduced and investigated optimization problems according to the bichromatic reverse nearest neighbor (BRNN) rule [17], while Wong et al. [18] studied a related problem called MaxBRNN; find an optimal region that maximizes the size of BRNNs. These two problems are studied in space. Du et al. [19] proposed that the optimallocation query returns a location with maximum influence, where the influence of a location is the total weight of its RNNs. In the extension version of [19], Zhang et al. [20] proposed and solved the mindist optimallocation query.
There are some studies, specially, about facility location optimization in road network database. Xiao et al. [21] have studied about optimal location queries in road network, with the introduction of three important types of optimal location queries: competitive location query, MinSum location query, and MinMax location query. Yan et al. also proposed some algorithms for finding optimal meeting point, which have smallest sum of network distances to all the points in a set of points in road networks [22].
MaxRS Problem. Imai and Asono proposed an optimal algorithm for the maxenclosing rectangle problem [11] with the time complexity being ; n is the number of rectangle. Nandy and Bhattacharya also presented another algorithm which is based on interval tree data structure with the same cost [14]. Those algorithms are internal memory algorithms. Choi et al. [10] proposed an algorithm for solving MaxRS problem in the case of external memory with optimal I/O cost. Tao et al. [12] proposed a new problem called ()approximate MaxRS which returns a solution that can be worse than optimal solution by a factor at most ; is an arbitrary small constant between 0 and 1.
Another version of MaxRS problem is maximizing circular range sum (MaxCRS) problem. This is a circle version of MaxRS problem with the boundary being a circle. Chazelle and Lee [23] proposed an algorithm for solving the maxenclosing circle problem with the time complexity being . As maxenclosing circle problem is 3SUMHARD [24], in which the best algorithm takes time, many studies used approximate approaches to solve maxenclosing circle problem. Aronov and HarPeled [25] give a MonteCarlo ()approximation algorithm for unweighted point sets that runs in time; this algorithm can be extended to the weighted case, giving an algorithm that uses time. de Berg et al. [26] proposed another approximation algorithm for maxenclosing circle problem with time complexity . The MaxCRS problem is also proposed in [10] by a novel reduction that converts the MaxCRS problem to the MaxRS problem.
6. Conclusions
The MaxRS problem can be used in locationbased applications to find the most profitable service place or the most serviceable place. All of previous studies are stated in Euclidean distance; however, in many locationbased applications, the network distance is used instead of Euclidean distance. This paper proposed an efficient algorithm for solving the MaxRS problem in road network database. We proposed an externalmemory algorithm, which is suitable for large dataset of road network. In our algorithm, all optimal locations (maxsegments) on the network will be returned while all previous methods only return one result. This can help clients of diverse interests choose their own best locations by considering other additional conditions. For the future works, we plan to improve our method and calculate the complexity of algorithm.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF2013R1A1A2061269) and this research was funded by the MSIP (Ministry of Science, ICT & Future Planning), Korea, in the ICT R&D Program 2013.