Abstract

Group nearest neighbor (GNN) query enables a group of location-based service (LBS) users to retrieve a point from point of interests (POIs) with the minimum aggregate distance to them. For resource constraints and privacy concerns, LBS provider outsources the encrypted POIs to a powerful cloud server. The encryption-and-outsourcing mechanism brings a challenge for the data utilization. However, as previous work from anonymity technique leaks all contents of POIs and returns an answer set with redundant communication cost, the LBS system cannot work properly with those privacy-preserving schemes. In this paper, we illustrate a secure group nearest neighbor query scheme, which is referred to as SecGNN. It supports the GNN query with LBS users and assures the data privacy and query privacy. Since SecGNN only achieves linear search complexity, an efficiency enhanced scheme (named Sec) is introduced by taking advantage of the KD-tree data structure. Specifically, we convert the GNN problem to the nearest neighbor problem for their centroid, which can be computed by anonymous veto network and Burmester–Desmedt conference key agreement protocols. Furthermore, the Sec scheme is introduced from the KD-tree data structure and a designed tool, which supports the computation of inner products over ciphertexts. Finally, we run experiments on a real-database and a random database to evaluate the performance of our SecGNN and Sec schemes. The experimental results show the high efficiency of our proposed schemes.

1. Introduction

With the prevalence of mobile phones and the rapid development of wireless, location-based service (LBS) provides the possibility for LBS users to proceed location queries according to their interests [13]. For instance, when locating in an unfamiliar place, LBS users can find the nearest restaurant through sending LBS queries to the Google map. A new employee also can find the shortest path to the company by sending a LBS query. Group nearest neighbor (GNN) query, a normal operation in the LBS system, requires LBS users to retrieve a meeting place from POIs with the minimum sum of distances to them.

Under the conditions of limited resources and frequent queries, the LBS provider always outsources heavy database of POIs to a powerful cloud server [4]. The LBS provider thus can enjoy the rich storage and computation resources, but database outsourcing raises privacy problems for the private sensitive information. In such a case, the LBS provider chooses the encryption-before-outsourcing mechanism, which avoids the potential leakage of private sensitive information. However, database encryption brings a challenging task for data utilization in the LBS system.

Hashem et al. [5] proposed the first privacy-preserving group nearest neighbor query scheme from anonymity technique. Users compute and send location regions instead of actual locations to the cloud server to assure the privacy of actual locations. Then, the cloud server searches the POIs over plaintexts and returns an answer set that contains the group nearest neighbor (namely, the meeting place) for actual locations. To evaluate the real GNN from the answer set, the authors proposed a private filter technique to avoid the leakage of actual locations and the meeting place. With spatial technique, users in scheme [6] obfuscate actual locations in cloak regions and construct a single centroid region for the nearest neighbor query. In the second phase, users compute the meeting place from the received answer set by using a special secure multiparty computation. In above schemes, the cloud server returns a superset instead of the real answer, which increases the communication cost.

As an improvement, Wu et al. [7, 8] introduced another scheme by hybridizing actual locations with other dummy locations. Users put their actual locations in a specific position, which is shared among them, and send location sets to the cloud server. The cloud server searches and generates candidate results for locations on the same index in their location sets. Furthermore, it retrieves and returns the real GNN on ciphertext by leveraging generalized pallier encryption [9]. Finally, users decrypt the ciphertext and obtain the meeting place. Unfortunately, all solutions [58] adapt anonymity technique to assure the privacy of sensitive locations. It is noteworthy that anonymity only assures that an attacker can identify the actual location with an advantage at most , but the content may be leaked. For instance, all candidate results in scheme [8] remain visible and an attacker can identify actual locations with advantage . The privacy-preserving method cannot satisfy the security requirements in practice.

Huang and Vishwanathan [13] proposed two GNN schemes, the centralized and distributed model, from cryptographic theory. The authors leverage garble circuit (GC) and oblivious transfer (OT) to construct a secure multiparty computation protocol. Then, users can jointly compute a meeting place from the secure multiparty computation framework. But the scheme [13] suffers a heavy computation and communication burden. Therefore, we aim to seek for a privacy-preserving GNN scheme based on cryptographic tools with whole security protection and high efficiency.

1.1. Our Contributions

In this paper, we design two privacy assurance group nearest neighbor query schemes, SecGNN scheme and Sec scheme, for the LBS system in cloud computing. The designed SecGNN scheme preserves the data privacy and query privacy, while achieving linear complexity. To enhance query efficiency, we introduce the Sec scheme based on the tree structure. The main contributions are demonstrated as follows.(1)We convert the problem of finding group nearest neighbor to the problem of finding a point in POIs with minimum aggregate inner products. Thus, the construction of the secure GNN scheme is modeled as designing an inner product-preserving scheme with privacy preserving. Based on the asymmetric scalar-product-preserving encryption (ASPE) technique [15], we design a basic tool to compute a special inner product and then introduce a secure group nearest neighbor query () scheme, which is referred to as the SecGNN scheme.(2)Furthermore, we prove that finding group nearest neighbor for LBS users is equivalent to finding the nearest neighbor for their centroid, which can be secretly computed by using anonymous veto network protocol and Burmester–Desmedt conference key agreement protocol. Thus, the Sec scheme is designed from the construction of nearest neighbor for the centroid and further improves the search efficiency of the SecGNN scheme. The proposed Sec scheme achieves search complexity, where is the number of data items in database, is the dimension of data items, and is the size of answers.(3)Moreover, we demonstrate the correctness analysis and security proof of both schemes. The security proof illustrates that our SecGNN and Sec schemes achieve data privacy and query privacy.(4)Finally, we evaluate the performance of SecGNN and Sec schemes through experimental simulation on a real-database with 62,556 data items and a random database with 1,000,000 data items. The experimental results show that the Sec scheme achieves practical search efficiency, about 0.2 s on millions’ database.

1.2. Related Work

Group nearest neighbor query enables a group of LBS users to retrieve a point with the minimum aggregate distance. The authors [16] demonstrated multiple query method (MQM), single-point method (SPM), and minimum bounding method (MBM) for processing GNN queries. Moving a step forward, Papadias et al. [17] generalized the GNN query to minimize the maximum or minimum distance. Because of heavy storage and computation burden, the resource-constrained LBS provider always outsources the database to the cloud server possessing the powerful storage and computation resources. Since the cloud server is dishonest and may steal sensitive information, privacy-preserving methods always be chosen to assure data security. As shown in Table 1, numerous studies [5, 6, 13, 18, 19] have been done to protect the privacy of private sensitive location information from the dishonest cloud server and outside attackers.

Hashem et al. [5] introduced the first privacy-preserving GNN scheme by leveraging k-anonymity technique. They described the privacy assurance solution from three steps, sending the query request, searching the GNN, and finding the actual meeting place. That is, users first send cloaked rectangles (including locations , , , ) to the cloud server, respectively. After receiving the query rectangles , the cloud server proceeds the group nearest neighbor query for query rectangles over plaintexts and returns an answer set that contains the GNN for users with respect to the actual locations . The cloud server also returns the sum of maximum distance to query rectangles (namely, ) for each point in the answer set . Finally, LBS user updates by using the distance to his actual location instead of . Specifically, LBS user computes . This process continues until all LBS users have updated. The point is the desired GNN if it has the minimum in the answer set. However, the scheme [5] is expensive in communication cost and fails to assure the query privacy.

Subsequently, Ashouri-Talouki et al. [18] demonstrated an efficiency-enhanced group nearest neighbor query scheme. Users compute the minimum bounding rectangle (MBR) that contains all of them or the centroid from the anonymous veto network (AV-net) and Paillier encryption scheme [20]. Then, the cloud server searches the nearest neighbor (NN) over plaintexts for the MBR or centroid. Furthermore, it is worth that the authors take the NN for the centroid as an approximate GNN, and the scheme [18] fails to protect the location privacy of the meeting place. The authors design another privacy-preserving group nearest neighbor query scheme from cloaked-centroid protocols [6]. LBS users hide their precise locations in cloaked regions by generating rectangles that contain the exact locations, respectively. The cloaked region can be adjusted with the time and privacy requirements. LBS users publish their cloaked regions to the public bulletin board. After publishing, LBS users can compute the cloaked-centroid region and submit to the cloud server for a NN query. In the next phase, the cloud server searches the POIs and returns an answer set to them. LBS users compute the approximate meeting place by using AV-net protocol. But the cloud server returns a superset replacing the real answer, which increases the communication cost.

Wu et al. [7, 8] presented a privacy-preserving GNN scheme based on the -anonymity method. LBS users use dummy locations to blur their actual locations and store the exact location in a specific position which is shared among them. A chosen LBS user proceeds the generalized Paillier encryption to generate the ciphertext for the indicator vector. The cloud server searches the group nearest neighbor over plaintexts and computes the encrypted answer by a private selection. Finally, LBS users decrypt the ciphertext and obtain the real meeting place. Unfortunately, all these solutions only protect the query privacy with an advantage at most , where is the size of dummy locations. Furthermore, it fails to protect the privacy of POIs since the cloud server searches the GNN or NN over plaintexts.

Some privacy-preserving GNN schemes [13, 21] based on cryptographic have been introduced. Huang and Vishwanathan [13] proposed a secure GNN scheme from secure garble circuit and oblivious transfer. In solution [13], LBS user Charlie constructs a garble circuit for the GNN and user David executes the computation task. Other users obtain ciphertexts from Charlie through oblivious transfer and send ciphertexts to David. Finally, David obtains the GNN by computing the garble circuit. LBS users in [21] jointly compute the aggregate distance for all POIs by using anonymous veto network and Burmester–Desmedt key establishment protocols. Thus, users can obtain the precise meeting place. However, both schemes [13, 21] bring high computation and communication costs for LBS users and are impractical in the real world.

1.3. Organization

The rest of this work is organized as follows. Some preliminaries and their definitions will be demonstrated in Section 2. In Section 3, we present the system model and privacy requirements of the proposed schemes. The introduced SecGNN scheme and its security analysis are illustrated in Section 4. We further describe the efficiency-enhanced Sec scheme and its security analysis in Section 5. Finally, the performance evaluation and conclusion will be discussed in Sections 6 and 7.

2. Preliminaries

In this section, we introduce some preliminaries of KD-tree, anonymous veto network protocol, and Burmester–Desmedt conference key agreement protocol.

2.1. KD-Tree

KD-tree is a hierarchical data structure which is always used to improve the efficiency of the range query and nearest neighbor query. It has a storage cost, construction time cost, and query time cost, where is the size of the database, is the dimension of data record in the database, and is the size of the answer set.

KD-tree is a binary tree structure in which every nonleaf node has two children. The children are split based on the coordinate or coordinate of their father node. From root to leaf node, the splitting is done on coordinate, on coordinate for the children, and next on coordinate for the grandchildren, and so on. In the following, we describe the solution for KD-tree to solve the problem of the nearest neighbor query:(1)The search process is done based on the splitting line (e.g., coordinate or coordinate of the farther node) and can quickly arrive at the leaf node. We return the leaf node value as a temporary nearest neighbor and mark down the road from root to this leaf node.(2)We check whether the distance of its father node to query node is closer than the temporary nearest neighbor. If it is, we update the father node as a temporary nearest neighbor. Furthermore, we do the checking whether the distance from the query point to the splitting line is smaller than that to the temporary nearest neighbor. If it holds, we search the other child. The checking algorithm continues until backing at the root.

For instance, we illustrate an example of KD-tree in Figure 1. Given a query point , the search process is done to obtain the leaf node value as the temporary nearest neighbor. When backing to the father node , it is easily seen that the father node is closer than the temporary nearest neighbor . We update father node as the temporary nearest neighbor. Furthermore, we check whether the circle with center and radius 0.14 (the distance from query point to temporary nearest neighbor ) intersects (the splitting line). Since it intersects with , we check whether point is closer than temporary nearest neighbor . If it fails, we are back to check the father node and obtain as the nearest neighbor for query point .

2.2. Anonymous Veto Network Protocol

In anonymous veto network protocol and the following Burmester–Desmedt protocol, it is assumed that is a finite cyclic group of prime order in which the decisional Diffie–Hellman (DDH) problem is intractable. The generator in is , and all computations take place in . There are members in the group as , , , , and they agree on .

Anonymous veto network protocol (AV-net) was introduced by Feng et al. [22, 23] to solve the problem of anonymous veto. The AV-net protocol contains two phases. In the first phase, users choose random numbers and publish the random ephemeral public keys , respectively. Then, each user computes by multiplying all the random ephemeral public keys before and dividing all the random ephemeral public keys after :

In the next phase, users compute and broadcast , , , , where if user does not veto; otherwise, is a random number (). In such situation, if no one vetoes, we have because of the vanishing property of AV-net exponents, . Otherwise, . Meanwhile, vetoing users remain anonymous.

2.3. Burmester–Desmedt Conference Key Agreement Protocol

Burmester–Desmedt (BD) protocol was developed by Burmester and Desmedt to establish a conference key [24, 25]. That is, users , , , aim to establish a conference key. The indices are taken in a cycle, namely, and . As shown in Figure 2, BD-protocol includes two processes. In the first process, users , , , choose random numbers and broadcast . In the second process, each user publishes and later computes the conference key by the following equation:

Thus, users , , , have the same conference key:

Considering the intractability of the Diffie–Hellman problem in , the conference is only computable by group users and adversaries can find no information about it.

3. Problem Formulation

In this section, we first illustrate the system model and then describe the privacy requirements of the group nearest neighbor query schemes.

3.1. System Model

In this paper, we aim to design a secure and efficient group nearest neighbor query scheme. As shown in Figure 3, the system model of our scheme consists of three entities, cloud server, LBS provider, and LBS users. The cloud server has powerful storage and computation resources. It provides storage and computation services for LBS provider and further searches the encrypted database after receiving search tokens from LBS users. Since constrained resources and privacy concerns, the LBS provider (e.g., a company) owns the database and chooses the encryption-and-outsourcing mechanism to enjoy the powerful resources provided by the cloud server. According to their requirements, LBS users (e.g., employees) generate and submit search tokens to the cloud server. Subsequently, they obtain the encrypted results and decrypt them to have the final results. In this paper, we suppose that LBS users always submit a query request for a meeting place. That is, LBS users want to find a meeting place that has the minimum aggregate distance.

Furthermore, we assume that the cloud server is honest-but-curious. That means, the cloud server follows the protocol faithfully but intends to steal the information about the private sensitive database. We also assume that the LBS provider is honest and LBS users want to steal other actual locations but do not collude with others. The assumption is common in other papers [26, 27].

Based on the assumption, we consider two threat models, known ciphertext model and known background model. In the known ciphertext model, only ciphertexts and encrypted index are known to the cloud server. In the known background model, the cloud server holds more information, such as the distribution of data records and queries. Both models are considered in some existing works [28, 29].

3.2. Privacy Requirements

Under above assumptions, the proposed scheme should protect the confidentiality of outsourced database and query data. The privacy requirements are illustrated in the following.(1)Data privacy: the LBS provider owns the original database and builds an index to improve the search efficiency. Both data and index contains private sensitive information. Thus, a secure group nearest neighbor query scheme should ensure that outside attackers (i.e., cloud server) cannot infer the exact information from the encrypted database and index.(2)Query privacy: the authorized LBS users submit GNN queries to the cloud server, and the latter executes search operations. However, the cloud server cannot infer specific contents of query requests from the search token and encrypted results.

3.3. Notations

For the convenience of describing SecGNN and Sec schemes, we illustrate some notations in Table 2.

4. Secure Group Nearest Neighbor Query

In this section, we illustrate a secure group nearest neighbor query scheme called SecGNN in the LBS system. SecGNN enables LBS users to retrieve a meeting place from POIs that have the minimum aggregate distance. For a point in POIs, the sum of distances to LBS users is computed as follows:where , , , are the coordinates of LBS users . Assume that , , , , , , , , and , , . Since is a fixed number, achieves the minimum value if the sum of inner products , , , and achieves the minimum value.

In the following, we adapt a privacy-preserving method to evaluate a point that has the minimum aggregate inner products. Specifically, we design a basic tool, named CompInner, supporting the computation of inner products over ciphertexts and then introduce the BD-conference key agreement to generate a common random. Finally, we construct the SecGNN scheme from above protocols.

CompInner tool consists of five polynomial-time algorithms, Setup, Encrypt, GenToken, GenInner, and Decrypt. Firstly, the user runs the system setup algorithm (Setup) to generate the secret key and then encrypts data items in the encryption algorithm (Encrypt). Next, the token generation algorithm (GenToken) is run by the user to output a computation token for the query point, and inner products can be computed in the inner generation algorithm (GenInner). Finally, the user runs the decryption algorithm (Decrypt) to decrypt the encrypted results.(1)Setup (): the user runs the system setup algorithm to generate two invertible matrices , a -bit string , and random values . Thus, the secret key is(2)Encrypt (): suppose that , and the data encryption algorithm is run to do the following steps:(1)Adding random numbers: for a point , the user first computes and then extends it as .(2)Random splitting: split into two parts and . For , If , . If , and are random numbers such that .(3)Data encryption: Compute the ciphertexts as .(3)GenToken(): suppose that , and the token is generated as follows:(1)Adding random numbers: for a query point , the user first computes and then extends it as , where and , , , are random numbers satisfying .(2)Random splitting: split into two parts and . For , If , and are random numbers such that . If , .(3)Data encryption: compute the ciphertexts as , .(4)GenInner(): the inner generation algorithm computes and outputs .(5)Decrypt (): this decryption algorithm first computes and . For ,If , .If , .

Finally, the user obtains and .

Correctness: the correctness of CompInner tool is illustrated as follows. We have

4.1. The Main Scheme

The secure GNN scheme, named SecGNN, consists of six polynomial-time algorithms, Setup, EncDB, CompCen, GenToken, Query, and Decrypt. The details of these algorithms are described as follows.(1)Setup (): on inputting a security parameter , it outputs a cyclic group with order and generator . Furthermore, the LBS provider runs CompInner.Setup() to output the secret key:(2)EncDB (, ): suppose is a point in POIs’ database . The algorithm EncDB is run by the LBS provider to encrypt the point as(3)ShareRan (): LBS users proceed this algorithm to compute a random value that keeps secreted from outside attackers. In the first phase, LBS users choose random numbers and publish to other LBS users. In the second phase, users compute and broadcast , , , , , and . Finally, LBS users compute and obtainIt is easy to see that the values satisfy . The value is shared by all group users.(4)GenToken (: after obtaining the secret value , LBS users , , , compute , , , , respectively. Note that the random number in above algorithms is chosen as the shared value .(5)Query (): let and are two encrypted points in . To determine whether the sum of inner products for query token to is smaller than that to , the cloud server first computes and and then checks whetherIf it holds, the aggregate inner products with is smaller than those with . Otherwise, the inner products with is smaller. The cloud server can linearly search all points in encrypted database and obtains point with the minimum aggregate inner products.(6)Decrypt (sk, ): in the query phase, the cloud server finds with the minimum aggregate inner products. The LBS users run

Finally, LBS users obtain the GNN location, namely, the meeting place.

Correctness: assume that and are two encrypted POIs in . Note that

Thus, if , then the aggregate inner products for is smaller than those for . That is, the aggregate distances for are smaller than those for . In such a case, the cloud server can find the group nearest neighbor (GNN) through linearly searching the encrypted database.

4.2. Security Analysis

In this section, we first illustrate the security analysis of our CompInner tool and our SecGNN scheme. The security proofs are described in the following.

Theorem 1. The proposed CompInner tool preserves data privacy and query privacy under the background model. Specifically, outside attackers could not learn the data items from the encrypted database and could not obtain the query point from the query token.

Proof 1. Without the loss of generality, we will prove the theorem from data privacy and query privacy two parts. The proof for data privacy will be introduced in the following.
Suppose that outside attackers have the knowledge and its trace , where is the encrypted database and is the encrypted queries submitted by group users, and the trace contains access pattern , search pattern , path pattern , and identifiers . Based on an encrypted data tuple , an adversary should recover two invertible matrices and from and further guess the bit string that are involved in the key if she wants to recover the original data from the ciphertexts. In such scenarios, an outside attacker has to model and as two -dimensional random vectors. The equations for solving the unknowns areAs we known, string is a bits string with length ; thus, the number of possible ways to split is , which is large enough to prevent an adversary from correctly guessing . On the contrary, the advantage of an adversary correctly guesses is negligible, and there are unknowns in and and unknowns in invertible matrices and . Since the number of equations are less than the number of unknowns , an outside attacker could not solve the equations for the unknowns.
For the query privacy, we pad random numbers for the query point in the encrypt algorithm, which leads to the fact that a same point generates different . Furthermore, we split dimensional into two random vectors and , whose technique is same as in the encryption phase. Thus, query privacy is semantically secure.
In this case, we prove that our CompInner tool achieves data privacy and query privacy.

Theorem 2. The proposed SecGNN scheme preserves the data privacy and query privacy from outside attackers.

Proof 2. In the SecGNN scheme, an outside attacker obtains the plaintext from the encrypted database. The encryption algorithm in SecGNN is the same with that in CompInner tool. Therefore, the data privacy can be assured in Theorem 1.
In the following, we proof the query privacy in two parts. First, we prove an outside attacker fails to learn the shared number key (note that the leaks of random number could not help an outside attacker to obtain actual locations. The shared number only helps the LBS users to hide the query frequency). Learning requires an outside attacker can disclose the AV-net mask. Specifically, an outside attacker computes from and in AV-net protocol. Under the assumption of decisional Diffie–Hellman (DDH) problem, an outside attacker fails to learn knowledge of shared key .
Furthermore, query tokens are produced in the CompInner.GenToken algorithm. Since CompInner achieves query privacy in Theorem 1, it is easy to see that actual locations can be protected from outside attackers.
Note that LBS users can obtain actual locations of other users in the SecGNN scheme. We will further improve the query privacy in the enhanced scheme Sec, namely, the actual location of a LBS user could not obtained by other LBS users until the rest of LBS users colluded.

5. Efficiency-Enhanced Group Nearest Neighbor Query

In this section, we aim to construct an efficiency-enhanced group nearest neighbor query scheme (called Sec) that protects actual locations privacy from other LBS users and achieves faster than linear search complexity. The Sec scheme is designed by the leveraging tree structure, such as KD-tree, R-tree, and quad-tree. In this work, we take KD-tree to describe the details of the proposed scheme.

Assume that is the centroid of LBS users and is the coordinate of a point in POIs. Therefore, the distance from the centroid to point is

Since is a fixed number, the NN for centroid is to find a point that has the minimum inner product . Considering GNN in Section 4, it requires to find a point with the minimum aggregate inner products:where and .

Therefore, the GNN problem requires to find a point in POIs with the minimum inner product . That is, the GNN problem can be converted as the NN problem for their centroid:

In the following, we describe the problem on how to use KD-tree to enhance the efficiency and support privacy preserving. The KD-tree for solving the NN problem requires comparison and computation operations, namely, the order comparison between query points and tree nodes should be proceeded, and distances from query points to tree nodes and splitting lines should also be computed.

From the root to leaf nodes, the search algorithm is done to check whether or , where is the coordinate of the query point and is the coordinate of the current tree node. The checking is done in each dimension. A function on coordinate is proposed aswhere is much bigger than , and thus, is a positive number.

The square of distance from the query point to tree node is defined as

The distance from query point to splitting line (e.g., coordinate) is defined as

Thus, the order comparison and distance computation are converted to the inner products computation. As described in CompInner tool in the SecGNN scheme, we first introduce a tool called CompInner to compute the above inner products. The basic idea of CompInner is similar to CompInner. The differences occur at data encryption and token generation.

5.1. Basic Tool

The proposed CompInner contains five polynomial-time algorithms, Setup, Encrypt, GenToken, GenInner, and Decrypt. The details are described as follows.(1)Setup (): it is the same with the system setup algorithm in CompInner. Thus, the secret key is (2)Encrypt (): suppose that is a data record, and this algorithm does the following steps.(1)Adding random numbers: for a data , the user computes and then extends them as and .(2)Random splitting: split into two parts, and . For , If , and . If , , and , are random numbers such that and .(3)Data encryption: compute the ciphertexts asThus, the ciphertext for data point is .(3)GenToken (): suppose that , the token is generated as follows.(1)Adding random numbers: for a data , the user extends it aswhere , , , , , , and , are random numbers such that(2)Random splitting: split into two parts , , , and . For , If , , , , , , , and , are random numbers such that If ,(3)Data encryption: compute the ciphertexts asThus, the token generation algorithm outputs token as , where , , , and .(4)GenInner(: the inner generation algorithm computes and outputs(5)Decrypt(): suppose that and . The decryption algorithm first computesFor ,If , and If , and

Finally, the user obtains and .

5.2. The Detailed Description

From above, we convert the problem of finding GNN for a group of users to the problem of finding NN for their centroid. In the following, we illustrate an efficiency-enhanced group nearest neighbor query scheme called based on KD-tree and CompInner tool. The introduced scheme consists of seven polynomial-time algorithms, Setup, BuildTree, EncDB, CompCen, GenToken, Query, and Decrypt. The detailed description is illustrated as follows:(1)Setup (): on inputting a security parameter , it outputs a cyclic group with order and generator . Furthermore, this algorithm runs CompInner tool and outputs a secret key:(2)BuildTree (): on inputting the database of POIs, this algorithm proceeds the KD-tree algorithm and stores the data items in a KD-tree .(3)EncDB (): suppose that is a data record stored in KD-tree . The LBS provider runs CompInner.Encrypt() to compute the ciphertext as(4)CompCen (): suppose that , , , are coordinates of LBS users , , , . LBS users evaluate the centroid as follows.(1)Each LBS user chooses two random numbers and publishes to other LBS users. Then, computes and further broadcasts .(2)The user computes(3)User computes and publishes . Multiplying all ,User can get the value through dividing by the secret key . Since is a small number, LBS users can compute the discrete logarithm by using the Pohlig–Hellman algorithm [30]. Considering the coordinates are six- or seven-decimal digits, they can be represented as a 32 bit data. Therefore, is a small number and can be efficiently computed from .(4)Finally, user can compute the centroid as and .(5)GenToken (): after receiving the centroid , LBS user runs CompInner.GenToken() to compute the token as , where , , , and .(6)Query (): the query algorithm consists of two phases. In the first phase, the cloud server searches over the encrypted KD-tree to find a leaf node whose corresponding region includes the query point. The cloud server computes  = CompInner.GenInner() (or  = CompInner.GenInner()) and checks whether . If it holds, the cloud server searches the left child; otherwise, it searches the right child. The cloud server writes down the route from root to the leaf node in a list .

After arriving at the leaf node, the cloud server takes this leaf node as the temporary NN and proceeds the second phase to check whether the temporary NN is the NN. In the second phase, the cloud server checks the list from leaf nodes back to the root. For each node, he computesand checks whether , where is the temporary NN and is the current node in the list . If it holds, the cloud server searches the father node; otherwise, the cloud server updates the current node as the temporary NN and further checks whether (or ). If it holds, the cloud server searches the other child of temporary NN. Otherwise, it backs to search the father node until to the root.(1)Decrypt(): the NN for the centroid is computed as in the query algorithm. LBS users run CompInner.Decrypt() to compute the plaintext. Finally, the LBS users can obtain the location, namely, the NN for the centroid and the GNN for the group of LBS users.

Correctness: during the .Query phase, the cloud server evaluates  = CompInner.GenInner() (or  = CompInner.GenInner()) and checks whether :

If , then we have , namely, . In such a case, the cloud server searches the left child. For coordinate, we have the same conclusion. Therefore, the cloud server can arrive at the leaf node correctly:

We also compute , , and . Thus, we have

If , it is easily known that the current node is closer than the temporary NN. Thus, we update the current node as the new temporary NN. Furthermore, the cloud server checks whether to search the other child through checking or . Therefore, the cloud server can search the NN for the query point over ciphertexts from the core idea of KD-tree.

5.3. Security Analysis

In this section, we illustrate the security analysis of the proposed CompInner tool and Sec scheme from data privacy and query privacy (e.g., actual location privacy of LBS users and centroid privacy).

Theorem 3. The introduced CompInner tool achieves data privacy and query privacy from outside attackers.

Proof 3. The security analysis of the introduced tool is similar to CompInner, which is presented in Theorem 1. Thus, we skip the details of proof for the sake of clarity.

Theorem 4. The proposed Sec scheme preserves the data privacy and the query privacy.

Proof 4. In the following, we present the proof from data privacy against outside attackers and query privacy (actual locations and the centroid) against outside attackers and other LBS users.
First, we will introduce the data privacy of the proposed Sec scheme. Data items are encrypted by using CompInner.Encrypt(). Since the data privacy can be assured in Theorem 3, the Sec scheme preserves data privacy. We skip the details for the sake of clarity.
Furthermore, the centroid privacy from outside attackers will be presented as follows. In the CompCen algorithm, an outside attacker learns the knowledge about the secret key from solving the DDH problem (as described in the proof of Theorem 2). Thus, the centroid privacy can be assured from outside attackers in the CompCen algorithm. In the GenToken algorithm, the centroid is encrypted aswhere is the centroid of group of users. Since CompInner tool achieves query privacy, the centroid privacy can be assured in the GenToken algorithm.
Moreover, the introduced Sec scheme preserves the privacy of actual locations from outside attackers and other LBS users. The centroid computation (CompCen) algorithm consists of AV-net and BD conference key agreement protocols. That is, the centroid privacy is based on these two schemes. An outside attacker fails to obtain the centroid because learning the location requires the attacker can disclose the AV-net mask and learn the knowledge of BD conference key. Specifically, an outside attacker needs to compute from and in AV-net protocol and compute from and in BD conference key agreement, respectively. Under the assumption of decisional Diffie–Hellman (DDH) problem, an outside attacker fails to obtain the centroid. A LBS user takes part in the attack game and wants to learn the knowledge of ’s actual location. Since LBS user knows and , then he can compute . Based on the difficulty of the DDH problem, LBS user fails to compute from and . Thus, LBS user could not learn the knowledge about the actual location of LBS user .

6. Performance Evaluation

In this section, we first illustrate the performance of the basic tools (i.e., CompInner and CompInner) and then demonstrate that of the introduced SecGNN and Sec schemes through experimental simulation. Specifically, we evaluate the performance with Python3 language on a machine with Intel(R) Core(TM) i7-9750H CPU processor running at 16 GB and 1 TB memory in Table 3. The matrix is provided by Numpy library, and the discrete logarithm is computed by the Pohlig–Hellman [30] algorithm. Throughout the experiment, we simulate all the LBS provider, LBS users, and the cloud server on the same machine. Furthermore, we set the extended length as , that is, equivalent the security of 1024 bit RSA keys.

Database: in the experiment, we evaluate the performance on a real database and a random database, respectively. The real database is the Sequoia database (Sequoia database. http://chorochronos.datastories.org/?q=node/58) that contains 62,556 data items in California. The random database contains 1,000,000 data items which are randomly generated. The descriptions of the Sequoia database and random database are presented in Figure 4.

6.1. Performance of the Basic Tools

CompInner and CompInner include system setup (Setup), data encryption (Encrypt), token generation (GenToken), inner computation (GenInner), and data decryption (Decrypt) algorithms. Figure 5 evaluates the performance of above algorithms with the increasing of expanded number (set , namely, 1000 Setup, Encrypt, GenInner, GenToken, and Decrypt). Time costs of GenToken and Decrypt algorithms are mainly occupied by the computation of the inverse matrices for secret keys and . If the inverse matrices and are precomputed, time costs of GenToken and Decrypt algorithms will present the similar performance with the encrypt algorithm in both tools. The detailed performance will be presented in the following experiments.

LBS provider runs CompInner.Setup() and CompInner.Setup() algorithms to produce the secret key , which contains two invertible matrices , a 80 bit string S, and random numbers . To enhance the efficiency, we precompute the invertible matrices for matrices . The system setup algorithm can be finished in 10 ms and thus is very efficient.

As illustrated in Figure 6, we evaluate the performance of CompInner tool and CompInner tool over the Sequoia database and a random database. It is easily seen that all algorithms have a high efficiency in both tools. Since the GenInner in CompInner tool is the same with that in CompInner, time cost of the GenInner algorithm performs the same performance. Specifically, the GenInner algorithm has the best efficiency, about 0.5 s for 60,000 inner computations and only 7.6 s for 1,000,000 inner computations for CompInner and CompInner. The GenToken algorithm shows the worst time cost, 8.6 s for 60,000 tokens and 141.2 s for 1,000,000 tokens on CompInner tool and 32 s for 60,000 tokens and 513 s for 1,000,000 tokens on CompInner tool. Furthermore, since GenInner is performed once for the data utilization, it is very efficient in the following group nearest neighbor query scheme, only 0.14 ms for one token in CompInner tool and 0.51 ms for one token in CompInner tool. The encrypt algorithm is an one-time algorithm for database construction and is acceptable for both tools. In conclusion, both tools are very efficient for the data utilization in practice.

6.2. Performance of the Introduced Schemes

For the convenience of description, we adapt some notations in this section. We denote by PlainNN the linear search nearest neighbor on plaintexts, by PlainGNN the linear search group nearest neighbor on plaintexts, by KDNN the KD-tree used in the nearest neighbor query on plaintexts, by SecGNN the SecGNN scheme in Section 4, and by Sec the efficiency enhanced scheme in Section 5.

The introduced SecGNN scheme contains six algorithms, system setup, database encryption, random share, token generation, database query, and result decryption, and the Sec scheme consists of seven algorithms, system setup, KD-tree build, database encryption, centroid computation, token generation, database query, and result decryption. In the following, we will describe the performance of above algorithms, respectively.

System setup: for the introduced SecGNN and Sec schemes, the LBS provider runs the setup algorithm to output a cyclic group with order and generator . Furthermore, the setup algorithm outputs a secret key , namely, two invertible matrices , a 80 bit string , and random numbers . For the convenience of computation, the invertible matrices of are precomputed in this phase. The experimental simulation shows that the setup algorithm can be done in 5 ms.

Tree construction: the BuildTree algorithm is run by the LBS provider to store POIs in a KD-tree data structure. The performance is illustrated in Figure 7. Figure 7(a) illustrates the time cost increasing with the size of data items in the Sequoia database and Figure 7(b) presents that in a random database. Moreover, KDNN and Sec present the same efficiency as the BuildTree algorithm and only requires the LBS provider to store POIs in a KD-tree structure.

Database encryption: the LBS provider runs the EncDB algorithm to assure the privacy of data items. Figure 8 presents the encryption efficiency of SecGNN and Sec schemes. As illustrated in Figures 8(a) and 8(b), it is easily seen that both schemes achieve high efficiency, about 99.27 s for 1,000,000 data items on the SecGNN scheme and 182.4 s for 1,000,000 data items on the Sec scheme. Furthermore, SecGNN achieves better encryption efficiency than the Sec scheme. Because database encryption is one-time operation, it is acceptable for the Sec scheme.

Shared random and centroid computation: LBS users run the ShareRan algorithm to share a random number in the SecGNN scheme and run the CompCen algorithm to compute the centroid for the following algorithms. LBS users compute the shared random number through BD-protocol and compute their centroid by using AV-net protocol and BD-protocol. Figure 9 presents the time cost increasing with the size of group users. The group size varies from 10 to 100. It costs about 25 ms to share a random number with a group of 100 LBS users in the SecGNN scheme and 180 ms to compute the centroid with 100 LBS users in the Sec scheme. In the CompCen algorithm and the following algorithms, it is enough for one special LBS user to compute the centroid and proceed the group nearest neighbor query. Finally, the special LBS user broadcasts the meeting place using the conference key. As demonstrated in Figure 9, the centroid computation is very efficient, 18 ms for 10 users and 180 ms for 100 users.

Token generation: in the SecGNN and Sec scheme, LBS users compute and generate the query token for the cloud server to search over the encrypted POIs. The cloud server directly searches the group nearest neighbor in PlainNN, PlainGNN, and KDNN, and therefore, they do not need to execute the GenToken algorithm. After receiving the shared random number in the SecGNN scheme, LBS users run the GenToken algorithm to compute the search token, respectively. The Sec scheme requires a special LBS user to compute the search token and send it to the cloud server. As illustrated in Figure 6, the search token can be computed quickly, about 0.15 ms for SecGNN and 0.5 ms for the Sec scheme.

Group nearest neighbor query: the cloud server proceeds the query algorithm to search the group nearest neighbor over encrypted POIs. In Figure 10, we illustrate the time cost for PlainNN, PlainGNN, and KDNN on plaintexts and time cost for SecGNN and Sec on encrypted POIs. Figure 10 presents that the PlainGNN scheme is a little slower than other introduced schemes in both the Sequoia database and random database. Since LBS users should compute the aggregate distance for each point in POIs to all group users () and linearly search the entire POIs, time cost on PlainGNN has the worst efficiency. The KDNN scheme has the best efficiency for the group nearest neighbor query over above schemes. Time cost of SecGNN scales linearly with the data items. Thus, SecGNN is impractical with million data items. As illustrated in Figure 10, the Sec scheme achieves practical search efficiency in data utilization on big data era, about 0.2 s on millions database.

Decryption: after receiving the query result, LBS users decrypt the results and finally receive the meeting place. The decrypt algorithm only contains two matrix multiplications. The multiplication is very fast and can be computed less than 2 ms.

7. Conclusions

In this work, we study the problem of the privacy-preserving group nearest neighbor query, namely, finding a meeting place with the minimum aggregate distance to a group of LBS users. The introduced SecGNN scheme supports group nearest neighbor (), while preserving the data privacy and query privacy from outside attackers. Unfortunately, it only achieves linear search complexity. To achieve high search efficiency, we design an efficiency-enhanced Sec scheme by leveraging the KD-tree structure. More specifically, we convert the GNN problem to the NN problem for the centroid and then leverage AV-net protocol and BD-conference key agreement to compute the centroid. Furthermore, the KD-tree structure and ASPE algorithm are introduced to construct the efficiency-enhanced Sec scheme. Note that the solution is compatible with other tree structures (e.g., R-tree and Quad-tree). Finally, we present the performance evaluation to show the high efficiency of our proposed schemes.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by National Natural Science Foundation of China (no. 61960206014) and National Crypto Development Foundation (no. MMJJ20180110).