Abstract

Protecting location privacy has become an irreversible trend; some problems also come such as system structures adopted by location privacy protection schemes suffer from single point of failure or the mobile device performance bottlenecks, and these schemes cannot resist single-point attacks and inference attacks and achieve a tradeoff between privacy level and service quality. To solve these problems, we propose a k-anonymous location privacy protection scheme via dummies and Stackelberg game. First, we analyze the merits and drawbacks of the existing location privacy preservation system architecture and propose a semitrusted third party-based location privacy preservation architecture. Next, taking into account both location semantic diversity, physical dispersion, and query probability, etc., we design a dummy location selection algorithm based on location semantics and physical distance, which can protect users’ privacy against single-point attack. And then, we propose a location anonymous optimization method based on Stackelberg game to improve the algorithm. Specifically, we formalize the mutual optimization of user-adversary objectives by using the framework of Stackelberg game to find an optimal dummy location set. The optimal dummy location set can resist single-point attacks and inference attacks while effectively balancing service quality and location privacy. Finally, we provide exhaustive simulation evaluation for the proposed scheme compared with existing schemes in multiple aspects, and the results show that the proposed scheme can effectively resist the single-point attack and inference attack while balancing the service quality and location privacy.

1. Introduction

With the rapid development of mobile devices and social networks, location-based service (LBS) has become a vital part of our daily activities in recent years. With smartphones or tablets, users can download location-based applications from Apple Store or Google Play Store. With the help of these applications, users can easily send queries to a service provider and obtain LBSs related to some points of interest. For example, users can check the bus schedule, the price information of nearby restaurants or gas stations, etc. Undoubtedly, by submitting LBS queries, users can enjoy the convenience provided by LBS. However, since the untrusted service provider has all the information about users such as where they are at which time, what kind of queries they submit, what they are doing, etc., he may track users in various ways or release their personal data to third parties. Thus, we need to take appropriate measures to protect users’ privacy.

Many approaches [13] have been taken to address such privacy problems, where the location anonymity and location perturbation are commonly used. The existing location anonymity technology usually adopts the structure based on a trusted third party [3]. The structure refers to introduce a trusted third party, called centralized location anonymizer, between the user and the service provider, and the usage of the location anonymizer to make the target user’s information indistinguishable from that of at least other users, so that the probability of location leakage is therefore at most . Specifically, to achieve anonymity, an LBS-related query is submitted to the service provider via a centralized location anonymizer, which enlarges the queried location into a bigger cloaking region covering many other users (e.g., geographically. As a result, it is hard for the untrusted service provider to distinguish the user’s real location from this area. However, these approaches of using anonymity have a fatal problem. It heavily relies on the location anonymizer, which suffers from a single point of failure [4]. If the adversary gains control of it, the privacy of all users will be compromised.

In response to the problems existed in the structure, some researchers have proposed a dummy location technology that can also achieve anonymity, which uses an independent system structure [2]. The independent structure only contains the user and the service provider, where the user uses the mobile terminal to generate dummy locations and then sends dummy locations combined with the user’s real location to the service provider. As a result, it is hard for the untrusted service provider to distinguish the user’s real location from the other dummy locations. Since the structure achieves functions such as location anonymity and filtering query results through a mobile terminal instead of a location anonymizer, there is no single point of failure caused by the location anonymizer. In 2008, with the birth of Bitcoin [5], blockchain technology has been widely used in finance, medical care, supply chain, and other fields. Blockchain [68], as the underlying technology of Bitcoin, realizes distributed information interaction and collective maintenance of data in a decentralized and autonomous way, with decentralization, tamper-proof, autonomy, traceability, etc. Simultaneously, the security of consensus protocols [9] and the protection of user privacy [10] in the blockchain has become a new research hotspot. Chen et al. [11] proposed a dynamic multikey fully homomorphic encryption. The decentralized characteristic of blockchain opens a new door for location privacy protection. Based on this idea, [12] proposes a distributed anonymity location privacy protection scheme based on blockchain, which can also achieve anonymity without the help of a location anonymizer.

In the LBS, the user firstly adopts approaches that are based on perturbing the information reported to the service provider, so to prevent the disclosure of one’s location. Clearly, the perturbation of the information sent to the service provider leads to a degradation of service quality, and consequently, there is a trade-off between the level of privacy that the user wishes to guarantee and the service quality loss that she will have to accept; however, the adversary formulates corresponding strategies based on the privacy protection method adopted by the user and infers the real location of the user by observing the perturbation of the information. Since the relationship between users and adversaries objectively conforms to the game relationship between participants in the Stackelberg game model, introducing the Stackelberg game method into the field of location privacy protection is an important research direction. Shokri et al. [13] took the lead in introducing the Stackelberg game into location privacy protection and proposed a location privacy protection scheme based on the Stackelberg game . The solution assumes that the adversary has acquired prior knowledge and allows the user and the adversary to play the game in turn: The level of privacy protection is maximized by user when the service quality loss is less than a given threshold, whereas the adversary strives to minimize the level of privacy protection based on prior knowledge and offset location. By playing the game above, this strategy can finally optimize the level of privacy protection while ensuring that the service quality loss is less than a given threshold.

Based on the analysis above, the existing location privacy protection schemes have the following shortcomings: (1) the existing location privacy protection schemes either adopt structure that has a single point of failure or the independent system structure. However, users in the independent structure use mobile terminals to perform location anonymity algorithms and filter query results, which will greatly increase the client’s pressure, affecting the service quality in turn. (2) On the one hand, these schemes do not fully consider the semantics, physical dispersion, and query probability of the location when selecting dummy locations. On the other hand, they do not fully consider the background knowledge that the adversary may have, which adversaries can use to infer the users’ location privacy information. So they cannot effectively resist single-point attacks and inference attacks. (3) Since such schemes need to sacrifice the service quality for improving the privacy protection level, there is no trade-off between service quality and privacy protection level. Aiming at related shortcomings above, this paper comprehensively considers features such as side information, location semantics, physical dispersion of locations combined with dummy locations, anonymity technology, Stackelberg game and other ideas, and then designs a anonymous location privacy protection scheme based on Stackelberg game and dummy locations, which can resist single-point attacks and inference attacks while effectively balancing service quality and location privacy. Our contributions are mainly as follows:(1)A semitrusted third party based location privacy protection structure is proposed. The is based on the structure by adding an encryption server and WiFi-AP, and stores the user’s privacy information on three party servers through a certain mechanism. In the , even if the adversary steals the information on the location anonymizer, he still cannot locate the user and obtain the user’s complete privacy information, which effectively solves the single point of failure existed in the structure. Meanwhile, the location anonymizer is responsible for implementing location anonymity algorithms and filtering query results, etc., therefore, also solve the mobile device performance bottlenecks that exist in the independent structure.(2)We propose a dummy location selection algorithm based on location semantics and physical distance . Compared with existing dummy location selection algorithms, takes into account the characteristics such as location semantic diversity, physical dispersion, query probability and offset location when selecting dummy locations, which can effectively protect users’ location privacy against single-point attack. Furthermore, we propose a location anonymous optimization method based on Stackelberg game, which introduces Stackelberg game to improve the dummy location selection algorithm. More specifically, we formalize the mutual optimization of user-adversary objectives (location privacy vs. correctness of inferring location) by using the framework of Stackelberg games, and find an optimal dummy location set by solving the game equilibrium. The optimal dummy location set can resist single-point attacks and inference attacks while effectively balancing service quality and location privacy.(3)We conduct a comprehensive experiment to evaluate the proposed scheme. Experimental results show that our scheme can effectively resist single-point attacks and inference attacks while effectively balancing service quality and location privacye when compared with other dummy-based schemes.

The rest of the paper is organized as follows. We discuss the related work in Section 2. Section 3 presents some preliminaries of this paper. Section 4 presents the structure of and the interactive process. We present the algorithms and a location anonymous optimization method based on Stackelberg game in Section 5. Section 6 presents the experimental process as well as results. We conclude the paper in Section 7.

In this section, we first analyze the merits and drawbacks of mainstream existing location privacy protection system structure. Furthermore, we review major existing techniques for preventing location privacy leakage including privacy protection scheme based on dummy and privacy protection scheme based on Stackelberg game in Sections 2.2 to 2.3, respectively.

2.1. Location Privacy Protection System Structure

As a large body of location privacy protection technologies has been proposed, the system structure on which various privacy protection technologies depend has shown distinct category differences. As the carrier of privacy protection technology implementation, the system structure has got sufficiently researched and developed.

Currently, there are two mainstream system structures: based central server structure and independent system structure. In the structure [1416], the location anonymizer obtains the location information of all users and is responsible for implementing the location privacy protection mechanism. It is currently a more commonly used privacy protection system structure, the advantage of which is that the location anonymizer can obtain the location information of all users and assist users in filtering service data. The disadvantage is that it relies on a location anonymizer to enlarge the queried location into a bigger cloaking region, and hence the location anonymizer becomes the central point of failure. References [2, 17, 18] proposed an independent system structure, where users can protect their location privacy according to their own capabilities and knowledge. The architecture treats each user as an independent individual, allows the user’s device to implement a location privacy protection mechanism, directly sends a service request to the service provider, and receives the query result. The advantage of this system structure is that the deployment is conveniently simple, and it is convenient for users to adjust the privacy protection granularity according to their privacy protection needs. However, the implementation of privacy protection algorithms has been limited by the performance of mobile devices. Meanwhile, filtering query results will also increase the burden on the mobile client, which in turn affects service quality.

2.2. Privacy Protection Scheme Based on Dummy

Location dummies are aimed to secure users’ accurate location by sending false locations (“dummies”) together with the true location so that the probability of location leakage is reduced to . Compared to the traditional anonymity, this approach sends exact locations instead of cloaked regions to a service provider, which can return a more precise query result and avoid single-point failure.

Kido et al. [19, 20] first proposed to use dummy locations to achieve anonymity without employing a central server. However, they only concentrate on reducing the communication costs. Moreover, they employ a random walk model to generate dummy locations, and it cannot resist side information attacks due to lack of considering factors such as query probability. Subsequently, although Dapeng et al. [21] proposed the algorithm based on query probability, which can resist side information attacks. However, it cannot resist homogeneity attacks and location similarity attacks for not considering the physical dispersion and location semantic diversity. The algorithm proposed by Chang et al. [22] protected location privacy to a certain extent, but did not consider the location semantic diversity, so it cannot deal with location similarity attacks. Niu et al. [23] selected dummy locations based on entropy metrics, and proposed a dummy location selection scheme and its improved scheme . Although the scheme can resist side information attacks and homogeneity attacks, which cannot resist location similarity attacks for lacking of considering the location semantic diversity. References [24, 25] all considered the user’s location semantic diversity, which can effectively resist location similarity attacks, but all have the problem that the cloaked region is too big, affecting the service quality in turn. Although [26, 27] fully considered the location semantic diversity and physical dispersion, which can effectively resist homogeneity attacks and location similarity attacks, but they cannot resist the side information attack for not considering the query probability.

2.3. Privacy Protection Scheme Based on Stackelberg Game

In a big data environment, an adversary can use the various data collected to infer the privacy information of the user’s location [28], which is called the location inference attack. Because the traditional dummy-based privacy protection scheme cannot effectively resist this kind of inference attack, the location privacy protection mechanism based on probabilistic reasoning [29, 30] has gradually attracted the attention of scholars. Such methods are based on perturbing the real locations of a user to the service provider, in order to increase the uncertainty of the adversary about a user’s true whereabouts. However, the perturbation of the information sent to the service provider leads to a degradation of service quality, and consequently, there is a trade-off between the level of privacy that the user wishes to guarantee and the service quality loss that she will have to accept. So, the Stackelberg game has become an important means of balancing the level of privacy protection and service-quality requirements in such methods.

Based on [13] and combined the ideas of anonymity and dummy location, Xingyou et al. [31] propose and its improved scheme . Although the scheme can effectively resist inference attacks and better balance service quality and location privacy, it cannot resist location similarity attacks for not considering location semantic diversity. Bordenabe et al. [32] also introduced differential privacy on the basis of [13] and constructed a privacy protection mechanism that optimizes the quality of service. Since differential privacy does not depend on prior, this mechanism can minimize the service quality loss under the premise of satisfying location indistinguishability. Shokri [18] further proposed using two indicators of differential privacy and distortion privacy to optimize the privacy protection strategy based on the Stackelberg game. Differential privacy limits the extent of user privacy leakage, while distortion privacy measures the error rate of inferring a user’s privacy. By combining these two standards, this privacy protection strategy can resist more kinds of inference attacks while ensuring privacy protection requirements.

3. Preliminaries

In this section, we first introduce some relative definitions of location privacy protection algorithm; meanwhile, we summarize the notations introduced throughout the section in Table 1 and then introduce the relative concepts of Stackelberg game.

3.1. Relative Definitions of Location Privacy Protection Algorithm

Definition 1. According to [27], location semantic tree , a true structure used to represent the semantic relations between two locations within the range of a Wi-Fi access points (Wi-Fi AP), which satisfies the following requirements:(a)Each nonleaf node stands for the category of its children nodes and each leaf node for a real location (b)The depth of , denoted as , is equal to the maximum number of layers of categories plus 1(c)The semantic distance between two locations is the number of hops from leaf node to leaf node

Definition 2. User’s privacy requirements , represented by two-tuple that has the following meanings:(a) denotes the anonymous degree of our location privacy preservation model. More specifically, each query is sent with at least dummy locations and its offset location (we use offset location instead of the real location), making that the probability of offset location leakage is therefore .(b) represents the minimum acceptable value of semantic distance between two locations in dummy location set . In other words, it satisfies the inequality:

Definition 3. (location map distance). If we let represent the map information within the range of the current Wi-Fi AP. For any two locations , the location map distance is the physical distance between the two locations on , the value of which ranges from tens of meters to hundreds.

Definition 4. (location query probability ). As shown in Figure 1, in a map divided into cells with equal size. Each cell has a query probability based on the previous query history, which is denoted aswhere . The depth of the color in the figure indicates (the darker the color, the greater the , and the white area indicates that the location has never had a location service request, so these locations may be rivers, barren mountains, and other places that are easily filtered by the adversary.

Definition 5. The probability of exposing real location , which has been used to measure the effectiveness of the algorithm against side information attacks, is computed bywhere denotes the anonymous degree and represents the number of dummy locations filtered by the adversary through the side information attack. The larger the , the less effective the algorithm resists side information attacks; the smaller the , the better the algorithm resists side information attacks.

Definition 6. Location physical dispersion , which has been used to measure the effectiveness of the algorithm against location homogeneity attacks, is obtained by computing the minimum physical distance between any two locations in a . The specific process is shown in (4):where . The greater the minimum distance between any two locations in the , the greater the and the greater the coverage of the , the better the algorithm’s resistance to location homogeneity attacks.

Definition 7. secure set of dummy locations. Dummy location set consisting of dummy locations and the offset location, where the semantic distance between and satisfieswhere , , and is a combination formulas, we call a secure set. We use as a privacy protection index of location semantics in our experimental analysis in Section 6. Our aim is to achieve the maximum , i.e., to make it equal to 1, such that two locations in belong to different categories.

Definition 8. The adversary uses background knowledge to run an inference attack on in order to output estimation of the user’s actual locations and the attack result can be denoted as , and we define the location privacy protection mechanism (LPPM) that the adversary knows as . Then, we follow the definition in [33] and quantify the user’s privacy level as the adversary’s expected error in his inference attack, i.e., the expected distortion in the reconstructed event. We compute the expectation over all , , and : directly reflects the adversary’s inference on the user’s actual location. The larger the , the less accurate the adversary’s inference, and the better the effect of the algorithm resist the inference attack.

Definition 9. We define the process of generating the offset location as . Then, following the definition in [13], the LBS response quality depends on the offset location output by and not on the user’s actual location . The distortion introduced in the offset location determines the quality of service that the user experiences. The more similar the actual and the offset location are, the higher the service quality is. The expected quality loss due to is computed as an average of over all and :We set a service quality threshold , which represents the maximum service quality loss that the user can accept. The location privacy protection scheme designed in this paper needs to guarantee because if the service quality loss exceeds the threshold, the location service request results obtained cannot satisfy the requirements of users.

3.2. Stackelberg Game

The classic Stackelberg game is a two-player game composed of a leader and a follower [34]. The leader first determines his strategy, and after observing the leader’s strategy, the follower chooses the strategy that maximizes his utility to play the game. In the field of location privacy protection, the terms “protector” and “leader,” “adversary” and “follower” can be used interchangeably. For simplicity of expression, the leader (protector), denoted as , is often referred to as she, whereas the follower (adversary), represented by , is referred to as he.

Definition 10. In the field of location privacy protection, the strong Stackelberg equilibrium is generally used as the solution of the Stackelberg game. The definition of is described as follows.
A strategy combination is , if and only if it satisfies the following conditions:(a)The leader’s strategy is the best response:where means a particular type of follower, represents the leaders’ mixed strategy, and represents the followers’ mixed strategy.(b)The follower’s strategy is the best response:where represents the leaders’ mixed strategy, represents the followers’ mixed strategy.(c)If there are multiple best responses for the followers, the followers choose the most favorable strategy for the leader to break the deadlock:where is the follower’s best response strategy set under the leader’s strategy is .

4. System Model

We first give the definition of the single-point attack mode and inference attack mode in Sections 4.1 and 4.2 respectively, and then introduce the structure of in Section 4.3. Finally we present the interactive process of our scheme in Section 4.4.

4.1. Single-Point Attack Model

From the time dimension, the adversary relies on the intercepted single location-service request to infer the user’s private information, which is called the single-point attack model [35]. In the model, the main attack methods of adversaries include side information attacks, homogeneity attacks and location similarity attacks.

Side information refers to information used by adversaries to filter dummy locations and help reduce anonymity, including map information and location query probability. For example, for a randomly generated dummy location set, some locations may be in a river or no man’s land, and adversaries can easily filter out these locations based on the map information. Assuming that the location anonymity requirement is , when of the dummy locations are filtered by the adversary based on the side information, the anonymity requirement is not satisfied, resulting in a decrease in the level of privacy protection.

Homogeneity attack means that the adversary analyzes the distance between multiple locations in a to infer a user’s privacy. Specifically, if the distance between any two locations is very close such as in the same building, although the satisfies anonymity, the user’s location privacy cannot be well protected because the cloaking region is too small.

The location similarity attack means that the adversary analyzes the semantic information in the cloaking region to infer a user’s privacy. More specifically, if the region contains only one kind of semantic information, such as a hospital or school, the adversary can infer the user’s behavior.

4.2. Inference Attack Model

In a big data environment, an adversary can use the various data collected to infer the privacy information of the user’s location [28], which is called the location inference attack.

In the location inference model, the adversary has certain background knowledge such as the user’s service request history records, LPPM, etc. Using the user’s service request history records, the adversary can calculate the user’s query probability distribution . When the user sends a query request again, if the location query probability distribution in the anonymous set is not uniform, the adversary can infer that the user is likely to be located in a location with a higher probability. While for the LPPM, the adversary can analyze the intercepted location request, combined with the anonymity algorithm, to infer the probability that each location in the anonymous set is the user’s true location, so as to make the inference attack more accurate.

4.3. The Structure of STTP

It can be seen from Section 2.1 that, for the current two mainstream location privacy protection system structures, the structure has the problem of a single point of failure, while the independent structure has the problem of mobile device performance bottlenecks. In view of the problems above, we have designed a semi-trusted third party based location privacy protection structure. is based on the traditional structure by adding an encryption server and Wi-Fi AP and stores the user’s private information in the three-party server through a certain mechanism, which results in that even if the location anonymizer has been controlled by adversaries, STTP also protects the user’s private information to a certain extent. Furthermore, the location anonymizer is responsible for implementing the privacy protection algorithms and filtering query results, so there are no problems such as mobile device performance bottlenecks. is shown in Figure 2, which consists of the following five entities:User: using a mobile terminal to initiate a location service request when needed.Wi-Fi AP: providing network support, and calculating, storing and .Encryption server (ES): providing encryption and decryption key pairs corresponding to the user’s pseudonym.Location anonymizer (LA): converting the user’s actual location into a dummy location set, and after the service provider returns the query result, extracting appropriate service information and returning it to the user.Service provider (SP): return the corresponding service result according to the location query request.

The proposed scheme assumes that ES, LA, and SP are “honest and curious.” On the one hand, they will not disrupt the protocol process and can faithfully complete their work following the agreement; on the other hand, they all want to analyze more other sensitive information about the user from what they have mastered. Meanwhile, we further set that ES, LA, and SP cannot collude with each other, that is, they will not be controlled by an adversary simultaneously. There will be no secrets for the user if the three parties conspire, so this setting is reasonable.

4.4. Interactive Process

There are eight steps in the interactive process of the proposed scheme. The specific implementation of each step is described below (as shown in Figure 2):(1)Before initiating a location service request, the user first requests , , and from the Wi-Fi AP.(2)The Wi-Fi AP generates , computes and stores of all locations within its current coverage area, generates and saves by collecting location semantic information within its radio range, and then sends , , and to the user. It should be noted that for any Wi-Fi AP, the location within its coverage area is relatively stable, so and do not need to change frequently.(3)The user then requests the pseudonym and key pair from ES. Specifically, if there are multiple service requests at the same location within the limited time , the user only applies for the pseudonym and key pair once; when the time exceeds or the user’s real location changes, she will reapply for a new pseudonym and key, so as to achieve the effect of confusing her identity.(4)ES generates the corresponding pseudonym and RSA key pair , returns and to the user, and sends and to the SP. It should be noted that, as an example, the solution uses the classic RSA algorithm for encryption, and it can be replaced by other encryption algorithms according to actual requirements. In addition, the solution requires ES to only act as a provider of pseudonyms and keys, so ES does not store related pseudonyms and keys locally.(5)The user first encrypts his query content with the public key and then sends his current pseudonym , encrypted query content , current real location , , , and to LA together.(6)After receiving the information, LA performs the corresponding location anonymity algorithm that generates a dummy location set to hide and then sends , and to the SP.(7)After receiving the location service request, the SP first searches for the corresponding private key according to , which is used to decrypt , and then provides the service result according to and , finally return it to LA.(8)After receiving , the LA first identifies the corresponding location according to the , and then filters out the query result from and finally returns it to the user.

5. Proposed Scheme

In this section, we first introduce a dummy location selection algorithm based on location semantics and physical distance and then present a location anonymous optimization method based on Stackelberg game.

5.1. A Dummy Location Selection Algorithm Based on Location Semantics and Physical Distance

Based on the analysis above, the final dummy location set not only needs to avoid selecting places that are easy to be filtered by adversaries, such as rivers and no man’s land but also to meet the semantic diversity while making the locations as dispersed as possible. In other words, the final dummy location set needs to simultaneously satisfy (11)–(13)where . It can be formulated as a multiobjective optimization problem (MOP) since three factors are considered simultaneously. However, we put forward a simpler objective formulas considering the complexity of MOP. In each dummy location set, we would like to make sure that (14) can be satisfied. Consequently, we propose a dummy location selection algorithm based on location semantics and physical distance .where . is to avoid the situation where the two locations have the same probability, that is, the difference between the query probability of the two locations is 0. Here, is a controllable factor for balancing the share of semantic distance, physical distance, and query probability distance since , where is the depth of and hence is usually less than 10 while , as Wi-Fi transmission distance, ranges from hundreds of meters to thousands, whereas the query probability distance is always less than 1. Consequently, we set .

Meanwhile, in order to balance service quality while the proposed algorithm can resist inference attacks effectively, we should take into account both and . So, we propose a location anonymous optimization method based on Stackelberg game, which introduces Stackelberg game to optimize the dummy location selection algorithm. More specifically, we formalize the mutual optimization of user-adversary objectives (location privacy vs. correctness of inferring location) by using the framework of Stackelberg games, to find an optimal dummy location set. The optimal dummy location set can resist single-point attacks and inference attacks while effectively balancing service quality and location privacy.

The main idea of is to first select an offset location to replace the user’s real location; secondly, selecting all locations that satisfy the semantic difference with the existing locations in the current dummy location set as the dummy location candidate set ; then, selecting an optimal location in the which refers to the location formed by satisfying (14); finally, a set consisting of an offset location and dummy locations is generated. Algorithm 1 shows the formal description of the SPDDS algorithm.

Input:: user’s real location
: user’s privacy requirement
: map information in current Wi-Fi AP : location semantic tree : location query probability;
Output:: dummy location set;
(1)Divide the as the sample space into grids;
(2)Generate semantic distance matric according to the and geographic distance matrix according to the and probability distance matric according to the respectively
(3)According to and , choose the locations with , which are closest to , and then a offset location set consisting of the locations and has been generated;
(4)Randomly choose a location from as offset location
(5)Generate a dummy location candidate set for all locations in the whole
grid space;
(6)
(7)Remove from ;
(8)whiledo
(9)ifthen
(10)  anonymity failed;
(11)else
(12)  ; ;
(13)  for eachindo
(14)   ifthen
(15)    anonymity failed;
(16)    go back to line 13;
(17)   else
(18)    
(19)    compute the maximum according to , and , which is recorded with , and then assign the corresponding to
(20)   end
(21)  end
(22)  
(23)  remove from ;
(24)end
(25)end
(26)output ;
5.2. A Location Anonymous Optimization Method Based on Stackelberg Game

We propose a location anonymous optimization method based on Stackelberg game, which optimizes the dummy location selection algorithm by introducing the Stackelberg game. More specifically, we formalize the mutual optimization of user-adversary objectives (location privacy vs. correctness of inferring location) by using the framework of Stackelberg game, and based on which we construct the related linear programs. We can find an optimal dummy location set by solving the linear programs, which can resist single-point attacks and inference attacks while effectively balancing service quality and location privacy.

5.2.1. Location Inference Model

In the location inference model, the adversary has certain background knowledge such as the user’s service request history records, LPPM, etc. Using the user’s service request history records, the adversary can calculate the user’s query probability distribution . When the user sends a query request again, if the location query probability distribution in the anonymous set is not uniform, the adversary can infer that the user is likely to be located in a location with a higher probability. While for the LPPM, the adversary can analyze the intercepted location request, combined with the anonymity algorithm, to infer the probability that each location in the anonymous set is the user’s true location, so as to make the inference attack more accurate.

Based on the existing knowledge , etc.), the adversary can form the posterior distribution on the true location of the user, conditional on the anonymous set :

The adversary’s objective is then to choose to minimize the user’s conditional expected privacy, where the expectation is taken under . The user’s conditional expected privacy for an arbitrary isand for the minimizing , it is

If there are multiple values of that satisfy (17), then the adversary randomizes arbitrarily among them. The probability with which is chosen in this randomization is .

5.2.2. Stackelberg Game Optimization Process

Here, we assume that the adversary has some background knowledge. Specifically, he will infer the user’s actual location as much as possible according to , the LPPM used by the LA, the anonymous result , and other background knowledge. Relatively, we can assume that all the background knowledge that LA knows will be used by the adversary, so LA can use the adversary’s optimal attack strategy as a parameter to optimize the generation process of the dummy location set .

We formalize the process above by using the framework of Stackelberg game. In a Stackelberg game the leader, in our case, the LA plays first by giving the dummy location set according to the relative location privacy protection algorithm . The follower, in our case the adversary, plays next by estimating the user’s true location, knowing some background knowledge.

We use the distance between the adversary’s inferred location and the user’s actual location to measure the utility of the participants in the game: the greater the distance, the greater the LA returns, indicating that the anonymous algorithm is more effective in resisting inference attacks; on the contrary, the smaller the distance, the greater the adversary returns, the more effective the adversary’s attack strategy.

The game model is also an instance of a zero-sum game, as the adversary’s gains (or losses) of utility is exactly balanced by the losses (or gains) of the utility of the user: the information gained (lost) by the adversary is the location privacy lost (gained) by the user.

The purpose of Stackelberg game optimization is to find so that the adversary cannot obtain more benefits by optimizing the attack strategy (that is, the adversary cannot make more accurate inferences about the actual location of the user). In this paper, optimized by Stackelberg game is denoted as .

It should be noted that is the result obtained by further anonymously, so can be further expressed by

In some cases, and are equal, the reason is that the adversary can filter out dummy locations except in such cases.

We see that, for a given , the user’s conditional expected privacy is given by (17). The probability that is output by the LPPM is . Hence, the user’s unconditional expected privacy (averaged over all is

To facilitate the computations, we define

Incorporating into (19), we write the unconditional expected privacy of the user aswhich the user aims to maximize by choosing the optimal . To facilitate the computations, (20) can be transformed to a series of linear constraints:

In addition, needs to conceal the user’s real location on the premise of ensuring the user’s service quality. In order to ensure the quality of service, we set the service quality threshold to limit the maximum service quality loss. The specific process is

In summary, can be solved by a linear program. The final definition of linear program is where is used to maximize the utility of the adversary; reflects the service quality constraint; indicates that the sum of the generation probability of the dummy location set must be 1; indicates the probability of each candidate dummy location set is greater than zero.

solves the objective function under the constraints in (24) and obtains the optimal dummy location set, which can resist single-point attacks and inference attacks while effectively balancing service quality and location privacy.

6. Simulations and Results

In this section, we use Python software to simulate the experiment. First, we give the relevant parameters of the experiment. Furthermore, we simulate the experimental results and analysis of the proposed scheme.

6.1. Simulation Setup

Our scheme is implemented in MATLAB and performed on a Windows 10 PC with an Intel Core i5-8500 CPU, a 3.00 GHz processor and a 8.00 GB main memory. We use a real road map of Guangzhou from Google Maps, since Guangzhou as a provincial capital in southern China is a big city with enough users in LBS and its central urban area has been covered by Wi-Fi APs in 2016. The coverage area of each Wi-Fi AP is about , the sample space is divided into cells with equal size, and a total of 13 559 sample trajectories are used as historical data to calculate the historical query probability of each cell. Besides, all locations in our experiments are divided into 6 categories semantically as follows: Education and Science, Administration and Housing, Medical care, Shopping malls, Public places, Catering and Entertainment. The value ranges of the main parameters and of the experiment are and , respectively.

6.2. Results and Analysis

We first evaluate the effectiveness of our proposed scheme in resisting single-point attacks from three assessment metrics as follows: (1) . As is shown in Definition 5, it reflects the effectiveness of the algorithm in resisting side information attack. (2) . As is shown in Definition 6, the larger the , the more dispersed the dummy locations in the , and the better the effectiveness of the algorithm in resisting homogeneity attack. (3) . As is shown in Definition 7, it refers to the level of semantic diversity in the anonymous set, which reflects the effectiveness of the algorithm in resisting location similarity attack; Next, evaluating the effectiveness of the scheme in resisting inference attack while balancing location privacy and service quality from two assessment metrics as follows: (1) . As is shown in Definition 8, the larger the is, the better the effect of LPPM against inference attacks is. (2) . As is shown in Definition 9, it reflects the effectiveness of the algorithm in balancing location privacy and service quality.

6.2.1. Effectiveness of the Scheme against Single-point Attacks

(1). In Figure 3(a), we compare the of , [27], [27], [23], and [31] schemes. As we can see, the of the five schemes shows a downward trend with the increase of , which means that the larger the , the more difficult it is for adversaries to filter out invalid locations in the anonymous set through side information attacks, the better the effect of the scheme against side information attacks. The of the , , and is lower than that of the and . And that of the , , and are basically the same. The reason is that the , and all consider the query probability and avoid selecting locations with low access probability such as lakes and forests to form an anonymous set; whereas the and do not consider the query probability, so there will be cases where invalid locations are selected, and thereby the adversary can filter out ones through side information attacks. In summary, the scheme can effectively resist side information attacks.

(2). Figure 3(b) shows the comparison chart of , , , , and schemes. As we can see, the of , , , and are close when ; at , the of is slightly larger than that of , and . Under the same value of , the of , , and is slightly larger than that of . In additional, with the increase of , the of the five schemes are both reduced gradually. The reason for this is obvious: it becomes harder to maintain a high level of dispersion with more and more dummies. In summary, has the largest , , , , and decrease in order, which means that the is better in resisting homogeneity attacks than the other four schemes, but the scheme is also acceptable.

(3). Figure 3(c) shows the value of comparison between , , , , and schemes. As shown in Figure 3(c), with the increases of , the value of of , , and schemes hardly change and close to the maximum value 1. However, that of and schemes is always at a relative low. The reason is that the , , and schemes all consider the semantic information of the location when selecting dummy locations, thereby ensuring semantic diversity, while the and schemes only consider the query probability of each location point instead of considering the situation that each location point may have the same semantic information. Moreover, the location points with higher query probability are often in hotspot areas, between which the semantic information is very similar and therefore not satisfying the semantic diversity. Consequently, the and schemes behave such badly in semantic diversity that they cannot resist location similarity attacks. In summary, the scheme can effectively resist location similarity attacks.

The experimental results above show that the scheme can effectively resist homogeneity attacks, location similarity attacks, and side information attacks simultaneously compared with the , , , and schemes, thereby effectively resisting single-point attacks.

6.2.2. Effectiveness of the Scheme against Inference Attacks and Balances and

Combining the location inference model and (6), it can be seen that the adversary can perform inference attacks, the purpose of which is to choose based on existing knowledge to minimize the expected user privacy. (25) defines this attack strategy:

Combining (6) and (25), we can construct the following linear program to find the optimal :

We use the model defined by (26) to run inference attacks on , [31], [13], and to make a comparison from two aspects of and , evaluating the effectiveness of the scheme.

(1) . The definition of is shown in (6), the larger the , the better the effect of LPPM against inference attacks. As shown in (24), the preset service quality loss threshold and anonymous degree have a greater impact on , so we evaluate the effectiveness of the KLPPS scheme against inference attacks from the two assessment metrics of and .(a)We compare the of , , , and schemes under different in Figure 4(a). As shown in the figure, we can draw the following 3 conclusions. First, the of , , and is significantly better than , which results from that does not consider the inference attack strategy of the adversary. Secondly, the and behave better than , the reason is that the former two use dummy location anonymity, increasing the difficulty of the adversary’s inference. Finally, with the increase of , the of the four schemes all increase, however the growth trend of of the four schemes slows down when reaching a certain level, which is related to the query probability distribution of user’s location, indicating that the influence of on is limited.(b)We compare the of , , , and schemes under different in Figure 4(b). As we can see, three conclusions have been drawn below. First of all, as the value of increases, the of , , and are significantly improved while not the . The reason is that only provides offset locations and does not consider dummy location anonymity; Secondly, the and behave better in improving than that of , which results from that the former two all consider the adversary’s inference attack strategy, while the latter does not, so is less effective in resisting inference attacks than that of the and ; Finally, the growth trend of of the four schemes slows down when reaching a certain level, which results from that the four schemes all use the offset location instead of the real location to protect users’ privacy. More specifically, the choice of will make the service quality loss is gradually approaching with the increase of , and the four schemes are all maximized under the premise of ensuring that , which means that when gradually approaches , the growth trend of slows down until , reaching the maximum value.

(2) . is closely related to . Specifically, in some cases, users allow losing a certain service quality in exchange for higher privacy. In the experiment, we set the maximum service quality loss that the user can accept as and analyze the relationship between and on this condition. Figure 5 shows the experimental results. First of all, it can be seen from the figure that the and of the three schemes all increase with the increase of . The reason for this is obvious: in the first place, we can see the will increase to a certain extent with the increase of from Figure 4(a). In the second place, from the definition of and , it is obvious that will increase to a certain extent with the increase of . Secondly, under the same , the of and is significantly higher than that of , but is also higher than to a certain extent, which results from that both and will make full use of the limited maximum service quality loss to optimize the selection of dummy location set, so as to improve location privacy as much as possible while ensuring that the loss of service quality does not exceed the constraints of service quality, thereby effectively balancing service quality and location privacy. In addition, under the same , the of is slightly larger than that of while is slightly smaller than that of , indicating that is better than in balancing service quality and location privacy.

The experimental results above show that the scheme can effectively resist inference attacks while effectively balancing service quality and location privacy compared with the , , and schemes.

7. Conclusion

There are some problems such as single point of failure and without the ability to effectively resist single-point attack and inference attack, etc., in the traditional anonymous location privacy protection schemes. To solve the problems above, by analyzing the merits and drawbacks of the existing location privacy protection system architecture, we propose a semitrusted third party based location privacy protection architecture that can tackle performance bottleneck of mobile device and single point of failure. Then, comprehensively considering the characteristics of side information, semantic diversity, and physical dispersion of locations combined with the ideas of dummy location technology and offset location, a dummy location selection algorithm based on location semantics and physical distance is proposed to effectively resist single-point attacks. Finally, we propose a location anonymous optimization method based on Stackelberg game to optimize the dummy selection algorithm. Specifically, we formalize the mutual optimization of user-adversary objectives (location privacy vs. correctness of inferring location) by using the framework of Stackelberg games, to find an optimal dummy location set. The optimal dummy location set can resist single-point attacks and inference attacks while effectively balancing service quality and location privacy. The experimental results further verify the effectiveness of the proposed scheme. However, our work still has the following shortcomings. First, in LBS, more and more people use continuous query services such as navigation services, etc., while our scheme can be only applied in snapshot query scenario not the continuous query scenario. Secondly, in different application scenarios, users have different requirements for privacy protection level and service quality, so it needs to be improved as much as possible in terms of balancing data validity and privacy levels. The next work hopes to improve our scheme to make it suitable for continuous query scenarios. Meanwhile, aiming at users with different needs in different scenarios, on the basis of further balancing service quality and privacy protection level, we design a personalized location privacy protection scheme.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This study was supported by the Foundation of National Natural Science Foundation of China (grant number: 61 962 009); Major Scientific and Technological Special Project of Guizhou Province (grant number 20 183 001); Science and Technology Support Plan of Guizhou Province (grant number [2020]2Y011); and Foundation of Guangxi Key Laboratory of Cryptography and Information Security (grant number GCIS202118).