Abstract
Locationbased services (LBS) applications provide convenience for people’s life and work, but the collection of location information may expose users’ privacy. Since these collected data contain much private information about users, a privacy protection scheme for location information is an impending need. In this paper, a protection scheme DPLHc is proposed. Firstly, the users’ location on the map is mapped into onedimensional space by using Hilbert curve mapping technology. Then, the Laplace noise is added to the location information of onedimensional space for perturbation, which considers more than 70% of the nonlocation information of users; meanwhile, the disturbance effect is achieved by adding noise. Finally, the disturbed location is submitted to the service provider as the users’ real location to protect the users’ location privacy. Theoretical analysis and simulation results show that the proposed scheme can protect the users’ location privacy without the trusted third party effectively. It has advantages in data availability, the degree of privacy protection, and the generation time of anonymous data sets, basically achieving the balance between privacy protection and service quality.
1. Introduction
With the rapid development of intelligent mobile devices and wireless communication technology, locationbased services (LBS) applications not only bring about convenience to users but also cause serious privacy and security risks to users. In LBS, users provide their location information to location service providers while acquiring location services, which may lead to the leakage of users’ sensitive information [1, 2]; for example, the access frequency of the users to the interest points can analyze the users’ preference and economic status. At the same time, if the attacker combines the users’ location information and nonlocation information, more personal information of the user will be exposed [3]. With the continuous improvement of people’s awareness of privacy protection, protection of the user’s location information becomes an urgent problem to be solved.
At present, location privacy protection methods mainly include kanonymity technology and (α,k)anonymity technology. kanonymity technology [4–6] uses a trusted thirdparty (TTP) server to expand the user’s real location into an invisible geographic location area that includes other k1 users, making untrusted LSP not able to distinguish the user’s real location from the geographic location of other k1 users and then sending the confused location information to the LSP (location service provider) through TTP. kanonymity technology provides the basis privacy protection research. However, kanonymity technology does not restrict the sensitive attributes in the users’ data set and is vulnerable to link attacks. To solve these problems, the authors [7, 8] propose (α,k)anonymity technology based on the kanonymity technology. (α,k)anonymity technology presets the threshold α, so that the proportion of sensitive attribute values in each equivalence class will not exceed this threshold, blurring the link relationship between sensitive attributes and quasiidentifiers and enhancing the anonymity effect of users’ data set. (α,k)anonymity technology can enhance users’ location privacy protection by anonymizing the data set through TTP. TTP has mastered all the knowledge of the users’ LBS query and is prone to suffer from a single point of failure. If the attacker captures the TTP, then the users’ location privacy will be leaked. To protect the users’ location privacy without relying on a third party, differential privacy technology [9–11] was proposed. In [2, 12], differential privacy and anonymous set with k locations are used to calculate the interference location, which can resist the attack of the adversary with useful background information. In [13], a location set based on differential privacy is proposed to protect the users’ real location at each time point under temporal correlation. But its disadvantage is that it ignores the attack mode of combining location data with nonlocation data. If attackers combine the user’s location information at different times with some nonlocation information, user’s private information will be seriously exposed.
The rest of the paper is organized as follows: the second section describes some definitions related to location privacy. In the third section, architecture and threat model of LBS system are analyzed in detail, and then the specific implementation algorithm of differential privacy location protection mechanism based on Hilbert curve (DPLHc) is introduced. In the fourth section, the DPLHc scheme is evaluated, including privacy analysis, security analysis, and algorithm complexity analysis. The fifth section verifies the effectiveness of the algorithm from the availability of published data sets, the degree of privacy protection of data sets, and the generation efficiency of anonymous data sets. The sixth section summarizes the DPLHc scheme, in which privacy protection is strengthened and the balance between privacy protection degree and service quality is solved effectively. The seventh section is devoted to the references used in this paper.
2. Related Definitions
Definition 1 (Hilbert curve). Hilbert curve is used as the mapping from dimensional Sspace to onedimensional space R, denoted as . If point , then ; that is, H(p) is the H value of point p. For point set , . Coding rules of the firstorder, secondorder, and thirdorder Hilbert curves are shown, respectively, in Figure 1.
Definition 2 (location differential privacy). For two data sets D and that differ by at most one location record, namely, , given a differential privacy algorithm A, Range(A) is the range of A. Algorithm A provides differential privacy, and is the privacy budget, which represents the degree of privacy protection. If the arbitrary position obtained by algorithm A from arbitrary trajectory data set D sand satisfies the following inequality, then algorithm A satisfies differential privacy.Probability Pr(.) represents the risk of users’ location privacy being exposed, and it is randomly controlled by algorithm A; parameter is the privacy budget. It can be seen from the above formula that the smaller the parameter is, the more similar the probability distribution of the query results returned by thedifferential privacy algorithm acting on a pair of adjacent trajectory data sets is, and the more difficult it is for the attacker to distinguish this pair of adjacent trajectory data sets. In extreme cases, when , the degree of privacy protection is the highest. On the contrary, the higher the value of the parameter , the lower the degree of privacy protection.
Definition 3 (Global Sensitivity). For any function , the global sensitivity of function f iswhere represents the firstorder norm distance of the function output values of adjacent data sets D and , and sensitivity refers to the maximum change in the output value of the function caused by adding or deleting any record in the data set. The global sensitivity of the query function is determined by the properties of the function itself and is independent of the data set.
Definition 4 (Laplace Mechanism). Given a function f, the Laplace mechanism is defined aswhere is a random variable of the Laplace distribution . The location parameter of Laplace distribution is 0, the scale parameter is , and its probability density function isThe added noise is proportional to the global sensitivity and inversely proportional to the privacy budget. Laplace mechanism is limited to the functions whose return value is real, and the exponential noise mechanism is proposed for nonnumerical query function.
Definition 5 (exponential mechanism). Given a utility function , r is an entity object in the output domain range of the available function. If the output of function u satisfies equation (5), then thedifferential privacy is satisfied.where is the global sensitivity of the utility function, and the exponential mechanism returns the entity object with a probability proportional to .
3. Differential Privacy Location Protection Mechanism Based on the Hilbert Curve
3.1. System Architecture and Threat Model Analysis
The system architecture used in this paper is shown in Figure 2. The system architecture is mainly composed of four parts: positioning system, mobile terminal, communication network, and LBS server. The mobile user holds mobile positioning devices (smartphones and vehiclemounted mobile terminals) to obtain personal accurate geographic location information through the Internet or GPS and other positioning technologies and then sends the query request to the LBS server through WiFi and other communication networks. After the LBS server receives the query request from the user, it sends the service information to the user’s mobile terminal as a response to complete a coherent process of request service and response service.
User makes a query request through the mobile terminal, and LBS may be exposed to the threat of privacy leakage in the process of responding to the request. Several types of attackers in the system are shown in Figure 3. Attackers include untrusted LSP itself, internal attackers, and external attackers in the system. Generally speaking, LSP is honest but curious, which can provide query service for users honestly according to the agreement. However, in order to improve its own commercial interests, the service provider collects user’s location information by observing the received disturbance location. In addition, there may also be attackers within the organization that provides the service, sending users’ disturbed locations to other external organizations for their own benefit; external attackers can be individuals or organizations by eavesdropping on users’ data or attacking the server to access to the server.
When the user sends out a query request, differential perturbation mechanism is implemented on the user’s mobile. Add Laplace noise to the user’s geographic location and provide the disturbed location to the LSP. When the LSP responds to user’s query and returns the query results to user, it does not infer user’s real geographical location combined with some background information, which largely protects users’ location privacy.
3.2. Implementation Mechanism and Algorithm
In this paper, interest points are defined as the real interest points in geographic space. Each interest point L can be approximately expressed as L(x, y, z), where (x, y) is the location coordinate of the interest point L and z represents the semantic location of the interest point L. The privacy protection of users is to achieve context sensing. The DPLHc scheme obtains context (such as the same density) through the distribution of users’ interest points. The optimal data clustering and distance preserving characteristics of the Hilbert curve make two adjacent points in twodimensional space more likely to be adjacent in onedimensional space; that is, all interest points in onedimensional space have homogeneous context. When users request LBS query service, the users’ twodimensional space interest points are first mapped to the onedimensional space; then the Laplace noise is added to locations containing the users’ real location; the disturbance location from the location points set is almost the same and the attacker cannot distinguish the users’ real location. As shown in Figure 4, the thirdorder Hilbert curve coding rule is adopted in this paper, and the circle points in the figure are the interest points after projection. The number in each atomic unit represents the Hilbert value of that atomic unit.
In this paper, we first use the quadtree index structure to index the users’ location and divide the area containing all users’ locations into a quadtree, and index to obtain a quadtree QT with location sets L. Then the location data is processed. According to the Hilbert curve technology mentioned above, the users’ twodimensional geographic locations are mapped into onedimensional space, and the semantic information of the location elements is retained to obtain the corresponding complete tree data structure, which improves the efficiency of searching the target point in the future. The quadtree after Hilbert curve mapping is shown in Figure 5.
Grid division of geographic areas is one of the effective methods to describe the location of interest points. As shown in Figure 4, the DPLHc scheme starts from the area containing the users’ locations and divides the geographic space into 4 grids at a time and then iterates to obtain grids (h is the division height) until a certain granularity is met. It is divided into some atomic regions that can no longer be divided, and the size of the atomic region is determined by the number of users’ locations C which the region can accommodate. In Figure 4, the area containing the users’ locations is divided into 88 grids, and each interest point uniquely belongs to a grid. In this paper, we set . The DPLHc scheme uses the quadtree structure to generate users’ location index, as shown in Algorithm 1. For the convenience of description, the symbol definitions involved in this algorithm are shown in Table 1.

The algorithm divides the geographic area into four parts according to the set of interest points P and the number of layers of quadtree and indexes the users’ locations. If the number of divided layers creates four new subnodes for the node of the layer, as shown in steps 1–6, and if it creates four new subnodes for the nodes of other segmentation layers, as shown in step 10, then, for each interest point belonging to , if the interest point is stored in the region of the ith child node of the quadtree node , these interest points are moved to their respective child nodes, as shown in steps 11–13. Finally, confirm which subnode the point p belongs to, and then recursively call to insert p into node D, as shown in steps 16 and 17. The above statements (1–17) are executed circularly until all interest points are inserted into node D, and finally the quadtree QT with location set is output.
The DPLHc scheme traverses and searches the tree formed after the location processing, obtains the location data L and nonlocation data B of each marker point, and applies differential privacy technology to separately add noise to location data and over 70% of nonlocation data.
The DPLHc scheme needs to combine the users’ location information with more than 70% of nonlocation information to protect the users’ privacy information. Adding noise to the coordinates of an interest point separately provides a higher degree of privacy protection than adding noise to the point alone. Formally let denote k interest points, is the user’s real interest point, and is the disturbed interest point. For any two interest points and , their probability of generating a perturbed interest point should satisfy the following equation:
For more than 70% of the nonlocation information, given the privacy budget, noise is added to the nonlocation information collected by traversal to meet the requirements of differential privacy. For the attribute set of nonlocation data D, its continuousvalued attribute is marked as , and its noncontinuousvalued attribute is marked as . Different mechanisms of noise are added to the nonlocation information of different attributes. The Laplace noise is added to the continuous value to be disturbed, and the exponential noise is added to the noncontinuous value to be disturbed. The exponential mechanism outputs discrete values with a probability proportional to . In the DPLHc scheme, the anonymous data processing procedure is shown in Algorithm 2.

The input parameters of the algorithm are users’ location data set L, given the privacy budget , users’ nonlocation data set D, nonlocation data attribute set B, continuous attribute data set , discrete attribute data set , and tree height h. The processing objects are location data set L and nonlocation data set D. The algorithm first allocates the privacy budget in step 2; then the location data and nonlocation data are classified. For any element of the location data set L, Laplace noise is added to its abscissa and ordinate, respectively, for differential perturbation in steps 3–5; next, the nonlocation data set is divided according to attributes in step 8. If the nonlocation data belongs to continuousvalued attribute, Laplace noise is added for differential perturbation in steps 9 and 10; if the nonlocation data belongs to discretevalued attribute, exponential noise is added for differential perturbation in steps 11 and 12. Execute the above statements (3–12) circularly until anonymous processing is performed for all location data and more than 70% of the nonlocation data. Finally, the anonymous data set is output in step 16.
4. Theoretical Analysis of Algorithm
4.1. Privacy Analysis
In this paper, the users’ location points are represented by specific abscissa and ordinate. Given the users’ location points set , one of which is the user’s real location , the Laplace noise is added to the users’ location points to disturb the user’s real location, and is the interest point after the disturbance. For any two interest points and in the k interest points, according to the Laplace mechanism, there is
Let b be to produce and to produce , where is the maximum in and is the minimum in . Thus, the perturbed interest point can be obtained.
For interest points ,, and , we can get the following results from triangle inequality:
Rearrange formula (8), and divide both sides by and raise them to a power exponent with base; then multiply both sides by to get
From equations (8) and (9), we have the following equation:
For coordinates and , equation (10) can be expressed as
By setting the exponential boundary of equations (11) and (12), we can getthat is,
It can be seen from the above that the anonymous data processing algorithm in the DPLHc scheme satisfies differential privacy.
4.2. Safety Analysis
In location privacy protection method provided in this paper, the user will submit a query service to the LBS server in order to query the interest point closest to the current location; for example, query the movie theater closest to the user’s location. Ideally, due to the disturbance, the attacker cannot identify any connection between the disturbed location and the user’s real location. However, when the attacker knows the density of interest points on the map, the user’s approximate location knowledge, and noise distribution, the attacker can infer the user’s real location based on this information. In the process of anonymity, the Laplace distribution mechanism with scale parameter is used to calculate the probability of the same disturbance location from the location point set which is limited to a small constant factor . Given the perturbation point , no matter which interest point is used to implement the perturbation, the probability of interest points to produce the perturbation is the same (in the range of constant factor ), and the attacker cannot use the above information to improve the probability of guessing the user’s real location. Therefore, the DPLHc scheme can effectively protect the user’s location privacy.
4.3. Algorithm Complexity Analysis
Firstly, the anonymous data processing algorithm in this paper uses a greedy method to recurse the quadtree from top to bottom, and the time complexity is . Then the algorithm classifies the data information contained in each node, and the time complexity of adding noise to the location data set is . For the data with continuous attributes in nonlocation data set, the time complexity of adding Laplace noise is ; for the data with discrete attributes in nonlocation data set, the time complexity of adding exponential noise is .
5. Experimental Results and Analysis
5.1. Experimental Setup
In order to study the feasibility of the algorithm proposed in this paper, the system hardware configuration adopted is an Intel(R) Core(TM) i7 compatible PC with a main frequency of 3.4 GHz, 4 GB of memory, and more than 200 GB of free disk space; the software configuration platform is Windows 7 operating system, Microsoft SQL Server database system, and C/S structure operating mode. The experiment is based on three data sets: Geolife, Amazon Access Samples, and Diversification data set Div400. The source databases of the data sets are Geolife GPS Trajectory stores, UCI Machine Learning Repository, and UMass Trace Repository. The experimental data sets include the users’ location information and nonlocation information. The data set size and attribute characteristics are shown in Table 2.
5.2. Experimental Results
Commonly used spatial indexing technology can improve the operational efficiency of spatial information databases. The DPLHc scheme adds differential privacy anonymity technology based on the quadtree spatial index technology and divides location space recursively into a tree structure of different levels. When spatial data objects are evenly distributed, they have higher spatial data insertion and query efficiency. The KDCKmedoids algorithm [14] adds differential privacy anonymity technology based on the kd tree spatial index technology. The kd tree is a structural form of multidimensional retrieval. It divides the location points in the kdimensional space and makes branching decisions for the corresponding objects according to the discriminator of the layer in each layer. It has the same good performance as a binary tree for matching and finding exact points (the average search length is ). The PRCAN algorithm [15] introduces rtree spatial index technology to make local indexes meet the requirements of differential privacy. All leaf nodes with overlapping regions in the rtree are redivided into disjoint regions. For leaf nodes with mutually exclusive regions, independent noise adding mechanism makes PRtree meet the requirements of differential privacy. In this paper, the data availability, the degree of privacy protection, and the generation time of anonymous data sets are used to verify the effectiveness of the scheme.
5.2.1. Data Availability
In the simulation experiment, under the condition of a gradual increase of privacy budget, we test three data sets that meet the privacy requirements, compare the accuracy of anonymous data output by the algorithm, and analyze the data availability of the algorithm under different privacy budgets. Choosing different scale transformation parameters , the amount of noise is proportional to the global sensitivity and inversely proportional to the privacy budget _{.} To verify the impact of the DPLHc scheme, the KDCKmedoids algorithm, and the PRCAN algorithm on the data availability under the requirements of differential privacy protection, the performance of the algorithm under different data sets and different privacy budget requirements is tested in this paper. The precision of analyzing the publishable data can reflect the availability of the algorithm to process the data set under the condition of meeting the query requirements. The published data precision of each experimental data set is shown in Figures 6–8.
The privacy budget is inversely proportional to the degree of privacy protection. When the privacy budget is smaller, the degree of data protection is greater, and when the degree of privacy protection, perfect privacy is achieved. As can be seen from Figures 6–8, with the increase of privacy protection budget, the protection degree of differential algorithm on published data decreases, so the data availability of the three algorithms has increased. The location space tree structure generated by the quadtree index in this paper has nothing to do with the nature of the experimental data set, so the precision of differential privacy publishing data based on the quadtree index is improved with the increase of privacy budget. Compared with the KDCKmedoids algorithm and the PRCAN algorithm, the DPLHc scheme has higher data availability, while maintaining certain algorithm stability. However, with the increase of privacy budget, the precision of the KDCKmedoids algorithm and the PRCAN algorithm is lower than that of the DPLHc scheme; that is, they have poor data availability and algorithm stability.
5.2.2. The Degree of Privacy Protection
(1) The Relationship between the Degree of Privacy Protection and the Laplace Transform Scale Parameter b. To research the privacy protection performance of the algorithm and find the best balance between data availability and the degree of privacy protection, the experiment analyzes and compares the average privacy protection degree of each data set under different Laplace transform scale parameter , and the larger the scale parameter b, the higher the degree of privacy protection. Comparing the DPLHc scheme with the KDCKmedoids algorithm and the PRCAN algorithm, the results of the experimental data set being anonymously protected by differential privacy are shown in Figures 9–11. From the experimental results, we can see that, with the transformation of Laplace scale parameter b, the privacy protection degree of DPLHc scheme can basically reach more than 80%, and, with the improvement of anonymity requirements, the execution efficiency of the algorithm will not be greatly reduced. However, the privacy protection degree of the KDCKmedoids algorithm and the PRCAN algorithm is less than 80% when the scale parameter b is relatively small. It can also be seen from the figure that when the scale parameter b is determined, the privacy protection degree of the DPLHc scheme is higher than those of the KDCKmedoids algorithm and PRCAN algorithm, and, with the increase of scale parameter b, the privacy protection performance of DPLHc scheme is more stable.
(2) The Relationship between the Degree of Privacy Protection and Anonymized Nonlocation Data Ratio k. To protect users’ location privacy better, the DPLHc scheme takes into account the inference of the user’s location privacy by nonlocation information and performs differential anonymous processing on more than 70% of nonlocation data. The greater the proportion k of anonymized nonlocation data, the higher the user’s location privacy protection. Assuming that the sensitivity of each data in the users’ nonlocation data set is equal, Figure 12 shows the influence of the DPLHc scheme on the degree of location privacy protection under different anonymized nonlocation data ratio k and data set requirements.
Through the experimental comparison, we can see that, with the increase of the proportion k of anonymous nonlocation data, the degree of privacy protection of each data set is improved. It can also be seen from the experiment that when the anonymized nonlocation data ratio k is fixed, the privacy protection degree of each data set is not much different; that is, the DPLHc scheme has good algorithm stability on the premise of ensuring the privacy protection degree. However, from the perspective of the data availability, the anonymous nonlocation data ratio k will have a certain impact on the data availability, so the nonlocation data ratio k should not be too high, and k is set to 75% in this paper.
5.2.3. The Generation Time of Anonymous Data Sets
Considering the choice of adding differential privacy anonymity methods to different spatial index trees, the average times taken by the DPLHc scheme, the KDCKmedoids algorithm, and the PRCAN algorithm to generate anonymous data sets are shown in Table 3–5. Figures 13–15 show the comparison results of the generation time of three methods under different data sets and different privacy budgets, where .
As can be seen from Figures 13–15, with the increase of privacy budget , that is, the reduction of privacy protection degree, the DPLHc scheme takes much less time to generate anonymous data sets than the PRCAN algorithm, and the DPLHc scheme is more efficient to generate anonymous data sets. Because DPLHc scheme uses the spatial index technology of quadtree, this technology has nothing to do with data in the process of constructing quadtree and avoids unnecessary overhead in the process of constructing quadtree. When , the time of generating anonymous data set by the DPLHc scheme is slightly higher than that by the KDCKmedoids algorithm. When , the time of generating anonymous data set by the DPLHc scheme is lower than that by the KDCKmedoids algorithm.
Through the comparison, we can also see that, with the decrease of , the time for the three algorithms to generate anonymous data sets is increasing, but the DPLHc scheme is obviously smaller than the KDCKmedoids algorithm and the PRCAN algorithm. In other words, with the increase of, the DPLHc scheme takes less time, has more obvious advantages, and is more practical.
Through the above experimental comparison, it is found that the DPLHc scheme has higher data availability and shorter generation time of anonymous data set on the premise of satisfying privacy protection as far as possible. The DPLHc scheme can protect the user’s location privacy and improve the quality of location services effectively.
6. Conclusion
Aiming at balancing the degree of privacy protection and the quality of services in the LBS system, a differential privacy location protection scheme based on Hilbert curve on the basis of the existing differential privacy model is proposed in this paper. The scheme no longer relies on TTP and adds Laplace noise to the users’ location in onedimensional space mapped by the Hilbert curve. It can prevent the attacks of adversaries with background information and has strong privacy protection strength. It can effectively solve the balance problem between the degree of privacy protection and the quality of services. Experimental results show that the DPLHc scheme has obvious advantages in data availability, the degree of privacy protection, and the generation time of anonymous data sets.
Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This work was supported by the National Natural Science Foundation of China (grant no. 61702316) and Shanxi Provincial Natural Science Foundation (Grant nos. 201801D221177 and 201901D111280).