Abstract

The Internet of Things (IoT) connects billions of physical devices around the world to the Internet to collect and share massive data. Location privacy leakage has received considerable attention in the field of security. In order to implement the -anonymity location privacy protection mechanism, a previous work constructed an anonymous location set by retrieving historical request record database from the trusted third anonymous server, which advantageously protects user’s location privacy. However, the performance of location-based services (LBSs) is weakened by the time overhead of continual retrieving database. Moreover, with the increasingly flourishing of the positioning technique, user location can be accurate to the user’s altitude beyond the two-dimensional latitude and longitude coordinates. In this paper, we present a location privacy protection strategy in LBSs based on Alt-Geohash coding (LPP-AGC) method synthetically considering the altitude of user location and time overhead. Specifically, we give the Alt-Geohash coding algorithm (AGCA) to retrieve the historical request record database, which greatly reduces the time overhead and ensures the immediacy of LBSs. In addition, we propose the dummy location generating algorithm (DLGA) and location filtering algorithm (LFA) to provide users with autonomous -anonymity location privacy protection. Extensive simulations are performed to verify the performance and security of the proposed strategy.

1. Introduction

With the prosperity of smart devices and communication technology, the Internet of Things (IoT) has become one of the most important areas of future technology. As a new technology paradigm, IoT is a large-scale interconnected network composed of various sensors, radiofrequency identification (RFID), global positioning system (GPS), and other interactive machines, equipment, vehicles, electrical appliances, and other things with communication and computing [13]. IoT is widely used in transportation and logistics, industrial manufacturing, healthcare, smart environment (home, office, and factory), smart city, etc. Meanwhile, location-based service (LBS) has shown a significant growth trend in recent years [4, 5]. LBS, along with the mobile networks, can determine the accurate geographical locations of the mobile users and provide convenient location services to users for navigation based on the dynamic geospatial database. As a result, the wide pervasiveness of LBSs significantly changes our lifestyle.

Although LBS plays a critical role in maintaining the diversity of public life, the potential leakage of privacy resulting from no standardized LBS market system may occur when users utilize their location information in exchange for corresponding services. Specifically, numerous users request the LBS service as well as tightly coupled with their identity, precise location, and query content to location service providers (LSPs). Unfortunately, deliberate LSPs may disclose these information to other malicious ones for their benefits [6, 7], resulting in certain things unaccomplished, disruption of fame, and physical damage. In view of this reason, the vulnerability of privacy may cause users to panic while enjoying LBSs. Nevertheless, since the public or industry users have an extensive demand for LBSs, the issue of imminently improving the security of location privacy has become a rigorous challenge [810].

Recently, many location privacy protection methods have been proposed, such as location -anonymity [1113], dummy location generation [14, 15], and encryption techniques [1618].

The location -anonymity technique [11] constructs an anonymous location set containing the real user and the remaining users’ locations and sends the anonymous location set to LSPs by means of the trusted third anonymous server, which ensures the probability of the user’s real location recognized is . However, generating the remaining locations may bring about tremendous time overhead via retrieving huge location records. Moreover, as the number of location services requested by users increases, retrieving data in location records will become increasingly time-consuming, which will affect the anonymity protection effect and degrade the instant superiority of location services.

The dummy location generation technique [14] can satisfy the users’ privacy protection requirement by generating dummy locations and usually does not rely on anonymous servers, reducing the communication overhead. Nevertheless, research has consistently shown that the generated dummy locations lack the rationality of spatial distribution, causing the anonymous protection to defeat [15].

The encryption technique [16] maps the user location and request content into the ciphertext space for calculating and searching, so that LSPs cannot obtain the decoded query content. However, the encryption technique has accentuated the issue of excessive communication overhead and complicated algorithm design.

The idea of the coding method is employed in the encryption technique, which is an effective solution to location privacy protection as well. The coding method can reduce the accuracy of the location by dimensionality reduction of the spatial location data. Especially, in the geospatial filed, Geohash coding [1921] is often adopted for nearest neighbor search and location privacy protection.

Generally, Geohash coding converts the two-dimensional latitude and longitude of a location into a region represented by a one-dimensional string, which can obscure the precision of the location. Moreover, since the locations that have the same Geohash code are in the same region, Geohash coding can perform fast retrieval within this region as well. It is worth noting that as the positioning method becomes more and more accurate, the location of users in LBS includes not only latitude and longitude coordinates but also altitude.

It is encouraging that employing the user locations within the same altitude for location privacy protection will increase the difficulty for attackers to identify the real user and enhance the effect of anonymous protection. This finding will make a considerable contribution to the field of location privacy protection. For example, there are multiple users using LBS in a multistorey building. The dummy locations generated by the location privacy protection method combined with the altitude has the same longitudes and latitudes as the users’ real locations, but the floors (altitudes) are different. Thus, this method can better protect the privacy of the users. To the best of our knowledge, considering synthetically the altitude of location to protect location privacy is largely under explored domain. For this motivation, in this paper, we propose the Alt-Geohash coding method, which is a novel coding method combining the altitude of the user location and Geohash coding.

The Alt-Geohash coding method converts the three-dimensional location of the user’s latitude, longitude, and altitude into a one-dimensional string. Since user location still retains the characteristics through Alt-Geohash coding, the candidate anonymous location set can be constructed via selecting locations with encoded characters similar to real user. It can avoid time delay by means of string retrieval and provide users with exceedingly faster and safer location services.

To further provide users with autonomous personalized -anonymity location privacy protection, we present dummy location generating algorithm (DLGA) and location filter algorithm (LFA), which allow the user to set the value of according to their own preferences and needs. If the number of locations in the candidate anonymous location set is less than , we use DLGA to divide the anonymous region into parts and randomly select in each subregion, meanwhile avoiding the unreasonable location. This not only ensures the uniform distribution of the generated dummy locations but also increases the difficulty for the attacker to identify the real user. If the number of locations in the candidate anonymous location set is greater than , we adopt LFA to select locations. LFA divides user locations into four categories and then selects the location similar to the real user based on the corresponding ratio of every category. Thus, the constructed anonymous locations can be evenly distributed, and it is difficult for attackers to identify the user’s real location.

Our main contributions are summarized as follows: (1)To the best of our knowledge, we are the first to combine altitude of the user location into Geohash coding and propose Alt-Geohash coding method(2)We present a location-based privacy protection strategy based on Alt-Geohash coding (LPPS-AGC) in LBS, which synthetically considers the altitude of the user location and time overhead(3)Our strategy includes location generalization algorithm (LGA), Alt-Geohash coding algorithm (AGCA), reverse retrieval algorithm (RRA), dummy locations generating algorithm (DLGA), and location filtering algorithm (LFA) to effectively protect the user location privacy in LBSs(4)Our strategy provides customizable -anonymity model with users for protecting location privacy by virtue of the trusted third anonymous server

The rest of this paper is organized as follows. Section 2 begins by laying out the related work on location privacy protection and Geohash coding application. Section 3 is concerned with the preliminaries used for this study. The details of algorithms design and the security analysis are presented in Section 4. Then, the proposed algorithms are evaluated with extensive simulation experiment in Section 5. Finally, Section 6 depicts conclusions and future work.

2.1. Spatial Cloaking Method

The spatial cloaking method [12, 2226] signifies that it can decrease the accuracy of user’s real location through enlarging the revealed spatial region. The most influential technique for spatial cloaking is location -anonymity, which conceals the real user’s location by other locations in the same spatial region.

In [22], Gruteser and Grunwald presented an adaptive interval cloaking algorithm to accomplish location -anonymity via utilizing qaud-tree for spatial and temporal cloaking. However, this approach is unrealistic in practice as most users have varied privacy requirements under different contexts.

In [23], Chow et al. proposed a Casper model-based spatial cloaking method to achieve personalized privacy preserving. This method divides the space into layers by a quad tree, and each layer maintains the information of the whole spatial structure to improve the performance of anonymity effectively. Unfortunately, since it needs the cooperation of trustworthy users, the failure to construct the anonymous region due to lack of sufficient dummy locations in a region with few users becomes a major defect.

In order to solve the location privacy problem in the region with few users, in [24], Ni et al. presented -SDCA algorithm in a sparsely populated region to construct anonymous region. This algorithm achieves -anonymity by selecting dummy users with high historical query probability, relatively uniform geographical distribution, and large difference in request contents by anonymous server, which had high security and better efficiency. However, since this method takes no account of the overhead of retrieving historical query records, it may spend a lot of time, affecting the service quality of LBSs.

In [25], Sun et al. proposed a location label-based (LLB) algorithm for privacy preservation in the scenario where the locations of users are nearby, similar or identical. In addition, the request aggregation protocol and the pseudoidentity exchange protocol were presented to reduce the response time of LBS. However, LLB takes more uncontrolled factors and is difficult to realize.

In [26], Ruchika and Udai presented an interesting scheme VIC-PRO (vicinity protection) to fortify the location privacy of the client alongside vicinity protection by concealing location coordinates using geometrical transformations in LBS.

In [12], Simon and Alouini proposed a -anonymity location privacy algorithm KABC based on clustering to eliminate outliers and establish the anonymous group in the anonymous model. This algorithm balances the conflict between the privacy preserving security and query quality of the location service. However, the time overhead is not optimized due to the iterative process of the clustering algorithm.

2.2. Dummy Location Generation Technique

The dummy location generation (DLG) technique [2733] which does not rely on the trusted third anonymous server is one of the ubiquitous research effort to protect location privacy. For each request associated with a user’s specific location, DLG generates multiple dummy locations close to the real one and submits them to the anonymous server as the service request. Thus, LSP cannot distinguish user’s real location.

In [27], Kido et al. defined the dummy location generation approach where the user adopts a random walking model to generate dummy locations. However, in [28], Lu et al. pointed out that, when the generated dummy locations had high density distribution, the protection degree of the user’s location privacy could be reduced. To solve this weakness, they split the circle/grid region into equal-sized subregions and distributed all the locations on different radiuses/vertices. Unfortunately, this method ignores the background information possessed by the adversary. If the adversary exploits the public knowledge such as the urban map, some dummy locations generated by this method could be identified easily because they may be in a river or on a mountain.

SpaceTwist is a classical privacy preserving method without the trusted third anonymous server, which employs an anchor to replace the user’s real location [30]. In addition, users can gain the accurate result according to their location information. Although this algorithm realizes location privacy protection, the degree of privacy preserving is not high.

In [29], Niu et al. presented two rules to generate dummy locations: (1) the query probability of each dummy should be similar to that of the user’s real location and (2) under the premise that the query probability of the locations remained unchanged. However, these locations should be spread as far as possible. Moreover, since the procedure of the algorithms runs on mobile clients, the amount of computing is relatively large. Thus, they have high requirements on computing power and storage space of mobile clients.

In [31], Hara et al. proposed a location privacy preservation method that can be applicable to a real environment, considering the traceability of the dummy locations to simulate the mobility of real user. This method can anonymize the users’ locations even against continuous observation.

2.3. Geohash Coding Application

Geohash coding is proverbially applied in many fields with the advantages of rapid retrieval. In [19], Liu et al. proposed the Geohash to build a distributed spatial index for the distributed memory in geographical information system (GIS) fields. This method promotes the reading and writing performance of spatial data and provides a solid technical foundation for high-performance geographical computation.

In [34], Li et al. presented a location-aware wireless body area network (WBAN) data monitoring system on the not only structured query language (NoSQL) database system based on Geohash spatial index, which can efficiently process location-based queries for medical signal monitoring. This novel data analysis method advantageously promotes the development of medical analysis services.

In [35], Guo et al. proposed an adaptive Hilbert-Geohash meshing and coding method called AHG in the geographic meshing system, which could represent both the location and the approximate size of the coded object directly by the meshing hierarchy and the corresponding coding length. AHG shows favorable stability and scalability besides its capability in accelerating spatial query. The method is now applied successfully to several spatial query tools in a high-performance geographic information system called HiGIS.

The Geohash coding technique [19] can encode the geographic coordinates and perform fast spatial neighbor search, avoiding the time overhead of complex space calculation and speeding up the retrieval. In addition, by mapping the user’s precise location into the spatial region by encoding, the user’s location privacy information can be effectively protected, which is very practical. Therefore, in this paper, we improve the Geohash coding technique to reduce time overhead and provide secure location services.

Geographical indistinguishability means that the user’s real location and approximate location are indistinguishable within a specified range, which is an extension of traditional differential privacy. In order to protect user’s privacy when dealing with LBS, Chatzikokolakis et al. [36] proposed two mechanisms to achieve geographic indistinguishability and extended them to location tracking, while providing a restriction due to the correlation between points sequence leads to the degradation of privacy.

If there is no privacy protection in the geographic location, users may be unwilling to share spectrum with secondary users. Therefore, Dong et al. [37] proposed the LpriDSS method to realize the geographical indistinguishability of the primary user while the first spectrum manager selects the spectrum sharing auxiliary user. User’s privacy leakage is a common but fatal problem. Taking into account the randomness of task arrival and the complexity of task allocation, Liu et al. [38] proposed an online control mechanism to maximize the profit of the system platform while ensuring the stability of the system, providing personalized location privacy protection.

Existing location privacy protection schemes have their own advantages. However, most methods ignore the time overhead of retrieving historical data, which causes the speed tardiness of providing location services and affects the experience of users for LBS. In this paper, combining the convenience of the Geohash coding, our LPPS-AGC can be capable of encoding the three-dimensional location of user. Through coding, it can quickly retrieve and single out the candidate locations with the same code. Meanwhile, we execute corresponding process for locations in the candidate anonymous location set according to the value, providing an anonymous location set that satisfies user’s privacy requirements and achieving customized anonymous protection. As a result, our LPPS-AGC can not only speed up the anonymous processing time but also provide personalized privacy protection with great application prospects.

3. Preliminaries

In this section, we describe the related definitions and Alt-Geohash coding method.

3.1. Related Definitions

First, we declare the concept of location -anonymity.

Definition 1 (location -anonymity, see [22]). Location -anonymity, formulated by Gruteser and Grunwald, employs user’s location requesting service indistinguishable from other locations to guarantee the probability of identification successfully becomes .

Here, describes the desired anonymity degree, which is defined by the user. Larger implies higher degree of anonymity.

In this paper, the location query information sent by the user is defined as follows.

Definition 2 (location query information ()). Location query information is in the form of (, , , , ), which describes the information sent by the user to request information about LBS, where (i) is the user identity(ii) is the real location of the user. It is a triplet (, , ), where , , and represent the longitude, latitude, and altitude of the user’s location, respectively(iii) is the current query time(iv) is the content of the query(v) is the privacy demand

In our strategy, the anonymous server constructs an anonymous location set to replace user’s real location and sends it to LSPs, which protects the location query information . Anonymous location set is defined as follows.

Definition 3 (anonymous location set ()). Anonymous location set is defined as a set of locations containing user’s real location and the remaining locations generated by an anonymous server, which is in the form of (, , ,..., ) (1), where (i) is the real location of the user(ii) () represents the th location generated by an anonymous server

It is worth noting that becomes the anonymous request (, , , ) through the construction of by an anonymous server.

In order to protect the location privacy information, we transform the latitude and longitude of the user’s location to the interval region, which is defined as follows.

Definition 4 (interval region ()). Extending the latitude and longitude coordinates of location to the region represented by a range of intervals is called the interval region, denoted by ([, ], [, ]), where (i)[, ] () is the latitude interval(ii)[, ] () is the longitude interval

Figure 1 illustrates an example of interval region. As can be seen from the figure that the latitude and longitude coordinates of user marked by the green five-pointed star are (, ). Now, we extend and in latitude and longitude, respectively, which are indicated by white arrows in the figure. That is to say, expands to [, ], expands to [, ]. In this way, [, ], [, ]. Then, the interval region corresponding to the user is ([, ], [, ]), which is marked with a black border.

After extending the precise location to the interval region, Geohash coding is applied to retrieve the historical query locations in the region. Geohash coding is defined as follows.

Definition 5 (Geohash coding [20]). Geohash coding is the process of converting two-dimensional latitude and longitude coordinates into one-dimensional strings. The one-dimensional string is called Geohash code, representing a rectangular region.

In this way, user’s location privacy can be protected sufficiently. In Geohash encoding, the Base32 encoding method [39] is applied. Base32 encoding converts the 5-bit binary number into a decimal number and then encodes the 32 letters using 0 to 9, to (removing , , , ), thus reducing the length of the code.

Combining with the altitude of a location, in this paper, we present the Alt-Geohash code to indicate user’s location.

Definition 6 (Alt-Geohash code). Alt-Geohash code is a one-dimensional string that is converted from location (, , ). It contains user’s Geohash code and the code corresponding to the altitude of user’s location.

By the lights of reference [19], after encoding the location coordinates, we search for similar locations by reverse retrieval.

Definition 7 (reverse retrieval). Reverse retrieval describes the process of seeking historical query records from the trusted third anonymous server based on the Alt-Geohash code of a user and finding out the corresponding location of the same code.

Figure 2 gives an example of reverse retrieval. As the figure depicted, suppose that the Alt-Geohash code of the user’s location is “wtw3v4.” Now, we reversely retrieve the historical request record database in an anonymous server and then select the corresponding first and last location codes marked in red, which are the same as the real user’s Alt-Geohash code. Thus, (, , ) of these two records are (31.281991, 121.510708, 492) and (31.248284, 121.53277, 492), respectively.

3.2. Alt-Geohash Coding Method

Geohash encoding, which is often used to retrieve geospatial data, mainly employs the latitude and longitude of the encoded location to quickly match in order to find similar locations in geographic space. Considering the altitude of location, we propose Alt-Geohash coding which can convert the user’s spatial three-dimensional location into a one-dimensional string. Specifically, the Alt-Geohash encoding process takes the following five steps:

Step 1: divide the earth latitude range (-90, 90) into two intervals (-90, 0), (0, 90). Here, negative number represents south latitude, while positive number represents north latitude. If the target latitude is in the previous interval, the code is 0; otherwise, the code is 1. Divide the interval into two intervals and code in the same way until the accuracy meets the requirements, attaining the latitude code.

Step 2: divide the earth longitude range (-180, 180) into two intervals (-180, 0) and (0, 180) and encode the target longitude via the same method as Step 1, obtaining the longitude code. Here, negative number represents west longitude, while positive number represents east longitude.

Step 3: merge the 0-1 codes of longitude and latitude codes from left to right to attain the merged 0-1 code, where the odd bits are latitude codes, the even bits are longitude codes, and the even numbers are started from 0.

Step 4: due to Base32 encoding method is employed, the merged 0-1 code obtained in Step 3 converts the 5-bit binary number into a decimal number from left to right. Utilize Base32 code in Figure 3 [39] to obtain the latitude and longitude codes.

Step 5: divide the target altitude by 100 and then connect to the codes obtained in Step 4, attaining the final Alt-Geohash code.

The following example illustrates the Alt-Geohash encoding process.

Suppose that the location of user is (39.92324, 116.3906, 492). The latitude range (-90, 90) is divided into two intervals (-90, 0) and (0, 90). Since the latitude 39.92324 belongs to (0, 90), the code is taken as 1. Then, we divide (0, 90) into two intervals of (0, 45) and (45, 90), and 39.92324 belongs to (0, 45), so the code is 0. By analogy, get the latitude code as 1011 1000 1100 0111 1001 by Step 1. Similarly, the encoding of longitude 116.3906 is 1101 0010 1100 0100 0100 by Step 2. Next, the longitude and latitude codes are combined to obtain the merged 0-1 code 11100 11101 00100 01111 00000 01101 01011 00001 by Step 3. Then, Base32 encoding is performed to obtain the encoding of (39.92324, 116.3906) as “wx4g0ec1” by Step 4. Thus, for the altitude , the user’s Alt-Geohash code is “wx4g0ec14” by Step 5.

4. Algorithm Design

In this section, the LPPS-AGC system model and algorithm design are elaborated.

4.1. System Model

We adopt a centralized system architecture [40, 41], which contains the location request users, the trusted third anonymous server, and LSPs. Among them, the trusted third anonymous server conceals the user’s location into to request LBS, achieving location privacy protection. Then, the anonymous server filters the returned responses to reduce the communication overhead. The system architecture of LPPS-AGC is shown in Figure 4. As depicted in the figure, the LBS request process is as follows: (1)Users who need the LBSs transmit (, , , , ) through the mobile terminals to the trusted third anonymous server(2)Anonymous server constructs an (, , , …, ) through LPPS-AGC(3)The anonymous server transmits (, , , ) to LSPs(4)LSPs query the location database to return the query results according to the transmitted request information by the anonymous server(5)The trusted third anonymous server filters the returned query results(6)The filtered results are returned to the users, finishing LBSs

The trusted anonymous server [30] plays an important role in protecting the location privacy of user. Specifically, LPPS-AGC includes the following 5 steps:

Step 1 (location generalization): receiving launched by users, the anonymous server generalizes the latitude and longitude of locations into an .

Step 2 (Alt-Geohash coding): the generalized location is encoded to facilitate the retrieval of neighbor locations by executing Alt-Geohash coding.

Step 3 (reverse retrieval): perform a reverse retrieval based on the code generated in Step 2 to find the same code of locations and construct the candidate anonymous location set .

After that, we compare the number (that is, value, as shown in Figure 4) of locations in with the user-defined value. If -1, execute Step 4. If , execute Step 5. If , take generated in Step 3 as to generate and transmit it to LSPs.

Step 4 (dummy location generating): is evenly divided into subregions. A dummy location is randomly generated in each subregion and an is formed.

Step 5 (location filtering): multiple locations in are classified based on user’s real location, and the number of locations in each category is selected according to the proportion of each category in the total numbers of locations in . Thus, which satisfies user’s privacy requirement is constructed.

The detailed algorithms implementation of LPPS-AGC is described in the following.

4.2. Location Generalization Algorithm

The location generalization algorithm (LGA) extends the latitude and longitude of location into the interval region defined in this paper. Moreover, the purpose is to obscure the location of user and reduce the precision of latitude and longitude coordinates. The detailed pseudocode of LGA is presented in Algorithm 1.

Input: user’s location (, , ).
Output: interval region ([, ], [, ]).
1: Generate four random numbers from 0 to 0.025: , , , and , which are used to extend latitude and longitude coordinates
2:
3:
4:
5:
6: return ([, ], [, ])

The algorithm generates four random numbers, each of which is randomly assigned to add/subtract with the user’s longitude/latitude, ensuring that the latitude and longitude are only subjected to one addition and one subtraction. According to the principle that the small number is in front and the large number is in the latter, the longitude interval and the latitude interval are formed, respectively, obtaining the final interval region.

For the setting of the random number range, the following two points need to be explained: (1)The reason why the range of generating random numbers is set from 0 to 0.025 is that the one degree of latitude of geographic location corresponds to 111 kilometers. Thus, 0.025 latitudes correspond to 2.5 kilometers. We can obtain that the maximum length of the longitude interval and the latitude interval is 0.05. Therefore, the region indicated is approximately: , which is enough to ensure a wide anonymous protection region(2)The purpose of randomly generating four random numbers is to prevent the attacker from predicting the user’s precise location. If merely one random number is generated, the longitude and latitude fluctuate forward and backward, respectively. According to the symmetry of the interval, the amplitude of the fluctuation can be easily calculated. Thus, the user’s latitude and longitude coordinates are also exposed. Similarly, if two random numbers are generated, the latitude and longitude can also be predicted by attackers. Therefore, four random numbers are randomly generated here, which can protect the location of the user triumphantly

Obviously, the time complexity of LGA is , which is simple and easy to implement.

4.3. Alt-Geohash Coding Algorithm

After the interval region is determined by LGA, the Alt-Geohash code is generated by the Alt-Geohash coding algorithm (AGCA) for location retrieval.

Concretely, AGCA initializes two variables and , respectively. stores the Geohash code of the generated interval region, while preserves the intermediate generated 0-1 code temporarily.

Then, AGCA takes the median of the earth latitude range and longitude range , respectively. The longitude interval is processed as follows: (i)If all the numbers in the longitude interval are less than or equal to the median, adds 0 and updates the upper limit of the longitude range of the earth to the median(ii)If all the numbers in the longitude interval are greater than the median, adds 1 and updates the lower limit of the longitude range of the earth to the median(iii)If there exist partial numbers in the longitude interval which are greater or less than the median, the encoding ends

Encoding the latitude interval is the same as the longitude interval encoding process. Iteratively repeating the coding process, we can get the 0-1 encoding of the interval region.

Next, we encode the height of the user’s location, divide it by 100, and round it up. The resulting value is added to to get the final Alt-Geohash. Note that the division 100 here is determined by the characteristics of the Geolife dataset [42] we employed, and the purpose is to obtain the encoding of 0-9. In addition, the length of is calculated for the next stage location retrieval to protect user’s location privacy.

The detailed pseudocode of AGCA is presented in Algorithm 2.

Input: interval region , altitude of user’s location .
Output: code of interval region Alt-Geohash, length of Geohash code .
1: ,
// is the range of latitude, is the range of longitude
2: ,
3: whiledo
4: sum
5: sum
6: while TRUE do
7:  if and then//longitude
8:  
9:  
10:  else if and then
11:  
12:  
13:  else break
14:  end if
15:  if and then//latitude
16:  
17:  
18:  else if and then
19:  
20:  
21:  else break
22:  end if
23: end while
24: end while
25: According to the Base32 code shown in, will be encoded as
26: //altitude
27: -
28: //len() represents the length of code
29: return Alt-Geohash,

To illustrate the process of of Alt-Geohash coding, we give an example in the following.

Suppose the real location is (121.51071, 31.281991, 397). The interval region obtained in the location generalization stage is ([121.51050, 121.51105], [31.281801, 31.282003]). According to AGCA, we iterate the latitude interval processing via the dichotomy for 0,1 coding.

Specifically, according to AGCA, the earth latitude range is divided into two intervals and , and the latitude interval falls in the latter half , coded as 1. Then, we divide into and , and the latitude interval falls in the first half [], coded as 0. By analogy, the dichotomy is used to divide the earth’s latitude range for 0-1 coding, and the iteration is stopped until the partition interval cannot completely cover the interval region.

Table 1 describes the process of latitude interval coding of the example. As table shown, in 19th round, cannot be completely covered by the divided interval 1 [] and the divided interval 2 . The longitude interval adopts the same method. After the 0-1 encoding is completed, the Base32 code is used for further encoding, obtaining the code of this interval region: “wtw3vn9.” Finally, the altitude is divided by 100, that is, is obtained. Therefore, the Alt-Geohash code is “wtw3vn93.”

Theorem 8. The time complexity of AGCA is , where is the number of encoding.

Proof. The time for encoding the interval region is spent on the division of the interval, that is, step 3 to step 25 in AGCA. Steps 6 to 23 deal with the latitude and longitude encoding of the location. In the worst case, when the accuracy of the divided interval is smaller than the range of the interval region, the encoding ends. That is, the number of iterations satisfies the following two conditions: where and are the length of the longitude interval and the latitude interval, respectively. Simplifying it, we obtain . Step 26 handles the altitude of the location, and its time overhead is . Therefore, the time complexity of AGCA is (max{, }).☐

4.4. Reverse Retrieval Algorithm

After the interval region of the user is encoded by AGCA, the history database of the third party anonymous server is retrieved based on the code. Specifically, for each location in the history request record, the reverse retrieval algorithm (RRA) calculates the Alt-Geohash code according to AGCA and the Geohash code length , denoted as . Then, if is the same as the interval region code Alt-Geohash, add this location to , and accumulates 1. Otherwise, the next location in historical request record location data set is performed. Finally, the algorithm returns the candidate anonymous location set and the location number .

The pseudocode of RRA is elaborated in Algorithm 3.

Input: code of interval region Alt-Geohash, length of Geohash code , anonymity degree , historical request record location data set (), ( is the three-dimensional location, is the number of locations).
Output: candidate anonymous location set , number of locations within the candidate anonymous location set .
1:
2:
3: fortodo
4: Call Algorithm 2 (AGCA), calculate the Alt-Geohash code of , denoted as ; calculate the code length
5: if == Alt-Geohash then
6:  
7:  
8: else
9:  continue
10: end if
11: end for
12: return,

Obviously, the time complexity of RRA is .

Since the number of locations in retrieved by RRA does not necessarily meet the privacy requirements of the user, we propose Algorithm 4 and Algorithm 5 in the following to provide personalized location privacy protection for the user. If the output of RRA , the real user is added to as , LPPS-AGC anonymous processing ends, and is sent to the LSP. If then execute Algorithm 4 dummy location generating algorithm (DLGA). If , Algorithm 5 location filtering algorithm (LFA) is executed.

4.5. Dummy Location Generating Algorithm

When the number of location in is less than the user’s privacy requirement value, we employ DLGA to generate locations for location privacy protection. The core of this algorithm is to uniformly divide the interval region into subregions and generate a dummy location in each subregion to protect user’s real location. It is worth noting that DLGA can generate dummy locations according to user’s preference, that is, the value set by the user himself, which can provide users with personalized privacy protection services and avoid singularization of the services provided.

DLGA first calculates the lengths of the longitude interval and the latitude interval and , respectively.

If is an even number and (that is, the length of longitude interval is longer), DLGA divides the longitude interval into equal parts and randomly selects a location in each subinterval. Among them, the longitude is within the range of the longitude subinterval, and the latitude is within the latitude interval, while the altitude is the same as the altitude of the user’s location. If is even and (that is, the length of latitude interval is longer), DLGA divides the latitude interval into equal parts. Meanwhile, the longitude of the selected location is within the longitude interval, and the latitude is in the latitude subinterval, the altitude is the same as above.

If is an odd number and , DLGA divides the longitude interval into equal parts. In order to ensure that locations are generated, the latitude interval is divided into 2 equal parts, and a location is randomly selected in each subinterval. Similarly, if is an odd number and , the latitude interval is divided into equal parts. To ensure that locations are generated, the longitude interval is equally divided into two parts, and one location is randomly selected in each subregion.

Finally, the user’s real location is added to as well, and the containing locations is successfully constructed. The pseudocode of DLGA is presented in Algorithm 4.

Input: interval region , anonymity degree , user’s real location .
Output: anonymous location set .
1: , ,
2: ,
3: if and then
4: forto-do
5:  
6:  
7:  
8:  
9:  
10: end for
11: else if and then
12: fortodo
13:  
14:  
15:  
16:  
17:  
18: end for
19: else if and then
20: fortodo
21:  fortodo
22:  
23:  
24:  
25:  
26:  
27:  end for
28:  
29: end for
30: else
31: fortodo
32:  fortodo
33:  
34:  
35:  
36:  
37:  
38:  end for
39:  
40: end for
41: end if
42: return

Now, we give the time complexity analysis of RRA in the following.

Theorem 9. The time complexity of DLGA is .

Proof. In Algorithm 4, steps 1 to 2 are simple calculations, in which time overhead is , preparing for later calculations. Steps 3 to 42 are the core of the algorithm, where the interval region is divided according to the parity of the value and the location is selected within the subregion. Steps 3 to 18 are actions performed when is even. It is divided into two processes by the length comparison result between the longitude interval and the latitude interval. Since the longer length intervals are also evenly divided into subregions, while the shorter length interval is not divided, and dummy locations are generated iteratively, the time overhead is . Steps 19 to 41 are actions performed when is an odd number. Similarly, there are two cases. The longer interval is divided into , and the shorter length is divided into 2 parts. Then, the location points are selected, respectively. At this time, the time overhead is . Therefore, the total time complexity of DLGA is .

4.6. Location Filtering Algorithm

The candidate anonymous location set (, ,…, ,…, ) is generated by RRA algorithm and contains locations, where consists of longitude , latitude , altitude . These locations are candidates for anonymous protection.

When is greater than privacy requirement value, we need to filter the locations in the interval region to select the locations that are most similar to real user to improve privacy. Therefore, we design location filtering algorithm (LFA) to implement it. Specifically, LFA divides the locations in into four categories , , , and according to the latitude and longitude values of the real user and then put them in four lists. After that, the number of locations (, , , ) in each list is counted, respectively. Accordingly, the proportion of the locations number of every category in the total numbers of is , , , and , respectively.

Note that in order to meet the privacy requirements of user and implement -anonymity protection, LFA selects , , , and locations in , , , and , respectively, which are added to . Finally, the real location of the user is added to as well to complete the anonymous processing. The detailed pseudocode of LFA is elaborated in Algorithm 5.

Input: candidate anonymous location set () ( ), location number of candidate anonymous location set , anonymity degree , user’s location .
Output: anonymous location set .
1: //initialize four empty lists to hold the four categories of locations.
2: Fortodo
3: If and then
4:  
5: Else if and then
6:  
7: Else if and then
8:  
9: Else
10:  
11: End if
12: End for
13: Count the number of locations in and , respectively, denoted as , , ,
14: Randomly select , , , and locations from the list , , , and , respectively
15:
16: Return

Theorem 10. The time complexity of LFA is , where is the number of locations in .

Proof. In LFA, the main work is to divide the locations in the candidate anonymous location set into four categories according to the real location of user. Steps 1 to 12 compare the latitude and longitude of each location in with the latitude and longitude of the user’s real location, which the time overhead is . Then, steps 13 to 14 randomly select the corresponding locations in the four categories of lists according to different proportions, which the time overhead is . Since , thus, the time complexity of this algorithm is .

4.7. Security Analysis

In some scenarios, a malicious attacker may collude with some users or LSPs to obtain the available private information about the legitimate user. Fortunately, our strategy can triumphantly resist some attacks of adversaries, such as map-matching attack [43], centre-of-ASR (anonymized spatial region) attack [43] and colluding attack [44], demonstrating superior security. In the following, we give the security analysis of our strategy.

Map-matching attack means that an attacker restricts the obfuscation region to certain locations where users can be located to find the real user by eliminating specific dummies based on map details [43]. Since our strategy adds height to the location information, there will be multiple users with different heights in the same location point. Even if a malicious attacker locates the user’s location, that is, the user’s longitude and latitude, through the positioning system, he cannot distinguish between real users and dummies. At the same time, our strategy also satisfies -anonymity. Therefore, our strategy can resist map-matching attacks.

ASR (anonymized spatial region) is the region used to hide the user’s actual location information during the anonymization process. In case of dummy generation-based techniques, if the actual user’s location is in the center of the region composed of dummy locations and the real user location, a centre-of-ASR attack [43] will occur. In our strategy, after adding the height to the user’s location information, more dummy users’ locations can be constructed in an anonymous region. Moreover, these dummy users’ locations are more similar, which is more confusing. In this way, it is difficult for an attacker to attack real user. Hence, our strategy can resist centre-of-ASR attack.

Colluding attack means that the probability of successfully deducing the real location of user out of anonymous location information generated by certain strategy does not increase with the growth of the size of the colluding group [44]. Since our strategy employs random numbers in the interval region, there are random factors in DLGA and LFA. Note that the generation of anonymous location set is irregular. Thus, the attacker cannot infer the real location of the user based on the anonymous location set. The probability of inferring the real user is still . At the same time, users are independent of each other and do not affect each other; therefore, this strategy can resist colluding attack by attackers and malicious users.

5. Simulations

In this section, we verify the performance of our proposed LPPS-AGC through simulation experiments. The experimental configuration settings and the analysis of the experimental results are described in detail.

5.1. Experimental Settings

The experimental environment is 64 bit Windows 7 system with Intel (R) Core i5 Processor (2.30 GHz) and 8 GB memory. The programming language is Python on PyCharm.

The location data is from the Geolife dataset [42], which collects the trajectory data for 182 users from April 2007 to August 2012. The dataset includes a series of time sequence locations, covering latitude, longitude, altitude, and so on. It contains 17,621 trajectories with a total distance of more than 1.2 million kilometers and a total time of more than 48,000 hours. These data record not only the location of user’s home and work but also a wide range of outdoor activities such as shopping, traveling, hiking, and cycling.

Without loss of generality, we apply the first record of a user in the Geolife dataset as the real location, and the remaining records as historical records to construct anonymous location set in our experiment. The relevant parameters of attributes are shown in Table 2.

5.2. Comparison of Time Overhead in Processing Data

Most existing techniques apply distance calculation to construct candidate anonymous location set. Suppose two locations are and , respectively. Thus, the universally accepted Manhattan distance [22, 45] and Euclidean distance [46] between and are measured by

In our experiment, we execute the algorithms 100 times and average the output results for comparison. Then, we evaluate the efficiency of our Alt-Geohash encoding technique.

Figure 5 shows the comparison of the time in processing different data volumes of Alt-Geohash encoding technique, Manhattan distance, and Euclidean distance measures. The result reveals that as the amount of data increases, the time overhead of the three methods all increases. Obviously, our Alt-Geohash encoding technique has an excellent advantage in terms of time overhead.

We can see that once the amount of data reaches 900, the other two methods take more than 1 s, while Alt-Geohash coding takes less than 0.2 s. It is because that Alt-Geohash coding simply turns three-dimensional location into one-dimensional strings without any redundant calculating. As for distance-based retrieval methods, distance calculating is indispensable, and the time taken is greatly grown with the increasing of the processed data volume. In particular, since Euclidean distance measure exists in multiplication operation, it immensely increases the time overhead. Therefore, Alt-Geohash coding retrieval is faster and more suitable for processing large volumes of data.

5.3. Comparison of Time Overhead in Generating Dummy Locations

Next, we consider the impact of the number of generated dummy locations on time overhead. We compare our dummy locations generating algorithm (DLGA), dummy-based user location anonymization algorithm (DBULA) [31], and random generating dummy locations strategy [47].

As shown in Figure 6, the time overhead of these three algorithms aggravates with the increasing of the number of dummy locations. Since the random strategy randomly selects locations within the specified region, its time overhead is the smallest. However, its anonymous protection effect is very inferior. Furthermore, since the randomly generated dummy locations do not consider the rationality of the locations, an attacker may filter out many locations, leading to the failure of anonymous protection. DBULA generates dummies around the user in a grid shape to satisfy the anonymous region requirement. The distance between two adjacent locations is , which is calculated as follows: where is the area of the anonymous region and indicates the anonymity degree. The generating dummy location satisfies where is the th dummy location to be generated, is the location of the user, is the number of user location, and is the index of the th dummy location.

Because DBULA carries out aforementioned iterative calculations when generating dummy locations, its time overhead is longer than that of our DLGA. Recall that DLGA divides the interval region into parts and then randomly selects among them. In addition, in an actual scene, the position of the generated dummy may not be located on the road. In that case, our strategy can adjust the generation of dummies so that they are located at the nearest intersection closest to the road, which ensures the rationality of the dummy locations.

5.4. Comparison of Anonymous Processing Time on Different Algorithms

Now, we compare the anonymous processing time with the value of for different anonymous algorithms, including Spacetwist [30], KABC [12], DLS [29], Casper [23], and our LPPS-AGC. Here, the historical request record in the simulation is 500.

As depicted in Figure 7, with the increase of value, the time overhead of all algorithms or strategies increases. The only fortunate that the time overhead is minimal in our strategy.

Concretely, since SpaceTwist utilizes an anchor near the client to replace user’s real location and every user obtains the accurate result according to his location information, it has the longest anonymous processing time without the trusted third anonymous server.

When Casper algorithm generates dummy locations, the generated anonymous region is too large, which will generate a large number of redundant locations. Thus, the time overhead is more excessive than the rest of the algorithms.

The anonymous processing time of KABC is similar to that of Casper. It is because that KABC applies a clustering algorithm and has a large time overhead. By virtue of historical request record database, DLS fully considers the background information. Based on the query probability, a round of dummy locations selection is performed. Thus, the number of redundant locations shrinks extremely, and the algorithm time complexity of DLS is better than that of Casper algorithm.

Our strategy executes AGCA, which tremendously reduces the time overhead. In addition, DLGA and LFA in our strategy can quickly provide secure personalized location services. Therefore, our strategy has a huge advantage in terms of time overhead.

5.5. Comparison of Anonymous Protection Effect on Different Algorithms

We employ anonymous entropy to evaluate anonymous protection effectiveness, which indicates the degree of location anonymity protection [24], the formula is as follows: where is the probability that the real user is recognized and is the anonymous entropy. The greater the value of the anonymous entropy, the better the effect of anonymous protection.

Figure 8 illustrates the comparison of the anonymous entropy on value of different algorithms, including random strategy [47], Casper [23], DLS [29], enhanced-DLS [29], and our LPPS-AGC.

It can be seen from Figure 8 that as the value of increases, the anonymous entropy of different algorithms also increases. The random strategy performs worse than other schemes, since it just generates dummy locations randomly without considering the background information.

The Casper algorithm has a better degree of anonymity protection than random strategy because it greatly deduces the probability that the real user can be recognized by means of the history request information of the third party anonymous server. However, since Casper generates many redundant locations, its anonymity effect is not very good as well.

In contrast, since the enhanced-DLS and DLS algorithms consider the background information and the generated dummy locations are based on similar probability of queries in history request record, both algorithms have higher anonymous entropy. Moreover, because enhanced-DLS further considers the degree of dispersion of the locations, it has higher anonymous entropy.

In our LPPS-AGC strategy, DLGA generates dummy locations uniformly distributed in the interval region. Meanwhile, fully considering the rationality of the generated locations, the user sets the value of according to his preference such that the generated locations can be dynamically adjusted according to the actual road situation information. Moreover, LFA can further select locations similar to the real user to improve the anonymous protection. Therefore, LPPS-AGC has more advantageous anonymous protection effect than that of other algorithms.

6. Conclusions and Future Work

The existing -anonymity mechanism employs anonymous server to construct an anonymous region for users, protecting location privacy. However, most work rarely ponders the huge time overhead of retrieving the database. Moreover, with the boom of IoT and positioning technology, user’s location has also evolved from the original two-dimensional to three-dimensional latitude and longitude and altitude. At present, few researches have considered the location protection by altitude.

In this paper, we propose a location privacy protection strategy based on Alt-Geohash coding (LPPS-AGC), which can effectively protect the three-dimensional coordinates. The strategy first generalizes user’s latitude and longitude into the interval region and then retrieves the location of the same code utilizing our Alt-Geohash coding algorithm (AGCA). After that, it provides user with personalized -anonymous privacy protection service according to user’s privacy requirements by dummy locations generating algorithm (DLGA) and location filter algorithm (LFA). In doing so, it not only considers the altitude of location but also speeds up the process of retrieval and reduces time overhead. Comprehensive analysis and simulation experiments show that our LPPS-AGC strategy has better superiority in performance and can achieve exceptional anonymous protection.

In the future, we will further investigate the location privacy protection scheme. Since mining a large amount of location information can analyze the social attributes of users, our future work will adequately consider the time dimension of users requesting location services and analyze user’s behavior models. A predictable dynamic location privacy protection approach may be proposed, which can resist the uncertain insecure factors in advance.

Data Availability

The data used to support the findings of this study are available from the website: “https://www.microsoft.com/en-us/download/confirmation.aspx?id=52367”.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work is supported by the Humanity and Social Science Fund of the Ministry of Education under Grant 20YJAZH078 and 20YJAZH127, Open Project of Tongji University Embedded System and Service Computing of Ministry of Education of China under Grant ESSCKF 2019-06 and ESSCKF 2019-08, and National Innovation and Entrepreneurship Training Program for College Students under Grant 201910424014.