Abstract
Applying the proliferated location-based services (LBSs) to social networks has spawned mobile social network (MSN) services that allow users to discover potential friends around them. While enjoying the convenience of MSN services, the mobile users also are confronted with the risk of location disclosure, which is a severe privacy preserving concern. In this paper, we focus on the problem of location privacy preserving in MSN. Particularly, we propose a repartitioning anonymous region for location privacy preserving (RPAR) scheme based on the central anonymous location which minimizes the traffic between the anonymous server and the LBS server while protecting the privacy of the user location. Furthermore, our scheme enables the users to get more accurate query results, thus improving the quality of the location service. Simulation results show that our proposed scheme can effectively reduce the area of anonymous regions and minimize the traffic.
1. Introduction
Internet of Things (IoT), a trend of future networks, is immersed into many aspects of our personal and working lives and provides more comprehensive intelligent service. Social networks widely used in mobile Internet catalyze mobile social networks (MSNs), and users in MSN can not only acquire their own location information and sign in a location but also find nearby friends and access to location-based services (LBSs) such as finding the nearest hotel, finding directions, sharing action tracks, obtaining information of body area networks, and so on [1–7]. However, when we enjoy the convenience of LBS and MSN services, the mobile users also are confronted with the risk of location disclosure, which is a severe privacy preserving concern [8–12].
MSN services (MSNS) have a wide range of applications in people’s daily life, where location-aware information plays a very important role [13]. In addition to supporting real-time services, location-based applications can predict the behavior of the users by analyzing the user’s position traces to obtain the user’s interest preferences and make user interest recommendations. With the development of MSNS, location privacy issues are gradually attracting more and more people’s attention. The location information includes the user’s identity information, location coordinates information, time stamp, and other sensitive information. Although analyzing the users’ locations and trajectories can better support MSNS and recommended services, it is easier for attackers to attack the user’s location information so as to expose user’s privacy. With the use of MSNS, mobile users are increasingly aware of the risk of privacy disclosure when enjoying location services [14–18].
Location-based privacy protection based on LBS is designed to prevent malicious attackers from gaining access to the mobile user’s location (or motion trajectory) to prevent user information from being compromised. Since the concept of LBS has been proposed, location-based privacy protection has quickly been raised and developed into academic research hot spots. The user’s location privacy mainly includes location privacy, trajectory privacy, and user identity privacy. Aiming at these three aspects, quite a few privacy protection methods have been proposed, including commonly used dummy location technology, temporal and spatial anonymous technology, pseudonym technology and other methods protecting the users’ location privacy, trajectory privacy, and identity information privacy [19–21]. Although the existing location privacy protection methods that apply to MSN can resist common privacy attacks [22], there are still some weaknesses to be resolved.
In this paper, we focus on the location privacy preserving in MSN aiming at larger communication overhead, larger range, and inaccuracy of query results for traditional anonymous schemes. The main contributions of this paper are summarized below.(1)We propose a repartitioning anonymous region for location privacy preserving (RPAR) scheme based on the central anonymous location. The anonymous region is divided into several subregions, users’ real locations are replaced by the central location, and a repartition is carried out to solve the tail anonymity user set after the anonymous region partition.(2)We analyze the superiorities of RPAR algorithm. RPAR minimizes the communication traffic between the anonymous server and the LBS server while protecting the privacy of the user location. Furthermore, our scheme enables the users to obtain the more accurate query results, thus improving the quality of LBS.(3)We simulate RPAR algorithm in extensive experiments. Simulation results show that our proposed scheme can effectively reduce the area of anonymous regions and minimize the traffic.
The rest of the paper is organized as follows. Section 2 reviews the related work. In Section 3, we give the preliminaries of location privacy preservation. Following that, in Section 4, the anonymous region repartition algorithm is proposed. In Section 5, simulations are given to verify the effectiveness of our proposed models and algorithms. Finally, we draw our conclusions and give the future work in Section 6.
2. Related Work
Recently, there have been considerable interests in the research of privacy preservation technologies for location-based services (LBSs) [23].
The -anonymity scheme was first proposed to be Sweeney’s application in a relational database [24] to protect sensitive properties by generalizing some of the nonsensitive properties in the database in case of stealing by adversaries. The generalization process makes the correlation between user information and their related record fuzzy, and every record in data table published is indistinguishable from other records. The scheme hides the link between the user and its corresponding information, guaranteeing the user’s privacy information security.
To protect location privacy, researchers like Gruteser first proposed to apply the -anonymity model to the location privacy preservation [25]. By constructing a location region composed of a query user and common users, the users’ geographical position information is generalized to be a -anonymity region. The users being generalized cannot be distinguished from other users in terms of identity information, geographical position information, and so on. Even though the adversaries can access a user’s location information in the anonymous region, they cannot identify the correspondence between the locations of the query information issued and the relevant user, protecting users’ privacy information. The location anonymity scheme, proposed in [26], splits a contiguous anonymous region into several scattered subanonymity regions and demonstrates its effectiveness.
The privacy preserving scheme covers not only -anonymity technology with strong applicability but also location privacy preserving scheme based on fake information. Bamba adopts virtual location information to hide real information in order to achieve location privacy preserving effect in [27]. Its main idea is as follows: Firstly users put forward location service requests and privacy requirements; then central anonymous servers send their fake position information and the real one to LBS servers; at last the LBS servers return query results back to anonymous servers, and the anonymous servers calculate the correct results and return them back to the corresponding users. Due to mixture of true and fake information, it is difficult for adversaries to distinguish the users’ actual query information; thus the users’ privacy information is protected.
Duckham et al. proposed to use obfuscation technique as an effective location privacy preserving mechanism [28] as well as position fuzzy region to process fuzzy query, satisfying the needs of privacy protection while sacrificing the quality of the service. Hence, how to strike a balance between location privacy preservation and service queries deserves our further study.
Ni et al. studied location privacy neighbor query based privacy preference in [29]; both the contradiction between the location protection and personal privacy preference and the balance between location privacy protection and query service quality problems are analyzed and summarized. Xu et al. proposed the location privacy region generation algorithm, centroid migration method [30], based on the spatial confusion location privacy protection technology; thus a certain deviation of centroid in the location privacy region was made. Ye et al. put forward a kind of location privacy protection scheme [31] based on active sharing mechanism; the main idea of this method is actively sharing the user’s location information by sharing mechanism, so as to reduce the users’ dependency on LBS servers and enhance the protective effect of the location privacy information.
In recent years, domestic and foreign researchers and institutions have widespread attention on location-based privacy preservation; increasingly in-depth studies are carried out in location-based privacy. Besides the above-described location privacy preserving technologies, there are a wealth of methods such as location data randomization [32], fuzzification of space or time data [33, 34], methods based on strategies and encryption [35–37], and sensitive semantic based security anonymity mechanism [33, 38–40].
In [41], we have proposed a preliminary location privacy preserving scheme via repartitioning anonymous region in MSN. However, this scheme did not elaborate the design motivation and the algorithm analysis. Also, this scheme did not implement the simulations to verify its performance.
3. Preliminaries
The -anonymity privacy preserving scheme was first applied to data releasing; extensive research and application are carried out after the application on the fields of location-based privacy preservation. Thereafter, anonymous region segmentation method is proposed to solve the insufficiency on large communication overhead of -anonymity. However, in many cases, anonymous regions that are partitioned once cannot achieve ideal effect; therefore we put forward the repartitioning anonymous region scheme for location privacy preserving (RPAR).
3.1. System Model
At present, although there are many communication modes that are unsuitable for the central anonymous servers, current location privacy preserving schemes mainly adopt the central server mode [42, 43]. Our scheme uses the classic central server mode, and the anonymous queries are processed through both the central anonymous servers and the LBS servers. As shown in Figure 1, the principles of the model are as follows:(1)When the user requests the location query service, all the query contents, location information, and parameters needed to be set are sent to the central anonymous servers.(2)After receiving the query information sent by users, according to certain rules, the central anonymous servers will generate an anonymous user set which meets the requirements, figure out the number of subanonymity regions, and then partition them; a few scattered subanonymity regions are yielded. When the subanonymity regions meet the requirements, their central location is computed to replace corresponding subanonymity regions to send requests to the LBS server.(3)The LBS server handles the query information sent by the central anonymous servers and returns the query results.(4)After the refinement process, the central anonymous servers return the corresponding results to the users.
3.1.1. Relevant Definitions
To describe RPAR scheme, it is necessary to introduce some relative knowledge of location anonymity in the division process. According to the requirements of the scheme, we refer to the relevant definitions like location -anonymity, nearest neighbor principles, etc., and several definitions such as central anonymity region location and the tail anonymity user set are also proposed.
Definition 1 ((location -anonymity) [44]). Suppose that there exists a mobile user whose location coordinates is . If the user and at least other users cannot be differentiated by location information after the generalization for this user, we can say that the users’ locations satisfy location -anonymity.
The users’ information forms a user anonymity set. Note that the least rectangular region which includes all the location -anonymity users is called the location -anonymity region. It is not difficult to see in Figure 2 that the rectangular box is an anonymous region with anonymous user set where .
Formally, we use to represent a location -anonymity region. Thus, according to certain rules, can be divided into discrete rectangular anonymity regions, represented by . It should be clear that these small scattered anonymity regions are all the subanonymity regions of continuous anonymity region . Figures 2 and 3 depict the status before and after the anonymous regions are repartitioned, respectively.
Definition 2 (central location of anonymous region). The location of the two diagonals of a rectangular subanonymity region is said to be its central location, which is represented by coordinates .
We will take central location as a fake location to issue location service requests by replacing the subanonymity regions.
Definition 3 (tail anonymity user set). After partitioning the location -anonymity region , if is not an integer, then the supernumerary users form a tail anonymity user set.
Definition 4 (nearest neighbor principle). Take a location point as the center, and find other location points in accordance with the priority principle of the nearest Euclidean distance from the center point.
4. Location Repartitioning Anonymous Region Scheme
In order to overcome the drawbacks of the exiting schemes, including the oversized subanonymity regions and the inaccuracy of query results during the anonymous region partition, we propose the repartitioning anonymous region for location privacy preserving (RPAR) scheme. The oversized subanonymity regions are further partitioned, and the central locations of the subanonymity regions replace those subanonymity regions to issue queries to the servers, so as to reduce the communication overhead and achieve relatively accurate query results.
4.1. Motivation
According to the traditional anonymity schemes, when the users request LBSs, their true locations will be replaced with anonymous region and issue requests to LBS servers. However, if the anonymous parameter is relatively bigger, then the anonymous regions will be relatively larger accordingly, especially apparent in the scenario where the users are sparse. Although the privacy preserving is in a high level, the accuracy of the request results is low, resulting in poor service quality.
Roman et al. [45] proposed anonymous partition algorithm for the first time, splitting the continuous -anonymity region into several discrete subanonymity regions. Compared with the traditional continuous anonymity regions, the method in [45] narrows the region of the anonymous regions, improving the service quality to a certain extent. However, the final returned candidate result set is still fuzzy by querying through subanonymity regions, and the communication traffic has a certain decrease but the overhead is still large.
In addition, when there are still remaining users after the subanonymity regions are partitioned (i.e., there remains tail anonymity user set), the method in [45] puts all the remaining users in one subanonymity region. As shown in Figure 2, in the anonymous region where , the red solid dots represent the users who issue the query requests, and the white dots represent other real users in the -anonymity region. We set the segmentation parameter , indicating that the anonymous user set is divided into three subanonymity regions, and the number of users in each subanonymity region is , as shown in Figure 3. However, the following problems will occur when the scenario in subanonymity regions is consistent with that in Figure 3:(1)The real locations of the users are replaced by the decentralized subanonymity regions in the corresponding regions to issued queries, which can still produce a large candidate result set and require a large amount of communication overhead.(2)When the tail anonymity user set, that is, the value of , is too large, it will result in the number of users in the subanonymity regions divided by the tail anonymous user set being larger than that in other subanonymity regions, resulting in great difference in query accuracy.(3)As shown in Figure 3, if one of the subregions is oversized or even close to the area of the -anonymity regions, the total area of subanonymity regions is greater than , segmentation algorithm will return back to the -anonymity algorithm with fewer users. The locations of the real users are far from the central region, causing queries to be not accurate.
In order to overcome these weak points, we construct RPAR scheme. Using the central location of the subanonymity regions instead of the subanonymity regions to issue queries to the servers, the communication traffic is greatly reduced. When total area of subanonymity regions is greater than , the largest subanonymity region is repartitioned into the other subanonymity regions, as shown in Figure 4, until the total area of the subanonymity regions is not greater than anymore.
For the query accuracy problem caused by the oversized granularity of the tail anonymity user set, the users in the tail anonymity user set are partitioned into other subanonymity regions in accordance with the nearest neighbor principle.
4.2. Basic Idea of RPAR
Combined with Figures 2, 3, and 4, the basic idea of RPAR scheme is elaborated as follows:(1)The solid red dots represent the users initiating query who find users with the query users who can form -anonymity regions according to the nearest neighbor principle, and users information set is recorded. It can be seen in Figure 2 that .(2)According to the parameter , that is, the number of subanonymity regions, the users are divided into subanonymity regions, so the number of users each subanonymity region contains is . The mobile users (red dots) are as the center to search other nearest users and form the first subanonymity region.(3)A user from the rest of the users that is not in the first subanonymity region is randomly selected as the central point. According to the nearest neighbor principle, the subanonymity regions are formed with the user and other 3 users who are not in the first subregion from anonymous region, until the remaining user number is 0 or below .(4)The tail anonymity user set is repartitioned into other subanonymity regions according to the nearest neighbor principle, and the subanonymity regions are updated.(5)The area of the subanonymity regions is calculated. If the total area of the subanonymity regions is greater than , users’ maximum subanonymity region will be repartitioned until the total area of the subanonymity regions is not greater than .(6)The central locations of all the subanonymity regions are computed which are used to replace their subanonymity regions to issue queries to the LBS servers.
4.3. Algorithm Design
Let be a user’s location service query. For each , represents the user identity, represents the query information, represents the location information, such as the user’s location coordinates , and is the privacy parameters including anonymous region parameter , homogeneity parameter (difference degree in anonymous query target regions), subanonymity region number , and user’s smallest total subanonymity area . The parameters like , , and are all user-defined. In practice, users can determine the values of of these parameters according to their own requirements and sensitivity. Thus, the algorithm can generate different degrees of anonymity to guarantee the performance.
For minimum area of anonymous regions, calculation probability is inferred through the Bayesian networks and the related background knowledge based on social network model, and minimum anonymous area is obtained by means of the maximum likelihood estimator. Figure 5 is the Bayesian network diagram for anonymous region estimation.
In the Bayesian network diagram, and represent users’ query information, represents the minimum anonymous area, User is the user’s query information, is the user’s query content such as hotel, hospital, and , respectively, represent the user’s identity and location information, represents the query time, and is the diversity parameter of query content which is not discussed in our paper.
In the process of calculation and estimation, without loss of generality, the relatively weaker nodes are ignored in order to simplify the computation complexity. The reasoning process of node is described as follows:
The probability distribution of the second and third nodes in the second layer is calculated:
Through the probability distribution of the first node in the first layer,
By (1) to (3), we havewhere represents all the background knowledge and represents the background knowledge of the child nodes. Here, the background knowledge refers to maps, historical query records, and historical query probability of a certain area, etc.
When users query location service , all the parameters are needed to be sent to the anonymous servers. The parameters include identity information , location coordinates , anonymous region parameter , subanonymity region number , minimum anonymity regions’ area , query content , and other users’ location coordinates. After the parameters are processed by the anonymous servers, the fake locations are sent to the LBS servers to request service queries. In practice, the area of the anonymous region is calculated based on the coordinates of the position points of the user in the upper left and lower right corners of the anonymous region.
According to the algorithm description, parameter settings, and the algorithm process, the pseducode of the detailed RPAR algorithm is elaborated in Algorithm 1.
|
4.4. Algorithm Explication
RPAR is proposed based on the existing -anonymity algorithm and partitioned subanonymity regions to solve the problems of the scattered subanonymity regions with total oversized area of anonymous regions and the oversized value of .
In the process of the subanonymity region partition, the nearest neighbor principle is used which means searching qualified users according to the distance with nearest and highest priority. The distance in the nearest neighbor principle is Euclidean distance, namely,where and are location coordinates of two points and is the distance between two points.
According to the locations and parameters of the users who issued the location service requests, the servers find out the other users nearest to this user, who satisfied the conditions. It is noted that, in the case of the repartitioning of the subanonymity regions, the nearest neighbor principle is still used to make the request users as the center and form the first subanonymous region by searching the users, where is the user number in the first subanonymity region. Then the coordinates of the central position in the first subanonymity region are calculated according ro the following formula:where , are each user’s location coordinates in the first subanonymity region and is the users’ number contained in this region.
After repartitioning the first subanonymity region, a user’s location is selected randomly as the center. Other subanonymity regions are formed in accordance with the nearest neighbor principle, and the area and the coordinates of central locations of subanonymity regions are calculated. When subanonymity regions are divided up, if , directly compare whether the total region of subanonymity regions is greater than . If the answer is exact, then repartition the biggest anonymous region until the total region is not larger than . If , then the users in the tail anonymity user set are repartitioned into the other subanonymity regions. After all the subanonymity regions are partitioned, the subanonymity regions are replaced with their central location to issue the location service requests to the LBS servers.
4.5. Algorithm Superiority Analysis
The traditional -anonymity scheme makes users form an anonymous region to issue queries to the LBS servers. The result of the queries generally is a candidate set. In the scenario with sparse users, the area of the anonymous regions will be large and the searched candidate set will also be correspondingly larger. Li et al. [26] divided the anonymous region into a few scattered subanonymity regions, using each subanonymity region to query; the query candidate set is small accompanied with precision improved; however a lot of communication overhead would cost.
Our RPAR scheme directly splits users into several distributed subanonymity regions instead of forming a -anonymity region. The regions are replaced with its center to issue requests to the LBS servers, which greatly reduce the communication overhead, and the accurate query results are obtained which referred to the central location of subanonymity regions, so each user can get their accurate query results by referring to the location. Therefore, compared with the traditional methods, the advantages of RPAR are obvious, especially when the users are sparse.
After dividing the anonymous regions [26], if , then the rest of the users are put into one of the anonymous regions. RPAR scheme repartitions the rest of the users into their nearest subanonymity regions in accordance with the nearest neighbor principle.
When the value of is relatively small, no matter how to divide the remaining users, the number of users in each anonymous region will not make a big difference, so the query precision degree of results returned also will not make a big difference.
While when the value of is larger, the method in [26] will lead to one anonymous region far larger than the others, the query precision has great difference. In our RPAR, after users are repartitioned according to the nearest neighbor principle, the users in each subanonymity region are much the same, so the result precision degree of the final queries will not be too different.
In this paper, the condition that the total area of subanonymity region is greater than is also considered in subanonymity region partition. The value of is defined by the user, which generally is the area of the -anonymity regions. As shown in Figure 3, if some of the area of subanonymity regions is too large or even close to that of the K-anonymity regions, the partition algorithm will be returned back to the -anonymity algorithm with fewer users. Therefore, when the total area of the subanonymity regions is greater than , the biggest subanonymity region is partitioned until the total area of the subanonymity regions is less than .
We use the central locations of the subanonymity regions instead of subanonymity regions to issue query requests to the LBS servers. As fake positions, the central locations well hide the true users’ locations. Considering the condition that some central positions would coincide with users’ actual locations, due to the scheme that all users in the subanonymity regions use fake positions, even if the query results are intercepted by adversaries, identification probability of actual users is at least .
5. Simulations and Performance Analysis
In this section, we implement RASA and analyze its performance.
5.1. Simulation Environment
As for the experimental environment, Windows 7 with 64-bit operating system with 4G RAM and Intel (R) Pentium (R) CPU [email protected] GHz processor is adopted. Thomas Brinkhoff’s mobile object generator [46] is used to generate the spatiotemporal data as the resulting dataset. The privacy preserving parameters of the experiment are determined according to the users’ requirements.
5.2. Simulations and Analysis
Assume that the mobile users are deployed in a rectangular region with an area of . Since the effect of our RPSA scheme is obvious for the scenario with sparse users, the simulation results are generated in this scenario with our RPSA scheme.
In the following, we will compare our RPAR algorithm with the traditional -anonymity, FCR [26], and DSCR [47] algorithms. FCR directly generates the total area of the subanonymity regions during the partition while the remaining users are put in the same subanonymity region. However, DSCR firstly generates contiguous anonymous regions; then continuous anonymous regions are divided into several subanonymity regions which may show a large number of repetitive regions.
We analyze the performance of RPAR by comparing the anonymous area, anonymous area percentage, and anonymous time of these methods with the same anonymous parameters. We assume that the value of the parameter in the simulation is an integer in the range from 6 to 15. We also set the number of subanonymity regions as 3. The comparison diagrams are illustrated in Figures 6, 7, and 8.
As shown in Figure 6, the area in the anonymous regions produced by our RPAR is not much different from that in FCR, whereas the area is greatly reduced compared with traditional -anonymity and DSCR. While in FCR, FCR puts the rest of the users into a subanonymity region; thus this area is oversized, causing the user query precision of this anonymous region and other subanonymity regions to vary greatly. Fortunately, our RPAR effectively solves the above problem. Note that, with the value increasing, the area increases more slowly and the reduced area is increasingly large in RPAR compared with other three anonymous region partition schemes, making RPAR superior obviously.
Figure 7 is the comparison of the ratio of the effective area of anonymous region generated by DSCR, FCR, and RPAR with the traditional -anonymity, namely,
It can be seen from Figure 7 that the area ratio of DSCR is larger, and the area ratio of FCR is closer to that of RPAR. As a whole, the size of is positively related to , where . With the increase of , the area of valid anonymous regions becomes increasingly large and stable. However, our RPAR scheme performs better than FCR as the value of is larger gradually. We observe that RPAR can also obtain stable and higher quality service while reducing the area of anonymous regions, whereas FCR scheme is not stable.
Figure 8 compares the time of anonymous regions in the algorithms. Traditional -anonymity scheme directly generates consecutive -anonymity regions; DSCR firstly generates -anonymity regions which are then partitioned into subanonymity regions, and the tail-anonymity regions are processed; therefore, the anonymous time is longer than the traditional -anonymity. More specifically, anonymous time is relatively short when while the anonymous time is relatively long when . FCR directly generates subanonymity regions, which results in the anonymous time being shorter. Note that our RPAR needs to deal with the tail anonymous user set in the process of directly generating subanonymity regions; therefore the time may be longer than both traditional -anonymity and FCR, but less than DSCR. Fortunately, the additional average time is less than 0.1 , which is quite small, and it will not make a big difference to the algorithm performance.
6. Conclusions and Future Work
Aiming at large communication overhead, large range, and inaccuracy of query results for traditional anonymous schemes, this paper proposed an anonymous region repartition algorithm by studying the users’ location privacy preservation. The anonymous region is divided into several subregions, the users’ real locations are replaced by the central location, and a repartition is carried out to solve the remaining users’ region set after the anonymous region segmentation. Finally, the algorithm is analyzed and the privacy degree is evaluated, and the simulation experiment is carried out. Experimental results show that the proposed scheme has some advantages in privacy anonymity.
In the future, we will research the location privacy preserving in the scenario of dense region. Furthermore, we will research the location privacy preserving under distributed environment.
Data Availability
Thomas Brinkhoff’s mobile object generator is used to generate the spatiotemporal data as the resulting dataset in our paper, and the website is http://iapg.jade-hs.de/personen/brinkhoff/generator/.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This work is supported by National Key R&D Programs Project of China under Grant 2017YFC0804406, NSF of China under Grant 61672321, 61771289, and 61373027, Training Program of the Major Research Plan of NSF of China under Grant 91746104, Project of Shandong Province Higher Educational Science and Technology Program under Grant J15LN19, and Open Project of Tongji University Embedded System and Service Computing of Ministry of Education of China under Grant ESSCKF 2015-02.