A Geographical and Social Society Attributes Based Privacy Preserving Recommendation Method for POIs
Point-of-interests (POIs) recommendation technology using user’s check-in data has attracted great attentions in recent years. However, user’s check-in data often contains sensitive information such as time and location data. Due to privacy considerations, many users are unwilling to share their check-in data with untrusted service providers, which has a great negative impact on recommendation quality. Trying to solve this problem, geographical and social society attributes based privacy preserving recommendation method for POIs, named GSSA-PPRM, is proposed in the paper. In the proposed method, a local differentially private matrix factorization algorithm is firstly designed to learn user’s preference with social attribute in client/server style. Then, according to the learned preference and considering geographical distance of POIs, a self-adaptive kernel density estimation algorithm is devised to study user’s check-in behavior. And an algorithm that tallies POI visit count and computes POI popularity by securely collecting user’s check-in data through random response (RR) mechanism is presented. Finally, a rating rule is given to predict the ratings of users for POIs by integrating kernel density estimation and POI popularity. The experimental results on two real datasets validate that the proposed method achieves better POI recommendation quality in condition of keeping user’s privacy.
Due to the rapid development of Internet and the widespread of smart mobile devices, location based social network application services are constantly emerging, such as Twitter, Yelp, and Dianping. These application services strongly encourage users to post personal dynamics and share geographic locations by checking in. The locations with certain semantics are collectively called point-of-interests (POIs). Social network application service providers may use the massive collected POI data to conduct in-depth mining and analysis of user preferences and potential needs. And on this basis, application service providers can provide a wide range of personalized services including POI recommendation.
POI recommendation is to learn from users’ check-in history and other related data, so as to find unvisited locations that may be of interest to users. At present, popular social network application service providers can list millions of POIs. But in real life, a user can only access very limited POIs. This causes the data sparseness problem, which is one of the critical challenges that limits the accuracy of POI recommendation.
There exist many POI recommendation methods. Many of these methods combine different attributes, such as geographic attributes [1,2], social relationships , and temporal attributes , with collaborative filtering technology [5,6]. Although these attributes can help better understand users’ preferences and alleviate the data sparseness problem, how to integrate this information into the learning model and obtain better recommendation results is still a key challenge. Most of these recommendation methods consider either a single attribute or multiple isolated attributes, which leads to poor recommendation performance.
Moreover, while users enjoy the personalized POI recommendation services, they also face the risks of privacy leakage to a large extent. Among the existing privacy preserving POI recommendation methods, there are quite a few methods based on a trustful third-party data aggregator, which securely collects users’ data and then sends the aggregated data to the recommendation system for privacy preserving recommendation. However, deployment of a fully trusted third-party data aggregator means extra expensive costs, and many recommendation service providers are unwilling to invest fund in this regard. In view of this situation, local differential privacy (LDP) [7,8] is a possible solution. LDP can provide strong privacy protection locally (on client/user side) for each user without any trustful data aggregator. At present, LDP has been adopted in some privacy preserving POI recommendation methods. Shin et al.  proposed a matrix factorization algorithm based on LDP. Firstly, each user perturbed the profile matrix on the local device and then sent it to the recommendation system. The recommendation system aggregated the perturbed profile matrix and finally performed the recommendation service. In addition, Kim et al.  considered the transition mode between different POIs and the visit count of POIs.
Based on the above analysis of the existing privacy preserving POI recommendation methods, there mainly exist two challenges that need to be further solved: (1) regarding the sparsity problem, how to effectively model users’ preferences with combining contextual information (such as social data, geographic data, etc.), for better improving recommendation accuracy; (2) how to securely collect user data and ensure the balance between data utility and privacy preservation in untrusted recommendation systems.
Trying to solve the problems above, a geo-social society attributes based privacy preserving recommendation method (GSSA-PPRM) for POIs is proposed in the paper. Firstly, a local differentially private matrix factorization algorithm is designed to learn user’s preference with social attribute in client/server style. Secondly, based on the learned preference and considering geographical distance of POIs, a self-adaptive kernel density estimation algorithm is devised to study user’s check-in behavior. Then, to calculate POI popularity, an algorithm that tallies POI visit count by securely collecting user’s check-in data through random response (RR) mechanism is presented. Finally, a rating rule is given to predict the ratings of users for POIs by integrating kernel density estimation and POI popularity, and Top-N POIs are selected according to their predicted rating value in descending order.
Compared with the similar existing POI recommendation solution [9,10], we summarize the following two major differences: (1) though the first stage of the GSSA-PPRM method follows the differentially private gradient descent strategy proposed by Shin et al. , social attribute is taken into consideration, and the Trust-MF model is adopted for local rating prediction on user side. (2) Factors of user's preference, social attribute, check-in behavior, and geographical attribute of POIs are integrated for comprehensive rating prediction in the GSSA-PPRM method, while the other two mainly focus on user's preference and check-in data.
The main contributions of this paper can be summarized as follows:(1)To improve the accuracy of POI recommendation, user's social attribute is used to learn user’s implicit preference, and geographic distance of POIs is adopted to study user’s check-in behavior. Comprehensively considering user's preference, check-in behavior, and POI popularity, the user's rating estimation for POIs is carefully calculated.(2)To effectively preserve user privacy, the proposed method perturbs item profile matrix and user's check-in data, respectively, in a differentially private approach. Besides, random projection is used to implement dimensionality reduction for item profile matrix, and a randomization strategy is also adopted to reduce the amount of Laplace noise added to the gradients.(3)Extensive comparative experiments are carried out on two real datasets, Gowalla and Yelp. Experimental study demonstrates that the proposed method improves the accuracy of POI recommendation while preserving user privacy well.
The rest of this paper is organized as follows. Section 2 presents the related works. Section 3 introduces the preliminaries of differential privacy and matrix factorization. Section 4 describes the proposed GSSA-PPRM method in detail. Section 5 shows the experimental results and analysis of comparative experiments. Section 6 concludes the paper.
2. Related Works
2.1. Matrix Factorization Based Recommendation
In recent years, matrix factorization (MF), especially Latent factor models based on matrix factorization, is widely adopted for recommendation system since it usually outperforms traditional methods. And all kinds of MF algorithms have been proposed for solving different recommendation problems. Zhao et al.  proposed a review-based recommendation model for social recommendation. The proposed model leveraged factors of user sentimental deviations and review’s reliability to depict user internal influence, and this influence was fused into a matrix factorization to improve the accuracy of rating prediction. Lian et al.  proposed a geography-aware sequential recommender based on the self-attention network (GeoSAN) for location recommendation. In their work, a new loss function based on importance sampling was designed to address the sparsity issue, and for better use of geographical information, the self-attention based geography encoder was given to represent the exact GPS positions. For emoji recommendation, Zhao et al.  proposed a context-aware recommendation model--CAPER, by considering both contextual information and personal information. In CAPER model, text feature, temporal feature, user gender feature, user preference feature, and emoji’s cooccurrence feature were fused to achieve efficient emoji recommendation. For next POI recommendation, Wu et al.  proposed a method named Personalized Long- and Short-term Preference Learning, to capture user’s preference. For long-term preference, the contextual features of POIs in users’ history records were extracted, and the attention mechanism was adopted to capture users’ preference. For short-term preference, both location-based sequence and category-based sequence, which were recognized by two LSTM models, respectively, were integrated to capture users’ sequential behavior. For POI recommendation, Zhao et al.  proposed a POI mining method and a personalized recommendation model. The given POI mining method firstly mined the POIs by fusing the sentimental and geographical attributes of locations. Then, sentiment similarity between POIs and geographical distance between user’s multiactivity centers and POIs were used in the probabilistic matrix factorization based recommendation model for rating prediction.
2.2. Privacy Preserving Recommendation
With the popularity of recommendation services, the privacy preservation problem of POI recommendation services has attracted more and more attention from many researchers. Cryptography technology can provide strong privacy protection based on encryption and decryption mechanisms. Liu et al.  designed two protocols (PPTR-S and PPTR-L) for privacy preserving POI recommendation, based on partial homomorphic encryption. In PPTR-S protocol, all the data was encrypted by the public key, but the private key could only be accessed by the social networking website. In PPTR-L protocol, the location based services (LBS) provider was responsible for generating the key pair and properly storing the private key. Wang et al.  proposed a homomorphic encryption based protocol to protect users’ check-in data. They used homomorphic properties of Paillier cryptosystem to achieve effective recommendation on encrypted data. And on this basis, two optimization strategies are presented to improve the proposed protocol. Gedik et al.  presented a k-anonymous customization framework that supports variable k to meet the personalized needs of location privacy protection. However, the location of the target user can still be easily identified.
For both anonymity and encryption technology, defenders need to carefully design different models regarding different background knowledge of adversaries. For decentralized recommendation scenario, Chen et al.  proposed a decentralized MF (DMF) framework for POI recommendation. And a random walk based nearby collaborative training technique is carefully designed to for DMF model training in each user’s end. Both the model learning process and the rating data are kept on user’s own side; only gradients are exchanged between nearby users; thus, user’s privacy is well preserved.
2.3. Differentially Private Recommendation
Differential privacy (DP) is firstly proposed by Dwork . The core idea of differential privacy is to add noise to the original data, so as to realize data perturbation and achieve privacy preservation. Differential privacy can prove the risk of privacy leakage through rigorous mathematical analysis, regardless of the adversary’s background knowledge. In order to prevent personal data from being collected by untrusted recommendation system, Liu et al.  added noise to the user's rating data by using random perturbation technology on the user side. In addition, to prevent adversaries from inferring sensitive data of any single user according to the recommendation results, they also added noise during calculating the relationship between the items. Meng et al.  believed that only the historical rating data of some users is sensitive. Based on above assumptions, they classified the user's historical rating data into sensitive level and nonsensitive level. And a large amount of noise was added to the sensitive rating data, while a small amount of noise is added to the nonsensitive rating data. As a result, the sensitive data could be kept better privacy protection, and the nonsensitive data could be used to achieve better recommendation accuracy. However, the above methods could only protect information of items or ratings. Wei et al.  proposed a differentially private trajectory community recommendation method. This method firstly converted positions of the real trajectory into positions of the noisy trajectory and then used geographic distance to construct the noisy trajectory that had minimum distance with the real trajectory, so as to ensure that the constructed noisy trajectory was highly similar to the real one. Zhang et al.  proposed a privacy preserving location recommendation framework. The proposed framework mainly used n-order Markov chain and user's sequential pattern to calculate the probability result and recommend interested location. Besides, they also designed a probabilistic differential privacy mechanism to reach a good trade-off between high recommendation accuracy and strict location privacy protection. For privacy preserving context-aware POI recommendation, Riboni and Bettini designed a differentially private recommendation system , which extracted statistics about personal preferences for POIs through PINQ query engine and then generated recommendations from those statistics. Users’ location privacy was protected by simply generalizing location into corresponding grid cell. Chen et al.  proposed a privacy preserving POI recommendation framework named PriRec. In this framework, users’ private raw data and linear models were kept on users’ own side, only public POIs’ data and the feature interaction model were kept by the recommender, and local differential privacy was adopted to generate dynamic POI feature. A secure gradient descent protocol based on secret sharing was designed for collaboratively learning linear models, and a secure aggregation strategy was adopted to learn the feature interaction model. For privacy preserving next POI recommendation, Kuang et al.  modeled users’ check-in sequences with their latent states based on hidden Markov model and proposed a weighted noise injection method. In the proposed method, the distance between the user's current location and the nearest check-in center was calculated, and then a different amount of noise was injected to latitude and longitude of the user's location according to the distance. The noisy locations were finally used to predict user’s next movement.
3.1. Local Differential Privacy
Centralized differential privacy needs a credible third-party data collector to collect user data and conduct privacy processing on it. However, in practical applications, it is difficult to find a credible third-party data collector, and its cost is relatively high. Therefore, local differential privacy (LDP) comes into being. LDP mainly assumes that the server is untrusted, and each user can process individual data independently. That is, the privacy processing is transferred from the data collector to the user side, without the intervention of a credible third party.
Let be the number of users, and the private data of the i-th user is represented by a tuple , which contains dimensions: , , … . To protect privacy, the i-th user firstly uses a random perturbation function to perturb and then sends the noisy data to the recommendation system.
Definition 1. (-LDP). Given a random function , for any two input tuples , and any possible output of function , ifthen the random function is said to satisfy -LDP, where is the parameter of privacy budget.
It can be seen from the above definition that -LDP controls the similarity of the output results of any two records, so as to ensure that the random function satisfies -LDP.
The two combination theorems of -LDP are given as follows:
Theorem 1 (Serial Combination Theorem). Suppose that there is a set of random algorithms , and each of satisfies -LDP on the dataset .Then, the set of sequence privacy mechanisms provides -LDP.
Theorem 2 (Parallel Combination Theorem). Suppose that dataset can be divided into a series of independent and nonoverlapped subsets and there is a set of random algorithms . If each privacy mechanism satisfies -LDP on , then the set of randomized algorithm can achieve -LDP on .
3.2. Matrix Factorization
Matrix factorization is an effective algorithm to overcome the data sparsity problem of recommendation system. Suppose that the number of users is and the number of items is , represents the rating data for the j-th item offered by the i-th user, and the user-item rating matrix . The purpose of matrix factorization is to predict the missing data in . To achieve this, is decomposed into user profile matrix and item profile matrix , where the number of underlying features is . The corresponding optimization function iswhere is the i-th row vector of user profile matrix , is the j-th row vector of item profile matrix , is the transposed vector of , and and are constraint parameters, which are positive constants. The first term is a loss function that measures the distance between two matrices, and the last two terms are used to prevent the model from overfitting. The unobserved rating data can be calculated by . To seek computational efficiency, stochastic gradient descent (SGD) strategy is often used to solve the above model.
4. Design of the GSSA-PPRM Method
4.1. Main Framework
The GSSA-PPRM method assumes that the third-party recommendation system is untrusted, and each user keeps personal rating data, check-in data, and social data on his/her local device. Besides, in order to reduce the storage cost on user's device, the recommendation system still retains all public POI data, such as POI geographic coordinates and other static data.
The main idea of the GSSA-PPRM method is to estimate the statistical analysis results of the original data without breaching user privacy. In the proposed method, each user firstly perturbs local data to satisfy LDP and then uploads the noisy data to the recommendation system. The recommendation system aggregates the noisy data and then carries out recommendation service. The framework of the proposed method is depicted in Figure 1. It contains three main parts: (1) privacy preserving social matrix factorization algorithm (PrivSMF) considering social attributes; (2) geographical correlation calculation based on kernel density estimation (GeoKDE); (3) random response mechanism according to POI popularity estimation (RRPE).
The PrivSMF algorithm runs in a C/S style to achieve initial user-POI rating prediction, which is based on differentially private gradient descent strategy proposed by Shin et al. . In one iteration, each user updates local rating prediction model according to user profile matrix, social profile matrix, and item profile matrix and then sends current noisy gradient value of item profile matrix to the recommendation system. After aggregating all the users’ noisy gradient, the recommendation system updates the global rating prediction model and sends noisy gradient matrix back to each user for further local model update. Different from the work of Shin et al. , the PrivSMF algorithm takes user’s social attribute into consideration and adopts the Trust-MF model for local rating prediction on user side. Based on the result of the PrivSMF algorithm, the GeoKDE algorithm estimates kernel density, to describe the impact of POI geographical distance on user's check-in behavior. In the RRPE algorithm, each user perturbs his/her check-in data by using the random response mechanism of LDP and uploads the noisy data to the recommendation system. The recommendation system tallies visit count of POIs by aggregating the noisy data from all the users and then performs POI popularity estimation to find popular POIs that may be interested by users. Considering both kernel density and POI popularity, the product rule is used to synthesize the results of the GeoKDE algorithm and the RRPE algorithm, so as to get final user-POI rating prediction result. In the end, the list of Top-N POIs is recommended to the user according to the sorted prediction ratings. The PrivSMF, GeoKDE, and RRPE algorithms will be explained in detail from Sections 4.2 to 4.4, respectively. And privacy analysis is presented in Section 4.5.
4.2. PrivSMF Algorithm
The goal of the PrivSMF algorithm is to update item profile matrix on the recommendation system while preserving users’ underlying preference, by combining the differentially private gradient descent strategy  and the randomization strategy proposed by Nguyen et al. . In one round of iteration, each user firstly updates user profile matrix and social profile matrix locally, then perturbs item profile matrix, and only uploads the gradient value of noisy item profile matrix to the recommendation system. The recommendation system aggregates the gradient values uploaded by all users and then updates item profile matrix. In the end of one round, the averaged noisy gradient matrix is sent back to each user. This iteration is repeated until maximum iteration number is reached, which means the end of update process.
Considering social attribute, the objective function of the PrivSMF algorithm is given as follows:where represents the rating data for the j-th POI offered by the i-th user (that is, the number of check-ins on the j-th POI made by the i-th user). is the social attribute between the i-th user and the k-th user; if a social relationship exists between them, then ; otherwise, . , and represent user underlying feature vector, item underlying feature vector, and social underlying feature vector, respectively. is the dimension of underlying feature vector, and and are the transposed vectors of vectors and , respectively. is a regularization term, which is a positive constant. and respectively represent the number of users and the number of POIs. represents the square of the second-order norm of the vector. The first two items of (3) represent the error between the true value and the predicted value, and the last item is used to control model complexity, so as to avoid overfitting.
The PrivSMF algorithm uses SGD strategy to minimize equation (3). , and are learned and updated according to equations (4)–(6), respectively. represents the underlying feature vector of the i-th user when the current iteration number is , so as the similar meaning of and .where represents the learning rate when the current iteration number is , and , and represent the gradient of , and , respectively. By taking the derivative of equations (4)-(6), it could getwhere is the transposed vector of .
In order to implement dimensionality reduction for item profile matrix, the PrivSMF algorithm adopts the differentially private gradient descent strategy, which is based on random projection . Assume a positive integer and generate a random matrix , in which each element is a random variable that satisfies Gaussian distribution with mean of 0 and standard deviation of . Suppose that the low-dimensional gradient matrix , is item profile matrix, each row is item underlying feature vector, and is the gradient of item profile matrix. The i-th user restores high-dimensional gradient matrix by updating low-dimensional gradient matrix .
To protect the rated items and rating data, the PrivSMF algorithm adopts the randomization strategy . The i-th user randomly selects a tuple from gradient matrix and perturbs it and then uploads the noisy tuple to the recommendation system. The recommendation system aggregates noisy tuples submitted by all users to generate low-dimensional noisy gradient matrix . Since , is the pseudoreciprocal of . After perturbing high-dimensional gradient , (5) could be changed to
The recommendation system updates item profile matrix according to equation (9). The pseudocode of the PrivSMF algorithm is described as follows.
In the PrivSMF Algorithm, each user only randomly selects one element from gradient matrix and uploads it to the server (recommendation system); that is, it only uploads one bit to the server. Compared with the work of Hua et al. , the PrivSMF Algorithm only requires that the server send a noisy gradient matrix to each user, instead of sending the item profile matrix , where . Through this way, the communication cost between users and the server is greatly reduced.
After obtaining item profile matrix , each user can calculate the prediction rating matrix locally by using the Trust-MF model . The prediction rating equation of the i-th user for the j-th POI iswhere is the logistic function. and are, respectively, user underlying feature vector and social underlying feature vector of i-th user, which are generated locally. is the transposed vector of item underlying feature vector , which is generated iteratively by user side and recommendation system side, and is the maximum value of all the rating data. The i-th user sends the calculated to the recommendation system, and the recommendation system aggregates the prediction rating matrix uploaded by all users. Finally, the noisy prediction rating matrix is obtained.
4.3. GeoKDE Algorithm
In location-based services, users often physically interact with POIs and consume related services, such as eating in a restaurant or watching a movie in a cinema. Therefore, the user's check-in behavior is likely to be affected by the geographical distance of POIs. Besides, the distribution of user’s check-in data is various. For example, people who like indoors may visit places around their living area; people who like outdoors may travel and explore some new and interesting places. Zhang et al.  learn the distribution from user’s historical check-in data based on a nonparametric estimation method, which estimates kernel density with a fixed bandwidth. However, this method does not accurately reflect the real user’s check-in behavior: the check-in density is high in densely populated areas, while the check-in density is low in sparsely populated areas.
Trying to better learn the impact of POI geographical distance on the user's check-in behavior, the GeoKDE algorithm estimates kernel density, which is based on the self-adaptive kernel density estimation algorithm  and the matrix obtained in Section 4.2.
Assuming that the check-in dataset of the i-th user for POIs is , and represent the number of users and the number of POIs, respectively. A POI has longitude and latitude attributes, which is represented by . Then, the value of self-adaptive kernel estimation of the i-th user for can be calculated as follows:where is the sum of rating value of all users for . is the rating value of the i-th user for . is the kernel function.where is the local bandwidth , and are, respectively, calculated according to standard deviation of the longitude and latitude values of check-in dataset .
4.4. RRPE Algorithm
In the GSSA-PPRM method, it is assumed that POIs having more visit count are more popular and interested by users. In order to calculate the popularity of POIs while preserving user privacy (a user does not want the recommendation system to know whether he/she has visited a certain POI), the RRPE algorithm is designed in this section.
The RRPE algorithm uses the random response mechanism (WRR) proposed by Warner et al. , to protect user's check-in data. Each user responds to the request sent by the recommender system with a certain probability. The recommender system tallies the visit count of POIs by aggregating the responses of all users and then calculates the POI Popularity. The main process of the RRPE algorithm is described as follows:
at user side, the i-th user answers to the recommendation system whether he/she has checked in a POI . If the i-th user has ever visited , then the answer is “yes” with response probability ; otherwise, the answer is “no”; if the i-th user has not visited , then the answer is “no” with response probability ; otherwise, the answer is “yes.” In this response stage, the response probability , where is privacy budget. The recommendation system aggregates the response results of all users and then calculates the POI popularity of , which is given as follows:where represents the answer of the i-th user for . If the answer is “yes”, then ; otherwise, .
Finally, a rating rule of the i-th user for is designed, which uses the product rule to integrate the POI popularity and the kernel estimation obtained in above subsection. This rating rule is shown as follows:where is the predicted rating value of the i-th user for . The larger the value of , the more likely the i-th user will visit . For the i-th user, POIs are sorted according to their predicted rating value in descending order, and then Top-N POIs are selected as the recommendation list for the i-th user.
4.5. Privacy Analysis
In the GSSA-PPRM method, Laplace noise is added in the PrivSMF algorithm and the RRPE algorithm, respectively. To prove the GSSA-PPRM method satisfies differential privacy, we need to prove that both the PrivSMF algorithm and the RRPE algorithm satisfy differential privacy.
4.5.1. The PrivSMF Algorithm Satisfies -Differential Privacy
Assuming that and are two adjacent datasets, is item profile matrix and is the output of the algorithm, is maximum iteration number, is the original tuple, and is the noisy tuple submitted by the i-th user when the current iteration number is .
Therefore, the PrivSMF algorithm satisfies -differential privacy.
4.5.2. The RRPE Algorithm Satisfies -Differential Privacy
Supposing that and are two adjacent datasets, and are any two records, and is certain output result of the algorithm.
Referring to the proof process of random response technique ,
Therefore, the RRPE algorithm satisfies -differential privacy。
4.5.3. The GSSA-PPRM Method Satisfies -Differential Privacy
Assuming , the datasets processed by the PrivSMF algorithm and the RRPE algorithm belong to disjoint datasets. According to the parallel combination theorem of differential privacy, it can be proved that the GSSA-PPRM method satisfies -differential privacy.
5. Experimental Study
5.1. Experimental Environment
Experiments are conducted on Intel (R) Core i5-6500/3.2 GHz/64 GB hardware platform and 64-bit Windows 10 operating system, and the programming language is Python (version 3.6.9).
Two real datasets Gowalla  and Yelp  are used in the experiments, and the statistics information of these two datasets is shown in Table 1. Gowalla dataset provides the user's check-in frequency on POIs, and Yelp dataset provides the user's rating on POIs.
5.2. Study of Experimental Results
To effectively evaluate the impact of differential privacy on the recommendation result, two common indicators (Precision and Recall) in the recommendation system are adopted to evaluate the recommendation quality.
Precision: the ratio of the number of actually visited POIs in the recommended POI set to the number of recommended POIs, which is given as follows:
Recall: the ratio of the number of recommended POIs in the actually visited POI set to the number of actually visited POIs, which is given as follows:where represents the set of users, represents the set of POIs recommended to the i-th user, and represents the set of POIs actually visited by the i-th user.
The performance of the GSSA-PPRM method is analyzed and compared with that of the following methods:
Nonprivate: nonprivate preserving baseline method, that is, privacy preserving part of the GSSA-PPRM method is removed. This baseline method mainly adopts SGD strategy to decompose user's historical rating matrix. On this basis, geographical distance and POI popularity are also considered to predict users’ preference for POIs.
DP-SVD: traditional matrix factorization recommendation method, which adds Laplace noise in the process of SGD so as to protect user privacy.
Private GD-DR: a recommendation method proposed by Shin et al. , based on LDP and matrix factorization technology.
KDTree: a data-dependent space decomposition method proposed by Zhang et al. , which mainly uses the differentially private trajectory dataset for location recommendation.
The GSSA-PPRM method has some predefined parameters. Before carrying out experiments, the number of underlying features is set to , the learning rate is set to , and ; is set to 50, during iteration, the learning rate is set to when the current iteration number is , and the regularization factors are set to .
Two groups of comparative experiments are executed. The first group of experiments compares the performance of the above methods under different values of privacy budget . The second group of experiments compares the performance of above methods under different values (the number of POIs recommended to users). These experiments adopt five-fold cross-validation strategy to test performance of above methods and then take the average value of precision and recall as the final result.
5.2.1. Experiments under Different Privacy Budget Values
Privacy budget is an important parameter that determines the level of privacy preservation of differential privacy; that is, the smaller the value, the higher the degree of privacy preservation. In this group of experiments, is fixed to 10, and the value of is set from 0.1 to 1.0 ().
For Gowalla and Yelp datasets, precision and recall of each method under different privacy budget values are shown in Figures 2 and 3, respectively. With the increase of privacy budget , accuracy value and recall value of the above four privacy preserving methods also increase (Since the KDTree method has not been tested on dataset Yelp, it is omitted in the right picture of the following figures). This result is expectable because the error generated by Laplace mechanism decreases as the privacy budget increases.
It can be seen from Figure 2(a), for Gowalla dataset, the precision value of the KDTree method is higher than that of the GSSA-PPRM method when . This clearly shows that when privacy budget is small, it has a greater impact on the GSSA-PPRM method. But when , the precision value of the GSSA-PPRM method is much higher than that of the KDTree method, and when , the GSSA-PPRM method tends to balance and maintain a better precision. It also can be seen from Figure 2(b) that, for Yelp dataset, except for and , precision value of the GSSA-PPRM method is higher than that of the other two privacy preserving methods.
In addition, for both Gowalla and Yelp datasets, it can be intuitively seen from Figure 2 that the precision value of the GSSA-PPRM method is higher than that of the Private GD-DR method. The main reason is that POI popularity and geographic attributes considered by the GSSA-PPRM method are important for the accuracy of POI recommendation and can improve recommendation quality to a certain extent.
It can be clearly seen from Figure 3(a) that, for Gowalla dataset, recall value of the KDTree method is higher than that of the GSSA-PPRM method when . This also shows that when privacy budget is small, it has a greater impact on the GSSA-PPRM method. But when , the recall value of the GSSA-PPRM method is higher than that of KDTree method. And when , the GSSA-PPRM method tends to balance and can maintain a better recall. It is extremely obvious in Figure 3(b) that, for Yelp dataset, recall value of the GSSA-PPRM method is noticeably higher than that of the other two privacy preserving methods when . When , recalls of both the GSSA-PPRM method and the DP-SVD method maintain a similar level. When , recalls of both the GSSA-PPRM method and the Private GD-DR method maintain a similar level.
Besides, it can also be clearly seen from Figures 2 and 3 that the nonprivate method achieves better recommended quality than the GSSA-PPRM method. However, this result can also be expected. Because of differential privacy protection, a certain loss of recommended quality is caused. Overall, compared with other privacy preserving methods, the GSSA-PPRM method proposed in the paper can maintain a relatively better recommendation quality when privacy budget changes.
5.2.2. Experiments under Different N Values
In this group of experiments, for all the privacy preserving methods, is set to 0.5, and the value of (the number of POIs recommended to users) is set from 5 to 65 ().
For Gowalla and Yelp datasets, precision and recall of four methods under different values are shown in Figures 4 and 5, respectively (since the KDTree method  did not perform such test, it is omitted).
It can be clearly seen from the two figures that as the value increases, the precision value of each method shows a downward trend, while the recall value of each method shows an upward trend.
It can be seen from Figure 4(a) that, for Gowalla dataset, precision value of the GSSA-PPRM method keeps close to that of the nonprivate method when . The precision value of the GSSA-PPRM method shows a sharp downward trend when . However, it is still much better than the other two privacy preserving methods. This demonstrates that the GSSA-PPRM method can maintain a better recommendation accuracy even when the value is larger. It can be seen from Figure 4(b) that, for Yelp dataset, precision value of the GSSA-PPRM method keeps close to that of the Private GD-DR method when except the point of . The precision value of the GSSA-PPRM method is much higher than that of the private GD-DR method when . Besides, the overall precision of the GSSA-PPRM method is relatively higher than that of the DP-SVD method.
It can be clearly seen from Figure 5(a) that, for Gowalla dataset, recall value of the GSSA-PPRM method keeps close to that of the other two privacy preserving methods when . When , the recall value of the GSSA-PPRM method is much higher than that of the other two privacy preserving methods, especially in the situation of . This means that when the number of POIs recommended to the users increases, the GSSA-PPRM method can better cover the set of POIs that users are interested in. It can be seen from Figure 5(b) that, for Yelp dataset, the overall recall of the GSSA-PPRM method keeps closer trend with that of the nonprivate method. Overall, the GSSA-PPRM method proposed in the paper is better than the other two privacy preserving methods.
5.2.3. Ablation Study
Here, we provide an ablation study that demonstrates the effectiveness of respective parts in our GSSA-PPRM method. We compare GSSA-PPRM method with three weaken versions, including (1) removing social attribute and social underlying feature vector from the PrivSMF Algorithm, and thus, it (3) becomes an ordinary optimization function for matrix factorization; (2) removing self-adaptive kernel estimation of the i-th user for from the GeoKDE Algorithm, assigning the same fixed value for each instead; (3) removing POI popularity calculation of from the RRPE algorithm, assigning the same fixed value for each instead.
Two groups of comparative experiments are executed. The first group of experiments compares precision value of the above four versions of the GSSA-PPRM method, while the second one compares the recall values. In these two experiments, privacy budget is set to and is fixed to 10. The experimental results of the ablation study are reported in Tables 2 and 3, respectively.
As shown in Tables 2 and 3, the original GSSA-PPRM method always achieves the best performance, and the performance of the first weakened version is the worst, which may imply the significance of social attributes. It can be also found that the performance of the first weakened version on Yelp is worse than that on Gowalla since Yelp dataset contains more labels of social relation than Gowalla dataset.
Aiming at the scenarios of privacy preserving POI recommendation, a privacy preserving POI recommendation method named GSSA-PPRM is proposed in the paper. In the proposed method, the PrivSMF Algorithm, which is a LDP based matrix factorization algorithm taking social attribute into consideration, is firstly presented to securely learn implicit user’s preference. Then, based on the result of the PrivSMF Algorithm, the GeoKDE algorithm is designed to study the impact of POI geographical distance on user’s check-in behavior by adopting the self-adaptive kernel density estimation strategy. And the RRPE algorithm is devised to calculate POI popularity according to the aggregated responses of all users through random response mechanism. Finally, integrating the results of the GeoKDE and RRPE algorithms, a rating rule is given to predict the rating of users for POIs, and Top-N recommendation is achieved.
To further smartly learn user’s implicit preference, our future work is to combine proper deep learning techniques with traditional recommendation methods. Moreover, how to improve the computation efficiency of the proposed method will be another research direction.
Previously reported (Gowalla and Yelp) data were used to support this study and are available at (http://snap.stanford.edu/). These prior studies and datasets are cited at relevant places within the text as references [34, 35].
Conflicts of Interest
The authors declare that they have no conflicts of interest to report regarding the present study.
This work was supported by Key Research and Development Program of Shaanxi Province (No. 2021GY-090) and Science and Technology Program of Xi’an City (No. 2019216914GXRC005CG006-GXYD5.2).
B. Liu, Y. Fu, Z. Yao, and H. Xiong, “Learning geographical preferences for point-of-interest recommendation,” in Proceedings of the 19th Acm Sigkdd International Conference On Knowledge Discovery And Data Mining, pp. 1043–1051, Association for Computing Machinery, Chicago, IL, USA, August 2013.View at: Google Scholar
H. Gao, J. Tang, and H. Liu, “Mobile location prediction in spatio-temporal context,” Nokia mobile data challenge workshop, vol. 41, no. 2, pp. 1–4, 2012.View at: Google Scholar
H. Gao, J. Tang, and H. Liu, “gSCorr: modeling geo-social correlations for new check-ins on location-based social networks,” in Proceedings of the 21st ACM International Conference On Information And Knowledge Management, pp. 1582–1586, Association for Computing Machinery, Queensland, Australia, May 2012.View at: Google Scholar
Y. Koren, “Collaborative filtering with temporal dynamics,” in Proceedings of the 15th ACM SIGKDD International Conference On Knowledge Discovery And Data Mining, pp. 447–456, Association for Computing Machinery, Paris, France, July 2009.View at: Google Scholar
J. S. Kim, J. W. Kim, and Y. D. Chung, “Successive point-of-interest recommendation with local differential privacy,” arXiv preprint, vol. 9, pp. 66371–66386, 2019, arXiv:1908.09485.View at: Google Scholar
D. Lian, Y. Wu, Y. Ge, and X. Xie, “Geography-aware sequential location recommendation,” in Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pp. 2009–2019, Association for Computing Machinery, New york, NY, USA, July 2020.View at: Google Scholar
Y. Wu, K. Li, G. Zhao, and X. Qian, “Personalized long-and short-term preference learning for next POI recommendation,” IEEE Transactions on Knowledge and Data Engineering, vol. 34, 2020.View at: Google Scholar
B. Gedik and L. Liu, A Customizable K-Anonymity Model for Protecting Location Privacy, Georgia Institute of Technology, Atlanta, GA, USA, 2004.
C. Chen, Z. Liu, P. Zhao, and X. Li, “Privacy preserving point-of-interest recommendation using decentralized matrix factorization,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, New Orleans, LA, USA, February 2018.View at: Google Scholar
X. Liu, A. Liu, X. Zhang et al., “When differential privacy meets randomized perturbation: a hybrid approach for privacy-preserving recommender system,” in Proceedings of the International Conference On Database Systems For Advanced Applications, pp. 576–591, Springer, Suzhou, China, March 2017.View at: Publisher Site | Google Scholar
X. Meng, S. Wang, K. Shu, Y. Zhang, and H. Liu, “Personalized privacy-preserving social recommendation,” in Proceedings of the AAAI Conference on Artificial Intelligence, no. 1, New York, NY, USA, February 2018.View at: Google Scholar
J. D. Zhang and C. Y. Chow, “Enabling probabilistic differential privacy protection for location recommendations,” IEEE Transactions on Services Computing, vol. 1939, p. 1, 2018.View at: Google Scholar
D. Riboni and C. Bettini, “Private context-aware recommendation of points of interest: an initial investigation,” in Proceedings of the 2012 IEEE International Conference on Pervasive Computing and Communications Workshops, pp. 584–589, IEEE, Lugano, Switzerland, March 2012.View at: Google Scholar
J. Hua, X. Chang, and Z. Sheng, “Differentially private matrix factorization,” in Proceedings of the 24th International Conference on Artificial Intelligence (IJCAI, pp. 1763–1770, Buenos Aires, Argentina, July 2015.View at: Google Scholar
J. D. Zhang and C. Y. Chow, “GeoSoCa: exploiting geographical, social and categorical correlations for point-of-interest recommendations,” in Proceedings of the 38th international ACM SIGIR Conference On Research And Development In Information Retrieval, pp. 443–452, Association for Computing Machinery, Santiago, CL, USA, August 2015.View at: Google Scholar
E. Cho, S. A. Myers, and J. Leskovec, “Friendship and mobility: user movement in location-based social networks,” in Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 1082–1090, Association for Computing Machinery, San Diego, CA, USA, August 2011.View at: Google Scholar
Yelp Challenge Data Set, 2014, http://www.yelp.com/dataset_challenge.
J. D. Zhang, G. Ghinita, and C. Y. Chow, “Differentially private location recommendations in geosocial networks,” in Proceedings of the 2014 IEEE 15th International Conference on Mobile Data Management, pp. 59–68, IEEE, Brisbane, Australia, 14-18 July 2014.View at: Google Scholar