Abstract
Pointofinterests (POIs) recommendation technology using user’s checkin data has attracted great attentions in recent years. However, user’s checkin data often contains sensitive information such as time and location data. Due to privacy considerations, many users are unwilling to share their checkin data with untrusted service providers, which has a great negative impact on recommendation quality. Trying to solve this problem, geographical and social society attributes based privacy preserving recommendation method for POIs, named GSSAPPRM, is proposed in the paper. In the proposed method, a local differentially private matrix factorization algorithm is firstly designed to learn user’s preference with social attribute in client/server style. Then, according to the learned preference and considering geographical distance of POIs, a selfadaptive kernel density estimation algorithm is devised to study user’s checkin behavior. And an algorithm that tallies POI visit count and computes POI popularity by securely collecting user’s checkin data through random response (RR) mechanism is presented. Finally, a rating rule is given to predict the ratings of users for POIs by integrating kernel density estimation and POI popularity. The experimental results on two real datasets validate that the proposed method achieves better POI recommendation quality in condition of keeping user’s privacy.
1. Introduction
Due to the rapid development of Internet and the widespread of smart mobile devices, location based social network application services are constantly emerging, such as Twitter, Yelp, and Dianping. These application services strongly encourage users to post personal dynamics and share geographic locations by checking in. The locations with certain semantics are collectively called pointofinterests (POIs). Social network application service providers may use the massive collected POI data to conduct indepth mining and analysis of user preferences and potential needs. And on this basis, application service providers can provide a wide range of personalized services including POI recommendation.
POI recommendation is to learn from users’ checkin history and other related data, so as to find unvisited locations that may be of interest to users. At present, popular social network application service providers can list millions of POIs. But in real life, a user can only access very limited POIs. This causes the data sparseness problem, which is one of the critical challenges that limits the accuracy of POI recommendation.
There exist many POI recommendation methods. Many of these methods combine different attributes, such as geographic attributes [1,2], social relationships [3], and temporal attributes [4], with collaborative filtering technology [5,6]. Although these attributes can help better understand users’ preferences and alleviate the data sparseness problem, how to integrate this information into the learning model and obtain better recommendation results is still a key challenge. Most of these recommendation methods consider either a single attribute or multiple isolated attributes, which leads to poor recommendation performance.
Moreover, while users enjoy the personalized POI recommendation services, they also face the risks of privacy leakage to a large extent. Among the existing privacy preserving POI recommendation methods, there are quite a few methods based on a trustful thirdparty data aggregator, which securely collects users’ data and then sends the aggregated data to the recommendation system for privacy preserving recommendation. However, deployment of a fully trusted thirdparty data aggregator means extra expensive costs, and many recommendation service providers are unwilling to invest fund in this regard. In view of this situation, local differential privacy (LDP) [7,8] is a possible solution. LDP can provide strong privacy protection locally (on client/user side) for each user without any trustful data aggregator. At present, LDP has been adopted in some privacy preserving POI recommendation methods. Shin et al. [9] proposed a matrix factorization algorithm based on LDP. Firstly, each user perturbed the profile matrix on the local device and then sent it to the recommendation system. The recommendation system aggregated the perturbed profile matrix and finally performed the recommendation service. In addition, Kim et al. [10] considered the transition mode between different POIs and the visit count of POIs.
Based on the above analysis of the existing privacy preserving POI recommendation methods, there mainly exist two challenges that need to be further solved: (1) regarding the sparsity problem, how to effectively model users’ preferences with combining contextual information (such as social data, geographic data, etc.), for better improving recommendation accuracy; (2) how to securely collect user data and ensure the balance between data utility and privacy preservation in untrusted recommendation systems.
Trying to solve the problems above, a geosocial society attributes based privacy preserving recommendation method (GSSAPPRM) for POIs is proposed in the paper. Firstly, a local differentially private matrix factorization algorithm is designed to learn user’s preference with social attribute in client/server style. Secondly, based on the learned preference and considering geographical distance of POIs, a selfadaptive kernel density estimation algorithm is devised to study user’s checkin behavior. Then, to calculate POI popularity, an algorithm that tallies POI visit count by securely collecting user’s checkin data through random response (RR) mechanism is presented. Finally, a rating rule is given to predict the ratings of users for POIs by integrating kernel density estimation and POI popularity, and TopN POIs are selected according to their predicted rating value in descending order.
Compared with the similar existing POI recommendation solution [9,10], we summarize the following two major differences: (1) though the first stage of the GSSAPPRM method follows the differentially private gradient descent strategy proposed by Shin et al. [9], social attribute is taken into consideration, and the TrustMF model is adopted for local rating prediction on user side. (2) Factors of user's preference, social attribute, checkin behavior, and geographical attribute of POIs are integrated for comprehensive rating prediction in the GSSAPPRM method, while the other two mainly focus on user's preference and checkin data.
The main contributions of this paper can be summarized as follows:(1)To improve the accuracy of POI recommendation, user's social attribute is used to learn user’s implicit preference, and geographic distance of POIs is adopted to study user’s checkin behavior. Comprehensively considering user's preference, checkin behavior, and POI popularity, the user's rating estimation for POIs is carefully calculated.(2)To effectively preserve user privacy, the proposed method perturbs item profile matrix and user's checkin data, respectively, in a differentially private approach. Besides, random projection is used to implement dimensionality reduction for item profile matrix, and a randomization strategy is also adopted to reduce the amount of Laplace noise added to the gradients.(3)Extensive comparative experiments are carried out on two real datasets, Gowalla and Yelp. Experimental study demonstrates that the proposed method improves the accuracy of POI recommendation while preserving user privacy well.
The rest of this paper is organized as follows. Section 2 presents the related works. Section 3 introduces the preliminaries of differential privacy and matrix factorization. Section 4 describes the proposed GSSAPPRM method in detail. Section 5 shows the experimental results and analysis of comparative experiments. Section 6 concludes the paper.
2. Related Works
2.1. Matrix Factorization Based Recommendation
In recent years, matrix factorization (MF), especially Latent factor models based on matrix factorization, is widely adopted for recommendation system since it usually outperforms traditional methods. And all kinds of MF algorithms have been proposed for solving different recommendation problems. Zhao et al. [11] proposed a reviewbased recommendation model for social recommendation. The proposed model leveraged factors of user sentimental deviations and review’s reliability to depict user internal influence, and this influence was fused into a matrix factorization to improve the accuracy of rating prediction. Lian et al. [12] proposed a geographyaware sequential recommender based on the selfattention network (GeoSAN) for location recommendation. In their work, a new loss function based on importance sampling was designed to address the sparsity issue, and for better use of geographical information, the selfattention based geography encoder was given to represent the exact GPS positions. For emoji recommendation, Zhao et al. [13] proposed a contextaware recommendation modelCAPER, by considering both contextual information and personal information. In CAPER model, text feature, temporal feature, user gender feature, user preference feature, and emoji’s cooccurrence feature were fused to achieve efficient emoji recommendation. For next POI recommendation, Wu et al. [14] proposed a method named Personalized Long and Shortterm Preference Learning, to capture user’s preference. For longterm preference, the contextual features of POIs in users’ history records were extracted, and the attention mechanism was adopted to capture users’ preference. For shortterm preference, both locationbased sequence and categorybased sequence, which were recognized by two LSTM models, respectively, were integrated to capture users’ sequential behavior. For POI recommendation, Zhao et al. [15] proposed a POI mining method and a personalized recommendation model. The given POI mining method firstly mined the POIs by fusing the sentimental and geographical attributes of locations. Then, sentiment similarity between POIs and geographical distance between user’s multiactivity centers and POIs were used in the probabilistic matrix factorization based recommendation model for rating prediction.
2.2. Privacy Preserving Recommendation
With the popularity of recommendation services, the privacy preservation problem of POI recommendation services has attracted more and more attention from many researchers. Cryptography technology can provide strong privacy protection based on encryption and decryption mechanisms. Liu et al. [16] designed two protocols (PPTRS and PPTRL) for privacy preserving POI recommendation, based on partial homomorphic encryption. In PPTRS protocol, all the data was encrypted by the public key, but the private key could only be accessed by the social networking website. In PPTRL protocol, the location based services (LBS) provider was responsible for generating the key pair and properly storing the private key. Wang et al. [17] proposed a homomorphic encryption based protocol to protect users’ checkin data. They used homomorphic properties of Paillier cryptosystem to achieve effective recommendation on encrypted data. And on this basis, two optimization strategies are presented to improve the proposed protocol. Gedik et al. [18] presented a kanonymous customization framework that supports variable k to meet the personalized needs of location privacy protection. However, the location of the target user can still be easily identified.
For both anonymity and encryption technology, defenders need to carefully design different models regarding different background knowledge of adversaries. For decentralized recommendation scenario, Chen et al. [19] proposed a decentralized MF (DMF) framework for POI recommendation. And a random walk based nearby collaborative training technique is carefully designed to for DMF model training in each user’s end. Both the model learning process and the rating data are kept on user’s own side; only gradients are exchanged between nearby users; thus, user’s privacy is well preserved.
2.3. Differentially Private Recommendation
Differential privacy (DP) is firstly proposed by Dwork [20]. The core idea of differential privacy is to add noise to the original data, so as to realize data perturbation and achieve privacy preservation. Differential privacy can prove the risk of privacy leakage through rigorous mathematical analysis, regardless of the adversary’s background knowledge. In order to prevent personal data from being collected by untrusted recommendation system, Liu et al. [21] added noise to the user's rating data by using random perturbation technology on the user side. In addition, to prevent adversaries from inferring sensitive data of any single user according to the recommendation results, they also added noise during calculating the relationship between the items. Meng et al. [22] believed that only the historical rating data of some users is sensitive. Based on above assumptions, they classified the user's historical rating data into sensitive level and nonsensitive level. And a large amount of noise was added to the sensitive rating data, while a small amount of noise is added to the nonsensitive rating data. As a result, the sensitive data could be kept better privacy protection, and the nonsensitive data could be used to achieve better recommendation accuracy. However, the above methods could only protect information of items or ratings. Wei et al. [23] proposed a differentially private trajectory community recommendation method. This method firstly converted positions of the real trajectory into positions of the noisy trajectory and then used geographic distance to construct the noisy trajectory that had minimum distance with the real trajectory, so as to ensure that the constructed noisy trajectory was highly similar to the real one. Zhang et al. [24] proposed a privacy preserving location recommendation framework. The proposed framework mainly used norder Markov chain and user's sequential pattern to calculate the probability result and recommend interested location. Besides, they also designed a probabilistic differential privacy mechanism to reach a good tradeoff between high recommendation accuracy and strict location privacy protection. For privacy preserving contextaware POI recommendation, Riboni and Bettini designed a differentially private recommendation system [25], which extracted statistics about personal preferences for POIs through PINQ query engine and then generated recommendations from those statistics. Users’ location privacy was protected by simply generalizing location into corresponding grid cell. Chen et al. [26] proposed a privacy preserving POI recommendation framework named PriRec. In this framework, users’ private raw data and linear models were kept on users’ own side, only public POIs’ data and the feature interaction model were kept by the recommender, and local differential privacy was adopted to generate dynamic POI feature. A secure gradient descent protocol based on secret sharing was designed for collaboratively learning linear models, and a secure aggregation strategy was adopted to learn the feature interaction model. For privacy preserving next POI recommendation, Kuang et al. [27] modeled users’ checkin sequences with their latent states based on hidden Markov model and proposed a weighted noise injection method. In the proposed method, the distance between the user's current location and the nearest checkin center was calculated, and then a different amount of noise was injected to latitude and longitude of the user's location according to the distance. The noisy locations were finally used to predict user’s next movement.
3. Preliminaries
3.1. Local Differential Privacy
Centralized differential privacy needs a credible thirdparty data collector to collect user data and conduct privacy processing on it. However, in practical applications, it is difficult to find a credible thirdparty data collector, and its cost is relatively high. Therefore, local differential privacy (LDP) comes into being. LDP mainly assumes that the server is untrusted, and each user can process individual data independently. That is, the privacy processing is transferred from the data collector to the user side, without the intervention of a credible third party.
Let be the number of users, and the private data of the ith user is represented by a tuple , which contains dimensions: , , … . To protect privacy, the ith user firstly uses a random perturbation function to perturb and then sends the noisy data to the recommendation system.
Definition 1. (LDP). Given a random function , for any two input tuples , and any possible output of function , ifthen the random function is said to satisfy LDP, where is the parameter of privacy budget.
It can be seen from the above definition that LDP controls the similarity of the output results of any two records, so as to ensure that the random function satisfies LDP.
The two combination theorems of LDP are given as follows:
Theorem 1 (Serial Combination Theorem). Suppose that there is a set of random algorithms , and each of satisfies LDP on the dataset .Then, the set of sequence privacy mechanisms provides LDP.
Theorem 2 (Parallel Combination Theorem). Suppose that dataset can be divided into a series of independent and nonoverlapped subsets and there is a set of random algorithms . If each privacy mechanism satisfies LDP on , then the set of randomized algorithm can achieve LDP on .
3.2. Matrix Factorization
Matrix factorization is an effective algorithm to overcome the data sparsity problem of recommendation system. Suppose that the number of users is and the number of items is , represents the rating data for the jth item offered by the ith user, and the useritem rating matrix . The purpose of matrix factorization is to predict the missing data in . To achieve this, is decomposed into user profile matrix and item profile matrix , where the number of underlying features is . The corresponding optimization function iswhere is the ith row vector of user profile matrix , is the jth row vector of item profile matrix , is the transposed vector of , and and are constraint parameters, which are positive constants. The first term is a loss function that measures the distance between two matrices, and the last two terms are used to prevent the model from overfitting. The unobserved rating data can be calculated by . To seek computational efficiency, stochastic gradient descent (SGD) strategy is often used to solve the above model.
4. Design of the GSSAPPRM Method
4.1. Main Framework
The GSSAPPRM method assumes that the thirdparty recommendation system is untrusted, and each user keeps personal rating data, checkin data, and social data on his/her local device. Besides, in order to reduce the storage cost on user's device, the recommendation system still retains all public POI data, such as POI geographic coordinates and other static data.
The main idea of the GSSAPPRM method is to estimate the statistical analysis results of the original data without breaching user privacy. In the proposed method, each user firstly perturbs local data to satisfy LDP and then uploads the noisy data to the recommendation system. The recommendation system aggregates the noisy data and then carries out recommendation service. The framework of the proposed method is depicted in Figure 1. It contains three main parts: (1) privacy preserving social matrix factorization algorithm (PrivSMF) considering social attributes; (2) geographical correlation calculation based on kernel density estimation (GeoKDE); (3) random response mechanism according to POI popularity estimation (RRPE).
The PrivSMF algorithm runs in a C/S style to achieve initial userPOI rating prediction, which is based on differentially private gradient descent strategy proposed by Shin et al. [9]. In one iteration, each user updates local rating prediction model according to user profile matrix, social profile matrix, and item profile matrix and then sends current noisy gradient value of item profile matrix to the recommendation system. After aggregating all the users’ noisy gradient, the recommendation system updates the global rating prediction model and sends noisy gradient matrix back to each user for further local model update. Different from the work of Shin et al. [9], the PrivSMF algorithm takes user’s social attribute into consideration and adopts the TrustMF model for local rating prediction on user side. Based on the result of the PrivSMF algorithm, the GeoKDE algorithm estimates kernel density, to describe the impact of POI geographical distance on user's checkin behavior. In the RRPE algorithm, each user perturbs his/her checkin data by using the random response mechanism of LDP and uploads the noisy data to the recommendation system. The recommendation system tallies visit count of POIs by aggregating the noisy data from all the users and then performs POI popularity estimation to find popular POIs that may be interested by users. Considering both kernel density and POI popularity, the product rule is used to synthesize the results of the GeoKDE algorithm and the RRPE algorithm, so as to get final userPOI rating prediction result. In the end, the list of TopN POIs is recommended to the user according to the sorted prediction ratings. The PrivSMF, GeoKDE, and RRPE algorithms will be explained in detail from Sections 4.2 to 4.4, respectively. And privacy analysis is presented in Section 4.5.
4.2. PrivSMF Algorithm
The goal of the PrivSMF algorithm is to update item profile matrix on the recommendation system while preserving users’ underlying preference, by combining the differentially private gradient descent strategy [9] and the randomization strategy proposed by Nguyen et al. [28]. In one round of iteration, each user firstly updates user profile matrix and social profile matrix locally, then perturbs item profile matrix, and only uploads the gradient value of noisy item profile matrix to the recommendation system. The recommendation system aggregates the gradient values uploaded by all users and then updates item profile matrix. In the end of one round, the averaged noisy gradient matrix is sent back to each user. This iteration is repeated until maximum iteration number is reached, which means the end of update process.
Considering social attribute, the objective function of the PrivSMF algorithm is given as follows:where represents the rating data for the jth POI offered by the ith user (that is, the number of checkins on the jth POI made by the ith user). is the social attribute between the ith user and the kth user; if a social relationship exists between them, then ; otherwise, . , and represent user underlying feature vector, item underlying feature vector, and social underlying feature vector, respectively. is the dimension of underlying feature vector, and and are the transposed vectors of vectors and , respectively. is a regularization term, which is a positive constant. and respectively represent the number of users and the number of POIs. represents the square of the secondorder norm of the vector. The first two items of (3) represent the error between the true value and the predicted value, and the last item is used to control model complexity, so as to avoid overfitting.
The PrivSMF algorithm uses SGD strategy to minimize equation (3). , and are learned and updated according to equations (4)–(6), respectively. represents the underlying feature vector of the ith user when the current iteration number is , so as the similar meaning of and .where represents the learning rate when the current iteration number is , and , and represent the gradient of , and , respectively. By taking the derivative of equations (4)(6), it could getwhere is the transposed vector of .
In order to implement dimensionality reduction for item profile matrix, the PrivSMF algorithm adopts the differentially private gradient descent strategy, which is based on random projection [9]. Assume a positive integer and generate a random matrix , in which each element is a random variable that satisfies Gaussian distribution with mean of 0 and standard deviation of . Suppose that the lowdimensional gradient matrix , is item profile matrix, each row is item underlying feature vector, and is the gradient of item profile matrix. The ith user restores highdimensional gradient matrix by updating lowdimensional gradient matrix .
To protect the rated items and rating data, the PrivSMF algorithm adopts the randomization strategy [28]. The ith user randomly selects a tuple from gradient matrix and perturbs it and then uploads the noisy tuple to the recommendation system. The recommendation system aggregates noisy tuples submitted by all users to generate lowdimensional noisy gradient matrix . Since , is the pseudoreciprocal of . After perturbing highdimensional gradient , (5) could be changed to
That is,
The recommendation system updates item profile matrix according to equation (9). The pseudocode of the PrivSMF algorithm is described as follows.

In the PrivSMF Algorithm, each user only randomly selects one element from gradient matrix and uploads it to the server (recommendation system); that is, it only uploads one bit to the server. Compared with the work of Hua et al. [29], the PrivSMF Algorithm only requires that the server send a noisy gradient matrix to each user, instead of sending the item profile matrix , where . Through this way, the communication cost between users and the server is greatly reduced.
After obtaining item profile matrix , each user can calculate the prediction rating matrix locally by using the TrustMF model [30]. The prediction rating equation of the ith user for the jth POI iswhere is the logistic function. and are, respectively, user underlying feature vector and social underlying feature vector of ith user, which are generated locally. is the transposed vector of item underlying feature vector , which is generated iteratively by user side and recommendation system side, and is the maximum value of all the rating data. The ith user sends the calculated to the recommendation system, and the recommendation system aggregates the prediction rating matrix uploaded by all users. Finally, the noisy prediction rating matrix is obtained.
4.3. GeoKDE Algorithm
In locationbased services, users often physically interact with POIs and consume related services, such as eating in a restaurant or watching a movie in a cinema. Therefore, the user's checkin behavior is likely to be affected by the geographical distance of POIs. Besides, the distribution of user’s checkin data is various. For example, people who like indoors may visit places around their living area; people who like outdoors may travel and explore some new and interesting places. Zhang et al. [31] learn the distribution from user’s historical checkin data based on a nonparametric estimation method, which estimates kernel density with a fixed bandwidth. However, this method does not accurately reflect the real user’s checkin behavior: the checkin density is high in densely populated areas, while the checkin density is low in sparsely populated areas.
Trying to better learn the impact of POI geographical distance on the user's checkin behavior, the GeoKDE algorithm estimates kernel density, which is based on the selfadaptive kernel density estimation algorithm [32] and the matrix obtained in Section 4.2.
Assuming that the checkin dataset of the ith user for POIs is , and represent the number of users and the number of POIs, respectively. A POI has longitude and latitude attributes, which is represented by . Then, the value of selfadaptive kernel estimation of the ith user for can be calculated as follows:where is the sum of rating value of all users for . is the rating value of the ith user for . is the kernel function.where is the local bandwidth [32], and are, respectively, calculated according to standard deviation of the longitude and latitude values of checkin dataset .
4.4. RRPE Algorithm
In the GSSAPPRM method, it is assumed that POIs having more visit count are more popular and interested by users. In order to calculate the popularity of POIs while preserving user privacy (a user does not want the recommendation system to know whether he/she has visited a certain POI), the RRPE algorithm is designed in this section.
The RRPE algorithm uses the random response mechanism (WRR) proposed by Warner et al. [33], to protect user's checkin data. Each user responds to the request sent by the recommender system with a certain probability. The recommender system tallies the visit count of POIs by aggregating the responses of all users and then calculates the POI Popularity. The main process of the RRPE algorithm is described as follows:
at user side, the ith user answers to the recommendation system whether he/she has checked in a POI . If the ith user has ever visited , then the answer is “yes” with response probability ; otherwise, the answer is “no”; if the ith user has not visited , then the answer is “no” with response probability ; otherwise, the answer is “yes.” In this response stage, the response probability , where is privacy budget. The recommendation system aggregates the response results of all users and then calculates the POI popularity of , which is given as follows:where represents the answer of the ith user for . If the answer is “yes”, then ; otherwise, .
Finally, a rating rule of the ith user for is designed, which uses the product rule to integrate the POI popularity and the kernel estimation obtained in above subsection. This rating rule is shown as follows:where is the predicted rating value of the ith user for . The larger the value of , the more likely the ith user will visit . For the ith user, POIs are sorted according to their predicted rating value in descending order, and then TopN POIs are selected as the recommendation list for the ith user.
4.5. Privacy Analysis
In the GSSAPPRM method, Laplace noise is added in the PrivSMF algorithm and the RRPE algorithm, respectively. To prove the GSSAPPRM method satisfies differential privacy, we need to prove that both the PrivSMF algorithm and the RRPE algorithm satisfy differential privacy.
4.5.1. The PrivSMF Algorithm Satisfies Differential Privacy
Assuming that and are two adjacent datasets, is item profile matrix and is the output of the algorithm, is maximum iteration number, is the original tuple, and is the noisy tuple submitted by the ith user when the current iteration number is .
Prove
Therefore, the PrivSMF algorithm satisfies differential privacy.
4.5.2. The RRPE Algorithm Satisfies Differential Privacy
Supposing that and are two adjacent datasets, and are any two records, and is certain output result of the algorithm.
Prove:
Referring to the proof process of random response technique [33],
Therefore, the RRPE algorithm satisfies differential privacy。
4.5.3. The GSSAPPRM Method Satisfies Differential Privacy
Assuming , the datasets processed by the PrivSMF algorithm and the RRPE algorithm belong to disjoint datasets. According to the parallel combination theorem of differential privacy, it can be proved that the GSSAPPRM method satisfies differential privacy.
5. Experimental Study
5.1. Experimental Environment
Experiments are conducted on Intel (R) Core i56500/3.2 GHz/64 GB hardware platform and 64bit Windows 10 operating system, and the programming language is Python (version 3.6.9).
Two real datasets Gowalla [34] and Yelp [35] are used in the experiments, and the statistics information of these two datasets is shown in Table 1. Gowalla dataset provides the user's checkin frequency on POIs, and Yelp dataset provides the user's rating on POIs.
5.2. Study of Experimental Results
To effectively evaluate the impact of differential privacy on the recommendation result, two common indicators (Precision and Recall) in the recommendation system are adopted to evaluate the recommendation quality.
Precision: the ratio of the number of actually visited POIs in the recommended POI set to the number of recommended POIs, which is given as follows:
Recall: the ratio of the number of recommended POIs in the actually visited POI set to the number of actually visited POIs, which is given as follows:where represents the set of users, represents the set of POIs recommended to the ith user, and represents the set of POIs actually visited by the ith user.
The performance of the GSSAPPRM method is analyzed and compared with that of the following methods:
Nonprivate: nonprivate preserving baseline method, that is, privacy preserving part of the GSSAPPRM method is removed. This baseline method mainly adopts SGD strategy to decompose user's historical rating matrix. On this basis, geographical distance and POI popularity are also considered to predict users’ preference for POIs.
DPSVD: traditional matrix factorization recommendation method, which adds Laplace noise in the process of SGD so as to protect user privacy.
Private GDDR: a recommendation method proposed by Shin et al. [9], based on LDP and matrix factorization technology.
KDTree: a datadependent space decomposition method proposed by Zhang et al. [36], which mainly uses the differentially private trajectory dataset for location recommendation.
The GSSAPPRM method has some predefined parameters. Before carrying out experiments, the number of underlying features is set to , the learning rate is set to , and ; is set to 50, during iteration, the learning rate is set to when the current iteration number is , and the regularization factors are set to .
Two groups of comparative experiments are executed. The first group of experiments compares the performance of the above methods under different values of privacy budget . The second group of experiments compares the performance of above methods under different values (the number of POIs recommended to users). These experiments adopt fivefold crossvalidation strategy to test performance of above methods and then take the average value of precision and recall as the final result.
5.2.1. Experiments under Different Privacy Budget Values
Privacy budget is an important parameter that determines the level of privacy preservation of differential privacy; that is, the smaller the value, the higher the degree of privacy preservation. In this group of experiments, is fixed to 10, and the value of is set from 0.1 to 1.0 ().
For Gowalla and Yelp datasets, precision and recall of each method under different privacy budget values are shown in Figures 2 and 3, respectively. With the increase of privacy budget , accuracy value and recall value of the above four privacy preserving methods also increase (Since the KDTree method has not been tested on dataset Yelp, it is omitted in the right picture of the following figures). This result is expectable because the error generated by Laplace mechanism decreases as the privacy budget increases.
(a)
(b)
(a)
(b)
It can be seen from Figure 2(a), for Gowalla dataset, the precision value of the KDTree method is higher than that of the GSSAPPRM method when . This clearly shows that when privacy budget is small, it has a greater impact on the GSSAPPRM method. But when , the precision value of the GSSAPPRM method is much higher than that of the KDTree method, and when , the GSSAPPRM method tends to balance and maintain a better precision. It also can be seen from Figure 2(b) that, for Yelp dataset, except for and , precision value of the GSSAPPRM method is higher than that of the other two privacy preserving methods.
In addition, for both Gowalla and Yelp datasets, it can be intuitively seen from Figure 2 that the precision value of the GSSAPPRM method is higher than that of the Private GDDR method. The main reason is that POI popularity and geographic attributes considered by the GSSAPPRM method are important for the accuracy of POI recommendation and can improve recommendation quality to a certain extent.
It can be clearly seen from Figure 3(a) that, for Gowalla dataset, recall value of the KDTree method is higher than that of the GSSAPPRM method when . This also shows that when privacy budget is small, it has a greater impact on the GSSAPPRM method. But when , the recall value of the GSSAPPRM method is higher than that of KDTree method. And when , the GSSAPPRM method tends to balance and can maintain a better recall. It is extremely obvious in Figure 3(b) that, for Yelp dataset, recall value of the GSSAPPRM method is noticeably higher than that of the other two privacy preserving methods when . When , recalls of both the GSSAPPRM method and the DPSVD method maintain a similar level. When , recalls of both the GSSAPPRM method and the Private GDDR method maintain a similar level.
Besides, it can also be clearly seen from Figures 2 and 3 that the nonprivate method achieves better recommended quality than the GSSAPPRM method. However, this result can also be expected. Because of differential privacy protection, a certain loss of recommended quality is caused. Overall, compared with other privacy preserving methods, the GSSAPPRM method proposed in the paper can maintain a relatively better recommendation quality when privacy budget changes.
5.2.2. Experiments under Different N Values
In this group of experiments, for all the privacy preserving methods, is set to 0.5, and the value of (the number of POIs recommended to users) is set from 5 to 65 ().
For Gowalla and Yelp datasets, precision and recall of four methods under different values are shown in Figures 4 and 5, respectively (since the KDTree method [36] did not perform such test, it is omitted).
(a)
(b)
(a)
(b)
It can be clearly seen from the two figures that as the value increases, the precision value of each method shows a downward trend, while the recall value of each method shows an upward trend.
It can be seen from Figure 4(a) that, for Gowalla dataset, precision value of the GSSAPPRM method keeps close to that of the nonprivate method when . The precision value of the GSSAPPRM method shows a sharp downward trend when . However, it is still much better than the other two privacy preserving methods. This demonstrates that the GSSAPPRM method can maintain a better recommendation accuracy even when the value is larger. It can be seen from Figure 4(b) that, for Yelp dataset, precision value of the GSSAPPRM method keeps close to that of the Private GDDR method when except the point of . The precision value of the GSSAPPRM method is much higher than that of the private GDDR method when . Besides, the overall precision of the GSSAPPRM method is relatively higher than that of the DPSVD method.
It can be clearly seen from Figure 5(a) that, for Gowalla dataset, recall value of the GSSAPPRM method keeps close to that of the other two privacy preserving methods when . When , the recall value of the GSSAPPRM method is much higher than that of the other two privacy preserving methods, especially in the situation of . This means that when the number of POIs recommended to the users increases, the GSSAPPRM method can better cover the set of POIs that users are interested in. It can be seen from Figure 5(b) that, for Yelp dataset, the overall recall of the GSSAPPRM method keeps closer trend with that of the nonprivate method. Overall, the GSSAPPRM method proposed in the paper is better than the other two privacy preserving methods.
5.2.3. Ablation Study
Here, we provide an ablation study that demonstrates the effectiveness of respective parts in our GSSAPPRM method. We compare GSSAPPRM method with three weaken versions, including (1) removing social attribute and social underlying feature vector from the PrivSMF Algorithm, and thus, it (3) becomes an ordinary optimization function for matrix factorization; (2) removing selfadaptive kernel estimation of the ith user for from the GeoKDE Algorithm, assigning the same fixed value for each instead; (3) removing POI popularity calculation of from the RRPE algorithm, assigning the same fixed value for each instead.
Two groups of comparative experiments are executed. The first group of experiments compares precision value of the above four versions of the GSSAPPRM method, while the second one compares the recall values. In these two experiments, privacy budget is set to and is fixed to 10. The experimental results of the ablation study are reported in Tables 2 and 3, respectively.
As shown in Tables 2 and 3, the original GSSAPPRM method always achieves the best performance, and the performance of the first weakened version is the worst, which may imply the significance of social attributes. It can be also found that the performance of the first weakened version on Yelp is worse than that on Gowalla since Yelp dataset contains more labels of social relation than Gowalla dataset.
6. Conclusion
Aiming at the scenarios of privacy preserving POI recommendation, a privacy preserving POI recommendation method named GSSAPPRM is proposed in the paper. In the proposed method, the PrivSMF Algorithm, which is a LDP based matrix factorization algorithm taking social attribute into consideration, is firstly presented to securely learn implicit user’s preference. Then, based on the result of the PrivSMF Algorithm, the GeoKDE algorithm is designed to study the impact of POI geographical distance on user’s checkin behavior by adopting the selfadaptive kernel density estimation strategy. And the RRPE algorithm is devised to calculate POI popularity according to the aggregated responses of all users through random response mechanism. Finally, integrating the results of the GeoKDE and RRPE algorithms, a rating rule is given to predict the rating of users for POIs, and TopN recommendation is achieved.
To further smartly learn user’s implicit preference, our future work is to combine proper deep learning techniques with traditional recommendation methods. Moreover, how to improve the computation efficiency of the proposed method will be another research direction.
Data Availability
Previously reported (Gowalla and Yelp) data were used to support this study and are available at (http://snap.stanford.edu/). These prior studies and datasets are cited at relevant places within the text as references [34, 35].
Conflicts of Interest
The authors declare that they have no conflicts of interest to report regarding the present study.
Acknowledgments
This work was supported by Key Research and Development Program of Shaanxi Province (No. 2021GY090) and Science and Technology Program of Xi’an City (No. 2019216914GXRC005CG006GXYD5.2).