Abstract

Focusing on the privacy issues in recommender systems, we propose a framework containing two perturbation methods for differentially private collaborative filtering to prevent the threat of inference attacks against users. To conceal individual ratings and provide valuable predictions, we consider some representative algorithms to calculate the predicted scores and provide specific solutions for adding Laplace noise. The DPI (Differentially Private Input) method perturbs the original ratings, which can be followed by any recommendation algorithms. By contrast, the DPM (Differentially Private Manner) method is based on the original ratings, which perturbs the measurements during implementation of the algorithms and releases the predicted scores. The experimental results showed that both methods can provide valuable prediction results while guaranteeing DP, which suggests it is a feasible solution and can be competent to make private recommendations.

1. Introduction

In the Internet age, users are constantly troubled by information overload, since they cannot get really useful parts of large amounts of information. As a promising solution, recommender systems with personalized technologies have been widely used to enhance user experience in various online services. The typical case is that Netflix has been working on recommending movies which are best suitable for the users’ taste. Collaborative filtering (CF for short) is one of the most dominant techniques used in recommender systems. The basic idea is to predict user preference based on preferences of other similar users. The methods are generally divided into two classes, the memory-based methods and the model-based methods [1]. However, users’ rating data collected for recommendation is a potential source of leaking privacy for inferring users' sensitive information [2]. Calandrino et al. [3] developed the algorithms to demonstrate several inference attacks from continual recommendations with auxiliary information. In this work, we focus on the privacy issues in recommender systems and seek feasible solutions based on differential privacy (DP for short) [4], which is recognized as a promising technology for the privacy framework.

Zhu et al. [5] proposed a private neighbor collaborative filtering algorithm consisting of two major steps. Firstly, a redesigned exponential mechanism is used to privately select neighbors with higher quality to enhance the performance of the recommendations. The involved recommendation-aware sensitivity is a new sensitivity based on the notion of local sensitivity. Then, the original ratings of the selected neighbors are perturbed by adding Laplace noise. Zhu et al. [6] designed two differentially private algorithms with sampling, named DP-IR and DP-UR for item and user based recommendation, respectively. Both algorithms use the exponential mechanism with a carefully designed quality function. Jorgensen et al. [7] proposed a privacy preserving framework for personalized social recommendations. There are two distinct graphs in the model settings, an unweighted preference graph and an insensitive social graph. The users are clustered according to natural community structure of the social network, which significantly reduces the amount of noise required to guarantee DP. However, relationships in social networks are sometimes considered sensitive information.

Friedman et al. [8] proposed a generic framework and evaluated several ways of differentially private matrix factorization for recommender systems. The specific methods are input perturbation, stochastic gradient perturbation and ALS with output perturbation. Through comparison and analysis, the input perturbation performs best in the recommendation results. Mcsherry et al. [9] adapted several leading algorithms in the Netflix Prize competition to the framework of DP. Concretely, the Laplace noise is incorporated into various global effects and covariance matrix of user rating vectors based on item-item similarities. Given these noisy measurements, several algorithms (the k-Nearest Neighbor method [10] and the standard SVD-based prediction mechanism) are employed to make private recommendations directly. Liu et al. [11] proposed a hybrid approach for privacy preserving recommender system to hide users’ private data and prevent privacy inference. The users’ original data are disguised through randomized perturbation (RP for short). Similar to literature [9], covariance matrix and some averages are masked with a particular amount of noise again to achieve DP. Then, some existing algorithms can run directly on the published noisy measurements.

Differently from the works in literature [5, 6], we choose to calculate the predicted scores by using all users’ ratings, not the recommended list, which can make full use of data information for the estimation of noise error. In literature [6], the theoretical results are presented in detail on both privacy and accuracy of the proposed method, however, the experimental results are lacking. By comparison, more experimental results are provided in this work to demonstrate the relationship between privacy and accuracy. In this work, the similarity is calculated by row vectors in the rating matrices. That is, the calculation is based on user-user similarities, not item-item similarities as in literature [9]. This mainly takes into account the recommendation based on users with the similar preferences. Furthermore, the experimental results of this paper showed that the DPI method performs better than the DPM method, which is consistent with the conclusion in literature [8].

In this paper, we propose a differential privacy framework for collaborative filtering, which includes three existing algorithms to calculate the predicted scores and adopts two methods of adding Laplace noise to conceal individual ratings and provide valuable prediction results. The rest of the paper is organized as follows. Section 2 introduces the background knowledge. A detailed description of the framework is presented in Section 3. Section 4 reports the experimental evaluations. Finally, Section 5 concludes the study and provides further research directions.

2. Background

2.1. Differential Privacy

DP offers a mathematical definition of privacy and a provable privacy guarantee for each record in the dataset. Intuitively, the output of the computation should not reveal too much information about any record in the dataset. The probability of the output is insensitive to small input changes, whether one record is in the dataset or not. DP is presented in a series of papers [1216], mainly used in data publishing [1719] and data mining [2022].

Definition 1 (ε-DP [4]). A randomized computation K satisfies ε-DP if, for any neighboring datasets A and B differing on at most one record, and for all subsets of possible outputs where ε is the privacy budget to make the trade-off between privacy and accuracy. The value of ε is generally set to a small positive value. The smaller it is, the higher privacy and lower accuracy it provides and vice versa.
Specifically, the neighboring datasets contain the same ratings except for one in this context. The rating in A that a user u assigns to an item i is different from in B.
The noise mechanism is suitable for perturbing the numerical outputs, which is one of the common ways to achieve DP. The amount of noise required is dependent on global sensitivity of the function. There are important combination properties in differentially private algorithms. Formally, the relevant definitions and propositions are described as follows.

Definition 2 (global sensitivity [4]). For a function : DRd, the global sensitivity of iswhere Rd is a real vector of d dimensions and and are neighboring datasets. The global sensitivity denotes the maximum extent that a single record could affect the output results.

Definition 3 (Laplace mechanism [23]). For a function : DRd, the randomized algorithm M satisfies ε-DP ifwhere , which are i.i.d. random variables sampled from the Laplace distribution with mean 0 and scale parameter .

Proof (see [24]). Suppose For any output ,Similarly, . Thus,

Proposition 4 (sequential composition [25]). Let each algorithm provide -DP. The combination algorithm A (A1 (D), A2 (D),…, Ak (D)) over the dataset D provides - DP.

Proof (see [24]). For any output ,

Proposition 5 (parallel composition [25]). Let each algorithm provide -DP. The combination algorithm A (A1 (D1), A2 (D2),…, Ak (Dk)) over the disjoint subsets of dataset D provides -DP.

Proof (see [24]). Without loss of generality, assume that Dj differs one element from ; other subsets are exactly the same. For any output ,In privacy preserving computations, the measurements need to be allocated a reasonable privacy budget based on the combination properties.

2.2. Collaborative Filtering

The CF system usually presents a sorted list of predicted items to the active user. The performance can be evaluated by the expected utility of the items in the recommended list. Alternatively, the system may provide numeric scores directly for the predicted items. The performance can be measured by a normed distance between the predicted scores and the actual preference values (i.e., the original ratings). There are two common ways of calculating the similarity, Pearson correlation coefficient (Pcc) and Cosine-based similarity (Cos) [26].

Pcc is a linear correlation coefficient , which is used to measure the correlation between two random variables X and Y. The range of values is between -1 and 1, and the larger the absolute value is, the stronger the correlation is. When X is linearly dependent with Y, the value is 1 (namely, positive linear correlation) or -1 (namely, negative linear correlation). It is defined as the ratio of covariance and standard deviation between X and Y, where E is the expected value.

Cos evaluates the similarity by calculating the angle cosine of two vectors X and Y, which pays more attention to the difference in directions between the vectors. The range of values is between -1 and 1, and the value of 0 denotes that two vectors are orthogonal. When the directions of two vectors coincide, the angle cosine takes the maximum value of 1, and vice versa. The vector similarity is defined as follows:

3. The Proposed Method

DP is essentially a property that the system should maintain, rather than a specific way of calculating. Therefore, the framework designed includes different perturbation methods to carry out predictions in a differentially private manner. The simple technique of noise addition is fully fit for predicting the scores while protecting original ratings without leakage.

As shown in Figure 1, the framework includes three different algorithms from the previous research [1]. The calculation methods are similar in the algorithms Pcc and Cos except for different measurements of similarities between users. For a detailed description, see Algorithm 1. The Avg algorithm makes use of the average of original ratings to predict the scores. According to the locations of adding noise, the perturbation methods are divided into two forms, respectively. In the DPI method, the noise is added to each entry in the rating matrix to mask the original data. In the DPM method, the noise is added to the various measurements of the algorithms based on the original rating matrix.

Input: the rating matrix, the privacy budget , the parameter k
Output: the predicted scores
.
2. simMat = similarity (activeMatTrain, otherMatTrain)
3. simMat = simMat + Lap
4. otherAvgVec = mean () + Lap
5. otherMatPred = otherMatPred - otherMatAvg
6. for each active user j
7.activeAvg = mean (activeMatTrain (j)) + Lap
// The spdiags function transforms the similarity vector of user j into a diagonal matrix.
8.activeMatPred (j) = sum (spdiags (simMat (j))otherMatPred) + activeAvg
9. end for
10. return activeMatPred
3.1. DPI Method

In this method, the original ratings are perturbed with Laplace noise before performing the prediction algorithms. The input perturbation, a relatively simple and efficient strategy, can be followed by any recommendation algorithms.

The magnitude of noise added to the rating matrix is according to the global sensitivity of the ratings. The range of ratings is , which dictates the maximum amount of changes in the ratings. According to Definition 3, the rating matrix will be perturbed by adding noise . As the postprocessing, the noisy ratings are clamped to using the following to limit the influence of some excessive noise.

These algorithms take the noisy matrix as inputs to carry out predictions without accessing the original matrix. Therefore, the whole process of the DPI method can guarantee -DP.

3.2. DPM Method

In this method, privacy protection needs to be guaranteed during the implementation of the algorithms based on the original rating matrix. Note that, the users generally assign ratings to part of the items in the dataset. The algorithms only consider nonzero ratings in the matrix. The constraint is that each user has rated at least k items, which is an important parameter in Algorithm 1. Otherwise, the system will not make predictive recommendations for the user.

3.2.1. Description of Algorithm

As shown in Figure 2, the rating matrix is partitioned into four blocks in Algorithm 1. The users are randomly divided into active users and other users. Meanwhile, a part of items is for training, and the rest are for predicting. The task of prediction is to score nonzero ratings in the activeMatPred.

Algorithm 1 provides a detailed description of adding Laplace noise to the Pcc (or Cos) algorithm.

Line 1 divides the privacy budget into two parts (i.e., and ) to be consumed in a sequential way. Line 2 calculates the similarity matrix simMat with activeMatTrain and otherMatTrain based on the measurements Pcc (or Cos). Line 3 perturbs simMat by adding noise Lap to each entry in the matrix. Line 4 calculates the averages of the ratings otherAvgVec adding noise Lap () for other users. Line 5 generates the deviation matrix otherMatPred by subtracting otherMatAvg, where each nonzero entry is set to the noisy average. Note that, the coordinates of the nonzero values are one-to-one in otherMatPred and otherMatAvg. Line 7 calculates the average of the ratings activeAvg adding noise Lap () for each active user j. Line 8 generates the predicted scores activeMatPred () by adding activeAvg to the predicted deviations calculated with simMat () and otherMatPred. Line 10 returns the matrix activeMatPred, where nonzero values are assigned to the predicted scores.

In the Avg algorithm, the users in the rating matrix are no longer divided and the items are still divided into two parts, training and predicting. In other words, the rating matrix is vertically divided into two parts, not four blocks as seen in Figure 2. The average of each user’s ratings in the training block is used directly as a predicted score for each nonzero item in the predicting block. The intuitive impression is that the ratings assigned by each user are relatively stable. The average of the ratings is the unique measurement that needs to be perturbed by adding Laplace noise. Therefore, the privacy budget no longer needs to be divided, and the magnitude of noise added is Lap (). The description of the algorithm is omitted here, as it is relatively simple and easy to implement.

3.2.2. Analysis of Privacy

Theorem 6. The global sensitivity of the average of the ratings is .

Proof. The global sensitivity iswhere n is assumed to be constant and and . For simplicity, is used as a specified upper limit of the sensitivity. Therefore, the magnitude of noise added to the average of the ratings is Lap ().

Theorem 7. The proposed Algorithm 1 guarantees  -DP.

Proof. The value of Pcc (or Cos) ranges from -1 to 1, and the maximum amount of changes is 2. According to Definition 3, the magnitude of noise added to line 2 is Lap (). According to Theorem 6 and Proposition 5, the magnitude of noise added to line 4 and line 7 is Lap (). The measurements to calculate activeMatPred in line 8 are completely perturbed by Laplace noise. According to Proposition 4, the proposed Algorithm 1 guarantees  -DP.

Theorem 8. The perturbed Avg algorithm guarantees  -DP.

Proof. As the predicted score, the average of the ratings is directly perturbed by adding Laplace noise. According to Theorem 6, the magnitude of noise added to the average of the ratings is Lap (), and the perturbed Avg algorithm guarantees -DP.

4. Experiments

In this section, we first conduct an experiment to provide an instance of the Laplace noise, as shown in Table 1. The first column contains original counts that need to be perturbed, the second column contains the generated random data, the others contain the generated noise corresponding to different . The last row calculates sum of squared error (SSE) of the noise in each column. In the vertical direction, when r is around 0.5, the generated noise is minimal, and the further r goes, the higher the noise. In the horizontal direction, the noise decreases with the increase of , so does SSE of the noise, and the level of privacy is naturally lower.

Then, we conduct experiments on three new datasets [27], ml-20m, ml-latest-small and ml-latest, which are collected and made available rating datasets from the MovieLens website by GroupLens Research. Respectively, we randomly selected 700 users from ml-latest-small, 1000 users from ml-20m, 2000 users, and 5000 users from ml-latest. Note that some users will be removed in the preprocessing if they have rated less than 3 items, that is, in the experiments. The actual size of original and experimental datasets is shown in Table 2. The ratings in the datasets are , respectively, that is, the global sensitivity . The privacy budget is set to , respectively. In Algorithm 1, we set 0.2 : 0.8. Meanwhile, the selected datasets are divided equally in both horizontal and vertical directions. The evaluation measurements are mean absolute error (MAE) and mean square error (MSE), respectively. and , where predval is the predicted score and trueval is the original rating.

As shown in Figures 314, three algorithms run on two smaller datasets in two perturbation methods. In summary, the DPI method performs better than the DPM method especially when evaluating MSE. MAE of nonprivate method is around 0.75 and MSE of nonprivate method is around 1. When , MAE of DPM is slightly less than 1.5 and MAE of DPI is around 1. Thus, the error ratio between two DP methods is probably less than 1.5. When , MSE of DPM is around 3 and MSE of DPI is around 1.5. Thus, the error ratio between two DP methods is approximately 2. When 2, the error curves of the DP methods decrease greatly. When 3, these error curves keep going down steadily. When weaker privacy guarantee is acceptable, that is 5, the prediction accuracy of the DP methods can get close to that of the nonprivate method. The errors decrease with the increase of , which is consistent with the analysis of noise data in Table 1.

The experimental results of three algorithms are similar, and DPM-Avg has a slightly better effect than other DPM algorithms. This is mainly due to the noise error caused by perturbing the similarity matrix in Algorithm 1. Sequentially, the algorithms Pcc and Avg continue to run on two larger datasets, and the experimental results are shown in Figures 1522. When , MAE of DPI is slightly greater than 1 and MSE of DPI is close to 2. Thus, the error of DPI gets a little larger at this point. Beyond that, the trend of all error curves remains almost unchanged. Although four datasets of different sizes are tested, the number of selected users has little influence on the prediction errors. In conclusion, both perturbation methods enable better scalability and obtain consistent experimental results for larger datasets. The proposed framework is feasible and competent to make predictive recommendations while guaranteeing DP.

5. Conclusion

In this paper, we addressed the problem of differentially private collaborative filtering based on existing algorithms. The proposed framework includes two perturbation methods by adding Laplace noise. The solution is actually a way of dealing with data, which may be applicable to other more advanced CF algorithms. However, the potential problem is that privacy budget may be larger, which may make the promised privacy protection ineffective. The experimental results showed that the number of selected users has little influence on the prediction errors. We can consider analyzing the problem from different perspectives. On the one hand, it seems enough to take a small amount of data to calculate such predictive scores. And on the other, the amount of noise required could be tolerable when the number of users grows. Therefore, the proposed framework is feasible and competent to make differentially private recommendations.

For commercial applications, it is worthy to study more valuable recommender system for social services. We will explore how to make predictive recommendations combining three degrees of influence, which is a strong connection that can trigger users’ behaviors in the social networks. To address privacy concerns, we consider employing local differential privacy [28], a stronger variant, to eliminate users’ worries in data collections. It dates back to randomized response technique (RRT for short) [29], which can provide plausible deniability for the individual ratings. The collected datasets then could be used without extra privacy protection. Finally, the performance of new recommender system requires a lot of experiments for further verification.

Data Availability

The data used to support the findings of this study can be accessed from https://grouplens.org/datasets/movielens/ without any restrictions.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This research was partially supported by National Natural Science Foundation of China (Nos. 61672179, 61370083, and 61402126), Natural Science Foundation of Heilongjiang Province (No. F2015030), Youth Science Fund of Heilongjiang Province (Nos. QC2016083, QC2017079), Postdoctoral Fellowship of Heilongjiang Province (No. LBH - Z14071), and the Fundamental Research Funds in Heilongjiang Provincial Universities (Nos. 135109245, 135109314).