Security and Privacy Protection of Social Networks in Big Data EraView this Special Issue
Research Article | Open Access
Zhengzheng Xian, Qiliang Li, Gai Li, Lei Li, "New Collaborative Filtering Algorithms Based on SVD++ and Differential Privacy", Mathematical Problems in Engineering, vol. 2017, Article ID 1975719, 14 pages, 2017. https://doi.org/10.1155/2017/1975719
New Collaborative Filtering Algorithms Based on SVD++ and Differential Privacy
Collaborative filtering technology has been widely used in the recommender system, and its implementation is supported by the large amount of real and reliable user data from the big-data era. However, with the increase of the users’ information-security awareness, these data are reduced or the quality of the data becomes worse. Singular Value Decomposition (SVD) is one of the common matrix factorization methods used in collaborative filtering, which introduces the bias information of users and items and is realized by using algebraic feature extraction. The derivative model SVD++ of SVD achieves better predictive accuracy due to the addition of implicit feedback information. Differential privacy is defined very strictly and can be proved, which has become an effective measure to solve the problem of attackers indirectly deducing the personal privacy information by using background knowledge. In this paper, differential privacy is applied to the SVD++ model through three approaches: gradient perturbation, objective-function perturbation, and output perturbation. Through theoretical derivation and experimental verification, the new algorithms proposed can better protect the privacy of the original data on the basis of ensuring the predictive accuracy. In addition, an effective scheme is given that can measure the privacy protection strength and predictive accuracy, and a reasonable range for selection of the differential privacy parameter is provided.
The Internet has been widely used since the birth of Web 2.0, and the human lifestyle has been greatly changed. When a user opens a shopping website or a mobile terminal application, a very enthusiastic recommender system will list some commodities in which he or she may be interested based on the purchase history record, browser footprint, evaluation information, and so forth. Today, there are numerous intelligent applications such as those. If the value of implicit feedback information such as historical browsing data, historical rating data, and the evaluation timestamp can be fully exploited, the predictive accuracy could be improved further. The Singular Value Decomposition (SVD) model  is a kind of common collaborative filtering method to provide personalized recommendation services, and the predictive accuracy can be improved by considering the user and item bias information. As a derivative model of SVD, the SVD++ model [2–4] achieves better recommendation accuracy by adding implicit feedback information, such as movies that a user has evaluated, and the specific value of the score does not matter for this kind of information.
While the Internet has brought much convenience to users, their daily medical, transportation, purchase, and Internet browsing information, which is neglected by the users themselves, will all be recorded to become data resources for Internet companies to identify further business opportunities and benefits. Meanwhile, there is also a risk of leakage of personal privacy information because the information is collected. In recent years, the issue of leakage of personal privacy information triggered by the Internet has arisen frequently. For example, in the Netflix Prize competition, the Netflix Corporation released a dataset through anonymous processing. However, researchers from the University of Texas were able to deduce the real Netflix users by linking the rating and timestamp in this dataset with public information on Internet Movie Database (IMDB). As another example, in 2012, an American college student was recognized as homosexual by his roommate. His roommate used a network to search for the frequency of access to homosexual forums and websites. Collaborative filtering based on items that are related in a transaction performed by a user will lead to the increase in similarity with this user’s previous commodity transactions. Thus, an attacker can track similar commodity lists related to the target user (attack target) and then determine what is a new commodity. When a similar commodity appears in these lists, the attacker can deduce the item to be added to the target user’s records. Thus, what can be obtained through indirect derivation of the personal privacy information is increasingly considered.
In 2006, Dwork  proposed differential privacy (DP), and it can solve the issues of leakage of personal privacy information by relating to the background knowledge mentioned above. It has a very strict definition and has nothing to do with background knowledge, so it can fundamentally solve the defects of the traditional privacy protection model and is an effective way to remove the possibility of leakage of personal privacy information from the data source. Although DP has been researched for 10 years, the major research achievements are academic theories. The Apple corporation has always claimed that the user’s privacy should be the top priority. This year, at the Worldwide Developers Conference (WWDC2016), Apple proposed the application of DP to collect and analyse user data from the keyboard, Spotlight, and Notes in iOS 10. Its goal is to ensure that the Quality of Service (QoS)  will not be affected and that the user’s personal information will not be leaked. This measure opens up new pioneering work on DP in the application layer.
Today, it is quite urgent in the field of data mining to improve QoS and ensure the security of personal privacy information, eliminating users’ worries and providing true and reliable data in order to guarantee the production of effective knowledge and rules [7, 8].
The contributions of our work are summarized as follows. First, we propose three new methods that apply differential privacy to SVD++ through gradient perturbation, objective-function perturbation, and output perturbation. Second, rigorous mathematical proofs are given to ensure that they all maintain the differential privacy. Third, we compare the predictive accuracies obtained by our differential privacy algorithms for SVD++ with those of the same methods for SVD and related methods in the literature on two real datasets and the method of objective perturbation for SVD++. Results show that our methods obtain better results in terms of balancing privacy and prediction. Finally, we propose a scheme for selection of DP protection parameter in order to balance the strength of privacy and the predictive accuracy, and a reasonable range of DP parameter could be obtained by this scheme.
The remainder to the paper is organized as follows. Section 2 surveys some works related to private-preserving in recommender systems. Section 3 introduces the SVD++ model and DP model. Section 4 presents the three new methods, which apply DP to SVD++ using gradient perturbation, objective-function perturbation, and output perturbation. Section 5 presents the experimental evaluation of each method on two real datasets. Finally, Section 6 summarizes the key aspects of our work and briefly addresses the directions for future work.
2. Related Work
The privacy protection of recommender systems became a popular research topic when Canny  proposed that the recommender not use the user’s data for financial benefit in 2002. It is a hot topic in research to apply DP to personalized collaborative filtering technology since DP is considered to be the best privacy protection technology. McSherry and Mironov  applied DP to collaborative filtering first, and the main idea of the paper was to use the Laplace mechanism to compute a differential private item-to-item covariance matrix, which was used to find neighbours and compute the SVD recommendation. However, it seems unreasonable that there is less contribution to the covariance when a user’s buying activity increases. Zhu et al.  addressed the privacy issues in the context of neighbourhood based CF methods by proposing a Private Neighbour Collaborative Filtering (PNCF) algorithm. Hua et al.  first proposed that recommenders who are not trusted should be prevented from using a user’s ratings, while allowing the user to leave or join in the matrix factorization (MF) process and then realizing DP protection by disturbing the objective function of MF. Liu et al.  proposed a method that applied DP to Bayesian posterior sampling by Stochastic Gradient Langevin Dynamics (SGLD), thus avoiding the influence of the Gaussian noise on the whole parameter space. Zhu and Sun  proposed Differentially Private Item-Based Recommendation and Differentially Private User-Based Recommendation and designed a low-sensitivity metric to measure the similarities between both items and users. Yan et al.  proposed a socially aware algorithm called DynaEgo to improve the performance of privacy-preserving collaborative filtering. DynaEgo utilizes the principle of DP as well as the social relationships to adaptively modify the users’ rating histories to prevent exact user information from being leaked. Javidbakht and Venkitasubramaniam  proposed using DP as a metric to quantify the privacy of the intended destination, and optimal probabilistic routing schemes are investigated under unicast and multicast paradigms. Balu and Furon  proposed using sketching techniques to implicitly provide DP guarantees by taking advantage of the inherent randomness of the data structure, and this approach is well suited for large-scale applications. Berlioz et al.  applied DP to the latent factor model for each step of MF; however, they did not provide rigorous mathematical proofs and need to do some preprocessing of the raw data; thus, the experimental results showed that a large DP parameter is needed to obtain good predictive accuracy.
Chaudhuri et al.  proposed general techniques to produce privacy-preserving approximations of classifiers learned via (regularized) Empirical Risk Minimization (ERM). They proposed an output perturbation and objective-function perturbation based DP model but these methods were applied to logistic regression and SVM in . Based on the above works, the SVD++ model, which is a derivative model of SVD, is the research object, and three new algorithms that apply DP to SVD++ using gradient perturbation, objective-function perturbation, and output perturbation are proposed. To improve the predictive accuracy, SVD++ considers the related information of the user and item. The theoretical proofs are given and the experiment results show that the new private SVD++ algorithms obtain better predictive accuracy, compared with the same DP treatment of traditional MF  and SVD.
The DP parameter is the key to the privacy protection power, but in the current study, it was selected by experience. Finally, an effective trade-off scheme is given that can balance the privacy protection and the predictive accuracy to a certain extent and can provide a reasonable range for parameter selection.
3.1. SVD++ Model
The “user-item” rating matrix is the core data used by the recommender system. MF is a good method of predicting the missing ratings in collaborative filtering. In brief, MF involves factorizing a sparse matrix and finding two latent factor matrices: the first is the user matrix to indicate the user’s features (i.e., the degree of preference of a user for each factor) and the other is the item matrix, which indicates the item’s features (i.e., the weight of an item for each factor). The missing ratings are then predicted from the inner product of these two factor matrices.
Let be a rating matrix containing the ratings of users for items. Each matrix element refers to the rating of user for item . Given a lower dimension , MF factorizes the raw matrix into two latent factor matrices: one is the user-factor matrix and the other is the item-factor matrix . The factorization is done such that is approximated as the inner product of and (i.e., ), and each observed rating is approximated by (also called the predicted value). However, only captures the relationship between the user and the item . In the real world, the observed rating may be affected by the preference of the user or the characteristics of the item. In other words, the relationship between the user and the item can be replaced by the bias information. For instance, suppose one wants to predict the rating of the movie “Batman” by the user “Tom.” Now, the average rating of all movies on one website is 3.5, and Tom tends to give a rating that is 0.3 lower than the average because he is a critical man. The movie “Batman” is better than the average movie, so it tends to be rated 0.2 above the average. Therefore, considering the user and movie bias information, by performing the calculation , it is predicted that Tom will give the movie “Batman” a rating of 3.4. The user and item bias information can reflect the truth of the rating more objectively. SVD is a typical factorization technology (known as a baseline predictor in some works in the literature). Thus, the predicted rating is changed towhere is the overall average rating and and indicate the observed deviations of user and item , respectively.
The goal of a recommender system is to improve the predictive accuracy. In fact, the user will leave some implicit feedback information, such as historical browsing data, and historical rating data, on Web applications as long as any user has rated item , no matter what the specific rating value is. To a certain extent, the rating operation already reflects the degree of a user’s preference for each latent factor. Therefore, the SVD++ model introduces the implicit feedback information based on SVD; that is, it adds a factor vector () for each item, and these item factors are used to describe the characteristics of the item, regardless of whether it has been evaluated. Then, the user’s factor matrix is modelled, so that a better user bias can be obtained. Thus, the predictive rating of the SVD++ model iswhere is the number of items rated by user .
To obtain the optimal and , the regularized squared error can be minimized as follows. The objective function of the SVD++ model iswhere is the regularization parameter to regularize the factors and prevent overfitting.
With regard to , , and , two methods can be used : fast empirical likelihood estimation (i.e., formula (4)) and Stochastic Gradient Descent (SGD). Considering the rate of convergence and the influence of the error in each iteration, the first method is used in this paper.
In formula (4), when , the value of will be 1; otherwise, it will be 0. In addition, averages tend to zero using the regularization parameters , , and , which are determined by cross-validation.
SGD and Alternating Least Squares (ALS) are two common optimization algorithms used to solve the objective function (formula (4)). The SGD algorithm is a combination of randomness and optimization and does not need to calculate the exact value but uses unbiased estimation.
Stochastic Gradient Descent. Let represent the error between the true and the predicted values (i.e., ). is any element of the user matrix , is any element of the item matrix , and the error of SVD++ can be expressed as . In SGD, the factors are learned by iteratively evaluating the error for each rating , and the user and item vectors are updated by taking a step in the direction opposite to the gradient of the regularized loss function. Then, the updating rules for both and can be formulated as follows:where constant is the learning rate and can determine the rate of error minimization.
Alternating Least Squares. In ALS, the optimization problem can be solved iteratively. One latent matrix (say ) in each iteration is fixed and then the objective function of SVD++ (formula (3)) is converted into a convex optimization problem, where the solution (say ) can be found efficiently. Similarly, another latent matrix can be found in the same way. Finally, these steps are repeated until convergence is achieved.
3.2. Differential Privacy
The privacy protection of the collaborative filtering algorithm needs not only to reduce the risk of leaking the private information from the original data but also to ensure the availability of data. DP defines an extremely strict attack model and provides a rigorous, quantitative representation and proof of the risk of leakage of private information. The amount of background knowledge that the attacker has does not matter since DP protects information of the user’s potential privacy by adding noise in order to prevent the attacker from inferring the user’s protected information even if the attacker knows other information. The attacker does not know whether certain user information exists in the original dataset. Because DP can result in recommendation results not related to the information in the original dataset, DP is applied to the recommender system based on collaborative filtering to prevent indirect deduction of personal private information.
Definition 1 (-differential privacy). Given any two adjacent “user-item” rating matrices and , which differ by at most one score, if any possible output result satisfies formula (6), the random algorithm provides -differential privacy.where is the probability that private information will be disclosed and is controlled by the randomness of algorithm ; it is independent of the background knowledge of the attacker. Parameter is used to indicate the strength of privacy protection, where a smaller value indicates a higher strength of privacy protection. In addition, the two rating matrices differ by at most one score and can also be understood as two matrices that differ by at most one record of a user.
The key technology of DP protection is to add noise that satisfies the Laplace or exponent mechanism . The former is applied to the results for numerical protection and the latter is applied for nonnumerical protection. The amount of noise is related to the function’s sensitivity and the privacy protection parameter . The sensitivity of the function is that the maximum difference in the output results comes from two datasets that differ by only one record. The sensitivity is divided into global sensitivity and local sensitivity. The former is determined by the function itself and different functions will have different global sensitivities. The latter is determined by the specific given dataset and the function itself. The formal definition of global sensitivity, the Laplace mechanism, and the two composition properties of DP are given as follows.
Definition 2 (global sensitivity). Given any two adjacent “user-item” rating matrices and that differ by at most one score, for any function , the -global sensitivity of function is where is the dimension of function , is the predicted value of item , and denotes the -norm.
If the global sensitivity of the function is too large to compute the average, median, and so forth, enough noise must be added to protect the privacy, but this will lead to the reduction in the availability of data. To address this problem, Nissim et al.  proposed the local sensitivity. In this paper, global sensitivity is adopted because the sensitivity of our function is small.
Dwork et al.  demonstrated that the Laplace mechanism could be used to obtain -differential privacy. The main idea is to add noise sampled from a Laplace distribution with a calibrated scale . The probability density function of the Laplace distribution with mean 0 and scale is
In this paper, it is denoted as .
Theorem 3. Given any two adjacent “user-item” rating matrices and that differ by at most one score, for any function (its global sensitivity is ), if the random noise , and the algorithm satisfythe algorithm provides -differential privacy.
This work also relies on the -norm mechanism , which makes it possible to calibrate noise to the -sensitivity of the evaluated function.
In this paper, the outputs of the new privacy algorithms are all numerical, so the Laplace mechanism is used to achieve DP.
Composition. Usually, a complex privacy-preserving problem requires DP protection technology to be applied multiple times. In this case, in order to ensure that the privacy protection level of the whole process is controlled within the budget given by the privacy protection parameter , two important composition properties of DP itself are required. One is the sequential composition property, and the other is the parallel composition property . The sequential composition property ensures that multiple random algorithms are distributed in a DP budget (like ), and each algorithm maintains -differential privacy. For the same dataset, the composition algorithm of these algorithms will maintain the sum of the total privacy budget DP (i.e., it will maintain -differential privacy). The parallel composition property means that, for a disjoint dataset, the composition algorithm of these algorithms will maintain the maximum total privacy budget DP (i.e., it will maintain -differential privacy).
4. Privacy-Preserving SVD++
The intuitive idea is that, after using traditional MF to solve this problem, there should be some latent features that determine how a user rates an item. However, if an attacker has some background knowledge, he or she can obtain the user’s private data from the original rating matrix. For example, an attacker can infer that a user likes certain types of movies, but the user does not want other people to know this. Thus, our goal is to protect the raw rating matrix by using DP reasonably. The main idea of SVD++ is to analyse the user’s preference for each factor and the extent to which the film contains the various factors from the observed ratings and some implicit feedback from users and then to predict the missing score. In this paper, considering the fact that SVD can obtain good predictive accuracy, we apply DP to SVD++ flexibly. Similarly, to the traditional MF, the SVD++ process can also be divided into the following four stages:(i)Inputting of the original rating matrix(ii)SVD++ factorization process by SGD or ALS(iii)Outputting of the user characteristic matrix and the item characteristic matrix(iv)Rating prediction (i.e., recommendation)
In [9, 10], DP was applied to these four stages and it was necessary to perform some preprocessing of the original matrix. The work of  was an extension of , and several algorithms in these two works are the same. Compared with [9, 10], our algorithms have three advantages. The first is that our algorithms do not perform any preprocessing with DP in order to ensure the availability of the original data. The second is that our algorithms adopt SVD++ to achieve MF because the SVD++ model considers the user and item biases and implicit feedback information of users in order to improve the recommendation accuracy. The third is that the objective perturbation of ALS for SVD++ comes from the idea of  and obtains better experimental results on two datasets than [9, 10].
4.1. SGD with Gradient Perturbation for SVD++
SGD with gradient perturbation for SVD++ applies DP to the error of each iteration in the SGD optimization algorithm. For a detailed description of the process, see Algorithm 1.
For Algorithm 1, a few explanatory points need to be stated as follows:(1)To constrain the effect of noise, the obtained error can be to a range (in our experiments, we let and due to the experimental rating being between 1 and 5).(2)The number of gradient descent iterations should be given in advance.(3)According to the sequential composition property of DP, the noise at each iteration is calibrated to maintain -differential privacy so that the overall SVD++ maintains -differential privacy after iterations.
Theorem 4. Given the differential privacy parameter and the maximum value () and minimum value () in the “user-item” rating matrix, set and let the rating error in each iteration be ( is the raw rating and is the predictive rating). If the noise vector is , then Algorithm 1 provides -differential privacy after iterations.
Proof. First, the error and the global sensitivity of the error () have the largest difference between ratings, so .
Second, in iterations, if the differential privacy is , then the budget allocated at each iteration should be .
Third, is a noise vector that is added to in each iteration and its probability density is . According to the Laplace mechanism, the new error becomes . Therefore, the error in each iteration maintains -differential privacy.
Finally, according to the sequential composition property of DP, Algorithm 1 provides -differential privacy (i.e., it provides -differential privacy) after iterations.
4.2. Private-Preserving ALS for SVD++
Two new approaches were proposed in , namely, objective perturbation and output perturbation using DP for the design of privacy-preserving algorithms, and then they were applied to logistic regression and SVM. Specifically, experimental results showed that the results of objective perturbation are optimal when balancing privacy protection and predictive accuracy. In this subsection, this approach is applied to the ALS optimization algorithm of SVD++. Algorithm 2 describes the process of ALS objective perturbation and Algorithm 3 describes the process of ALS output perturbation.
In the SVD++ model, considering the user’s bias, the item’s bias, and the rating information to which the user has contributed in which the user has taken part, then the predicted rating is changed to
(see Section 3.1). The basic principle of ALS for solving SVD++ can be seen in Section 3.1. According to the principle of ALS, the raw objective function (formula (3)) becomes two convex optimization problems as follows:where and are subsets of raw and
Then, the main idea of Algorithm 2 is to add noise to the objective function; that is,where is a noise vector with components and is the number of features of or . To solve the convex optimization problem, the idea of ERM  is used. So, from formula (13), we can obtain
According to Algorithm of , the regularization terms and avoid overfitting after perturbation, where is determined by the privacy parameter and the slack term parameter .
The ALS objective functions for SVD++ are convex and differentiable, so they satisfy the application conditions of Algorithm of . In this paper, our Algorithm 2 describes the DP protection process of ALS objective perturbation to solve for the latent factors of SVD++.
Regarding Algorithm 2, a few explanatory points should be stated as follows:(1)First, to deduce and compute the value of parameter in steps () and (), the value of is set to 2. The specific deduction process is similar to the deduction applied in logistic regression (Corollary ) and SVM (Corollary ) from .(2)To solve for the values of and after objective perturbation, that is, to solve for the partial derivatives of formulas (14) and (15), respectively, where indicates the number of users and indicates the number of items in the raw matrix, the key steps are as follows.
When and , we can obtain
Then, we have where , and is a identity matrix.
Then, fixing and solving , we have
Similarly, given a fixed , when , we can solve as follows:where , .
Theorem 5. Given the differential privacy parameter and the parameter for computing the slack term , if , , and the loss functions of ALS are convex and differentiable, Algorithm 2 provides -differential privacy.
Proof. Our Algorithm 2 satisfies the application condition of Algorithm in , which was proven to provide -differential privacy; thus our Algorithm 2 also provides -differential privacy.
Another privacy-preserving ALS algorithm of SVD++ is the ALS output perturbation method, which is shown in Algorithm 3.
In the objective function of ALS (i.e., formula (11)), each user vector and item vector can be obtained by solving the following risk minimization problem:The main idea of Algorithm 3 is that it guarantees DP by adding a random noise vector to the output of and .
Regarding Algorithm 3, a few explanatory points should be stated as follows:(1) and are the upper bounds on and , respectively; . Because and are the -sensitivity values, their global sensitivities can be obtained as and .(2)According to the Laplace mechanism, for a fixed matrix , a random noise vector with the pdf is generated. For a fixed matrix , a random noise vector with the pdf is generated.(3)For the ALS objective function of SVD++ (formula (11)), we have Corollary 6 and Theorem 7 as follows.
Corollary 6. Let refer to the rating of user for item . The predictive rating in SVD++ is . is differentiable and -strongly convex and the loss function is convex and differentiable with . Then, the -sensitivity of is at most .
Proof. Let there be two rating matrices that differ in the value of the last entry:Moreover, letSecond, due to the convexity of and the -strongly convexity of , is -strongly convex.
In addition, due to the differentiability of and , and are also differentiable at all points. Then, we haveThen, the equation can be obtained. Hence, the -sensitivity of is less than or equal to . The proof now follows by an application of Lemma of .
Similarly, the -sensitivity of is at most .
Theorem 7. Let refer to the rating of user for item . The predictive rating in SVD++ is . and are differentiable and -strongly convex and the loss function is convex and differentiable with . Then, Algorithm 3 provides -differential privacy.
Proof. The proof of Theorem 7 follows from Corollary 6 and . (1) According to the proof of Corollary 6, if the conditions on and the loss function hold, the -sensitivity of with the regularization parameter is at most .(2) When is picked from the distribution , where , for a specific vector , the density at is proportional to (3)Let and be any two rating matrices that differ in the value of the last entry. Then, for any , we have