Abstract

With the popularity of big data, people get less useful information because of the large amount of data, which makes the Recommender System come into being. However, the privacy and accuracy of the Recommender System still have great challenges. To address these challenges, an efficient personalized recommendation scheme is proposed based on Federated Learning with similarity ciphertext calculation. In this paper, we first design a Similarity calculation algorithm based on Orthogonal Matrix in Ciphertext (SOMC), which can compute the Similarity between users’ demand and Items’ attributes under ciphertext with a low calculation cost. Based on SOMC, we construct an efficient recommendation scheme by employing the Federated Learning framework. The important feature of the proposed approach is improving the accuracy of recommendation while ensuring the privacy of both the users and the Agents. Furthermore, the Agents with good performance are selected according to their Reliability scores to participate in the federal recommendation, so as to further make the accuracy of recommendation better. Under the defined threat model, it is proved that the proposed scheme can meet the privacy requirements of users and Agents. Experiments show that the proposed scheme has optimized accuracy and efficiency compared with existing schemes.

1. Introduction

With the rise of big data, information overload makes people get more useless information. Therefore, Recommender System is gradually popular, the purpose of Recommender System is providing users with personalized online products or service recommendations, and Recommender System has become an important way to resolve information overload issues, which brings opportunities and challenges to education, medical and other industries. However, the current Recommender System has many potential risks, of which the privacy disclosure is one of the primary concerns [1]. In general, the Recommender System consists of two parts: recommender server and users. In order to get a superior model for recommendation, the traditional Recommender System uses a central architecture and usually collects a large amount of feedback information such as user preferences [2]. But the information is often sensitive to users and can lead to serious privacy and security risks: the users’ raw data may be disclosed from the feedback information from some programs [3]. For example, Recommender System only needs to obtain the records about users watching movies, and can infer some privacy information (e.g., age, income, medical history, etc.). In addition, Recommender System may collect users’ personal data and share it with a third party to obtain profits [4]. Once this information is abused, the consequences are unimaginable.

As a result, people are increasingly concerned about their data privacy, and they hope that their private information will not be known by Internet applications. In existing research, there are many methods to protect data privacy, such as anonymity, differential privacy, homomorphic encryption and Federated Learning (FL). Federated Learning is a popular tool to reduce privacy risks. Thus, FL has received increasing attention. Such as [5], it uses FL to protect users’ healthcare data privacy in the big data scenario. And in [6], based on the framework of FL, a recommendation scheme is proposed. The scheme defines multiple agents for collaborative recommendation, and the FL framework is used to ensure the users’ data privacy and to make the recommended items more reliable. However, in the system model of this scheme, the settings of the cloud are completely trusted, which is difficult to achieve in practice, and all the records are stored in the cloud in plaintext, so there is still a risk of exposing privacy.

In other aspects, the concept of similarity is often introduced in order to achieve better recommendations [7, 8]. Generally, in order to ensure the accuracy of recommended items, many Recommender Systems need to calculate two parameters [9]. One is the similarity between users’ demands and the attributes of recommended items, and the other is the users’ evaluations about recommendation items. For the former, the higher similarity implies that the items will be more appropriate for the needs of users. For the latter, it means that items will have higher recommendation priority. After receiving recomendation items, the users will evaluate the recommendation items, i.e., submit feedback scores. The Recommender System will collect these scores and calculate the reliability of the recommendation agents according to the evaluations.

Obviously, it is essential to protect users’ privacy in the current Recommender System. When users submit requirements or estimates, they wish their information to be protected. This is because the users’ demands and evaluations are usually related to the privacy of information [10]. To solve these problems, some feasible solutions were provided, such as [9, 11], but there are some problems: the encryption method is too complicated and the computation load is too heavy, and the server is set to be fully trusted, which is difficult to achieve in reality.

To address these above challenges, we first set the cloud to be semi-trusted, which means that it is possible to spy on users’ privacy. This makes our scheme have better practicability. Then we take the similarity as the criterion to select the recommendation items to improve the accuracy of recommendations. Next each user evaluates the recommendation items after receiving items, and the Cloud server evaluates the reliability of the recommendation agents according to the evaluations. Last, our scheme uses the FL framework to improve recommendation accuracy and protect users’ privacy. The comparison of our scheme and some studies about recommendation is discussed in Table 1.

In this paper, we focus on how to perform a more secure and effective recommendation process with a Federated Learning framework. Our contribution is summarized as follows:(i)We design a Similarity calculation algorithm based on Orthogonal Matrix in Ciphertext (SOMC), which can not only reduce the calculation overhead, but also ensure the privacy and security of users.(ii)Based on SOMC, we construct an efficient recommendation scheme by employing the Federated Learning model, named Efficient Recommendation Based on Federated Learning (ERBFL). ERBFL can securely aggregate the recommendation weight from the multiple Agents and calculate the similarity between users’ demand and Items’ attributes under ciphertext, so as to improve the accuracy of recommendation while ensuring the privacy of both the users and the Agents. Moreover, the Agents with the good performance are selected according to their Reliability scores to participate in the federal recommendation, so as to further make better the accuracy of recommendation.(iii)Under the defined threat model, we prove the proposed scheme is to meet the privacy requirements of users and Agents. In addition, we conduct experiments that show our scheme has optimized accuracy and efficiency compared with existing schemes.

The rest of this paper is divided into six parts. State-of-the-art solutions about the privacy protection problem of Recommender System are described in Section 2. Following this we present the preliminaries of proposed work in Section 3. ERBFL scheme is discussed in detail in Section 4, and its security analysis are presented in Section 5. Comprehensive performance evaluation is given in Section 6. Conclusions are drawn in Section 7.

Recent years, Recommender System helps to solve the problem of information overload while providing personalized information retrieval [14]. However, in order to improve the recommendation efficiency, the system requires the personal information of users, which is a serious privacy concern for many users [15, 16]. Recent research indicates that there are two methods to solve the privacy problem of the Recommender system: Architecture-based and Algorithms-based [17].

Architecture-based solutions usually exploit distributed data storage to minimize the threat of data leakage, but some existing schemes [18, 19] still have the risk of leaking users’ privacy and increasing the computing burden of local devices. Algorithms-based solutions are different from them, usually utilize an encryption algorithm to protect the original sensitive data. Several studies have revealed that the encryption-based solution can reduce the risk of leaking users’ privacy [17]. At present, some researchers have proposed various solutions to the privacy protection recommendation system, such as [2022]. Erkin et al. [20] prevented the malicious users and server from accessing privacy-sensitive data by using homomorphic encryption. Kaur et al. [21] introduced arbitrary distributed data based on two techniques: multi-party random masking and polynomial aggregation, which can improve the efficacy of recommender system and protect users’ privacy. Ma et al. [22] realized a friend recommendation in a way of protecting privacy by utilizing social attributes and trust relationship about online social network users. However, these schemes all cause heavy computational overhead.

Besides, to ensure the accuracy of the recommendation items, the current Recommender System will take the similarity as the reference parameter of recommendation. In [23], the similarity between users is calculated to help get more accurate recommendation results. In [24], the similarity between users and items is calculated. To further improve the accuracy of recommendations, asymmetric similarity method was mentioned in [25], where weighted schemes made recommendations more accurate, taking into account the varying degrees of influence of each factor. But that will raise privacy issues. This is due to the similarity calculation in the recommendation process will involve the user’s privacy information, the recommender server may be able to know the users’ personal interests, which contain sensitive information. Some studies have proposed solutions to this problem, such as [12, 13, 26]. Li et al. [26] proposed a privacy-preserving scheme to implement a contextual recommendation for online social communications, which can hide users’ real identities from other users by using pseudonyms. Zhang et al. [12] used BGN Cryptosystem to protect users’ privacy in the recommendation process, which utilized homomorphic property to calculate the similarity between two users in the ciphertext domain. However, the scheme causes heavy computing burden on the user side, in which the computation of bilinear pairs is used. In addition, Xu et al. [13] focused on similarity and evaluation of truth to propose a privacy-preserving recommendation scheme. However, it used a centralized framework so that it is unsuitable for the scenario with multiple recommenders. FL can solve this problem very well. Many existing studies utilize FL to achieve collaboration while protecting data privacy. In [27], a scheme called EPPDA was proposed, which can resist the reverse attack by utilizing secret sharing and an efficient privacy-preserving data aggregation method for FL.

In order to address the above privacy problems and improve accuracy, Federated Recommendation System(FedRec) is proposed, the goal of FedRec is to collaborate with multiple parties to complete the more efficient and accurate recommendation process without directly accessing each other’s private data [4]. To perform the recommendation process in multiple data-owners scenarios, Zhou et al. [6] implemented a privacy-preserving contextual recommendation which used Federated Learning framework to protect users’ privacy and increase efficiency of recommendation. However, in the scheme, the center server is set to be fully trusted, which is hard to meet in practice [28], and the center server processes the sensitive data in plaintexts, which raises privacy risk [29, 30]. For instance, some malicious Cloud servers may apply gradient inference attacks or model inversion attacks to damage the client’s sensitive data, as described in [31, 32]. Hence, it is impractical that schemes are under the assumption that the Cloud server is full-trusted.

3. Preliminaries

3.1. Orthogonal Matrices

Orthogonal matrix is an important type of matrix [33]. When the product of a matrix and its transpose gives the value of the identity matrix, the matrix is called an orthonormal matrix.These matrices are useful in many scientific applications related to vectors.

Suppose and are orthogonal matrices which can also be represented by , where . The symbol represents an infinitely large set of vectors, where each vector has components. Mathematically, , where is the transpose symbol. is a orthogonal matrix, if and only if:

Meanwhile, let and are the transpose matrices of and , respectively. Then they have the properties as follows:(i) , ;(ii) , ;(iii) ;(iv);

where E is the identity matrix of the order . is the inverse of matrix .

Interestingly, the inverse and transpose of an orthogonal matrix are the same that make the inverse operation of the orthogonal matrix simple and fast. In light of this, it is efficient to use orthogonal matrices as cipher keys.

3.2. Differential Privacy

Differential privacy (DP) is a promising technology which can solve privacy problems [34]. We have adopted the local version of DP for users, which can protect users’ privacy under untrusted Cloud server and Agents. We introduce several definitions about differential privacy as follows [35]:

Definition 1. (-Differential Privacy, -DP). A random mechanism is said -indistinguishable if for all pairs which differs in only one entry, for all adversaries , and for all transcripts t, where is p-dimensional vector data set:In the definition, a privacy parameter is predefined in order to control the privacy budget, which means that if is smaller, the privacy protection will be stronger.

Definition 2. (-Sensitivity). Assume that f is a numeric query function, and f maps a data set into a p-dimensional real space such as . For any pair of adjacent data sets and , the sensitivity about f is defined as:where denoted the norm.
Theorem (Random-Laplace Mechanism). Let , and f is a numeric query function which maps a p-dimensional domain p-dimensional real space , such as . The mechanism randomly selects elements in vector , where satisfies the following condition: .provides -Differential Privacy, where selected subscript j: . Other is drawn from the Laplace distribution with scaling parameter b, where b’s density function iswhere is bound by the privacy budget and the sensitivity of function .

3.3. Digital Signature

Digital Signature is used to prevent information from being tampered with [36]. Digital Signature is composed of the algorithms defined as follows:

: It takes as input , and outputs a key pair , where is a security parameter.

: It takes as input the private key and the message , and outputs the signature .

: It takes as input the public , the message , and the signature , and it outputs 0 if the signature is invalid and 1 otherwise.

3.4. Federated Learning

In order to improve the accuracy of predicting users’ next input, Google built a horizontal federated model, and Google first proposed a concept of Federated Learning (FL) in 2017 [37]. FL is a distributed deep learning framework, and it allows multiple clients such as IoT devices and mobile devices to train the model, but the sensitive privacy data in the device remains local, the joint model trained by the server is sent back to each client, and the client continues to learn the new model. The process iterates to get the optimal model [38]. Therefore, the privacy of the client has been protected to a certain extent.

4. Design of Proposed Scheme

Before introducing the proposed scheme, we first give the basic symbols involved in the scheme. We define is ’s user group, is a set of user groups: where is the size of Agents. Note that here is a subgroup of based on geographic location, in which is a size of users included in . For each user , we define the vector as ’s demand vector, where refers to an attribute about ’s demand.

Each has a corresponding , and holds a Item set where is the size of ’s Items. We define the vector as ’s attribute vector, where refers to an attribute about . Besides, ’s recommendation weight matrix consists of weight parameters , denoted as which is a diagonal matrix. Notes: each represents the impact factor of the -th attribute on the recommendation item. According to the property of diagonal matrix, we can know .

4.1. System Architecture

As shown in Figure 1, our system consists of four components as follows: users, Agent, Cloud server and Trusted authority (TA).(i)User: Considering our system, the user submits an encrypted demand vector to Agent, and makes the recommendation request. After receiving the recommended Item, the user sends an encrypted feedback score to Agent.(ii)Agent: Agent has its own user group and Item set. It can recommend appropriate Items to users belonging to its user group, after receiving requests the users send.(iii)Cloud server: It is considered to be honest but curious. It is responsible to calculate similarity between users’ demand and Items’ attributes, and calculate Agent’s Reliability score after receiving request. It will follow the proposed scheme for executing requests received. But we do not exclude the possibility that it will disclose the privacy of users. Additionally, the server is not allowed to collude with Agents.(iv)Trusted authority (TA): It is a fully trusted third party and responsible to generate and distribute keys to users, Agents and Cloud server.

Specially, all users submit their requests for recommendation to Agents in ciphertext, and the Agents need to cooperate with the Cloud server for recommendation. According to the Agents’ Reliability scores, the server judges whether it needs to select several Agents to form a group for federal recommendation. After recommendation finished, the users will the give encrypted feedback scores to Agents, then the Agents forwards to the Cloud server, and the Cloud server updates the Agents’ Reliability scores.

4.2. Similarity Calculation Algorithm Based on Orthogonal Matrix in Ciphertext

In order to improve the accuracy of recommendation items, we need to calculate the similarity between users’ demand and items’ attributes. Meanwhile, to strengthen the privacy preservation of recommendation, users’ demand and items’ information should be compared under ciphertext form. Therefore, we propose an efficient Similarity calculation algorithm based on Orthogonal Matrix in Ciphertext (SOMC), which can protect sensitive information privacy by lightweight encryption. The SOMC comprises the following three algorithms KeyGen, Enc, and Eval, and detailed as follows.

SOMC.KeyGen : It takes as input where is the dimension of the matrix, and outputs the secret keys and :(i)It first takes as input , generates orthogonal matrices and as follows:where .(ii)Then, generating and according to the following formula: and.

SOMC.Enc : It takes as input secret key and plaintext , where is or , and is a matrix (weight matrix) or a -dimensional vector (demand or item vector), and outputs the ciphertext .The encryption process is as follows:

SOMC.Eval : On input the secret key , the ciphertext and , where are obtained by algorithm SOMC.Enc and SOMC.Enc, the similarities between (item vector) and (demand vector) as output. Note that here and are -dimensional vectors. The calculation process is as follows:

Therefore, we can calculate the similarity between and without obtaining the plaintext about them.

4.3. Efficient Recommendation Scheme Based on Federated Learning
4.3.1. System Initialization

In this phase, it is divided into two parts, firstly TA generates and distributes key, and then the Agents encrypt attribute information. In order to facilitate description, we take and users as an example to illustrate.

Firstly, TA uses the algorithm SOMC.KeyGen described in the previous section to generate keys and system parameters as follows:(i)Step 1: TA exploits algorithm SOMC.KeyGen to generate , and then distributes {} to ; {} to the Cloud server; {} to each users by secure channel. Note that here for and , if , then , . Beside, each has the same .(ii)Step 2: TA generates unique identification for each Agent and each Item. We define as ’s identification and as for Item .(iii)Step 3: On input where is the security parameter, TA runs the signature algorithm to generate . Then, TA distributes to ; to Cloud server. And then TA public .(iv)Step 4: TA sets each ’s Reliability score to 0 and its update times to 0.

Secondly, has own Item set , generates the encrypted Item information for :and then upload to the Cloud server for storage.

4.3.2. Item Recommendation

In this subsection, we introduce how to recommend a suitable Item to a user when the Agent cooperates with the Cloud server. The whole process can be divided into three parts: (1) User sends demand; (2) The Cloud chooses Agents; (3) Item choosing and recommendation. The pseudo-code of the Item Recommendation is given in Algorithm 1.

Input: , , , , , , and
Output:
(1)The Cloud server does:
(2)if then
(3)ifthen
(4);
(5) end
(6) else
(7)  while (the number of collected !=j) do
(8)   choose Agents which satisfy:
   collecting from ;
(9)  end
(10)  fordo
(11)   calculate following equation (13);
(12)  end
(13)  calculate following equation (14);
(14) end
(15) calculate following equation (15)–(17);
(16) choose : ;
(17) Calculate ;
(18)end
(19)The Agent does:
(20)if then
(21)SOMC.Enc;
;
 send to the Cloud server
(22)end
(23)return

(1) User sends demand. To avoid the disclosure of user needs that contain a large amount of private information, user will exploit SOMC.Enc to encrypt message :

To resist brute-force exhausted attack [39], we apply a Laplace mechanism [35] by adding noise to . Although this noise will cause an error, we will experiment later to prove that the error is in the acceptable range. First, randomly selects elements in to form a set , where satisfies the following condition: and then adds a Laplace noise:where is Laplace parameter. To simplify, we express the noise as a vector . We can get . Then generates a signature .

Afterward, user submits to .

After receiving , exploits SOMC.Enc to encrypt weight matrix:and sends and a request for collaboration about recommendation to the Cloud server.

(2) The Cloud chooses Agents. After receiving ’s request, the Cloud server verifies the signature by performing the algorithm . If it fails, the Cloud server refuses the request; otherwise, should determine whether is eligible to make a recommendation alone or federated recommendation according to its Reliability score . The higher , the more credible the items recommends.

If , where is a threshold set by TA, the Cloud server considers that can recommend credible items to users alone, then go straight to 3) Item choosing and recommendation. Otherwise, the Cloud server needs to choose some other Agents to participate in the federal recommendation. Specific steps are as follows:(i)Step 1: When , the Cloud server sorts all Agents according to their Reliability score by descending order firstly, then selects Agents with a higher score to form a set with the following threshold condition: , where is set by TA. We denote is the th max in the Reliability score of Agents.(ii)Step 2: For each , the Cloud server calculates , where is a random number, then sends and recommendation request to each .(iii)Step 3: After receiving the message from the Cloud server, exploits SOMC.Enc to encrypt own weight matrix: , and sends to the Cloud server.(iv)Step 4: After receiving all encrypted weight matrices from , the Cloud server calculates for each by shared key between the Cloud server and the Agent:

Notes that if the Cloud server finds some weight matrices missing according to the during collecting , the Cloud server will select another Agents and repeat the previous step.

Then the Cloud server aggregates weight for Federated Recommendation:where is denoted as the sum of all Reliability scores of .

We define , then we can get .

(3) Item choosing and recommendation. The process of recommending alone by and federal recommendation is the same in this step, therefore we use to represent both and which are weight matrices in ciphertext form. According to , , we define .

For more accurate recommendation, the Cloud server finds and computes the similarities as follows:

Then according to , the cloud selects the Item with the best similarity, and calculates: where is the unique identifier of the . Then send to , will forward the recommendation to the user .

4.3.3. Update Reliability Score

After receiving the recommendation, the user sets scores according to the recommended items and sends it to the Cloud server as feedback, then the Cloud server updates the ’s Reliability score according to the feedback score. The whole process can be divided into two parts: (1) User scores for recommendation; (2) The Cloud calculates Reliability score. The pseudo-code of the Item Recommendation is given in Algorithm 2.

(1) User scores for recommendation. First, the user verifies the signature by performing the algorithm . . If it fails, refuses the request; otherwise, generates a feedback score matrix by calculating the square root of the scores: , where represents to a score about -th attribute of demand. The value of is 1 to 5.

To keep the Cloud server from knowing ’s scores, adds noise to the feedback score matrix : We define a set contains scores which is selected by in the set , where satisfies the following condition: .

And then adds noise to the feedback scoring matrix to get :where is a random number. Then generates an indicator vector :

Then We can get , where

To prevent tampering, calculates as follows:And then sends to .

(2) The Cloud calculates Reliability score. After receiving a number of from , calculatesfor each , where is defined as a set contains users who send feedback to . When the amount of feedback exceeds that is set by TA, sends to the Cloud server.

After receiving , the Cloud server calculates to get :

Then the Cloud server verifies each the signature by performing the algorithm . If it fails, the Cloud server will punish by giving it a low Reliability score; otherwise, calculates and updates Reliability score for each as follows:where is defined as the number of users which send feedback to , and is the number of times has been updated.

Input: , , ,
Output:
(24)Eachdoes:
(25);
for do
(26) calculate following equation (22);
(27);
(28)ifthen
(29)  send to the Cloud server;
(30)  ;
(31) end
(32)end
(33)The Cloud server does:
(34)for each do
(35) fordo
(36)  calculate following equation (23);
  ifthen
(37)   calculate following equation (24);
(38)  end
(39) end
(40) calculate following equation (25);
(41);
(42)end
(43)return

5. Security Analysis

In this section, we give the formal security proof of our proposed scheme, and security requirements of our scheme include verifiability and privacy. We have used the digital signature so that the verifiability is guaranteed, which has proven to be safe in [36]. The privacy of our proposed scheme is proved under the following threat model:(i)The Cloud server and Agents are honest-but-curious: we assume that Agents and the Cloud server will follow the protocol, but may be curious about users’ sensitive information.(ii)Agents does not collude with the Cloud server.

For privacy, it includes users’ data privacy and Agents’ parameter privacy. We give the proof of users’ data privacy in Theorem 1, and Agents’ parameter privacy in Theorem 2.

Theorem 1. Under the above threat model, our scheme meets the requirement of users’ data privacy.

Proof. Depending on our scheme, users’ privacy is reflected in Item Recommendation and Update Reliability score. So, our proof is divided into two parts as follows:(i)According to the Item Recommendation process in Algorithm 1, no additional information about the user (we use as an example) is sent except for the encrypted demand vector .If the attacker wants to reveal users’ demand privacy information from , he needs to get and . So we can set up the following equation to solve: .
However, , if attacker is , he has , but no . Similarly, if the attacker is the Cloud server, he has , but no . That means Agent and the Cloud server cannot get and at the same time. So, they cannot reveal and .
Then we consider the known-sample attack [40] and brute-force exhausted attack [39]:
If the purpose of the attacker is to recover plains from encrypted demanding vectors, and the attacker gets a set of plain demanding vectors , but he does not know which is the corresponding ciphertext of .
The attacker can use brute-force exhausted attack: trying every possible and to recover . We can simplify the problem a little bit by setting all the elements of to zero, then we can get , and then attacker just needs to recover .
Then we assume the worst-case scenario for : there are at least linearly independent vectors in , and . The attacker can solve by setting up the following equations: for to . If the cloud server who has as attacker needs to try every possible X to recover : , then the attacker chooses vectors from to form a matrix such that . He has to try every possible linearly independent permutations from the encrypted demanding vectors which he has received to form a matrices such that . Then the attacker has . Note that is invertible since are linearly independent. He picks a set of encrypted demanding vectors randomly and sets up a hypothesis that contains the corresponding encrypted demanding vectors in . Then he can set up equation to solve for and choose some vectors to verify the hypothesis: if , the hypothesis cannot be correct; otherwise, the hypothesis may be true.
However, it is for the attacker that the attack is exponentially expensive. That’s because there are possible candidates of . For each candidate, the attacker performs to verify whether recovered key is true which takes validations. For example, if and the attacker is capable to perform validations in a second, then if the attacker wants to try all hypotheses, he must spend more than 460 years. Therefore, our simplified scheme ( ) can also resist this the known-sample brute-force exhausted attack. Noteworthy, in our scheme, is a random vector, which the number of non-zero is random and non-zero elements are also random. The blindness of the demanding vector is increased, which makes his calculation aboved more difficult. There’s another scenario: as attacker wants to recover ’s demanding vector, but it is more difficult since he does not have .
Similarly, the who has as attacker needs to try every possible Y to recover : , the proof process is similar to the above.
Therefore, in the Item Recommendation phase, the demanding privacy of users is guaranteed.(ii)According to the Update Reliability score process in Algorithm 2, no additional information about is sent except for the encrypted feedback vector and matrix .If the attaker wants to reveal users’ preference information and the total score of the feedback from and , where the non-zero elements in represent that each true score in , is used to mark which scores in the feedback are true, and the sum of elements in represents the total score of the feedback, he needs to get , and . So we can set up the following equation to solve: , .
However, who has as attacker only knows the ciphertext and the semi-decrypted result . He cannot know and the sum of elements in . That’s because , and he cannot get without . Meanwhile, the encryption method for is the same as the demand vector in Algorithm 1 so that he cannot recover . Therefore, he cannot recover from and . For the total score of the feedback, we define a -dimensiona vector with the elements being all 1, he can get the sum of by calculating , but the result is not the total score of the feedback, because contains random numbers. The total score of the feedback can only be obtained by calculating , where . However, he cannot know so that he cannot get the total score of the feedback. This means that he cannot know the ’s preference information and the total score of the feedback.
Similarly, the Cloud server who has as attacker only knows the ciphertext and the semi-decrypted result . He cannot know , but can get the sum of elements in to update Reliability score for . This because , he can recover , but he cannot recover that is encrypted by the same method in Algorithm 1. He only get . The encryption method is the same as the demand vector in Algorithm 1. Therefore, he cannot recover that includes ’s preference information. But he can get the total score of the feedback: . This means that he can know the total score of the feedback but he cannot know the ’s preference information.
Therefore, in the Update Reliability score phase, the preference privacy of users is guaranteed.(iii)To sum up, our scheme meets the requirement of users’ data privacy.

Theorem 2. Under the above threat model, our scheme meets the requirement of Agents’ parameter privacy.

Proof. Depending on our scheme, Agents’ parameter privacy is reflected in Item Recommendation.
According to the Item Recommendation process in Algorithm 1, no additional information about the Agent (we use as an example) is sent except for the encrypted weight matrix . We consider that the attacker is the Cloud server or other Agents, and he wants to recover . The encryption method is the same as mentioned in Theorem 1. so that the attacker cannot recover . The proof procedure is the same as Theorem 1. Here is not described again.
Thus, our scheme meets the requirement of Agents’ parameter privacy.

6. Evaluation

In this section, we conduct simulation experiments to compare the performance of the proposed SOMC and ERBFL scheme with the existing schemes. We conduct all the experiments on the server with an Intel(R) Core(TM) i7-10700 CPU @ 2.90 GHz 2.90 GHz and 16 GB of RAM and uses TensorFlow to simulate the SOMC and ERBFL algorithm in Python.

For experimental parameters, we choose appropriate and security parameters with the given initial parameter and . We can get . So that the full Reliability score is 15. To facilitate the test, we choose the parameter , and .

6.1. Efficiency Comparison

In order to show the computation costs about encryption (SOMC) in ERBFL, we compare our scheme with PPO-NBR [12] and EPRT [10], which use homomorphic encryption. We compare the running time of encrypting vectors against varying number of Items. As shown in Figure 2, it can be observed that as the number of Items increasing, all of them increase. But we can see that our scheme takes the least time no matter how many Items. This is because the encryption form in PRO-NBR can only deal with a bit message per encryption, and EPRT can deal with an integer message, whereas our scheme can deal with integer messages. Thus, our encryption scheme runs faster and more efficiently.

Next, we conduct experiments to compare the running time of recommendation phase in our scheme with PPO-NBR [12] and EPRT [10] as shown in Figure 3. As the number of items increases, the running time increases. However, the running time of our scheme is always the least.

6.2. Impact of Laplace Noise on Similarity Accuracy

To achieve better privacy protection, we use differential privacy Laplace noise encryption, which makes us consider the impact on the recommendation items. In fact, the accuracy of the recommended items can be known from equation (17), which is equivalent to the impact on the similarity calculation items. Therefore, we conduct the following experiments by setting three scenarios: noise added after encryption, noise added before encryption and nosieless encryption, as shown in Figure 4. It is easy to see that the effect of putting noise after encryption is almost equal to that of no noise, so the effect of our ERBFL scheme on similarity calculation is negligible, therefore, this noise will not affect the accuracy of recommendations in our scheme.

6.3. Recommendation Accuracy Comparison

We use users to score the recommended items to indicate the accuracy of the recommended items. Because if the recommendation items are more accurate, the users will be more satisfied and the feedback score will be higher. Then we use a data set MovieLens-100k [41], which includes 100000 movie ratings from about 900 users on over 1600 movies, where the value of all scores is 1 to 5. Since the data does not include all users’ scores for all movies, we fill in the missing scores with the mean of two known values, where one of two known values is the average of the corresponding user has scored, another value is the average of the corresponding item has been scored. We use the data as the users’ scores, the more satisfied with the recommended items, the higher the score. We repeat the recommendation process many times, find the corresponding score from the data for each time, and compare different schemes of average score. In addition, an evaluation indicator called accuracy is also used. Accuracy is defined as the fraction of correct recommendations in the total number of recommendations, and accuracy is generally considered to evaluate whether a Recommender System can recommend accurate items. Higher accuracy means that the Recommender System can recommend more comfortable items. Then, due to the scores in data set MovieLens-100k [41] are all real users scoring, we assume that an item with a score above 4 is the correct recommendation. We repeat the recommendation process many times, count the number of correct recommendations, and compare the accuracy of different schemes.

As shown in Figure 5, it can be seen that the items recommended by our scheme always has a higher feedback score. And as shown in Figure 6, it also can be seen that our scheme always has the best performance in terms of accuracy. On the one hand, our scheme is designed based on a federal learning framework, which is different from EPRT [10] and PPO-NBR [12] used centralized recommendation. On the other hand, our scheme uses similarity as a recommended criterion, which is different from PPO-NBR [12]. Therefore, our scheme can recommend the item that makes users more satisfactory.

7. Conclusion

In this paper, we propose ERBFL—an Efficient Recommendation Based on Federated Learning for helping users find an appropriate item. To protect privacy and improve accuracy of recommendation, we design a privacy-preserving ciphertext calculation for similarity calculation, and by employing the Federated Learning framework. We analyze the security of our scheme for achieving the design secure goals and ensuring the accuracy of the recommended items. In addition, the experiment proves that ERBFL is more efficient than the existing scheme. The primary constraint we faced was the high dimension of user or item information, which can affect the execution time, and the performance of our scheme. Furthermore, in our scheme, we assume that the Cloud and agents cannot collude. Therefore, our future plans are to extend our scheme to be applicable more complex scenarios and to be able to resist collusive attacks.

Data Availability

The processed data required to reproduce these findings cannot be shared at this time as the data also forms part of an ongoing study.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was partially supported by National Natural Science Foundation of China under Grant No. 61932010. The authors appreciate the anonymous reviewers for their thorough comments and suggestions for this paper.