Support vector machine (SVM) is an important technique for data classification. Traditional SVM assumes free access to data. If the data are split and held by different users, for privacy reasons, users are likely unwilling to submit their data to a third party for classification. In this paper, by using additive homomorphic encryption and random transformations (matrix transformation and vector decomposition), we design a privacy-preserving outsourcing scheme for conducting Least Squares SVM (LS-SVM) classification on vertically partitioned data. In our system, multiple data owners (users) submit their encrypted data to two non-colluding service providers, which conduct SVM algorithm on it. During the execution of our algorithm, neither service provider learns anything about the input data, the intermediate results, or the predicted result. In other words, our algorithm is encrypted in the whole process. Extensive theoretical analysis and experimental evaluation demonstrate the correctness, security, and efficiency of the method.

1. Introduction

1.1. Background

SVM [1] is a powerful machine learning algorithm. It uses some given data points to learn a model which separates the feature space into two parts. All data in each part are considered to belong to one category. Then, the model could be used to predict the category of a new data point. Thus far, SVM classification has been successfully used in many fields, such as pattern recognition [2], diagnosis and classification of diseases [3, 4], and financial forecast [5]. Traditional SVM supposes that data are centralized and can be freely accessed. However, there is a widespread practice in today’s cloud computing environment; that is, the data for learning are split and held by multiple users, and the learning process is outsourced to some cloud service providers. For example, multiple financial institutions (banks, insurance companies, etc.) want to construct a model used for assessing personal credit. Clearly, the more the data used for learning are, the more accurate the model will be. To obtain a more accurate model, these institutions should contribute their data to a service provider for learning a SVM model. However, in this procedure, the traditional SVM will expose their data to the service providers. This is not permitted because of legal and moral constraints. The key to solving this problem is to develop a privacy-preserving SVM (PP-SVM) algorithm which obtains valid results without disclosing the users’ data.

PP-SVM is a privacy-preserving machine learning (PPML) algorithm [6]. Generally speaking, a PPML algorithm modifies the original machine learning algorithm from two aspects: changing the system architecture and combining different security techniques with the original algorithm. For the first aspect, some architectures used by people include one client and one server [7, 8], multiclient and one server [9, 10], and multiclient and multiserver [11] architecture. Thus far, it is still difficult to design a highly efficient algorithm for one server architecture. In contrast, multiserver provides more possibilities for designing a highly efficient algorithm, but it also increases communication costs. For the second aspect, the security techniques often used in PPML include fully homomorphic encryption [7, 12], additive homomorphic encryption [11, 1315], secret sharing [1517], garbled circuits [17, 18], and differential privacy [19]. In this paper, we mainly use additive homomorphic encryption to construct our PP-SVM on a multiclient/two-server architecture.

1.2. Related Work

In the last two decades, some attention has been given to PP-SVM algorithm.

Laur et al. [8] designed a cryptographically private SVM with additive homomorphic encryption [20], linear secret sharing [21], and conditional oblivious transfer [22]. Their algorithm only works on a system consisting of one server and one client, where the server owns the input data points and the client owns the corresponding class labels. Their system is not suitable for multiclient situation.

Yu et al. [23] proposed an algorithm for PP-SVM based on secure set intersection cardinality [9] and commutative public-key encryption. Vaidya et al. [14] constructed three PP-SVM methods using secure addition, secure scalar product, and homomorphic encryption for vertically, horizontally, and arbitrarily partitioned data. The algorithms proposed by Yu et al. and Vaidya et al. are secure multiparty computing system. Omer et al. [13] proposed a PP-SVM algorithm based on additive homomorphic encryption, which is suitable for multiclient and one server systems. In above three algorithms, the input data points are concealed, but the corresponding class labels and kernel matrix are exposed to other participants. Exposing these messages to others can sometimes reveal some privacy about the input data (see the appendix in [24]). Therefore, we want to hide all this information in our algorithm.

Liu et al. [12] proposed their PP-SVM algorithm based on fully homomorphic encryption [25, 26] and secure addition method. In their algorithm, each user communicates with the server many times, and each user performs a lot of computation by itself. These operations increase the workload of users and make their system look more like a collaborative rather than an outsourcing computing system.

Park et al. [7] used CKKS [27] scheme, a fully homomorphic encryption, to construct a training algorithm based on Least Squares SVM [28]. Their method is suitable for the case of one client and one server, not directly for the case of multiple users.

Thus far, all known methods of fully homomorphic encryption are still not efficient enough for practical use. We hope to design a sufficiently effective algorithm, which should reduce the work of users as much as possible.

Wang et al. [11] first built a secure computation system and then designed a PP-SVM algorithm on it. Their computation system consists of eight secure operations, including secure integer multiplication, secure inner product, and secure floating-point addition/subtraction. They use a distributed two-trapdoor public-key cryptosystem [29] to set up this computation system, which ensures that their algorithm has better performance. This method can work on a multiclient/multiserver architecture. Compared with their methods, our method in this paper is simple and easy to understand.

In addition, because our algorithm is based on LS-SVM, the service providers need to solve a linear system of equations. To ensure the data security, our algorithm employs random transformations to hide the coefficients of the equations. Similar data masking techniques have been used by some researchers. Lei et al. [30] proposed a secure outsourcing method to compute the inverse matrix. They use a sparse matrix to mask the original matrix, and then send it to the cloud for computing the inverse matrix. Chen et al. [31] proposed a secure outsourcing scheme for solving large-scale systems of linear equations based on matrix transformation in the fully malicious model. Chen et al. [32] designed two protocols for secure outsourcing of linear regression. They use random orthogonal matrix to mask the original equations and then outsource the equation solving process to the cloud. However, all their methods only work in one client and one server architecture. In this paper, we combine homomorphic encryption and random transformation technique to tackle the problem under multiclient/two-server framework.

1.3. The Target Problem, Architecture, and Security Model
1.3.1. Target Problem

In this paper, we focus on the data privacy issues when multiple users submit their data to some cloud servers for SVM classification.

Specifically, suppose there are data pairs for training and a new data point for predicting, where each is the class label corresponding to the data point and is shared by all users. Each vector (including ) is vertically partitioned among users as shown in Figure 1, where

The is the private data of who is unwilling to share it with others. Meanwhile, all users do not want to reveal to any server.

We want to design a PP-SVM algorithm that enables the cloud server to perform SVM algorithm using all user data without learning any input data or prediction results.

1.3.2. Our System Architecture

Our system architecture consists of user , multiple users, and two service providers (SP1 and SP2), as shown in Figure 2. They each undertake the following tasks.(1): It sends a service request to all data owners (users). Then, all users and service providers perform PP-SVM algorithm. may or may not be one of the data owners.(2)Users: Each user provides different attribute values of the same set of objects to a service provider in an encrypted form. The data can be used for training or prediction.(3)SP1, SP2: They collaboratively learn a SVM model and predict the category of a new data point. In the training process, SP1 plays the role of Evaluator and SP2 plays the role of Crypto Service Provider (CSP). In the predicting process, SP1 and SP2 switch their roles. These two roles have the following responsibilities.(1)CSP generates a key pair of the Paillier cryptosystem [20] and then sends the public key to all other participators. It also performs some computations to help Evaluator execute our algorithm.(2)Evaluator executes our training and predicting algorithms by collaborating with CSP. It performs all homomorphic computations in our protocols.

Figure 2 shows that the training and predicting procedure can be divided into the following steps as a whole.(1)CSP sends the public key to other participators.(2)All users send encrypted data to Evaluator.(3)Evaluator and CSP collaboratively perform the privacy-preserving training or predicting protocols.(4)Evaluator sends the predicted result to .

1.3.3. Security Model

Each party in our system is semi-honest. This means that each service provider or user performs our protocols faithfully but hopes to learn some privacy of others by observing the execution of the protocols.

Meanwhile, we suppose the two service providers are non-colluding. Service providers are generally large companies that are heavily regulated; we believe this condition is not difficult to achieve.

In addition, we found that our protocol for computing RBF kernel (Protocol 3) may be inefficient. Therefore, we designed a new protocol (Protocol 4) as an alternative. However, the new protocol requires that there is no collusion between any two participants.

1.4. Our Contributions

In this paper, we propose a privacy-preserving outsourcing scheme for SVM classification on vertically partitioned data. Our algorithm is based on LS-SVM. Paillier encryption and random transformation are the main techniques in this work. Our algorithm has the following features.

1.4.1. Secure Outsourced SVM Training

     Our algorithm enables two service providers to solve LS-SVM equation in an encrypted state. In this procedure, the two service providers cannot learn the input data (all and ), the solution of LS-SVM equation, and the training parameters.

1.4.2. Secure Outsourced SVM Prediction

     Our algorithm enables two service providers to make prediction for a new data point in an encrypted state. In this procedure, the two service providers learn nothing about the input data or the value of the decision function. The encrypted outcome will be sent to user , and then can use it to compute the value of the decision function by himself.

1.4.3. Concealing Important Intermediate Results

     In our algorithm, the kernel matrix and decision function are hidden, which makes it more difficult for the service provider to infer some information about the input data.

In addition, we conduct a comprehensive security analysis of the protocol and prove its security under various collusion scenarios. Experiments show that our algorithm has good accuracy and performance.

2. Preliminaries

In general, we use lowercase letters to indicate numbers (e.g., , , , ), lowercase boldface letters (e.g., , , ) for vectors, and uppercase boldface letters (e.g., , ) for matrices.

2.1. SVM and LS-SVM Overview

SVM algorithm consists of two phases: training and predicting. In SVM training phase, it uses all training data to learn a linear or nonlinear model represented by a decision function . Then, in SVM prediction phase, it computes at a given . The function value is used to predict the category of the given data.The SVM model is actually a hyperplane called decision boundary, which separates the training data into two categories. The training process finds the hyperplane by maximizing the margin between two categories of data; Figure 3 shows a two-dimensional case. It can be described as follows. Suppose there are data pairs for SVM training, where is a vector in the feature space and is the class label of . The SVM learns a decision boundary by solving the following optimization problem:where is the slack variables that allow some data points to lie within the margin. Figure 2 is a two-dimensional example. The classification rule induced by is . The problem equation (2) is usually solved in its dual form, which often combines with the kernel trick to deal with the nonlinear decision boundary. Finally, the form of its dual problem iswhere is a kernel function, and is a basis function. The decision function iswhere are the solutions to the dual problem equation (3). can be obtained as follows: select that satisfies ; compute .

Three popular choices for kernel function are

The matrix is the kernel matrix.

Although problem equation (3) is only quadratic programming, it is still difficult to solve in the encrypted state, because the inequality constraints are difficult to deal with. The algorithm in this paper is based on the following LS-SVM.

We can also transform this problem to its dual form with Lagrange multiplier method. Likewise, we can apply the kernel trick to its dual problem. The kernel functions (5) can also be used in LS-SVM. Most important of all, we can obtain the coefficients of its decision function by solving LS-SVM equation as follows:whereand is the th entry of . In this paper, we will solve this linear equation in an encrypted form and then secretly compute for a new data point . For more details about SVM, see [1, 28].

2.2. The Paillier Cryptosystem

The Paillier cryptosystem is an additive homomorphic encryption scheme which satisfies semantic security [33]. Semantic security makes it impossible for any polynomial algorithm to gain extra information about a plaintext when given only its ciphertext and public key. As an asymmetric encryption, Paillier cryptosystem has a key pair , where is the public key and is the private key.

Let be the encryption algorithm and be the decryption algorithm. Given , the Paillier encryption satisfies the following properties:

For more details, see chapter 13 of [34].

In our system, CSP generates the key pair and sends to other parties. To simplify the description, for any or , we use or to denote a matrix or vector whose each entry is or .

2.3. Data Representation

We use fixed-point representation in our system. More precisely, we represent a real number with a fixed-point integer. For a fixed-point real number which dedicates bits to the fractional part, we use the integer to represent it. Supposing and are fixed-point numbers in our system, we perform the arithmetic operations as follows:where the operation rounds a real number down to the nearest integer.

Because the cryptosystem works only for nonnegative integers, we use 2’s complement in our system to ensure all numbers are positive. However, this causes another problem, the operation listed above is not correct for the complement of a negative number. Therefore, when performing a division operation, we first transform the 2’s complement to its true form and then perform division on it, and the result of division will be transformed back to the complement for subsequent computation.

2.4. The Protocol

The protocol EncMul is a two-party protocol that accomplishes the following task: Evaluator provides ; CSP keeps the private key; finally, Evaluator obtains , and the two servers cannot learn anything about or .

Elmehdwi et al. [35] proposed this protocol and Liu et al. [29] gave a similar one which used multiple keys. This protocol is used in our methods to compute the polynomial or RBF kernel functions. The description of the EncMul protocol is as follows:(1)Evaluator chooses two random positive integers and ; computes , ; and then sends , to CSP.(2)CSP obtains , ; then computesand returns to Evaluator.(3)Evaluator computes

In this protocol, is the modular inverse of . This protocol is based on the equation .

3. Our PP-SVM Training Algorithm

In the training phase, SP1 is Evaluator and SP2 is CSP.

3.1. Main Steps of Our PP-SVM Training Algorithm

The key of our training algorithm is to solve the LS-SVM equation securely. Figure 4 shows the procedure of PP-SVM training, which includes several steps as follows:(1)SP2 sends to other participators.(2)All users send encrypted data to SP1.(3)SP1 computes and and then sends to SP2 ( is used for masking ).(4)SP2 decrypts , solves , decomposes into , and then sends to SP1 ( and ).(5)SP1 computes , (removing perturbation).

Finally, the SP1 has random vectors , ; SP2 has random numbers . These data satisfy , where is the solution of LS-SVM equation. Because the two service providers do not collude, neither of them knows .

Furthermore, during the training procedure, SP1 and SP2 should not learn anything about . The reason for this is that someone can learn the kernel matrix from , because the difference between them is small, and thus can learn some information about from the kernel matrix, especially when the kernel function is simple (see the appendix in [24]).

We use the notations and instead of SP1 and SP2 in this section to make their roles easy to identify.

3.2. Securely Computing Matrices and
3.2.1. Case 1: PP-SVM with Linear Kernel

For a linear-kernel matrix, its th entry . For , we have

Based on (13), we design Protocol 1.

Protocol 1. Compute for linear-kernel PP-SVM.Input: provides ; each is shared by all users, where and .Output: SP1 obtains .(1)Each computes locally and sends them to SP1, where , .(2)For any , SP1 computes as follows:.(3)User sends and all to SP1, who computes each . Then, SP1 obtains ( is the parameter in (7)).

3.2.2. Case 2: PP-SVM with Polynomial Kernel

For a polynomial kernel matrix, its th entry . Hence,Based on (14), we design Protocol 2.

Protocol 2. Compute for polynomial kernel PP-SVM.Input: provides all , and in the kernel function are shared by all users .Output: obtains .(1) computes , , and ; other users compute . Then, all users send these data to .(2)For , computes(3) sends to . For , let . and repeatedly execute times. Then, obtains(4) and computeThen, obtains .(5) sends and all to , who computes . Then, obtains .

Remark 1. Because cannot be disclosed to service provider, two service providers must execute EncMul to generate , as shown in step 4. Another method is that SP1 sends all to one user, who computes and returns it to SP1. In this way, two service providers can reduce the execution of EncMul once.
In addition, the degree is the only parameter sent directly to SP1.

3.2.3. Case 3: PP-SVM with RBF Kernel

For a RBF kernel matrix, its th entry

Let ; we have

According to (9), we can design the following two methods for computing :(A) collects and from users. Then, two servers use EncMul to compute .(B) chooses some random ; all users try to compute . collects and ; then, it can compute with them.

Protocol 3 is designed based on method A.

Protocol 3. (method A)Compute for PP-SVM with RBF kernel.Input: provides , and each is shared by all users, where and .Output: obtains .(1)For any , computes and computes . Then, all users send these data to .(2)Let . and repeatedly executefor . Then, obtains(3) sends and all to , who computes . Then, obtains .

Remark 2. Protocol 3 will call EncMul times. This leads to inefficiency because EncMul is a time-consuming protocol.
Protocol 4 is designed based on the method B.

Protocol 4. (method B) Compute for PP-SVM with RBF kernel.Input: provides , and each is shared by all users, where and .Output: obtains .(1) chooses random real numbers and a random integer , where . Then, computes for .(2) sends each to , and computes and then sends to . One by one, other users do the same thing until computes and sends these data to .(3) computes for ; sends to ; and sends to and .(4) computes as follows:where .(5) sends and all to , who computes . Then, obtains .

Remark 3. The data . To improve the accuracy, we multiply by and use the integer part of in subsequent computations. Proposition 1 will illustrate why it require and .
In this paper, we use Linear-SVM, Poly-SVM, RBF-SVM(A), and RBF-SVM(B) to denote the PP-SVM methods which use Protocols 14, respectively.
The RBF-SVM(B) is far more efficient than RBF-SVM(A), but it requires that any two participants do not collude with each other (see Section 3.5.2).

3.3. Securely Computing and Splitting It between Two Service Providers

The decision function, which we call SVM model, can be written as follows:where

In this section, the two service providers will securely compute . Because discloses the relationship between the input points and the hyperplane to some extent, we do not reveal to service provider. To this end, we will randomly decompose into a linear combination, then split this combination into two parts, and provide different parts to Evaluator and CSP. This ensures that both of the service providers learn nothing about .

In a word, we require that none of the service providers can learn anything about , or when executing our algorithm.

Protocol 5. Compute and split it between two servers.Input: inputs ; CSP has the private key.Output: obtains vectors , ; obtains real numbers , . These data satisfy .(1) chooses an invertible random . For Linear-SVM, Poly-SVM, and RBF-SVM(A), let ; computes each entry of as follows:For RBF-SVM(B), let ; computeswhere , , are the elements of , , . Then, sends to .(2) obtains . Especially for method RBF-SVM(B), SP2 computes . Then, SP2 solves and obtains .(3)Then, randomly divides into a linear combination as follows:Then, sends vectors to .(4) computes and .

Remark 4. Step 1 uses an invertible matrix to mask in an encrypted state. Step 3 decomposes into a linear combination. These random transformations prevent SP1 and SP2 from learning anything about or .
It is worth noting that keeping confidential is a key step to ensure the security of . If someone obtains , they can compute , where is the right-hand side of (6).
SP1 and SP2 will use , , , and to compute in prediction phase.

3.4. Correctness of Our Training Algorithm

It is easy to verify the correctness of Protocols 13 by using the properties of Paillier encryption.

In Protocol 4, we use secure multiplication which uses to conceal each user’s . The integers and are used instead of and in this protocol, which introduces error into system. Fortunately, the following proposition proves that the error is small.

Proposition 1. (precision)In Protocol 4, the use of and introduces errors into the system, but the actual effect on each element in is less than .
Proof: In Protocol 4, SP1 computes

We use instead of . Let be a matrix and its th entry be .

In step 2, SP2 computes . Clearly, it is equivalent to replacing with . This means the error for each entry in the matrix is . We have

For example, if the , the error is less than . The following proposition proves that the output of Protocol 5 is a linear combination of .

Proposition 2. (correctness)In Protocol 5, SP1 has and ; SP2 has and . These data satisfy , where is the solution of LS-SVM equation.

Proof: .

3.5. Security of Our Training Algorithm
3.5.1. Security of the Protocol Itself

Protocols 14 only use Paillier encryption and protocol EncMul. Paillier encryption is semantically secure, and the security of EncMul has been confirmed by previous studies. Therefore, Protocols 14 do not disclose input data. The following proposition proves the security of Protocol 5.

Proposition 3. (security) In Protocol 5, two semi-honest and non-colluding service providers learn nothing about and of the LS-SVM equation.

Proof: only has the plaintext and . These data have nothing to do with ; hence, SP1 cannot learn anything about . In addition, SP1 probably knows . This gives it equations but unknowns . Therefore, SP1 cannot also learn anything about .

only knows the matrix . Even if it knows , it still cannot obtain anything about or , because gives it equations but unknowns in and . Meanwhile, CSP learns nothing about without , because and it only has .

3.5.2. Analysis of Collusion

Suppose there exists collusion between some participants. In Protocols 13, all users only provide data, and there is no communication between users. Therefore, we only need to discuss the collusion between users and one service provider.

Proposition 4. (collusion) For Linear-SVM, Poly-SVM, and RBF-SVM(A), collusion between some users and SP1 or SP2 cannot help them to learn anything they do not know.

Proof: . There are two cases of collusion.Case 1: Some users collude with SP1. These users can provide their data to SP1. However, SP1 still cannot obtain any part of because each entry of is divided between all users. SP1 and these users cannot learn anything they do not know.Case 2: Some users collude with SP2. SP2 has and it knows . Without , matrix cannot help them learn anything else. Hence, they also cannot obtain because only some users collude with SP2. Without , SP2 cannot compute and . SP2 and these users cannot learn anything they do not know.For Protocol 4, simply requiring that there is no collusion between two service providers cannot prevent privacy leakage. Protocol 4 uses random to keep any user from learning other user’s . This is similar to the secure multiparty summation protocol proposed by Clifton et al. [36], except that our arithmetic operation is multiplication. If any two users do not collude, this multiparty multiplication is secure. However, if there are two users colluding with each other, they may obtain the other user’s data. For example, and can easily compute ’s . For more discussion on this problem, please see [37].

4. Our PP-SVM Prediction Algorithm

In this paper, each training data point is vertically partitioned between users. These users could be banks, insurance companies, e-commerce firms, etc., and they are unwilling to share their data with others. Meanwhile, the new data point for prediction also includes the same attributes as . Indeed, it is difficult for someone else to collect the data for prediction from these users. Accordingly, we assume the data is also vertically partitioned between these users.

4.1. Main Steps of Our PP-SVM Prediction Algorithm

In the prediction phase, wants to know about the category of object . Therefore, it sends a request to all data owners (users); then, all users provide data about , and service providers will perform PP-SVM prediction algorithm. To determine the category of , needs to know as follows:where and , and and are held by SP1 and SP2. Figure 5 shows the procedure of prediction, which includes several steps as follows:

Let , , and in the following steps.(1)SP1 sends to other participators.(2)All users send encrypted data to SP2.(3)SP2 computes (maybe with the help of SP1).(4) chooses and sends them to SP1.(5)SP1 chooses ; computes , , ; and then sends them to SP2.(6)SP2 chooses and computes , , , . Then, it keeps by itself and sends to SP1.(7)SP1 decrypts the data received; computes and ; and then sends , to SP2.(8)SP2 computes and and then sends to .(9) computes the .

Finally, obtains the , which is the value of . Because using the random vectors and in this procedure, each service provider cannot learn the data held by the other party.

4.2. Our Protocol for PP-SVM Prediction

In this phase, SP1 and SP2 switch their roles in the training phase. SP1 becomes CSP and SP2 becomes Evaluator. We use the notations and instead of SP1 and SP2 in Protocol 6 to show their roles. Let the new data point , where is owned by .

Protocol 6. Privacy-preserving SVM prediction.Input: provides ; and provide their data obtained in the training phaseOutput: only obtains the value of .(1)Compute , where .Case 1: Linear-SVM.(a)Each computes locally and sends them to , where .(b) computes the following data for .Case 2: Poly-SVM.(a)Each computes and computes all and . They send all of them to .(b)For , computes(c)Let . and repeatedly execute times. Then, obtains(d)Finally, and computeThen, obtains for .Case 3 (for PP-SVM with RBF kernel):Choose one from the following two methods.Method RBF-SVM(A):(a)Let . computes ; computes , where . Then, all these data are sent to .(b)For any , let . Evaluator and CSP repeatedly execute for . Finally, obtainsMethod RBF-SVM(B):(a)Let . chooses random real numbers and computes for any .(b) sends each to , and computes and then sends to . One by one, other users do the same thing until computes and sends these data to .(c) chooses a random integer , computes , sends to , and sends to and .(d)For , computes(2) chooses two random real numbers , and sends them to Evaluator, where , .(3) chooses positive random integers and computes andwhere ’s data and are obtained in Protocol 5. Then, sends all , , and to .(4) chooses positive random integers . Let , . computesand sends all to .(5) obtains , . For Linear-SVM, Poly-SVM, and RBF-SVM(A), computesFor RBF-SVM(B), computesThen, sends to .(6)For Linear-SVM, Poly-SVM, and RBF-SVM(A), computesFor RBF-SVM(B), computesThen, sends to .(7) computes as follows:We choose , to make the computation more accurate.

4.3. Correctness and Security of Prediction Algorithm

In Protocol 6, two service providers compute in step 1. This step is similar to computing in Protocols 14. Its correctness is obvious. The following proposition proves that the output of Protocol 6 is .

Proposition 5. (correctness) In Protocol 6, finally obtains the value

Proof: . For Linear-SVM, Poly-SVM, and RBF-SVM(A), finally obtains the following value:There is a similar proof process for RBF-SVM(B).
The next proposition guarantees the security of Protocol 6.

Proposition 6. (security) In Protocol 6, two non-colluding service providers cannot learn anything about the data , , , , and .

Proof: . We do not discuss step 1 because it is essentially the same as computing in Protocols 14. The following discussion is about the other steps.
knows plaintext , , , and . The and have nothing to do with and . In fact, even if it knowsand regards each as an unknown variable , the number of unknowns (including and ) still far exceeds the number of equations given by and . This ensures that SP1 learns nothing from and . Meanwhile, SP1 still does not know , which means it still cannot learn . Thus, SP1 knows nothing about , , , , or .
only receives plaintext , , , where . Other encrypted data it has are not helpful in understanding private information. Under this circumstance, even if it knows the following equations:the number of unknowns (including , , , , , , ) is still far more than the number of equations. It cannot obtain any new information from these equations. Hence, SP2 learns nothing about , ,