Abstract

With the rapid development of cloud computing and machine learning, using outsourced stored data and machine learning model for training and online-aided disease diagnosis has a great application prospect. However, training and diagnosis in an outsourced environment will cause serious challenges to the privacy of data. At present, many scholars have proposed privacy preserving machine learning schemes and made a lot of progress, but there are still great challenges in security and low client load. In this paper, we propose a complete privacy preserving outsourced multiclass SVM training and aided disease diagnosis scheme. We design some efficient basic operation algorithms for encrypted data. Then, we design an efficient and privacy preserving SVM model training protocol using the basic operation algorithms. We propose a secure maximum finding algorithm and secure comparison algorithm. Then, we design an efficient online-aided disease diagnosis scheme based on the BFV cryptosystem and blinding technique. Detailed security analysis proves that our scheme can protect the privacy of each participant. The experimental results illustrate that our proposed scheme significantly reduces the computation overhead compared with the previous similar works. Our proposed scheme completes most of the operations of aided disease diagnosis by the cloud servers and the client only needs to complete a small amount of encryption and decryption operations. The overall computation overhead is 0.175 s, and the efficiency of online aided disease diagnosis is improved by 85.4%. At the same time, our proposed scheme provides multiclass diagnosis results, which can better assist doctors in their treatment.

1. Introduction

Machine learning (ML) uses the computer system to build mathematical models on sample data with statistical methods and makes predictions or decisions without being explicitly programmed. Now, ML has shown significant advantages in the field of disease diagnosis and brings more and more convenience to the prevention and treatment of diseases.

With the rapid development of cloud computing technology, cloud service providers (CSP) have high-quality computation and huge storage space, which can provide data processing, model training, diagnosis services and deployment, and other intelligent solutions based on machine learning. In this context, the local clients will outsource their medical data and machine learning models to CSP without having to build their own large-scale infrastructure and computing resources. The cloud can train a machine learning model and provide aided disease diagnosis service by using the outsourced medical data and machine learning models, which can help improve doctors’ diagnosis, treatment decisions and provide patients an online disease diagnosis service. A typical cloud platform machine learning system architecture is shown in Figure 1.

However, the security and privacy of outsourced data will be threatened by various threats, making people afraid to use the service of CSP. The security and privacy threats are mainly reflected in the leakage of the data, the machine learning model of the model owners, the users’ request, and diagnosis results. As we all know, the leakage of medical information may cause irreversible losses or become a major event. Therefore, the security and privacy preserving of model training and diagnosis based on cloud computing have become a major challenge.

To address the abovementioned challenges, many scholars have proposed various schemes, such as a secure outsourced classification based on logistic regression model [1], an electronic medical disease risk prediction scheme based on naive Bayes model [2], and other secure disease prediction schemes based on machine learning technology [35]. As a machine learning algorithm with high computational efficiency and nice predictive accuracy, the support vector machine (SVM) has achieved high classification accuracy and efficiency in the medical field [6, 7]. However, the existing privacy preserving SVM schemes mainly implement secure prediction [811], and there are few privacy preserving SVM schemes for secure training. Most of the existing privacy preserving SVM schemes are designed for binary classification, which can only determine whether the patient has the disease [12], but cannot deal with the multiclass of the disease. In addition, multiclass SVM requires more computation, which will reduce the efficiency [13].

To solve the abovementioned problems, we propose an efficient and privacy preserving online disease diagnosis scheme based on the SVM algorithm. In our scheme, we can achieve multi-class SVM training on the encrypted outsourced data from multiple data owners and provide users with privacy preserving disease diagnosis. In summary, our contributions are as follows:(1)Efficient and secure basic operation algorithms: Based on the Paillier cryptosystem, we design several basic operation algorithms to realize the secure outsourced data storage and computation, including secure aggregation algorithm, secure multiplication algorithm, and so on. These secure computation algorithms are the building blocks for our proposed training protocol.(2)Completing machine learning process under privacy preserving: Aiming at the general machine learning process and the goal of privacy preserving, we propose a privacy preserving outsourced multiclass SVM model training and online-aided disease diagnosis scheme. Different from the existing privacy preserving schemes that only support training or diagnosis, our proposed scheme extends the function of privacy preserving machine learning system.(3)Efficient and secure online aided disease diagnosis: Based on the BFV cryptosystem, we design a secure maximum finding algorithm and secure comparison algorithm. We provide an efficient and privacy preserving aided disease diagnosis scheme. Experimental results illustrate that our proposed scheme significantly reduces the computation cost than the existing similar schemes, which is suitable for practical application scenarios where a large number of users request diagnosis at the same time.(4)Low overhead for local client: For a local client, the client only needs to perform encryption and decryption operations in our proposed scheme, which reduces the storage and computation overhead of the local client to the greatest extent and makes full use of the computation power of the cloud servers.

The remainder of this paper is organized as follows. In Section 2, we review some related works. In Section 3, we review the Paillier cryptosystem, BFV cryptosystem, and SVM algorithm as preliminaries. In Section 4, we make a system overview. Then, we propose our scheme in Section 5. In Section 6, we analyze the security of our proposed scheme. In Section 7, we make a performance evaluation. Finally, we conclude this paper in Section 8.

In this section, we summarize the privacy preserving machine learning schemes in recent years.

With the development of big data era, machine learning has been widely used in many fields. Among them, the application of machine learning in the field of intelligent disease diagnosis has developed rapidly. Disease diagnosis schemes based on various machine learning classification algorithms have been proposed [1417]. However, at the same time, the problem of privacy disclosure in the machine learning process is becoming more and more serious. So, many scholars have carried out the research studies on privacy preserving machine learning.

Triastcyn and Faltings [18] proposed the Bayesian differential privacy, considered the distribution of data and provided a more practical privacy guarantee. Laur et al. [19] proposed a privacy preserving scheme of support vector machine based on secure multiparty computation. For each training or testing phase, their scheme involves multiple parties holding encrypted data and secret sharing obtained during training. Based on additive homomorphic encryption, Mandal and Gong [20] designed a privacy preserving scheme that performs gradient descent on data owners and cloud server. They achieved secure linear and logistic regression model training. Shen et al. [21] used blockchain technology to establish a secure and reliable data sharing platform among multiple data providers and constructed a privacy preserving support vector machine training scheme based on the Paillier cryptosystem. However, in their scheme, the data provider needs to interact with the cloud server to complete the computation. The computation cost of the data provider is large. Liu et al. [22] proposed a privacy preserving clinical decision support system using the naive Bayes (NB) classifier. The BGV homomorphic encryption system significantly improved the performance. In work [23], a framework for securely and efficiently outsourcing decision tree inference was proposed. Tan et al. [24] proposed a system for privacy-preserving machine learning that implements all operations on the GPU, which makes full use of the computing power of GPU. Zheng et al. [25] combined random permutation and arithmetic secret sharing by the compute-after-permutation technique and built a privacy-preserving machine learning framework. Li et al. [26] proposed a verifiable privacy-preserving machine learning prediction scheme for the edge-enhanced HCPSs, which outputs the verifiable prediction results for users without privacy leakage. Ma et al. [27] designed a lightweight privacy-preserving medical diagnosis mechanism on edge called LPME.

Among them, the SVM algorithm is a research hotspot and has been widely used in different data mining and machine learning schemes. Most of the existing privacy preserving SVM schemes are based on three main privacy preserving technologies: differential privacy (DP), secure multi-party computation (SMC), and homomorphic encryption (HE). DP can significantly improve the calculation and communication efficiency, but the cost is to sacrifice the accuracy of the model by adding random noise [28, 29]. Zhang et al. [30] proposed a general differential privacy model fitting method based on the genetic algorithm, but it reduces the decision accuracy of the model. SMC alleviates the limitation of computing but requires more interaction between participants. This leads to expensive communication overhead [31, 32]. Yu et al. [33] first proposed a privacy preserving SVM classification method based on vertically segmented data. They use SMC technology to obtain the global model, so as to protect the local privacy data and hide the classification model. However, this method requires at least three parties to participate in the calculation, which is complex and inefficient. HE can directly calculate the encrypted data, but it also requires a lot of computing costs [34, 35]. Bajard et al. [36] uses HE technology to protect the decision model and medical data, but it needs high computational load. Therefore, it is necessary to design an efficient and secure SVM scheme for cloud online disease diagnosis service. Wang et al. [37] proposed an efficient privacy preserving outsourced SVM scheme for Internet of medical things deployment, which protected training data privacy and guaranteed the security of the trained SVM model.

In this paper, we propose a new privacy preserving scheme for training and disease diagnosis of the multiclass SVM algorithm. We make a comparison analysis with the schemes in [3840]. The experimental results demonstrate that our scheme has more practical application values.

3. Preliminaries

In this section, we describe some techniques as the basis of our scheme, including the Paillier cryptosystem, BFV cryptosystem, and SVM algorithm.

3.1. Paillier Cryptosystem

In the training phase, the data are encrypted by the Paillier cryptosystem [41]. The Paillier cryptosystem is a public key cryptosystem with additive homomorphic operation. We will introduce the Paillier cryptosystem as follows.(i)Key generation: Set the security parameter . Choose two big primes , , is the Carmichael function of . Choose a random number , and . The public key is . The private key is .(ii)Encryption: Given . The message will be encrypted with . The ciphertext is expressed as , where is a random number.(iii)Decryption: According to the key generation stage and Carmichael’s theorem, . So . Then, .(iv)Homomorphic computation: Given two ciphertexts under the same public key . The homomorphic computations are defined as .

3.2. BFV Cryptosystem

In the prediction phase, the data are encrypted by the BFV cryptosystem [34]. BFV cryptosystem is a leveled-FHE public key cryptosystem based on RLWE, which can support unlimited times additive homomorphic operation and limited times multiplicative homomorphic operation.(i)Key generation: Generate a polynomial . The private key is defined as . Then, generate a polynomial from ciphertext polynomial space (polynomial ), . The polynomial is used to generate public key. Define a noise polynomial . The notation expresses the Gaussian distribution. The public key is .(ii)Encryption: The message . Define . The ciphertext is computed as .(iii)Decryption: To decrypt the ciphertext , define . The message is computed as .(iv)Homomorphic computation: BFV cryptosystem supports ciphertext batch processing. Define two -dimensional vectors encrypted under public key , . The homomorphic computations are defined as follows:

3.3. SVM Algorithm

SVM is a classical supervised learning algorithm to solve two kinds of classification problems. The SVM algorithm will find the best hyperplane. The classifier is a decision function , expresses positive class and expresses negative class.

There are two training methods for the SVM model: one is based on the SMO algorithm and the other is based on the gradient descent algorithm. Because the operation steps of the SMO algorithm are more complex, which makes a lot computation costs when using encrypted data. Therefore, we choose gradient descent to realize the privacy preserving SVM model training. In the SVM model training process based on the gradient descent method, the objective function needs to be minimized. When , it means that the classification is correct. The and the parameters do not need to be updated. When , it means that the classification is incorrect. The and the parameters need to be updated.

4. System Overview

In this section, we will introduce our system model, security goals, and threat model.

4.1. System Model

Our system model should achieve the privacy preserving training and online disease diagnosis process. Therefore, our system model is designed as shown in Figure 2.

There are six participants in our system model, which are trusted authority (TA), medical centers (MCs), cloud storage server (CSS), cloud computation server (CCS), diagnosis service provider (DSP), and users.(i)Trusted authority (TA): TA is the fully trusted party of the whole system, which is used to generate and distribute keys for other participants in the system. After initialization, TA will stay offline.(ii)Medical centers (MCs): Each MC has its own local medical data. To reduce the local storage cost, MCs will outsource the medical data to CSS for storage.(iii)Cloud storage server (CSS): CSS has the ability to store and manage outsourced data. CSS can perform privacy preserving computation with its powerful computation power.(iv)Cloud computation server (CCS): CCS assists CSS to complete privacy preserving computation.(v)Diagnosis service provider (DSP): DSP wants to train a machine learning model on the outsourced data from MCs and provides online aided disease diagnosis for users. Due to the limited computation and communication ability, DSP will outsource the training and diagnosis to CSS.(vi)Users: Users are patients or doctors who have unlabeled samples and want to get the diagnosis results. The users will send encrypted diagnosis requests to CSS and obtain the encrypted results. The users can decrypt the results with own private key.

4.2. Security Goals

In order to meet the security requirements of outsourced training and diagnosis, our scheme will achieve the following security goals.(i)Medical data privacy: The outsourced data of MCs will not be leaked to other participants in the whole machine learning process.(ii)Model privacy: Other participants cannot learn any useful information about the model of DSP.(iii)Users privacy: The diagnosis requests and results of users will not be acquired by other participants.(iv)Intermediate results privacy: In the execution of protocols, any participant will not infer other participants’ sensitive information through the intermediate results.

In our scheme, the training and diagnosis processes are completed by CSS and CCS. All participants are semi-honest (or honest-but-curious). Specifically, they will honestly implement the secure computation protocols, but they will try to analyze the sensitive data and intermediate results to infer the useful information of other participants. Like the previous works, we assume that CSS and CCS will not collude. Because CSS and CCS belong to different commercial companies, they will not collude with each other for their own reputation.

4.3. Threat Model

In this paper, we will define three attacks in our system model.(i)Eavesdropping attack: This attack means that an adversary can eavesdrop and analyze data during the data transmission. The data transmission includes outsourcing process and the interaction between participants in protocol implementation.(ii)Honest-but-curious attack: All participants will implement the protocol honestly, but they will infer the useful information during the execution of protocols.(iii)Client-collusion attack: In the training and diagnosis process, some clients may collude to analyze the useful information of other participants.

5. Proposed Scheme

In this section, we describe the proposed scheme in detail. Our scheme mainly includes system initialization, privacy preserving machine learning training, and online disease diagnosis.

In order to accurately describe our proposed scheme, we give the description of used notations in Table 1.

5.1. System Initialization

In the system initialization phase, TA generates system parameters and distributes the parameters for MCs, CSS, CCS, and DSP, respectively. TA sends the parameters through the secure communication channel. Then, TA will stay offline. We assume that there are MCs in our system. Because the Paillier cryptosystem and BFV cryptosystem can only encrypt integers, the floating point numbers and negative numbers should be converted into integers. Therefore, all participants should make data conversion before encrypting their sensitive information.

5.1.1. Generate System Parameters
(1)Generate a public-private key pair of the Paillier cryptosystem and a public-private key pair of the BFV cryptosystem. The BFV plaintext space is . The public keys are public and the private keys are sent to the CCS.(2)Generate a public-private key pair of the Paillier cryptosystem and a public-private key pair of the BFV cryptosystem for CSS. The BFV plaintext space is . The public keys are public and the private keys are sent to CSS.(3)Generate a public-private key pair of the Paillier cryptosystem for DSP. The public key is public and the private key is sent to DSP.(4)Generate a random integer . TA randomly splits to integers, satisfying and sends to . Then, generate two lists and . Each list has random integers, . Each element in and represents the ID of each MC. When sends authentication to CSS, will hide and with and , respectively. The is sent to . and are sent to CSS.
5.1.2. Data Conversion

In the machine learning application scenario, data and model parameters contain floating point numbers and negative numbers.

For a floating point number , we enlarge to ( is the precision of floating point numbers). For example, given a floating point number and the precision , we can convert into an integer . For a negative number , we divide the plaintext space ( is expressed the plaintext space of the Paillier or BFV cryptosystem) into two parts because all variables and intermediate results in the process of training and prediction are much smaller than . An integer in represents a positive integer and represents a negative integer. When encrypting the negative integer , it is converted to encrypt . If is both a floating point number and a negative number, is first converted into a negative integer.

5.2. Privacy Preserving Machine Learning Training

The privacy preserving machine learning training process is completed by CSS and CCS. We assume that the amount of outsourced data is .

5.2.1. Local Data Outsourcing

To protect the privacy of MCs’ local data, MCs will encrypt the data before outsourcing. The outsourcing process of is as follows.(1) generates a random integer . Computing as public key and the private key is .(2)Computing .(3)For each plaintext data, such as , will make a data conversion as mentioned in Section 5.1. Then, compute to encrypt and outsource the encrypted data to CSS for storage.

5.2.2. Secure Basic Building Blocks for Training

To complete the privacy preserving outsourced training, we construct some algorithms as basic building blocks based on the Paillier cryptosystem: secure data aggregation (Block_1), secure multiplication algorithm (Block_2), secure inner product algorithm (Block_3), secure scalar multiplication of vector algorithm (Block_4), and secure symbol judgment algorithm (Block_5). The algorithms will be executed with CSS and CCS.

(1) Secure data aggregation algorithm (Block_1). CSS needs to aggregate MCs’ outsourced data before starting machine learning training. The algorithm works as follows and is described in Algorithm 1.(1)CSS sends a training request to , .(2)After receiving the training request, computes as authentication (The indicates that CSS is allowed to use the outsourced data of for training) and sends to CSS.(3)CSS obtains the of through the and computes . It should be noted that can be obtained only after all MCs have sent their authentication. Then, CSS computes and completes the aggregation.

(2). Secure multiplication algorithm (Block_2). Given two encrypted integers and , the algorithm needs to compute . The algorithm works as follows and is described in Algorithm 2.(1)CSS generates two random integers and . Then, it computes by applying the additive homomorphism, obtaining the following results.Then, sending them to CCS.(2)CCS generates a random integer . It decrypts by using . Then, it encrypts with to get . Computing and encrypting with . Sending and to CSS.(3)CSS decrypts with and computes . Then, computing by applying the additive homomorphism, obtaining the following results.Computing the result,

(3). Secure inner product algorithm (Block_3). Given two encrypted vectors . The algorithm will compute and is described in Algorithm 3.

Input: the authentication and outsourced data of .
Output: the training data.
CSS:
(1) Send a training request to , .
MCs:
(2):
   sends to CSS
 end CCS:
(3) CSS obtains ,
(4) Compute
(5):
  Compute
  For each outsourced data of , such as ,
  Compute to complete aggregate
 end
Input:
Output:
CSS:
(1)
(2)
(3) Send to CCS.
CCS:
(4)
(5) Decrypt
(6)
(7) Encrypt with
(8) Encrypt with
(9) Send and to CSS.
CSS:
(10) Decrypt with and Compute
(11)
(12)
(13)
(14)
Input:
Output:
CSS:
(1) Define .
(2):
  
 end

(4). Secure scalar multiplication of vector algorithm (Block_4). Given a encrypted vector and a encrypted integer , the algorithm will compute and is described in Algorithm 4.

Input:
Output:
CSS:
(1) Define .
(2):
  
 end
Input:
Output:
CSS:
(1)Choose a random integer
(2)
(3)Send to CCS
CCS:
(4)Decrypt
(5)
(6)Encrypt with
(7)Send to CSS

(5). Secure symbol judgment algorithm (Block_5). Given an encrypted integer , the algorithm will compute the sign of . Let if else . The algorithm works as follows and is described in Algorithm 5.(1)CSS chooses a random integer . Then, it computes by applying the additive homomorphism and sends to CCS.(2)CCS decrypts . Let if else . Then, it sends to CSS.(3)CSS decrypts and obtains the symbol .

5.2.3. Privacy Preserving Outsourced Training with Multiclass SVM

In this section, we construct a privacy preserving outsourced training protocol to train a multiclass SVM model using the proposed building blocks. DSP outsources the training task to CSS and CSS completes the aggregation of outsourced data. Then, CSS and CCS complete the model training. After finishing the training, CSS transforms into . To achieve the transformation, we use the algorithm proposed in reference [38].

For multiclass SVM training, there are two methods: one to rest (ovr) and one to one (ovo). In order to improve the efficiency and reduce the number of iterations, we choose the ovr method for training. We need to construct binary SVM classifiers, each of which corresponds to one classification. The process is described in Algorithm 6.

Input: outsourced data of MCs
  ,
  iterations , learning rate ,
  regularization parameter
Output: encrypted binary SVM classifiers parameters
   
(1):
  :
   
   :
(2)    
(3)    
(4)    :
     
     
   end
(5)   
  end
 end
(6) return
5.3. Privacy Preserving Online-Aided Disease Diagnosis

In this section, our proposed scheme consists of four steps: diagnosis outsourcing, secret diagnosis request generation, diagnosis values computation, and diagnosis result generation. The privacy preserving online-aided disease diagnosis is completed by CSS and CCS.

5.3.1. Diagnosis Outsourcing

To reduce the computation and communication overhead, DSP outsources the SVM model parameters to CSS and authorizes CSS to provide diagnosis service for users.

The SVM parameters of DSP are expressed as (There are classifiers),

For and the corresponding class result , DSP generates a -dimensional random integer vector and a random integer , . Then, DSP computes and to hide the parameters class results.

According to the combination of subtraction of random integer vectors, DSP constructs a combination table. The combination table has values, as shown in Table 2. The values in combination table are used to eliminate the blinding factors in subsequent computation.

DSP encrypts with , with , with and all values of combination table with . Then, DSP sends them as the outsourced parameters to CSS. After receiving the outsourced parameters, CSS decrypts and the combination table with . CSS computes as follows:

5.3.2. Secret Diagnosis Request Generation

For , the symptom is expressed as (The last 1 is added to facilitate the computation of vector inner product). The generates a -dimensional random integer vector and . Then, hides plaintext symptoms .

The encrypts symptom with and encrypts with . Let as the secret prediction request of .

Then, the sends to CSS.

5.3.3. Diagnosis Value Computation

In our proposed diagnosis scheme, it is a multiclassification problem, so it is necessary to compute the diagnosis value of each classification. After receiving the secret prediction request , CSS decrypts with . Then, it computes by the homomorphic operation of the BFV cryptosystem.

According to the decision function of the SVM algorithm, a diagnosis value needs to be computed by one multiplication homomorphic operation and one addition homomorphic operation. Because the BFV encryption algorithm supports ciphertext packaging, batch operation can be realized and the computation efficiency is significantly improved. The process is described in Algorithm 7.

Input: , ,
Output: encrypted diagnosis values
   
CSS:
(1):
  
 end
5.3.4. Diagnosis Result Generation

After computing the diagnosis values, CSS obtains encrypted diagnosis values and each value corresponds to a class result. Then, CSS needs to select the classification corresponding to the maximum value from the encrypted values as the diagnosis result.

Therefore, we design a secure maximum find protocol and a secure comparison algorithm. In this process, CSS and CCS jointly execute the protocol.

(1). Secure maximum finding. CSS sets an initial maximum position . Then, CSS executes cycles and each cycle executes a secure comparison algorithm to continuously update the value.

After cycles, CSS obtains the final diagnosis result and converts into under the public key of . To achieve the transformation, we use the algorithm proposed in literature [38]. Then, CSS sends to . The decrypts the encrypted result with . The process is described in Algorithm 8.

Input: diagnosis values and corresponding class results
   ;
   initial
Output:
CSS:
:
(1)
(2):
  
end
(3) Transform into with CCS
(4) Send to .
:
(5) decrypts with .

(2). Secure comparison (SC). For the i-th cycle, CSS computes . Then, according to and , computing . The corresponds to the value in the combination table and computing as follows:

At this time, has eliminated in .

CSS chooses equal random integers and . Computing . Then, CSS chooses different random integers, and computing . Summing all elements in to get and encrypting it with . CSS sends to CCS.

CCS decrypts with and with , . Then, summing each dimension, .

CCS removes from by computing . Let if , else .

CCS encrypts with and sends it to CSS. CSS decrypts it and if , updates the value of .

The process is described in Algorithm 9.

Input:
Output:
CSS:
(1)
(2)
(3)
(4)
(5) Generate .
(6)
(7) Generate .
(8)
(9)
(10) Send to CCS.
CCS:
(11) Decrypt .
(12),
(13)
(14)
(15) Encrypt . Send it to CSS
CSS:
(16) Decrypt

6. Security Analysis

In this section, we analyze the security of the proposed scheme. The focus is on the outsourced data of MCs, the SVM model parameters of DSP, the symptoms, and diagnosis results of users.

6.1. Security Analysis of Training

In the training phase, the outsourced data of MCs and the SVM model parameters of DSP need privacy preserving. The training protocol is composed of building blocks designed in Section 5.2.2, which are completed by CSS and CCS. According to the threat models proposed in Section 4.3, we analyze the security of the training protocol.

6.1.1. Eavesdropping Attack

The data transmission process in the training phase includes that MCs outsources the encrypted data to CSS and the interactions of training protocol between CSS and CCS.

In the outsourcing process, the data of have been encrypted. combines the system public key , parameter , and its own public key to ensure that the data are hidden while encrypting. Suppose an adversary obtains the private key and eavesdrops when MCs outsource their data to CSS. Because the data of have been encrypted, such as , the adversary cannot obtain any useful information. Similarly, the authentications are also hidden by random numbers. In the training protocol execution process, CSS and CCS will interact and the transformed data have been encrypted and hidden the real values with random numbers. The adversary also cannot obtain any useful information.

6.1.2. Honest-But-Curious Attack

During the training phase, CSS and CCS will get some intermediate results from the proposed building blocks in Section 5.2.2.

In the Block_2, CSS hides with by homomorphic operation before sending them to CCS. Then, CCS sends and to CSS after computing. Therefore, both CSS and CCS cannot learn any useful information about . Because the Block_3 and Block_4 are designed based on the Block_2, we will not analyze them. In the Block_5, CSS hides with and sending to CCS. CCS can only know the symbol of , but cannot obtains the real value of . CCS only returns the result (0 or 1) to CSS. Through the abovementioned analysis, CSS and CCS cannot learn any useful information in the training process.

6.1.3. Client-Collusion Attack

For MCs, each only know its own . Therefore, if MCs collude with each other to steal the privacy of another MC, they cannot learn any useful information.

6.2. Security Analysis of Disease Diagnosis

In the diagnosis phase, the SVM parameters of DSP, the symptom and the diagnosis result of need privacy preserving. The diagnosis process consists of diagnosis outsourcing, secret diagnosis request generation, diagnosis value computation, and diagnosis result generation. Therefore, we conduct security analysis on the main steps by the threat model.

6.2.1. Eavesdropping Attack

The data transmission process includes that DSP outsources and to CSS, sends request to CSS and the interaction of diagnosis process between CSS and CCS.

Through the encrypted data of outsourcing process, it can be seen that the adversary(CCS) can only decrypt and with and . However, the adversary cannot learn because of the and the do not contain any useful information. When sends to CSS, the symptom may be eavesdropped and decrypted by the adversary, but is hidden by random numbers. In the interaction of SC algorithm between CSS and CCS, all transmitted data are hidden by random numbers and ciphertext state, so the adversary cannot learn any useful information.

6.2.2. Honest-But-Curious Attack

In the diagnosis value computation process, CSS can only obtain the encrypted diagnosis values under and does not know the corresponding classification meaning. The whole process is executed in the ciphertext state, so CSS cannot learn any useful information. The process of diagnosis result generation consists of secure maximum finding protocol and secure comparison algorithm. When CSS and CCS execute the secure comparison algorithm, CSS computes the difference between the two encrypted vectors to be compared. The obtained difference vector can confuse the positive and negative of the two numbers on each dimension of the original two vectors. At the same time, random integers are used to hide the difference vector. After decrypting the difference vector, CCS can eliminate the random number only after summing. During this process, CSS and CCS cannot obtain any useful information.

After CSS and CCS execute secure maximum finding protocol, CSS obtains the diagnosis result . When performing key conversion on , CSP hides with a random integer . Then, sending to CCS. CCS can decrypt it. However, because there is a random integer hidden, CCS cannot obtain .

6.2.3. Client-Collusion Attack

For all users, they can only get the diagnosis results and cannot get any other information. Therefore, our proposed scheme can resist the client-collusion attack.

7. Performance Evaluation

In this section, we implemented our scheme and evaluated the performance of training and diagnosis.

Our experimental environment is shown in Table 3.

In our experiments, we evaluated our proposed scheme with a real dataset from UCI machine learning library called dermatology. The dermatology dataset is a multiclassification dataset with 6 categories and 34 symptoms.

7.1. Privacy Preserving Machine Learning Training Evaluation
7.1.1. Effect of Key Length on Computation Overhead

The key length in cryptosystem has a great impact on efficiency and security. Therefore, we tested the data encryption time and main building blocks time (Block_1 and Block_3), which have high computation overhead. The test results are shown in Table 4.

From Table 4, it can be seen that the increase of key length has a great impact on the computation overhead. Based on the experimental results and security considerations, the key length of the Paillier cryptosystem is set to 1024 bit in the training phase.

7.1.2. Privacy Preserving Multiclass SVM Training Analysis

In order to meet the requirements of data encryption, we convert all floating-point numbers to integers. The conversion accuracy of floating-point numbers has a great impact on the accuracy of the SVM model. We tested the accuracy of the SVM model under different values; the results are shown in Figure 3.

Through the abovementioned experimental analysis, it can be seen that the larger the , the higher the accuracy of the model. With the increase of , the accuracy of the model tends to be stable. When , the accuracy of the model is the highest. At the same time, we also used the gradient descent method to train the SVM model in the plaintext state. We compared the accuracy with the model trained in ciphertext state and the results are shown in Table 5.

Through the abovementioned experimental analysis, it can be seen that the accuracy of our proposed scheme is the same as the plaintext state (98.61%). Therefore, it is verified that our proposed scheme is correct and available.

7.2. Privacy Preserving Online-Aided Disease Diagnosis Evaluation

We implemented our proposed scheme by using SEAL library in the diagnosis phase.

7.2.1. Noise Effect of BFV Cryptosystem

When using the BFV cryptosystem for homomorphic operation, the influence of noise needs to be considered. The noise of ciphertext will be increased when the multiplication homomorphic operation is carried out. If the noise is too large after computation, the correct result cannot be obtained after decryption.

Therefore, the BFV cryptosystem in SEAL will set the during initialization. If the is greater than 0 after the computation, it can be decrypted correctly. The value of is related to the setting of parameters. We evaluated the influence of poly module degree on the encryption time, the change of noise budget after homomorphic operation, the computation time and whether the decryption result is correct. The results are shown in Table 6. It can be seen that the noise consumption of the BFV cryptosystem is relatively large when performing multiplication homomorphism, so the BFV cryptosystem can only perform multiplication homomorphism for a limited number of times. When computing the diagnosis values, only one inner product operation and one addition operation are required. Therefore, it is completely feasible to use the BFV cryptosystem.

We comprehensively consider the encryption time and computation time and ensure that the computation results can be decrypted correctly. The parameter we set is .

7.2.2. Influence of Different Classification Numbers on Computation Overhead

When using the BFV cryptosystem to encrypt data, multiple plaintext data can be packaged and encrypted into a ciphertext. The number of classifications is .

We tested the impact of different on and DSP. The results are shown in Figure 4(a). With the increase of , the encryption time of DSP is gradually increasing, and the encryption time of can be considered as unchanged. We also tested the impact of different on the diagnosis values computation of CSS. The results are shown in Figure 4(b). With the continuous increase of , the computation time for CSS is also increasing. The process of generating diagnosis result is jointly completed by CSS and CCS. We tested the effect of different on the diagnosis result generation. The results are shown in Figure 4(c). With the continuous increase of , the time for CSS and CCS is also increasing.

7.2.3. Comparison Analysis of Secret Diagnosis Request Generation and Diagnosis Values Computation

In our proposed scheme, secret diagnosis request generation can be regarded as data encryption of and diagnosis value computation can be regarded as homomorphic operation. We compared with the other three privacy preserving schemes. The results are shown in Table 7.

Through the comparison analysis, it can be seen that the time of data encryption in our proposed scheme is significantly reduced compared with [38, 39]. In the computation of decision function, our scheme has significantly reduced the computational cost compared with the scheme in [39, 40]. At the same time, it can be seen from the total time that our proposed scheme is significantly lower than the other three schemes.

Next, we make further analysis. The names of participants may be slightly different in different schemes. In order to facilitate analysis, we divided participants into cloud server and client. We compared the computation overhead of cloud server and client, respectively. The results are shown in Tables 8 and 9.

In our proposed scheme, the client only needs to encrypt the data and can be offline after uploading the data to the cloud server. The cloud server only needs to compute the decision function. This model reduces the computation overhead of the client to the greatest extent and performs privacy preserving computation through the powerful computing power of the cloud server. In scheme [38], the cloud server does not participate in the whole process, so it brings heavy computation overhead to the client. In scheme [39], the computation of the diagnosis values needs to be completed by the cloud server and the client. Therefore, it not only brings heavy computation overhead to the client but also requires the client to always stay online in this process.

7.2.4. Comparison Analysis of Diagnosis Result Generation

In our proposed scheme, after CSS completes the diagnosis values computation, it will jointly execute the secure protocol with CCS to generate the diagnosis result. We continued to make comparison analysis with schemes in [3840]. The results are shown in Table 10.

Through the comparison analysis in Table 10, it can be seen that the computation time of our proposed scheme is significantly lower than the scheme in references [38, 40]. In our proposed scheme, the client does not need to participate in the process of diagnosis result generation. The schemes in references [38, 39] require the participation of the client, which brings heavy computation overhead to the client.

7.2.5. Comprehensive Comparison Analysis

We made a comparison analysis of the whole privacy preserving online disease diagnosis process. It is divided into the secret diagnosis request generation (data encryption), diagnosis value computation, and diagnosis result generation. The results are shown in Table 11.

Through the comparison analysis in Table 11, the total time of our proposed scheme is significantly lower than the schemes in references [3840]. Considering that in the actual application scenario, a large number of users will constantly initiate secret diagnosis requests. It is very important to be able to quickly respond to the diagnosis results for users. Therefore, our scheme has more practical application value. Then, we made a summary as shown in Table 12.

8. Conclusion

In this paper, we propose an efficient and privacy preserving outsourced multiclass SVM training and online-aided disease diagnosis scheme. We design some secure basic operation algorithms for machine learning training over the outsourced data from multiple data owners. We achieve a privacy preserving multiclass SVM training based on the basic operation algorithms. In the diagnosis phase, we achieve a privacy preserving multiclass diagnosis through our proposed the secure maximum find algorithm and secure comparison algorithm. Security analysis proves that our proposed scheme ensures that outsourced data, model parameters, users’ symptoms, and diagnosis results will not be leaked. Experimental evaluation illustrates that our proposed scheme significantly reduces the computation overhead. In the future, we will study more efficient and privacy preserving machine learning schemes.

Data Availability

The data supporting the results of this study can be obtained from the corresponding author.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The work was supported in part by the National Natural Science Foundation of China (61862052) and the Science and Technology Foundation of Qinghai Province (2019-ZJ-7065).