Abstract

With the continuous improvement of computation and communication capabilities, the Internet of Things (IoT) plays a vital role in many intelligent applications. Therefore, IoT devices generate a large amount of data every day, which lays a solid foundation for the success of machine learning. However, the strong privacy requirements of the IoT data make its machine learning very difficult. To protect data privacy, many privacy-preserving machine learning schemes have been proposed. At present, most schemes only aim at specific models and lack general solutions, which is not an ideal solution in engineering practice. In order to meet this challenge, we propose an efficient and privacy-preserving machine learning training framework (ePMLF) in a fog computing environment. The ePMLF framework can let the software service provider (SSP) perform privacy-preserving model training with the data on the fog nodes. The security of the data on the fog nodes can be protected and the model parameters can only be obtained by SSP. The proposed secure data normalization method in the framework further improves the accuracy of the training model. Experimental analysis shows that our framework significantly reduces the computation and communication overhead compared with the existing scheme.

1. Introduction

At present, the application of the IoT can realize the real-time collection of user data. The large amount of data generated every day provides a good basis for training high-quality machine learning models. However, privacy issues hinder the application of machine learning [1, 2]. On the one hand, training data contains people’s sensitive information, which is not allowed to be disclosed. On the other hand, the trained machine learning model is a valuable property of SSP and the leakage of the model will bring serious economic losses to SSP. Therefore, the training process faces a serious problem of privacy disclosure. The strong privacy requirements of data make training very difficult.

In order to solve the problem of privacy disclosure in the training process, many privacy-preserving machine learning training schemes have been proposed. There are two main solutions: secure collaborative training and secure outsourcing training. The main application scenario of secure collaborative training is federated learning [3]. In federated learning, clients complete the training locally. Then, upload and download model parameters to the server. Therefore, frequent interaction leads to a high communication overhead of federated learning [4]. In addition, the goal of federated learning is to train a global model for each data provider, which is different from ours [5]. In this paper, the goal of our proposed framework is to train a private model for SSP by using the ciphertext data of data providers (fog nodes). Secure outsourcing training is mainly based on cloud computing. Cloud servers provide a lot of storage and computation resources. Data providers use homomorphic encryption or secret sharing technologies to convert their data into ciphertext data and outsource them to cloud servers. In [68], they all adopt the dual cloud server model and assume that the two cloud servers do not collude. However, this assumption has potential security hazards [5].

Through the above analysis, the existing privacy-preserving schemes mainly have two problems. First, these schemes are only for specific machine learning algorithms and cannot meet all machine learning algorithms. The second is the lack of global normalization processing of data. Normalization without disclosing the data can make the training process converge to the optimal solution more easily. To solve these problems, Zhu et al. [5] proposed a privacy-preserving ML training framework, which contains multiple secure training protocols for the aggregation scenario and defends security under collusion situations. However, the scheme in [5] does not make a global data normalization. In addition, there are still some shortcomings to be improved, including the function of building blocks is not comprehensive and high communication and computation overhead, which makes the framework impractical.

Therefore, we propose an efficient and practical privacy-preserving machine learning training framework in a fog computing environment. Specifically, with the continuous increase of data, it brings a heavy storage burden to IoT devices with limited resources. Therefore, deploying fog nodes with higher configurations near IoT devices has become an effective solution [9]. IoT devices will store the collected data in nearby fog nodes. We assume that a SSP wants to use the data in the fog nodes to train its own private model. In the training process, the data in the fog nodes will not be leaked to SSP and other fog nodes. The model of SSP will not be leaked to fog nodes. Our contributions are as follows:(1)Based on the requirements of data privacy-preserving in the IoT environment and the characteristics of fog computing, we propose ePMLF. Through ePMLF, fog nodes can protect the privacy of the data and let an untrusted SSP train different machine-learning models. At the same time, ePMLF provides the function of model updating.(2)The ePMLF implements secure data processing. We propose two secure data normalization methods (secure z-score and secure min-max), which can normalize the data distribution of all fog nodes and form high-quality training data. Based on the OU cryptosystem, we propose a method to encrypt negative numbers. Unlike the existing scheme [10], which can only encrypt negative numbers by the party holding the private key, our proposed method can allow any party to encrypt negative numbers.(3)In ePMLF, we define the ciphertext as the encrypted data and its precision, which can prevent the resulting error caused by inconsistent precision in ciphertext computation. Then, we design a precision control protocol, which can avoid the plaintext overflow caused by multiple homomorphic operations. Based on the ciphertext defined by ePMLF, we propose some secure algorithms as the basic building blocks of the framework. Experimental results show that these algorithms are helpful to improve training efficiency.(4)We implement the proposed framework. Strict security analysis shows that our framework meets the strong privacy-preserving requirements of all participants. Through comparative evaluation, our scheme significantly reduces the communication and computation overhead.

The existing privacy-preserving machine learning training scenarios are mainly divided into two categories: secure collaborative training and secure outsourcing training.

In the scenario of secure collaborative training, each participant has some computation resources and their own data. Therefore, each participant undertakes some computation and communication tasks and cooperatively trains a global machine-learning model on the joint training data. Data owners should keep their data confidential during the training [1113]. Dani et al. [14] proposed protocols for solving the secure multi-party computation problem. Mehnaz et al. [15] proposed a secure sum protocol with strong security guarantees and used this protocol to propose two secure gradient-descent algorithms. Saha et al. [16] proposed a fog-enabled federated learning framework-FogFL to facilitate distributed learning and reduce communication latency and energy consumption of resource-constrained edge devices. Xu et al. [17] presented a secure and verifiable federated learning scheme, with which federated deep learning is achieved and the final learning results are verifiable. Wang et al. [18] proposed a privacy-preserving federated learning scheme for regression training, which is noninteractive in the whole training process. Zhao et al. [19] proposed a collaborative architecture based on orbital edge computing and low-orbit satellite network communication.

In the scenario of secure outsourcing training, data owners with limited computation resources outsource their ciphertext data to cloud servers. The cloud servers perform the privacy-preserving machine learning training process and train a private machine learning model for the training service requester. Liu et al. [6] designed a system for privacy-preserving decision tree training and evaluation in a twin-cloud architecture. Liu et al. [7] proposed a secure ML-kNN training and classification scheme. Wang et al. [8] proposed an efficient privacy-preserving outsourced SVM scheme, which protects the privacy of training data and the SVM model. Zhang et al. [20] proposed a secure deep computation model by offloading the expensive operations to the cloud. Li et al. [21] proposed an outsourced privacy-preserving C4.5 algorithm over horizontally and vertically partitioned data for multiple parties. Li et al. [22] proposed a privacy-preserving multi-party machine learning framework and the data owners do not need to participate in the training process. Liu et al. [23] proposed a privacy-preserving clinical decision support system in the outsourced cloud computing environment. Deploying machine learning services in the cloud has become a flexible training solution. However, this approach generally relies on the assumption of noncollusion, which is a serious security risk.

3. Preliminaries

3.1. Homomorphic Encryption
3.1.1. Okamoto-Uchiyama (OU) Cryptosystem

OU cryptosystem is a public key cryptosystem with additive homomorphism [24]. We will introduce OU cryptosystem as follows:(i)Key Generation: choose two big primes . Generate a random number . Compute . The public key is . The private key is .(ii)Encryption: given . The message will be encrypted with . The ciphertext is , where is a random number.(iii)Decryption: given a ciphertext . Compute .(iv)Homomorphic computation: given two ciphertexts under the same public key . The homomorphic computations are defined as .

3.2. Cloud-ElGamal Cryptosystem

In this paper, we select an enhanced version of ElGamal called Cloud-ElGamal [25], which supports multiplicative homomorphism and resists confidentiality attacks.(i)Key Generation: choose a big prime . Find a generator of . Generate a random integer and compute . The evaluation key is . The private key is .(ii)Encryption: given . Generate a random integer and compute . The ciphertext is .(iii)Decryption: Given a ciphertext . Compute .(iv)Homomorphic computation: given two ciphertexts under the same private key. The homomorphic computations are defined as .

3.3. Machine Learning Training

Machine learning training consists of data preprocessing and model training.

For data preprocessing, we focus on data normalization for continuous data (such as age and height). There are two main methods of data normalization: the min–max method and the z-score method.

The min-max method maps the values of attributes between 0 and 1 according to the maximum and minimum values of attributes in the dataset. For attribute , the min-max method will compute as follows:

The z-score method is to standardize attributes based on the average and standard deviation of attributes in the dataset. For attribute , we assume that the average of is and the standard deviation of is . The z-score method will compute as follows:

Through data normalization, high-quality training data can be formed.

For model training, there are many model training algorithms, which include many linear and nonlinear operations. For example, logistic regression, SVM, and naive Bayes. To train a logistic regression model, the parameter will be updated by computing . To train a SVM model, the parameter will be updated by computing . For training a naive Bayes model, we should compute the class prior probability and the conditional probability .

4. System Overview

In this section, we will introduce our proposed framework ePMLF, including the system model, design goal, and threat model.

4.1. System Model

In our system model, the goal is to train a private machine-learning model for SSP. The training task is completed by fog nodes and SSP. At the same time, the privacy data of each fog node will not be leaked to SSP and other fog nodes. SSP’s trained model will not be leaked to fog nodes. Therefore, our system model is designed as shown in Figure 1.

There are four participants in our system model, which are trusted authority (TA), IoT devices, fog nodes (FNs), and software service provider (SSP).(i)Trusted authority (TA): TA is a trusted authority and generates system parameters for all participants.(ii)IoT devices: IoT devices will produce a large amount of IoT data. However, their computation and storage resources are very limited.(iii)Fog nodes (FNs): fog nodes have strong computation and storage capacity. They collect, store and manage the data generated by IoT devices. The data owned by FNs belongs to sensitive information and cannot be leaked.(iv)Software service provider (SSP): SSP wants to train machine learning models through the data in the FNs. SSP is not trusted and will try to obtain the data of FNs during the training processing.

4.2. Design Goal

Based on the system model, our design goal includes privacy-preserving, training high-accuracy machine learning models, and low computation and communication overhead.

4.2.1. Privacy Preserving
(1)The data of FNs should be preserved(2)The trained model of SSP should be preserved
4.2.2. Training High Accuracy Machine Learning Model

When using ciphertext data from FNs to train machine learning models, it is very important to ensure the high accuracy of the trained models. Therefore, we need to generate high-quality training data through data normalization and design a general computing framework.

4.2.3. Low Computation and Communication Overhead

For achieving privacy-preserving training, the proposed ePMLF is designed based on cryptography technology. Cryptography technology will bring significant computation and communication overhead, so we should improve the framework efficiency as much as possible.

4.3. Threat Model

In our proposed ePMLF, we assume that FNs and SSPs are honest-but-curious (TA is trusted). They will implement the protocol honestly, but they try to obtain the data of other participants by analyzing the results of the processing.

At the same time, we allow FNs at most collude with each other to analyze the privacy of other participants and FNs at most collude with SSP to analyze the privacy of other participants, which is the same as [5]. To prove our proposed scheme is secure, we define an adversary. The adversary can eavesdrop and analyze data during the data transmission. The data transmission process is the interaction between participants in protocol implementation.

5. Our Proposed Framework

In this section, we describe our proposed framework in detail. The ePMLF mainly includes system initialization, data normalization, basic building blocks, privacy-preserving machine learning training, and machine learning model updating. The workflow of our proposed framework is shown in Figure 2.

In order to accurately describe our proposed scheme, we give the description of used notations in Table 1.

5.1. System Initialization

In the system initialization phase, TA generates system parameters for all participants.Improved OU encryption.Zhang et al. [10] achieved the negative integers encryption based on the OU cryptosystem. They divided the plaintext space into two parts as follows: represents positive integers and represents negative. For a negative integer , it should be converted to . It should be noted that is the private key of OU cryptosystem.Through the above analysis, we can find that the negative integers encryption method proposed by Zhang et al. can only be implemented by the participants with the private key, which is not practical. Therefore, we propose a new method to realize that any participants can encrypt negative integers based on the OU cryptosystem without disclosing the private key. Specifically, TA generates a large prime after generating . TA computes and publics . When a participant without a private key needs to encrypt the negative integer , computing and encrypting it. In decryption, is eliminated by modulus . The range of encrypted integers is . We prove the correctness of our proposed method as follows.Negative integers encryptionNegative integers decryption

Generate system parameters.(1)We assume that there are fog nodes in our system. TA generates a OU public-private key pair and a Cloud-ElGamal evaluation-private key pair for FN .(2)TA generates two random integers , , and splits , to random integers, , . Then, TA distributes , to and , to SSP by a secure communication channel.(3)TA generates a OU public-private key pair for SSP.

5.2. Privacy Preserving Data Normalization

In our proposed framework, SSP uses the encrypted data of all FNs to train a machine-learning model. Therefore, we need to normalize the data by all FNs, which can improve the quality of training data.

The data format of is . In the processing of data normalization, any participants cannot know the data of .

Secure z-score as follows:(1)For the -th dimension data, computes . Then, encrypts with and sends and to SSP.(2)SSP computes and sends it to each FN.(3) computes and .Secure min–max as follows:

In our proposed secure min-max, the (3) and (4) computation method is the same as [18].(1)Each FN computes the maximum and minimum values of each dimensional data. can obtain and .(2)For , setting . Starting with , and input and into the comparison protocol [26] respectively to compare. If , .(3)For , generates a random integer , and computes . Then, publics ().(4)After computing the global maximum and minimum values, the data will be standardized according to and . computes as follows:

5.3. Basic Building Blocks

In order to make our framework realize the privacy-preserving machine learning training, we will modularize the proposed framework by designing some general building blocks.

5.3.1. Precision Control

In the machine learning training process, many data are floating-point numbers. Therefore, it is necessary to convert floating-point numbers to integers before encrypting. Generally, the conversion method is to multiply the floating-point number by or [26], is the precision. This method will significantly expand the original data. However, the plaintext space of the encryption algorithm is limited. Using the expanded data for homomorphic operation will lead to the problem of plaintext overflow and the pression of the data will change. For example, , .

To solve this problem, we propose a method called precision control. We express the encrypted data as . The is the ciphertext data and the is the precision of the data. For example, is expressed as . Through the method, we can know the current precision of the data. When , we need to reduce the precision bits of encrypted data to avoid plaintext overflow. In order to better control the precision, we require that any participant must set before encrypting the data. The detailed method is as follows and is described in Algorithm 1.(1)SSP chooses a random integer . For , SSP sends to . For , SSP sends to .(2)For computes . For , computes . Then, encrypts (or ) and sends (or ) to SSP.(3)For , SSP computes . For , SSP computes the inverse of modulo and obtains .

Input: or ,
Output: or
SSP:
(1)Send
or to
:
(2)Decrypt
or
(3)Compute
or and encrypt
(4)Send or to SSP
SSP:
(5)Compute
or
5.3.2. Secure Addition

Given two ciphertext data encrypted with , and . SSP will obtain .

5.3.3. Secure Subtraction

Given two ciphertext data encrypted with , and . SSP will obtain .

5.3.4. Secure Multiplication (OU)

Given a ciphertext data encrypted with , and a plaintext data of SSP, . SSP will obtain .

5.3.5. Secure Multiplication (Cloud-ElGamal):

Given two ciphertext data encrypted with , and . SSP will obtain .

5.3.6. Secure Division:

The algorithm is inspired by [27]. Given two ciphertext data encrypted with , and . SSP will obtain . The computation process is as follows:(1)SSP chooses two random integers and computesThen, SSP sends them to (2) decrypts , and obtains . Sending to SSP(3)SSP computes Secure power computation

Given a ciphertext data encrypted with , and a plaintext data of SSP, . SSP will obtain . The computation process is as follows:(1)SSP computesChoosing a random integer , and computes , . Then, sending and to .(2) decrypts and computesThen, sending to SSP(3)SSP computes

5.3.7. Secure Inner Product

Given an encrypted vector of , and a plaintext vector of SSP, . SSP will obtain

5.3.8. Secure Summation

SSP wants to obtain the summation of and belongs to . The process is described in Algorithm 2.(1) encrypts with and sends to SSP(2)SSP computes and decrypts . Then, SSP can obtain

Input:
Output:
FNs:
:
(1) computes
(2) Encrypt with and send to SSP
SSP:
(3) Compute
(4) Decrypt and compute
5.3.9. Secure Sigmoid Function:

To complete the nonlinear computation of the sigmoid function, we propose an algorithm called Secure Sigmoid Function. The process is described in Algorithm 3.(1) has . encrypts with and sends it to SSP(2)SSP has . SSP computeswhere . Then, SSP chooses two random integers , and sends , to (3) decrypts and computesThen, sending to SSP(4)SSP chooses two random integers computes as follows:Then, SSP sends and to (5) decrypts , and obtainsThen, SSP sends to SSP(6)SSP computes

Input:
Output:
:
:
(1)Send to SSP
SSP:
(2)
(3)
(4)
(5)
(6)Choose two random integers ,
(7)Send , to
:
(8)Decrypt and compute
(9)Send to SSP
SSP:
(10)Choose two random integers
(11)
(12)
(13)Send and to
:
(14)Decrypt ,
(15)
(16)Send to SSP
SSP:
(17)
5.3.10. Secure Sign Computation

Given encrypted data that is computed by SSP, . needs to know if but cannot know the value of .

SSP flips a coin and chooses a random integer . If , SSP computes . If , SSP computes . Then, SSP sends the computed data to . decrypts and obtains . Let if else . sends . If and , SSP will know . If and , . If and , . If and , .

Converting Cloud-ElGamal to OU

For this building block, we cite building block 7 of [5].

Converting OU:

For this building block, we cite building block 8 of [5].

5.4. Privacy Preserving Machine Learning Training

In this section, we achieve four training protocols based on our proposed building blocks, which are popular machine learning models. We assume that the training data has been normalized before training.

5.5. Secure Logistic Regression (LR) Training

We use the stochastic gradient descent (SGD) algorithm to train a Logistic Regression model. SSP randomly selects data of FNs (all data of FNs have been numbered). The process is described in Algorithm 4.

Input: the selected data of FNs, iterations , learning rate
Output: model parameters
FNs:
:
(1) encrypts the selected data with
Such as , send to SSP
SSP:
:
(2)
:
(3)SSP performs
and
(4)
(5)Converting OU:
(6)
5.6. Secure SVM Training

For training a SVM model, we also use the SGD algorithm. The process is described in Algorithm 5.

Input: the selected data of FNs,
iterations , learning rate , regularization parameter
Output: model parameters
FNs:
:
(1) encrypts the selected data with
Such as ,
send , and to SSP
SSP:
:
(2)
(3)
(4)SSP and perform
 for ,
:
(5)In the process of :
: will judge
: will judge
SSP:
(7)To compute , SSP performs
and
:
:
Converting OU:
:
:
Converting OU:
5.7. Secure Naive Bayes (NB) Training

To train a naive Bayes model, the user needs to aggregate the class prior probability and the conditional probability from FNs. We assume that it is a binary classification problem. The process is described in Algorithm 6.

Input: the data of FNs
Output: model parameters ,
FNs:
:
(1) computes ,
Encrypt them with and send them to SSP
SSP:
(2)Compute ,
(3)Decrypt them with
:
(4)
(5)
5.8. Secure Deep Neural Network Training

The training process of deep neural networks contains nonlinear computations such as ReLU. To compute a nonlinear activation function, we will make an approximation method. The approximation method is proposed in [28] and the nonlinear activation functions will be converted to polynomial functions. Then, the training process can be completed with our proposed building blocks. In addition, we use the training method proposed in [29] to train the deep neural network models.

5.9. Privacy Preserving Machine Learning Model Updating

With the continuous increase of fog node data in the system or the addition of new fog nodes in the system, the quantity and quality of the overall data will be significantly improved. Therefore, it is very important to update the SSP’s trained model. For different types of machine learning models, we propose different updating methods. Specifically, we divide the machine learning model into the model trained by the gradient-descent method and the model trained by a nongradient-descent method. In order to prevent a differential attack, the model trained by the nongradient-descent method can be updated only when the data of the original fog nodes increases to more than 50%.

A model trained by the gradient-descent method:

For the model trained based on the gradient-descent method (e.g., LR and SVM), the updating process of the model is as follows:(1)TA determines the fog nodes and SSP participating in the updating process and redistributes the system parameters for them (in the way of 5.1).(2)FNs and SSP train a new model on the added data. SSP will obtain the new model .(3)Based on the and the original model , SSP will compute the updated model ( is the number of original data and is the number of added data).

5.10. Model Trained by Nongradient-Descent Method

For the model trained by the nongradient-descent method (for example, naive Bayes), the updating process of the model is as follows:(1)TA redistributes system parameters for all fog nodes and SSP to update the model (in the way of 5.1)(2)FNs and SSPs use all the data to retrain a model. SSP takes the model as the updated model

6. Security Analysis

In this section, we make a security analysis of our proposed framework with the real and ideal paradigm and composition Theorem, like [27]. We use a simulator in the ideal world to simulate the view of an adversary (honest-but-curious) in the real world. We consider the adversaries are FNs and SSP, . We will prove the security of our proposed basic building blocks.

Theorem 1. The precision control is secure against honest-but-curious adversaries .

Proof. We construct two simulators based on and SSP, . The computes (or ) and cannot distinguish the real execution from the ideal simulation due to the semantic security of the OU cryptosystem (or Cloud-ElGamal cryptosystem). For , the view of is (or ). Because the statistics are distinguishable, cannot distinguish the real execution from the ideal simulation.

Theorem 2. The secure addition, secure subtraction, secure multiplication (OU), secure multiplication (Cloud-ElGamal), and secure inner product are secure against honest-but-curious adversaries .

Proof. It should be noted that these basic building blocks are similar, so we will make a security analysis for secure addition as an example.
For secure addition, the simulator is the same as Theorem 1. Based on the semantic security of the OU cryptosystem, cannot distinguish , and . The view of is indistinguishable in the real and ideal world.

Theorem 3. The secure division is secure against honest-but-curious adversaries .

Proof. Because , are random integers, and are statistics distinguishable for . The simulator can only obtain .

Theorem 4. The secure power computation is secure against honest-but-curious adversaries .

Proof. Because , , are random integers, , and are statistics distinguishable for . The simulator can only obtain .

Theorem 5. The secure summation is secure against honest-but-curious adversaries .

Proof. The of is computed through , so the cannot obtain the value of based on the statistics distinguishable. At the same time, is encrypted with . For , the adversaries cannot distinguish the real execution from the ideal simulation due to the semantic security of the OU cryptosystem.

Theorem 6. The secure sigmoid function is secure against honest-but-curious adversaries .

Proof. The performs computation on and , which means that cannot obtain the value of . Because the statistics are distinguishable, cannot obtain the value of with and . The can only obtain .

Theorem 7. The secure sign computation is secure against honest-but-curious adversaries .

Proof. The can obtain or through the building blocks. Based on the statistics distinguishable, cannot obtain the value of .

Theorem 8. The secure machine learning training protocols are secure against honest-but-curious adversaries .

Proof. We construct the machine learning training protocol through our designed building blocks in a modular way. The simulators are , which is the same as Theorems 17. We have proved the security of the proposed building blocks, so the view of adversaries cannot distinguish the real execution from the ideal simulation.

7. Performance Evaluation

In this section, we evaluate our proposed ePMLF and compare it with Zhu et al. [5]. Zhu et al. [5] proposed a privacy-preserving ML training framework for the aggregation scenario. However, there are still some shortcomings to be improved, including the function of building blocks is not comprehensive and high communication and computation overhead, which makes the framework impractical. Our proposed framework effectively solves these problems. Our experimental environment is shown in Table 2.

To train LR, SVM, and NB, we implement our framework on three datasets of the UCI machine learning library, as shown in Table 3.

For the deep neural network, we will use the MNIST dataset to train a LeNet model [30]. The MNIST dataset contains 60,000 training samples and 10,000 testing samples.

7.1. Performance of Our Proposed Framework
7.1.1. Building Blocks Evaluation

To test our proposed building blocks, we test three main building blocks, secure inner product, secure summation, and secure power computation. We set the different key lengths (256 bits, 512 bits, 1024 bits, and 2048 bits). The computation time results are shown in Figure 3. It can be seen that the increase in key length will make the computation time longer. To balance security and efficiency, the key length is usually set to 1024 bits or 2048 bits.

Then, we evaluate the impact of on computation time for secure inner product and secure sigmoid function. Based on the above results, the key length will be set to 1024 bits. The results are shown in Figure 4. It can be seen that with the increase in , the computation time of secure inner product and secure sigmoid function will increase.

7.1.2. Secure Machine Learning Training Analysis

In order to test the quality of the model trained using our framework, we test the accuracy of the trained model (LR, SVM, NB, and LeNet). The results are shown in Table 4. From Table 4, it can be seen that the accuracy of the training model is high.

7.2. Comparative Analysis

In this section, we analyze the computation and communication overhead of our proposed framework. Then, we make a comparison with Zhu et al. [5]. According to [31], we know that the computation cost of an exponentiation operation is equal to multiplication operations, where is the length of ciphertexts. We assume that , , and denote the ciphertext length of OU cryptosystem, Cloud-ElGamal cryptosystem, Paillier cryptosystem, and Cloud-RSA cryptosystem, respectively.

7.2.1. Computation Complexity Analysis

In our proposed framework, we focus on the exponentiation operation and multiplication operation. The computation complexity of secure addition is . The secure subtraction costs . For secure multiplication (OU) and secure multiplication (Cloud-ElGamal), they cost and respectively. For secure division, the computation complexity is . For secure power computation, the computation complexity is . For a secure inner product, the computation complexity is . For secure summation, the computation complexity is . The secure sigmoid function costs . The secure sign computation costs (or ). The comparison results are shown in Table 5. In our framework, the OU cryptosystem and Cloud-ElGamal cryptosystem are used to encrypt data. In [5], the Paillier cryptosystem and Cloud-RSA cryptosystem are used to encrypt data. It should be noted that , and . From Table 5, it can be seen that our proposed scheme has lower computation costs.

7.2.2. Computation Overhead Analysis

We test the computation time of our framework and [5]. The key length of the cryptosystem is set to 1024 bits. The results are shown in Table 6. From Table 6, it can be seen that our framework is more efficient. Then, we compare the computation overhead of each participant with [5]. We assume that the SSP is the model owner and the FNs are the data owners. The comparison results are shown in Table 7 (, ). From Table 7, the computation overhead is lower than [5] for SSP and FNs. We have obvious advantages to perform secure machine learning training.

: The computation process of secure sigmoid function will cause the loss of accuracy. Therefore, it is very important to ensure accuracy. To prove the accuracy of the secure sigmoid function, we perform the building block by setting a case. We set and , which is the same as [5]. In the plaintext, the value is 1.83. Our result is 1.81 and the result of [5] is 1.6588. It can be seen that our result is more accurate.

7.2.3. Communication Analysis

We analyze the communication overhead and interactions of our building blocks and compare them with [5]. The secure addition, secure subtraction, secure multiplication (OU), secure multiplication (Cloud-ElGamal), and secure inner product are computed by the user, so the interactions of these building blocks are 0. For the secure division, the communication overhead is . For the secure summation, the communication overhead is and the interactions are . For the secure sigmoid function, the communication overhead is . The comparison results are shown in Table 8. It can be seen that the communication overhead of our proposed secure summation is much lower than [5].

8. Conclusion

In this paper, we propose a privacy-preserving machine learning framework, including secure training data normalization, model training, and model updating. Based on our proposed framework, SSP can train different machine learning models. The trained models have high accuracy. Compared with the existing scheme, our proposed framework significantly reduces the computation and communication overhead. In future research, we will focus on the efficiency and robustness of the privacy-preserving machine learning training framework.

Data Availability

In our experiments, we implement our framework on four datasets, dermatology, heart disease, breast cancer, and MNIST.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The work was supported in part by the National Natural Science Foundation of China (61862052) and the Science and Technology Foundation of Qinghai Province (2019-ZJ-7065).