Privacy-Preserving Outsourced Logistic Regression on Encrypted Data from Homomorphic Encryption

Yu, Xiaopeng; Zhao, Wei; Huang, Yunfan; Ren, Juan; Tang, Dianhua

doi:https://doi.org/10.1155/2022/1321198

Security and Communication Networks

On this page

Abstract Introduction Related Works Preliminaries Conclusion Appendix Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Special Issue

Cryptographic Schemes and Protocols for Artificial Intelligence

View this Special Issue

Research Article | Open Access

Volume 2022 | Article ID 1321198 | https://doi.org/10.1155/2022/1321198

Privacy-Preserving Outsourced Logistic Regression on Encrypted Data from Homomorphic Encryption

Xiaopeng Yu,¹Wei Zhao,¹Yunfan Huang,¹Juan Ren,¹and Dianhua Tang^1,2

Academic Editor: Debiao He

Received23 Mar 2022

Revised11 May 2022

Accepted24 May 2022

Published21 Jul 2022

Abstract

Logistic regression is a data statistical technique, which is used to predict the probability that an event occurs. For some scenarios where the storage capabilities and computing resources of the data owner are limited, the data owner wants to train the logistic regression model on the cloud service provider, while the high sensitivity of training data requires effective privacy protection methods that enable efficient model training without exposing information about the training data to untrusted cloud service providers. Recently, several works have used cryptographic techniques to implement privacy-preserving logistic regression in such application scenarios. However, on large-scale training datasets, the existing works still have the problems of long model training time and poor model performance. To solve these problems, based on the homomorphic encryption (HE), we propose an efficient privacy-preserving outsourced logistic regression (P²OLR) on encrypted training data, which enables data owners to utilize the powerful storage and computing resources of cloud service providers for logistic regression analysis without exposing data privacy. Furthermore, the proposed scheme can pack multiple messages into one ciphertext and perform the same arithmetic evaluations on multiple plaintext slots by using the batching technique and single instruction multiple data (SIMD) mechanism in HE. On three public training datasets, the experimental results show that, compared with the existing schemes, the proposed scheme has better performance in terms of the encryption and decryption time of the data owner, the storage of encrypted training data, and the training time and accuracy of the model.

1. Introduction

Logistic regression (LR) [1] is a popular classification method, which has been used in numerous practical applications including cancer diagnosis [2], credit scoring [3], genome-wide association study [4], and more. LR can not only be applied to the problem of predicting the probability of occurrence of various events, but also is competitive with other classification algorithms in terms of prediction accuracy. In some practical application setting, the data owners have the limited computing and storage resources, and thus wants to outsource some of the heavy computation in logistic regression model training, the outsourced data analysis [5] has received considerable attention recently, which enables data owners to train a LR model using the powerful storage capacity and computing resources of cloud service providers [6].

However, the high sensitivity of training data requires to perform an effective privacy protection [7–10] that enable efficient and secure logistic regression analysis without leaking information about the training data to untrusted cloud service provider. Recently, to meet such application requirements, based on the cryptographic techniques like secure multiparty computation (MPC) [11] and homomorphic encryption (HE) [12], there have been several researches on the privacy-preserving logistic regression (PPLR) [13–22], which enables data owners to employ the service providers’ powerful data storage and computing resources for logistic regression model training without exposing its own data privacy. Specifically, the data owner encrypts its training data, and sends encrypted training data to the service provider. The service provider can train a logistic regression model on encrypted training data, and returns the encrypted training result to the data owner. The data owner can decrypt the encrypted training result to obtain final training result.

Unfortunately, on large-scale training dataset, the existing PPLR schemes [13–22] still have the bottlenecks of high model training time and low model precision. To solve these problems, based on the HE cryptographic technique [23] that has the property that the operation results on ciphertexts are consistent with those on plaintexts, we design an efficient privacy-preserving outsourced logistic regression (P²OLR). The main contributions are as follows:(1)Firstly, we propose a method for achieving P²OLR on encrypted data from HE. To speed up the model training, the proposed P²OLR scheme employs the batching technique to pack multiple elements into multiple plaintext slots, encrypts them into one ciphertext, and performs the same arithmetic operations to multiple plaintext slots in the SIMD mechanism.(2)Secondly, we evaluate the proposed P²OLR on three public datasets [18]. Under the same experimental environment, compared with the related P²OLR [17, 18, 22], the model training time of the proposed P²OLR is reduced by more than 71.7%, and the proposed P²OLR has a better model performance.

The rest of this paper is arranged as follows. We present the related works in Section 2. We review the preliminaries related to our P²OLR in Section 3. In Section 4, ourP²OLR is described. The performance evaluation for our P²OLR is presented in Section 5. The security analysis of our P²OLR is shown in Section 6. Finally, we conclude in Section 7.

There have been a lot of works on achieving PPLR using cryptographic techniques. In this paper, we mainly focus on the PPLR based on HE. To outsource the LR model training to a cloud service provider in a privacy-preserving manner, based on the HE scheme (FV) [24], Charlotte et al. [13] proposed an algorithm to train a LR model on an homomorphically encrypted dataset, which is implemented based on the FV-NFLlib library [25]. However, the accuracy of model is poor due to the use of a quadratic polynomial to approximate the sigmoid function. Furthermore, the training time grows linearly in the number of training samples. Using the HE scheme (FV) [24] and 1 bit gradient descent (GD) method, Chen et al. [14] presented a method to train LR over encrypted data, which is implemented through the SEAL library [26], and allows an arbitrary number of iterations by using bootstrapping [27] in FV, but bootstrapping introduces a significant decrease in performance. Focusing on the prediction process of LR, based on the HE scheme (BGV) [28], Li and Sun [15] proposed a secure protocol to solve the data leakage problem during the LR prediction process, and implement their scheme by the HElib library [29]. Based on the Chimera framework [30] that allows switching between HE schemes TFHE [31] and CKKS [23], Carpov et al. [16] proposed a solution to achieve semi-parallel LR on encrypted genomic data, which performs the bootstrapping [27] without re-encrypting the genomic data for an arbitrary number of iterations, and is implemented by using TFHE library [32] and HEAAN library [33].

Adapting the packing and parallelization techniques of approximate HE scheme (CKKS) [23], Kim et al. [17] proposed a PPLR, which is implemented through using the HEAAN library [33], and uses least squares approximation to improve the accuracy and efficiency of LR model training. However, as the number of iterations increases, the parameters of the CKKS scheme also need to become larger, which makes the training time increase dramatically. Kim et al. [18] applied the HE scheme (CKKS) [23] to achieve PPLR. Their scheme is implemented via using the HEAAN library [33]. Moreover, they devised an encoding method to decrease the storage of encrypted training data and adapted Nesterov’s accelerated GD method to reduce the number of iterations as well as the computational cost. However, their scheme requires the assumption that both the number of training samples and features are power-of-two, which makes the scheme unsuitable for practical applications. To reduce the number of iterations, Cheon et al. [19] proposed an ensemble GD method based on the HE scheme (CKKS) [23], and applied it to the PPLR, in which they approximate the sigmoid function using a polynomial of 5-degree obtained by least squares approximation. Their scheme is implemented based on the HEAAN library [34]. To run a genome-wide association study on encrypted data, using the SIMD capabilities of HE scheme (CKKS) and Nesterov’s accelerated GD, Bergamaschi et al. [20] introduced a method for homomorphic training of LR model, which is implemented based on the HElib library [29]. To protect the private information of both parties, based on the HE scheme (CKKS) [23] and gradient sharing technology, Wei et al. [21] proposed a protocol to train an LR model on vertically distributed data between two parties, which does not require trusted third-party nodes and is implemented by the HElib library [29]. Based on the HE scheme (CKKS) [23], Fan et al. [22] offered a PPLR algorithm, where they approximate the sigmoid function in LR by Taylor’s theorem, and use row encoding to encrypt training samples, but as the number of samples increased, this will lead to longer model training time.

3. Preliminaries

3.1. System Model

As can be seen in Figure 1, the system model of the proposed P²OLR considers two entities, namely a data owner (DO) and a service provider (SP). For readability, the definitions of the notations in this paper are shown in Table 1. DO: It has limited computational resources, and wants to use SP’s data analysis service on encrypted data to train a LR model without revealing its own training data privacy. SP: It is a semi-trusted entity with powerful data storage and computing capabilities, and can provide data analysis and statistical services on encrypted data for DO. Specifically, DO chooses poly_modulus_degree , coeff_modulus , and runs key_generation algorithm to generate the secret_key , public_key , relinearization_key , galois_key . Next, DO encrypts the training data into ciphertexts , encrypts the initial weight into ciphertexts , encrypts the learning rate into one ciphertext , and sends , , , , , , , , , to SP. SP performs the algorithm and returns the ciphertext result of the -th iteration to DO. DO decrypts the ciphertext result to obtain final result .

3.2. Homomorphic Encryption

Homomorphic encryption (HE) is a cryptographic technique, which allows operations on ciphertexts without decryption, and guarantees that the computation results on ciphertexts are consistent with the computation results on plaintexts. We adopt the HE scheme (CKKS) [23] based on the Ring Learning with Errors (RLWE) problem, which can encrypt multiple elements in one ciphertext and supports the single instruction multiple data (SIMD) operations. Suppose denotes the -th cyclotomic polynomial, where is power of 2. denotes the cyclotomic ring of polynomials. denotes the residue ring of modulo . denotes a subring of complex vector that is isomorphic to . denotes a canonical embedding that transforms a plaintext polynomial into a complex vector . denotes a natural projection that transforms a complex vector to . HE scheme (CKKS) [23] supports the operations as follows, which can be found in the Appendix. For ease of description, we define the Algorithms 1–9.

	Input:
	Output:
(1)	encode_double (, , )
(2)	encrypt (, )
(3)	return:

	Input:
	Output:
(1)	for ( to ) do
(2)	decrypt (, )
(3)	decode_double (, )
(4)
(5)	end for
(6)	return:

	Input:
	Output:
(1)	mod_switch_to_inplace (, .parms_id())
(2)	multiply (, , )
(3)	relinearize_inplace (, )
(4)	rescale_to_next_inplace ()
(5)	.set_scale ()
(6)	return:

	Input:
	Output:
(1)	encode_double (, , )
(2)	mod_switch_to_inplace (, .parms_id())
(3)	multiply_plain (, , )
(4)	rescale_to_next_inplace ()
(5)	.set_scale ()
(6)	return:

	Input:
	Output:
(1)	mod_switch_to_inplace (, .parms_id())
(2)	add (, , )
(3)	return:

	Input:
	Output:
(1)	encode_double (, , )
(2)	mod_switch_to_inplace (, .parms_id())
(3)	add_plain (, , )
(4)	return:

	Input:
	Output:
(1)	mod_switch_to_inplace (, .parms_id())
(2)	sub (, , )
(3)	return:

	Input:
	Output:
(1)	mod_switch_to_inplace (, .parms_id())
(2)	add (, )
(3)	return:

	Input:
	Output:
(1)
(2)	for (; ; ) do
(3)	rotate_vector (, , , )
(4)	add_inplace (, )
(5)	end for
(6)	return:

3.3. Sigmoid Approximation

Since the existing HE scheme can only effectively support polynomial arithmetic computations, the computation of sigmoid function using HE is a barrier to the realization of P²OLR. To find a approximate polynomial of , adapting the least squares method, we consider the 7° polynomial over the domain , where , , , , . and can be seen in Figure 2, the maximum errors between and are about 0.032. over encrypted data from HE can be achieved by the Algorithm 10.

	Input:
	Output:
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)	return: .

3.4. Logistic Regression

Logistic regression (LR) is a statistical analysis method for predicting the probability of an event. We consider the case where the predicted value is a binary dependent variable. Assuming that a dataset consists of samples of the form with and , the goal of LR is to find the optimal parameters that minimizes the negative log-likelihood function (loss function) , where . A common method for minimizing loss function is a gradient descent (GD) algorithm, which finds the local extremum of a loss function by following the direction of the gradient. The gradient of with respect to is calculated by . Let be the regression parameters and is a learning rate in the -th iteration of the GD algorithm, the GD algorithm can update by .

4. Privacy-Preserving Outsourced Logistic Regression

Based on the HE scheme, we propose a P²OLR, where we employ the batching method to pack multiple elements into multiple plaintext slots, and encrypt them into one ciphertext, and then perform the same arithmetic evaluations to multiple plaintext slots through the SIMD mechanism. To reduce the parameters of HE scheme (CKKS) as well as improve the performance of P²OLR, the proposed P²OLR allows the interaction between DO and SP during iterative training. Specifically, SP returns the ciphertext training result to DO after certain number of iterations . DO decrypts the ciphertext training result, and determines whether the performance of the model has met the requirements. If so, stops training. Otherwise, sends encrypted weights to SP to continue training. Letdenote the training data sets held by DO, where consists of samples of the form with and . The first column of denotes the label, other columns denote the features. Since DO has limited computational resources, DO wants to outsource to SP to train a LR model without disclosing its own training data privacy. The specific description of the proposed P²OLR is as follows.(1)DO generates , computes , calls the Algorithm 1 to encrypt the training data into ciphertexts calls the Algorithm 1 to encrypt the initial weight into ciphertexts calls the Algorithm 1 to encrypt the learning rate into one ciphertext and sends , , , , , , , , , , , , , , , , , , , , , , Q, , , , , to SP.(2)SP computes ciphertexts and sets the lists Next, SP calls the Algorithm 11, and returns the ciphertext result to DO.(3)DO calls the Algorithm 2 to decrypt the ciphertext result into the result . Next, DO judges whether has met the requirements. If so, terminates the training. Otherwise, DO calls the Algorithm 1 to encrypt into ciphertexts and sends , , , to SP to continue training.

	Input: , , , , , , , , Q,., , , , ,
	Output:
(1)	for ( to ) do
(2)	for ( to ) do
(3)	for ( to ) do
(4)
(5)	end for
(6)
(7)	for ( to ) do
(8)
(9)	end for
(10)
(11)
(12)	for ( to ) do
(13)
(14)	end for
(15)	end for
(16)	for ( to ) do
(17)
(18)	for ( to ) do
(19)
(20)	end for
(21)
(22)
(23)
(24)
(25)
(26)	end for
(27)	end for
(28)	return:

5. Performance Evaluation

We implement all experiments on a 32-core Intel Xeon CPU with 256 GB RAM. We compare the performance of the proposed P²OLR with the related P²OLR [17, 18, 22]. We employ 5-fold cross-validation method to obtain the validity of the experimental results. For [17, 18], the implementations are publicly available at [35, 36], respectively, which use the HEAAN library [33] to provide HE cryptographic operations. For [22] and the proposed P²OLR, we employ the Microsoft SEAL library [26] for the HE cryptographic operations. For all experiments, we set the learning rate , random initial weight vector maximum number of iterations , and scaling factor . To guarantee bit security, the scheme [17] takes the polynomial-modulus-degree , coefficient-modulus around 2204 to 2406 bits; the scheme [18] sets the , bits; the scheme [22] chooses the , bits; For the proposed P²OLR, we select , bits. Using the three datasets [18]: —Umaru Impact Study, —Myocardial Infarction Study from Edinburgh, —Nhanes III, we compare the proposed P²OLR with the related P²OLR [17, 18, 22] in terms of the encryption time (E. time) and decryption time (D. time) of DO, storage of encrypted training data, and training time (T. time), accuracy, precision, recall, F1-score and AUC of model. All comparison results are shown as an average of 10 experiments. The performance comparisons of the proposed P²OLR and the related P²OLR [17, 18, 22] are shown in Table 2.

From Table 2, we can see that, compared with the related P²OLR [17, 18, 22], the proposed P²OLR has a better performance. Specifically, as shown in Figure 3, under the training dataset , the encryption time of DO in the proposed P²OLR is 2.01 s, which is reduced by nearly 71.4%, 7.8%, and 93.3% respectively compared with the encryption time of DO in [17, 18, 22]; under the training dataset , the encryption time of DO in the proposed P²OLR is 2.16 s, which is reduced by nearly 73.6%, 2.3%, and 96.8% respectively compared with the encryption time of DO in [17, 18, 22]; under the training dataset , the encryption time of DO in the proposed P²OLR is 3.49 s, which is reduced by nearly 75.9%, 81.6%, and 75.0% respectively compared with the encryption time of DO in [17, 18, 22].

(a)

(b)

(c)

As can be seen in Figure 4, under the training dataset , the decryption time of DO in the proposed P²OLR is 0.23 s, which is reduced by almost 95.3% and 41.0% respectively in comparison to the decryption time of DO in [17, 18]; under the training dataset , the decryption time of DO in the proposed P²OLR is 0.26 s, which is reduced by almost 95.0% and 36.6% respectively in comparison to the decryption time of DO in [17, 18]; under the training dataset , the decryption time of DO in the proposed P²OLR is 0.45 s, which is reduced by almost 96.1% and 6.1% respectively in comparison to the decryption time of DO in [17, 18]. The decryption time of DO in [22] is smaller in comparison to that of the proposed P²OLR.

(a)

(b)

(c)

As described in Figure 5, under the training dataset , the storage of encrypted training data in the proposed P²OLR is 72.00 MB, compared with the storage of encrypted training data in [17, 22], which is reduced by nearly 88.9% and 95.0%; under the training dataset , the storage of encrypted training data in the proposed P²OLR is 80.00 MB, compared with the storage of encrypted training data in [17, 22], which is reduced by nearly 89.0% and 97.4%; under the training dataset , the storage of encrypted training data in the proposed P²OLR is 128.00 MB, compared with the storage of encrypted training data in [17, 18, 22], which is reduced by nearly 89.4%, 13.0% and 99.7% respectively. Although the storage of encrypted training data for dataset and in [18] is smaller than that of the proposed P²OLR, as the number of samples and features increases, for dataset , the storage of encrypted training data in the proposed P²OLR is smaller than that of [22].

(a)

(b)

(c)

As displayed in Figure 6, under the training dataset , the training time of model in the proposed P²OLR is 2.64 min, which is reduced by almost 96.6%, 73.8%, and 90.1% respectively than the training time of model in [17, 18, 22]; under the training dataset , the training time of model in the proposed P²OLR is 2.91 min, which is reduced by almost 96.5%, 71.7%, and 95.0% respectively than the training time of model in [17, 18, 22]; under the training dataset , the training time of model in the proposed P²OLR is 4.21 min, which is reduced by almost 96.5%, 79.8%, and 99.4% respectively than the training time of model in [17, 18, 22].

(a)

(b)

(c)

As illustrated in Figure 7, under the training dataset , the average accuracy of model in the proposed P²OLR is 80.6%, which has nearly 5.8%, 6.2%, and 6.2% improvement respectively compared with the average accuracy of model in [17, 18, 22]; under the training dataset , the average accuracy of model in the proposed P²OLR is 90.6%, which has nearly 9.0%, 7.6%, and 7.9% improvement respectively compared with the average accuracy of model in [17, 18, 22]; under the training dataset , the average accuracy of model in the proposed P²OLR is 83.7%, which has nearly 4.6%, 4.5%, and 5.8% improvement respectively compared with the average accuracy of model in [17, 18, 22].

(a)

(b)

(c)

As illustrated in Figure 8, under the training dataset , the average precision of model in the proposed P²OLR is 95.6%, which has nearly 3.3%, 4.7%, and 4.7% improvement respectively compared with the average precision of model in [17, 18, 22]; under the training dataset , the average precision of model in the proposed P²OLR is 95.1%, which has nearly 5.4%, 4.7%, and 4.7% improvement respectively compared with the average precision of model in [17, 18, 22]; under the training dataset , the average precision of model in the proposed P²OLR is 60.3%, which has nearly 10.3%, 10.1%, and 7.9% improvement respectively compared with the average precision of model in [17, 18, 22].

(a)

(b)

(c)

As illustrated in Figure 9, under the training dataset , the average recall of model in the proposed P²OLR is 77.4%, which has nearly 6.0%, 6.0%, and 6.0% improvement respectively compared with the average recall of model in [17, 18, 22]; under the training dataset , the average recall of model in the proposed P²OLR is 90.6%, which has nearly 8.2%, 7.1%, and 7.7% improvement respectively compared with the average recall of model in [17, 18, 22]; under the training dataset , the average recall of model in the proposed P²OLR is 64.2%, which has nearly 3.0%, 2.9%, and 2.0% improvement respectively compared with the average recall of model in [17, 18, 22].

(a)

(b)

(c)

As illustrated in Figure 10, under the training dataset , the average F1-score of model in the proposed is 85.5%, which has nearly 5.0%, 5.5%, and 5.5% improvement respectively compared with the average F1-score of model in [17, 18, 22]; under the training dataset , the average F1-score of model in the proposed P²OLR is 92.8%, which has nearly 6.9%, 4.0%, and 4.3% improvement respectively compared with the average F1-score of model in [17, 18, 22]; under the training dataset , the average F1-score of model in the proposed P²OLR is 62.2%, which has nearly 7.2%, 7.0%, and 5.3% improvement respectively compared with the average F1-score of model in [17, 18, 22].

(a)

(b)

(c)

As demonstrated in Figure 11, under the training dataset , the AUC of model in the proposed P²OLR is 0.73, compared with the AUC of model in [17, 18, 22], which has nearly 0.05, 0.08, and 0.07 improvement respectively; under the training dataset , the AUC of model in the proposed P²OLR is 0.88, compared with the AUC of model in [17, 18, 22], which has nearly 0.06, 0.02, and 0.02 improvement respectively; under the training dataset , the AUC of model in the proposed P²OLR is 0.85, compared with the AUC of model in [17, 18, 22], which has nearly 0.02, 0.14, and 0.14 improvement respectively.

(a)

(b)

(c)

6. Security Analysis

In a semi-honest adversary model, we assume that DO and SP hold the public key , relinearization key , galois key , and only DO holds the secret key . For our P²OLR that evaluates deterministic function , following the simulation-based paradigm [37], we consider the security model for security analysis, namely, DO encrypts its private data and sends to SP. SP performs the homomorphic operations on to obtain , homomorphically evaluates on to obtain , and sends to DO. DO decrypts and obtains .

Theorem 1. We assume that SP is a semi-honest entity and assume that DO and SP do not collude with each other. Let be a private data of DO. If the HE scheme [23] provides semantic security, after performing the homomorphic operations on and the evaluation of on , DO learns but nothing else, SP learns nothing.
Security Proof. The security proof of the proposed P²OLR follows the simulation-based paradigm [37]. Let the view of DO and SP during the evaluation be and , respectively. The view of SP consists of . We construct a simulator as follows. randomly chooses input data . Then, simulates by . Since the HE scheme [23] provides semantic security by assumption, and are indistinguishable. Therefore, the proposed P²OLR is secure against a semi-honest SP.

7. Conclusion

In this paper, we present a method for achieving a P²OLR on encrypted training data, which enables data owners to utilize the powerful storage and computing resources of cloud service providers for logistic regression analysis without exposing the privacy of training data. We take advantage of the batching technique and SIMD mechanism in HE to speed up the training progress. On the three public datasets, compared with the related P²OLR schemes [17, 18, 22], the model training time of the proposed P²OLR is reduced by more than 71.7%, and the proposed P²OLR has over 4.5%, 3.3%, 2.0%, 4.0%, and 0.02 performance in terms of the accuracy, precision, recall, F1-score, and AUC of model. There are still some limitations in applying our scheme to arbitrary datasets and performing arbitrary number of iterations on encrypted training data. In the future, we will extend our scheme to efficiently support P²OLR with arbitrary number of iterations.

Appendix

(1)key_generation() {, , , }: Given the poly_modulus_degree and coeff_modulus , it returns the secret_key , public_key , relinearization_key , galois_key .(2)encode_double(, , ): Given the message vector and scaling factor , it expands to by , scales by , and outputs the plaintext .(3)decode_double(, ): Given the plaintext , it computes , , and outputs the message vector .(4)encrypt (, ): Given the plaintext , it encrypts into a ciphertext , and outputs the ciphertext .(5)decrypt (, ): Given a ciphertext , it decrypts into a plaintext , and outputs the plaintext .(6)add (, , ): Given two ciphertexts and , it computes and saves the result as a new ciphertext .(7)add_inplace(, ): Given two ciphertexts and , it computes and saves the result in ciphertext .(8)add_plain(, , ): Given a ciphertext and a plaintext , it computes and saves the result as a new ciphertext .(9)sub(, , ): Given two ciphertexts and , it computes and saves the result as a new ciphertext .(10)multiply(, , ): Given two ciphertexts and , it computes and saves the result as a new ciphertext .(11)multiply_plain(, , ): Given a ciphertext and a plaintext , it computes and saves the result as a new ciphertext .(12)mod_switch_to_inplace(\ , .parms_id()): Given a ciphertext/plaintext \ and a levels .parms_id() of ciphertext , it switches the levels of \ to .parms_id().(13)relinearize_inplace(, ): Given a ciphertext and a relinearization_key , it relinearizes and saves the result in ciphertext .(14)rescale_to_next_inplace(): Given a ciphertext , it switches the modulo of to the next levels, reduces the length of the plaintext accordingly, and saves the result in ciphertext .(15)set_scale(): Given a scaling factor , it scales the ciphertext by computing .set_scale(), and outputs the ciphertext .(16)rotate_vector(, , , ): Given a ciphertext , a rotation value , and galois_key , it rotates left by , and saves the result as a new ciphertext .

Data Availability

Previously reported Umaru Impact Study, Myocardial Infarction dataset from Edinburgh and Nhanes III datasets were used to support this study and are available at https://doi.org/10.1186/s12920-018-0401-7. These prior studies (and datasets) are cited at relevant places within the text as references [18].

Conflicts of Interest

The authors declare no conflicts of interest.

Acknowledgments

This work was supported by the National Natural Science Foundation of China under Grant no. U19B2021.

References

Y. Jiang, J. Hamer, C. Wang et al., “SecureLR: secure logistic regression model via a hybrid cryptographic protocol,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 16, no. 1, pp. 113–123, 2019.
View at: Publisher Site | Google Scholar
V. V. P. Wibowo, Z. Rustam, A. R. Laeli, and A. A. Said, “Logistic regression and logistic regression-genetic algorithm for classification of liver cancer data,” in Proceedings of theInternational Conference on Decision Aid Sciences and Application, pp. 244–248, Sakheer, Bahrain, December 2021.
View at: Publisher Site | Google Scholar
B. Liu, L. Lu, Q. Zeng, and Y. Li, “Implementation of credit scoring card model based on logistic regression and lightgbm,” in Proceedings of the International Conference on Control Science and Electric Power Systems, pp. 175–178, Shanghai, China, May 2021.
View at: Publisher Site | Google Scholar
Z. Han, L. Lu, and H. Liu, “A differential privacy preserving approach for logistic regression in genome-wide association studies,” in Proceedings of the International Conference on Networking and Network Applications, pp. 181–185, Daegu, Korea (South), October, 2019.
View at: Publisher Site | Google Scholar
X. Jiang, M. Kim, K. Lauter, and Y. Song, “Secure outsourced matrix computation and application to neural networks,” in Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, pp. 1–23, Toronto, Canada, October 2018.
View at: Publisher Site | Google Scholar
J. M. Cortas-Mendoza, A. Tchernykh, M. Babenko, L. B. Pulido-Gaytan, and A. Avetisyan, “Privacy-preserving logistic regression as a cloud service based on residue number system,” in Proceedings of the 6th Russian Supercomputing Days, pp. 598–610, Moscow, Russia, September 2020.
View at: Google Scholar
P. Mohassel and Y. Zhang, “SecureML: “A system for scalable privacy-preserving machine learning,” in Proceedings of the IEEE Symposium on Security and Privacy, pp. 19–38, San Jose, CA, USA, May 2017.
View at: Google Scholar
J. Feng, L. Liu, Q. Pei, and K. Li, “Min-max cost optimization for efficient hierarchical federated learning in wireless edge networks,” IEEE Transactions on Parallel and Distributed Systems, p. 1, 2022.
View at: Publisher Site | Google Scholar
J. Feng, W. Zhang, Q. Pei, J. Wu, and X. Lin, “Heterogeneous computation and resource allocation for wireless powered federated edge learning systems,” IEEE Transactions on Communications, vol. 70, no. 5, pp. 3220–3233, 2022.
View at: Publisher Site | Google Scholar
S. Mao, L. Liu, N. Zhang et al., “Reconfigurable intelligent surface-assisted secure mobile edge computing networks,” IEEE Transactions on Vehicular Technology, p. 1, 2022.
View at: Publisher Site | Google Scholar
A. C. Yao, “Protocols for secure computations,” in Proceedings of the 23rd Annual IEEE Symposium on Foundations of Computer Science, pp. 1–5, Chicago, Illinois, USA, November 1982.
View at: Publisher Site | Google Scholar
C. Gentry, “Fully homomorphic encryption using ideal lattices,” in Proceedings of the 41st Symposium on Theory of Computing, pp. 169–178, Bethesda, Maryland, USA, June 2009.
View at: Publisher Site | Google Scholar
B. Charlotte and V. Frederik, “Privacy-preserving logistic regression training,” BMC Medical Genomics, vol. 11, no. 4, pp. 13–21, 2018.
View at: Google Scholar
H. Chen, R. Gilad-Bachrach, K. Han et al., “Logistic regression over encrypted data from fully homomorphic encryption,” BMC Medical Genomics, vol. 11, no. S4, p. 81, 2018.
View at: Publisher Site | Google Scholar
Z. Li and M. Sun, “Privacy-preserving classification of personal data with fully homomorphic encryption: an application to high-quality ionospheric data prediction,” Machine Learning for Cyber Security, pp. 437–446, 2020.
View at: Publisher Site | Google Scholar
S. Carpov, N. Gama, M. Georgieva, and J. R. Troncoso-Pastoriza, “Privacy-preserving semi-parallel logistic regression training with fully homomorphic encryption,” BMC Medical Genomics, vol. 13, no. S7, p. 88, 2020.
View at: Publisher Site | Google Scholar
M. Kim, Y. Song, S. Wang, Y. Xia, and X. Jiang, “Secure logistic regression based on homomorphic encryption: design and evaluation,” JMIR Medical Informatics, vol. 6, no. 2, p. e19, 2018.
View at: Publisher Site | Google Scholar
A. Kim, Y. Song, M. Kim, K. Lee, and J. H. Cheon, “Logistic regression model training based on the approximate homomorphic encryption,” BMC Medical Genomics, vol. 11, no. S4, p. 83, 2018.
View at: Publisher Site | Google Scholar
J. H. Cheon, D. Kim, Y. Kim, and Y. Song, “Ensemble method for privacy-preserving logistic regression based on homomorphic encryption,” IEEE Access, vol. 6, pp. 46938–46948, 2018.
View at: Publisher Site | Google Scholar
F. Bergamaschi, S. Halevi, T. T. Halevi, and H. Hunt, “Homomorphic training of 30,000 logistic regression models,” Applied Cryptography and Network Security, vol. 11464, pp. 592–611, 2019.
View at: Publisher Site | Google Scholar
Q. Wei, Q. Li, Z. Zhou, Z. Ge, and Y. Zhang, “Privacy-preserving two-parties logistic regression on vertically partitioned data using asynchronous gradient sharing,” Peer-to-Peer Networking and Applications, vol. 14, no. 3, pp. 1379–1387, 2020.
View at: Publisher Site | Google Scholar
Y. Fan, J. Bai, X. Lei et al., “Privacy preserving based logistic regression on big data,” Journal of Network and Computer Applications, vol. 171, p. 102769, 2020.
View at: Publisher Site | Google Scholar
J. H. Cheon, A. Kim, M. Kim, and Y. Song, “Homomorphic encryption for arithmetic of approximate numbers,” in Proceedings of the Advances in Cryptology - ASIACRYPT 2017: 23rd International Conference on the Theory and Application of Cryptology and Information Security, vol. 10624, pp. 409–437, Hong Kong, China, December 2017.
View at: Google Scholar
J. Fan and F. Vercauteren, “Somewhat practical fully homomorphic encryption”, IACR Cryptology Eprint Archive,” 2012, https://eprint.iacr.org/2012/144.
View at: Google Scholar
“FV-NFLlib,” 2016, https://github.com/CryptoExperts/FV-NFLlib.
View at: Google Scholar
“Seal,” 2021, https://github.com/microsoft/SEAL.
View at: Google Scholar
C. Ilaria, N. Gama, M. Georgieva, and M. Izabachne, “Faster fully homomorphic encryption: bootstrapping in less than 0.1 seconds,” in Proceedings of the Advances in Cryptology ASIACRYPT 2016: 22nd International Conference on the Theory and Application of Cryptology and Information Security, vol. 10031, pp. 3–33, Hanoi, Vietnam, December 2016.
View at: Google Scholar
Z. Brakerski, C. Gentry, and V. Vaikuntanathan, “(Leveled) fully homomorphic encryption without bootstrapping,” in Proceedings of the 3rd Innovations in Theoretical Computer Science, pp. 309–325, New Yark, USA, June 2012.
View at: Publisher Site | Google Scholar
“HElib,” 2021, https://github.com/homenc/HElib.
View at: Google Scholar
C. Boura, N. Gama, M. Georgieva, and D. Jetchev, “Chimera: combining ring-lwe-based fully homomorphic encryption schemes,” Journal of Mathematical Cryptology, vol. 14, no. 1, pp. 316–338, 2020.
View at: Publisher Site | Google Scholar
I. Chillotti, N. Gama, M. Georgieva, and M. Izabachène, “TFHE: fast fully homomorphic encryption over the torus,” Journal of Cryptology, vol. 33, no. 1, pp. 34–91, 2020.
View at: Publisher Site | Google Scholar
“Tfhe,” 2021, https://github.com/tfhe/tfhe.
View at: Google Scholar
“Heaan,” 2022, https://github.com/snucrypto/HEAAN.
View at: Google Scholar
“Heaan,” 2019, https://github.com/kimandrik/HEAAN.
View at: Google Scholar
“Helr,” 2019, https://github.com/K-miran/HELR.
View at: Google Scholar
“Heml,” 2018, https://github.com/kimandrik/IDASH2017.
View at: Google Scholar
R. Küsters, A. Datta, J. C. Mitchell, and A. Ramanathan, “On the relationships between notions of simulation-based security,” Journal of Cryptology, vol. 21, no. 4, pp. 492–546, 2008.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2022 Xiaopeng Yu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

364

Downloads

437

Citations

Security and Communication Networks

Cryptographic Schemes and Protocols for Artificial Intelligence

Privacy-Preserving Outsourced Logistic Regression on Encrypted Data from Homomorphic Encryption

Abstract

1. Introduction

2. Related Works

3. Preliminaries

3.1. System Model

3.2. Homomorphic Encryption

3.3. Sigmoid Approximation

3.4. Logistic Regression

4. Privacy-Preserving Outsourced Logistic Regression

5. Performance Evaluation

6. Security Analysis

7. Conclusion

Appendix

Data Availability

Conflicts of Interest

Acknowledgments

References

Copyright