Privacy-Preserving Vertical Collaborative Logistic Regression without Trusted Third-Party Coordinator

Yu, Xiaopeng; Zhao, Wei; Tang, Dianhua; Liang, Kai

doi:https://doi.org/10.1155/2022/5094830

Security and Communication Networks

On this page

Abstract Introduction Related Works Preliminaries Conclusion Appendix Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Special Issue

Blockchain Empowered Integration of Sensing, Communication, and Computing for the Internet of Things

View this Special Issue

Research Article | Open Access

Volume 2022 | Article ID 5094830 | https://doi.org/10.1155/2022/5094830

Privacy-Preserving Vertical Collaborative Logistic Regression without Trusted Third-Party Coordinator

Xiaopeng Yu,¹Wei Zhao,¹Dianhua Tang,^1,2and Kai Liang³

Academic Editor: Jianbo Du

Received28 Jul 2022

Accepted13 Sept 2022

Published12 Oct 2022

Abstract

Collaborative learning is an emerging distributed learning paradigm, which enables multiple parties to jointly train a shared machine learning (ML) model without causing the disclosure of the raw data of each party. As one of the fundamental collaborative learning algorithms, privacy-preserving collaborative logistic regression has recently gained attention from industry and academia, which utilizes cryptographic techniques to securely train joint logistic regression models across data from multiple parties. However, existing schemes have high communication and computational overhead, lose the ability to deal with high-dimensional sparse samples, cut down the accuracy of the model, or exist the risk of leaking private information. To overcome these issues, considering vertically distributed data, we propose a privacy-preserving vertical collaborative logistic regression ( VCLR) based on approximate homomorphic encryption (HE), which enables two parties to jointly train a shared model without a trusted third-party coordinator. Our scheme utilizes batching method in approximate HE to encrypt multiple data into a single ciphertext and enable a parallel processing through single instruction multiple data (SIMD) manner. We evaluate our scheme by using three publicly available datasets, the experimental results indicate that our scheme outperforms existing schemes in terms of training time and model performance.

1. Introduction

Machine learning (ML) [1] is a method for analyzing large-scale data and is widely used in practice to train predictive models for practical applications. As one of the basic machine learning algorithms, logistic regression (LR) [2] has attracted much attention for its powerful ability to solve classification problems in practical applications, such as disease diagnosis [3], credit evaluation [4].

In recent years, in order to obtain massive data for training better-performing models [5], there is growing interest in machine learning by combining the data from different institutions [6]. For instance, different hospitals would like to combine health data to jointly train models to facilitate more accurate disease diagnosis; different financial companies want to collaborate to train more effective credit card scoring and fraud detection models. Unfortunately, due to regulatory and competitive reasons, it is difficult or even impossible to directly exchange data of different parties for model training in practice [7]. That is, the data of different organizations is isolated. To eliminate the issue of “data isolation“, the idea of collaborative learning [8] is introduced. Its goal is to cooperatively train a shared ML model on distributed data while complying with regulation and protecting privacy. The security, privacy, and efficiency concerns remain main challenges for practical applications. Recently, as a fundamental collaborative learning algorithm, privacy-preserving collaborative logistic regression (PPCLR) [9–24] has received considerable attention recently, which utilizes cryptographic primitives such as homomorphic encryption (HE) algorithm [25] and multi-party computation (MPC) protocol [26] to securely train a joint logistic regression model across data from multiple parties. However, for the HE-based schemes [9–11], model weights are exposed to all parties at each iterative update of global model parameters during training, which is able to be utilized to deduce additional private information [27]; for the MPC-based schemes [12, 14], after using secret sharing (SS) [28] on training samples of all parties, even previously sparse samples become dense, so they are not able to efficiently handle sparse samples and require high communication complexity when the training data becomes large.

To solve the problems mentioned above, in a two-party setting, considering two vertically distributed training data with the same sample distributions but different feature distributions, we construct a privacy-preserving vertical collaborative logistic regression ( VCLR) based on the HE for arithmetic of approximate numbers [29]. The main contributions are as follows:(1)Firstly, we construct a VCLR framework for collaborative learning of vertical distributed features, which can securely realize the joint modeling of both parties without the assistance of a trusted third-party (TTP), and hence greatly reduces the system complexity.(2)Secondly, to improve the training efficiency, using the batching technique in HE [29], the proposed scheme can pack multiple samples into a single plaintext with multiple slots, encrypt it into a single ciphertext, and enable a parallel computing through using SIMD.(3)Finally, we conduct performance evaluations on three datasets [30], and the experimental evaluation results indicate that our scheme achieves a significant improvement in efficiency and performance than existing schemes [9, 21]. Specifically, the training time of the model is decreased by almost 32.3%-72.5%; the accuracy, F1-score, and AUC of the model have nearly 0.3% - 3.0%, 0.1% - 2.7% and 0 - 0.03 improvement, respectively. Furthermore, the security analysis indicates that the proposed VCLR scheme is secure against semi-honest adversaries, and neither of the both parties can know each other's raw data.

The rest of this work is arranged as follows. Several works related to our scheme are introduced in Section [2]. In Section [3], we review some preliminaries. In Section [4], our scheme is described. In Section [5], the evaluations for our scheme are presented. The security analysis of our scheme is shown in Section [6]. In Section [7], we conclude this work.

There are several works that have been made to joint train a LR model across multiple data owners. In general, a common approach is to implement secure logistic regression by using cryptographic primitives like HE [25] and MPC [26]. The existing works [9–24] can be divided into two categories: PPCLR with a TTP coordinator [9–16] and PPCLR without a TTP coordinator [17–24]. A summary of the existing works [9–24] follows.

As for the PPCLR with TTP coordinator [9–16], Hardy et al. [9] described a privacy-preserving federated LR scheme by employing additively HE scheme [25], which centralizes two vertically distributed training data in one TTP coordinator, but the approximation of non-polynomial function reduces the model accuracy. Yang et al. [10] shown a quasi-Newton way for achieving vertical federated LR model based on the additively HE scheme [25]. Using an additive HE [31] and an aggregation method [32], Mandal et al. [11] built a privacy-preserving regression analysis protocol on the horizontally distributed high-dimensional data. Employing an additive secret sharing technique [33], Zhang et al. [12] proposed a privacy-preservation collaborative learning for ensuring local training data and model information. Liu et al. [13] introduced a collaborative learning platform, which supports multiple institutions to build machine learning models collaboratively over large-scale horizontally and vertically partitioned data. By means of MPC from additive secret sharing [34, 35], Cock et al. [14] proposed a protocol for securely training LR model over distributed parties, where TTP initializer assigns relevant random values to two computing severs. Based on multi-key fully HE [36], Wang et al. [15] designed a secure cloud-edge collaborative LR system, which employs the cloud centre and edge nodes to collaboratively train a LR model over encrypted data. Zhu et al. [16] proposed a value-blind LR training method in a collaborative setting based on HE [25], where the central server updates model parameters without access to the training data and intermediate values, and model parameters are shared among the central server with collaborating parties. However, it's inherently difficult to establish a third party trusted by any data owners in a real-world scenario. Moreover, data interactions between data owners and TTP raise the risk of leakage of sensitive data of the data owner.

To decrease the complexity of training a joint model for any two parties, by removing the TTP coordinator, Yang et al. [17] constructed a parallel distributed LR method for vertical federated learning based on HE [25], which allows two parties to jointly train models without the help of a TTP coordinator. Using the secure MPC protocol and ciphertext domain conversion protocol [37], Chen et al. [18] presented a collaborative learning system for jointly building better models over vertically partitioned multiple data. Based on the HE scheme [29], Li et al. [19] introduced a collaborative learning method on encrypted data, which could securely train LR models over vertically distributed data from both data owners. Based on asynchronous gradient sharing and HE algorithm [29], Wei et al. [20] designed a two-parties collaborative LR protocol, which can train securely joint model on the vertically partitioned data. Chen et al. [21] combined the HE [25] and secret sharing [38] to build securely LR model on the vertically distributed large-scale sparse training data. Over the horizontally partitioned data, based on secure MPC protocol, Ghavamipour et al. [22] described two methods to train LR model in a privacy-preserving manner. However, each data owner requires to compute multiple shares of its sensitive training data and sends them separately to each non-collusion computation party, this leads to heavy communication costs. He et al. [23] constructed a vertical federated LR method through a HE algorithm [25], which uses a piecewise function to ensure the accuracy of the loss function, but this results in a loss of efficiency. With the HE scheme [25] and differential privacy algorithm [39], Sun et al. [24] introduced a vertical federated LR solution, which alleviates the constraints on feature dimensions. However, the existing PPCLR schemes [17–24] without a TTP coordinator lead to high communication and computational overhead.

3. Preliminaries

3.1. System Architecture

For ease of reading, the definitions of the symbols in our VCLR scheme are displayed in Table 1. As is shown in Figure 1, the system architecture of our VCLR includes two semi-trusted entities: and . and hold the vertically distributed datasets and , respectively. and have the same sample space but different feature distribution, namely, holds the part of the features, holds another part of the features and the label. cooperates with to train a shared LR model without disclosing the privacy of training data. Specifically, generates [29], sends polynomial-degree , coefficient-modulus , scaling factor , public key , relinearization key , galois key to , and securely store secret key . Then, encrypts its own data with , and sends the encrypted data to . Finally, and jointly execute VCLR algorithm to obtain the training result.

3.2. Homomorphic Encryption

HE allows direct operations on ciphertext without decryption, and can ensure that the computation on the ciphertext is consistent with the computation on the plaintext. Cheon et al. [29] introduced an approximate HE algorithm from ring learning with errors (RLWE) [40], which supports the following operations.(1): Given the parameters , it generates , , , for .(2): Given a message vector and , it generates a ciphertext .(3): Given and , it generates a message vector .(4): Given and , it generates a ciphertext .(5): Given and a message vector , it generates a ciphertext .(6): Given a ciphertext list , it generates a ciphertext .(7): Given and , it generates a ciphertext .(8): Given and , it generates a ciphertext .(9): Given , and , it generates a ciphertext .(10): Given , and , it generates a ciphertext .(11): Given , and , it rotates left by rotation value , and generates a ciphertext .

3.3. Logistic Regression

Let a dataset includes samples , where an input maps to a binary dependent variable , the goal of binary LR is to compute weights that minimizes the log-likelihood loss function , where . Assuming that and denote the model weights and learning rate at iteration , respectively, the gradient descent (GD) is able to be used to compute the extremum of by . Since the HE scheme (CKKS) [29] is not able to effectively support non-polynomial arithmetic operations, we use a 7-degree polynomial function f to approximate sigmoid function over the domain [-8, 8], where , , , , and .

4. Privacy-Preserving Vertical Collaborative Logistic Regression

Over vertically distributed datasets and , we propose a VCLR scheme based on an approximate HE [29]. Using batching method in approximate HE, the proposed scheme packs a message vector with multiple messages into a plaintext with multiple plaintext slots, and performs parallel training based on SIMD. For ease of readability, we give the Algorithm , which can be found in Appendix. We assume that the samples of and held by and have been aligned, namely,

and consist of samples of the form and , respectively, where . Each column of denote the features. The last column of represents the label, and other columns of represent the features. cooperates with to train a shared LR model without revealing the data privacy. Suppose , the details of the proposed VCLR are described below. Input: and for and respectively Output: and for and respectively Preprocessing: 1: computes , , , lets , generates , encrypts dataset into ciphertexts , lets , encrypts the initial weight into one ciphertext , and sends , , , , , , , , to . 2: computes , , , lets , , sets data set into message vectors , , , , lets , sets the initial weight into one message vector , sets the learning rate into one message vector , lets , sets the message vectors , sets the lists , , . Training: 3: computes 4: for ( to ) do 5: computes 6: end for 7: for ( to ) do 8: for ( to ) do 9: computes 10: computes 11: computes 12: computes 13: computes 14: computes 15: end for 16: computes 17: computes 18: computes 19: computes 20: chooses random message vector 21: computes 22: sends to 23: computes to 24: sets 25: sets 26: sets 27: sends to 28: sets 29: sets 30: computes 31: end for Reconstructing: 32: sends to 33: computes 34: sends to 35: computes 36: return: and for and respectively

5. Performance Evaluation

We execute the performance comparisons among our VCLR scheme and existing schemes [9, 21]. We perform all experiments on a 64-bits Linux system machine with i7 CPU and 16 GB memory. For all experiments, we choose the initial weights , , the learning rate , and the maximum number of iterations . The schemes [9, 21] choose the Paillier cryptosystem [25] to provide the additive HE operations, the proposed scheme uses the Microsoft SEAL library [41] to instantiate the HE operations [29]. To achieve bits security, for the schemes [9, 21], we set the prime number bits and bits; for the proposed scheme, we choose the polynomial-degree , the coefficient-modulus , and the scaling factor . On three publicly available datasets [30]: - Umaru Impact Study, - Myocardial Infarction from Edinburgh, and - Nhanes III, we compare the proposed scheme and schemes [9, 21] in terms of training time, accuracy, F1-score, AUC. has the first 4 features of all samples of , has the last 4 features and labels of all samples of ; has the first 5 features of all samples of , has the last 4 features and labels of all samples of ; has the first 8 features of all samples of , has the last 7 features and labels of all samples of . We get the validity of the experimental results by using 5-fold cross-validation. All experiment results are shown as the average of 10 experiments. The performance comparisons between the proposed scheme and schemes [9,21} are described in Table 2, in which '' '' denotes ''satisfied'' and '' '' means ''unsatisfied''. From Table 2, we can see that our VCLR scheme outperforms existing schemes [9, 21] in both training time and model performance, and does not need a TTP coordinator.

From Figure 2, we can get that, for dataset , in our scheme, the training time of the model is 0.86 min, which is decreased by nearly 32.3% and 60.6% compared with that of [9, 21], respectively; for dataset , in our scheme, the training time of the model is 1.49 min, which is reduced by almost 32.3% and 56.3% in comparison to that of [9, 21], respectively; for dataset , in our scheme, the training time of the model is 5.87 min, which is nearly 47.3% and 72.5% less than that of [9, 21], respectively.

From Figure 3, we can get that, for dataset , in our scheme, the accuracy of the model is 74.4%, which has nearly 0.3% and 0.3% improvement compared with that of [9, 21], respectively; for dataset , in our scheme, the accuracy of the model is 92.3%, which has an increase of almost 1.0% and 1.4% in comparison to that of [9, 21], respectively; for dataset , in our scheme, the accuracy of the model is 85.7%, which is nearly 3.0% and 3.0% higher than that of [9, 21], respectively.

From Figure 4, we can get that, for dataset , in our scheme, the F1-score of the model is $85.2\%$, which has nearly 0.1% and 0.1% improvement compared with that of [9, 21], respectively; for dataset , in our scheme, the F1-score of the model is 78.0%, which has an increase of almost 0.5% and 2.7% in comparison to that of [9, 21], respectively; for dataset , in our scheme, the F1-score of the model is 61.9%, which is nearly 1.8% and 1.8% higher than that of [9, 21], respectively.

From Figure 5, we can get that, for dataset , in our scheme, the AUC of the model is 0.58, which has nearly 0.01 and 0.02 improvement compared with that of [9, 21], respectively; for dataset , in our scheme, the AUC of the model is 0.96, which is the same as that of [9, 21]; for dataset , in our scheme, the AUC of the model is 0.91, which is nearly 0.03 and 0.02 higher than that of [9, 21], respectively.

6. Security Analysis

In the semi-honest model [42], we let the parties and know , , , and only has . The proposed VCLR scheme belongs to secure two-party computation, which denotes an objective functionality . For the inputs , where is from party and is from party , the outputs are random. is the output for , and is for , and neither party can know more private information than its output. According to the simulation-based security [43], we perform a security analysis of our VCLR scheme.

Definition 1. Let be a deterministic functionality and be a secure two-party computation protocol to compute . Given 's input , 's input , and security level , the views of and in the protocol are denoted as and , where and are the messages received by and . We think that, in semi-honest model, can securely calculate if there are the probabilistic polynomial-time (PPT) simulators and , such that

Theorem 1. Assuming that the and do not collude with each other, and the HE scheme (CKKS) [29] satisfies the semantic security, our VCLR scheme can ensure the security in semi-honest model.
Security Proof. Security proof of our VCLR scheme follows the simulation-based security [43]. We prove that we are able to build and , such thatwhere and denote the views of and , respectively. Next, we show that the above two equations are indistinguishable for the corrupted parties and , respectively.
Against corrupted : We build that, when given , 's input and 's output , is able to simulate 's view in the execution of the protocol. In this respect, we then analyze 's view in the execution of the protocol. The only message gets is the ciphertext . Therefore, consists of 's secret key , random message vector and ciphertext . Given , , and , generates a simulation of . encrypts with into , and generates the output . Therefore, we can obtain two equations as follows:Through the above analysis, we are able to get that probability distribution of 's view and 's output is indistinguishable. Therefore, the proposed VCLR scheme is secure against the corrupted in semi-honest model.
Against corrupted : We build that, when given , 's input and 's output , is able to simulate 's view in the execution of the protocol. For this reason, we analyze 's view in the execution of the protocol. does not receive any message vectors from . Therefore, consists of 's input and random message vector . Given , , and , generates a simulation of by outputting . Therefore, we have the following two equations:Through the above analysis, we are able to get that probability distributions of 's view and 's output are indistinguishable. Therefore, the proposed VCLR scheme is secure against the corrupted in the semi-honest model.

7. Conclusion

In this paper, to improve the efficiency of the collaborative LR, based on an approximate HE algorithm, we propose a VCLR over vertically distributed data while realizing the security of training data and the privacy of model parameters for all parties. We then evaluate the proposed scheme on the public datasets. The evaluation results show that our VCLR scheme achieves a better performance in terms of joint training time and model performance in comparison to that of existing schemes [9, 21]. Specifically, the training time of the model is decreased by almost 32.3%-72.5%; the accuracy, F1-score, and AUC of the model have nearly 0.3% - 3.0%, 0.1% - 2.7% and 0 - 0.03 improvement, respectively. In the future, we will extend our method for supporting more complex ML, and deploy our scheme for practical applications.

Appendix

Input: Output: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: return: Input: Output: 1: 2: for (; ; ) do 3: 4: 5: end for 6: return: Input: Output: 1: 2: for (; ; ) do 3: 4: 5: end for 6: return: Input: Output: 1: 2: for (; ; ) do 3: 4: 5: end for 6: return:

Data Availability

Previously reported datasets were used to support this study and are available at https://doi.org/10.1186/s12920-018-0401-7. These prior studies (and datasets) are cited at relevant places within the text as references [30].

Conflicts of Interest

The authors declare no conflicts of interest.

Acknowledgments

This work is supported by the National Key R\&D Program of China (Grant No. 2019YFE0113200) the National Natural Science Foundation of China (Grant No. U19B2021, No. 61901317), and the Guangdong Basic and Applied Basic Research Foundation (Grant No. 2020A1515110496, No. 2020A1515110079).

References

P. Mohassel and Y. Zhang, “SecureML: a system for scalable privacy-preserving machine learning,” in Proceedings of the IEEE Symposium on Security and Privacy, pp. 19–38, San Jose, CA, USA, May, 2017.
View at: Google Scholar
J. M. Cort, A. Tchernykh, M. Babenko, B. Pulido-Gayt, and G. Radchenko, “Multi-cloud privacy-preserving logistic regression,” in Proceedings of the 7th Russian Supercomputing Days, pp. 457–471, Moscow, Russia, September, 2021.
View at: Google Scholar
Y. Guan, “Application of logistic regression algorithm in the diagnosis of expression disorder in Parkinson’s disease,” in Proceedings of the 2nd International Conference on Information Technology, Big Data and Artificial Intelligence, pp. 1117–1120, Chongqing, China, December, 2021.
View at: Google Scholar
E. Dumitrescu, S. Hué, C. Hurlin, and S. Tokpavi, “Machine learning for credit scoring: improving logistic regression with non-linear decision-tree effects,” European Journal of Operational Research, vol. 297, no. 3, pp. 1178–1192, 2022.
View at: Publisher Site | Google Scholar
J. Feng, W. Zhang, Q. Pei, J. Wu, and X. Lin, “Heterogeneous computation and resource allocation for wireless powered federated edge learning systems,” IEEE Transactions on Communications, vol. 70, no. 5, pp. 3220–3233, 2022.
View at: Publisher Site | Google Scholar
J. Feng, L. Liu, Q. Pei, and K. Li, “Min-max cost optimization for efficient hierarchical federated learning in wireless edge networks,” IEEE Transactions on Parallel and Distributed Systems, vol. 33, no. 11, pp. 1–2700, 2022.
View at: Publisher Site | Google Scholar
J. Du, F. R. Yu, X. Chu, J. Feng, and G. Lu, “Computation offloading and resource allocation in vehicular networks based on dual-side cost minimization,” IEEE Transactions on Vehicular Technology, vol. 68, no. 2, pp. 1079–1092, 2019.
View at: Publisher Site | Google Scholar
Q. Yang, Y. Liu, T. Chen, and Y. Tong, “Federated machine learning: concept and applications,” ACM Transactions on Intelligent Systems and Technology, vol. 10, no. 2, pp. 1–19, 2019.
View at: Publisher Site | Google Scholar
S. Hardy, W. Henecka, H. Ivey-Law et al., “Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption,” 2017, https://arxiv.org/abs/1711.10677.
View at: Google Scholar
K. Yang, T. Fan, T. Chen, Y. Shi, and Q. Yang, “A quasi-Newton method based vertical federated learning framework for logistic regression,” 2019, https://arxiv.org/abs/1912.00513.
View at: Google Scholar
K. Mandal and G. Gong, “PrivFL: practical privacy-preserving federated regressions on high-dimensional data over mobile networks,” in Proceedings of the 10th ACM SIGSAC Conference on Cloud Computing Security Workshop, pp. 57–68, London, England, November 2019.
View at: Google Scholar
Y. Zhang, G. Bai, X. Li, C. Curtis, and R. K. L. Ko, “PrivColl: practical privacy-preserving collaborative machine learning,” in Proceedings of the 25th European Symposium on Research in Computer Security, pp. 399–418, Guildford, UK, September 2020.
View at: Google Scholar
Y. Liu, T. Fan, T. Chen, Q. Xu, and Q. Yang, “FATE: an industrial grade platform for collaborative learning with data protection,” Journal of Machine Learning Research, vol. 22, no. 226, pp. 1–6, 2021.
View at: Google Scholar
M. D. Cock, R. Dowsley, A. C. A. Nascimento, D. Railsback, J. W. Shen, and A. Todoki, “High performance logistic regression for privacy-preserving genome analysis,” BMC Medical Genomics, vol. 14, no. 1, pp. 1–18, 2021.
View at: Google Scholar
C. Wang, J. Xu, and L. Yin, “A secure cloud-edge collaborative logistic regression model,” in Proceedings of the IEEE Congress on Cybermatics/14th IEEE International Conference on Internet of Things/14th IEEE International Conference on Cyber, Physical and Social Computing/17th IEEE International Conference on Green Computing and Communications/7th IEEE International Conference on Smart Data, pp. 244–253, Electric Network, Melbourne, Australia, December, 2021.
View at: Google Scholar
R. Zhu, C. Jiang, X. Wang, S. Wang, H. Zheng, and H. Tang, “Privacy-preserving construction of generalized linear mixed model for biomedical computation,” Bioinformatics, supplement_1, vol. 36, pp. 128–135, 2020.
View at: Publisher Site | Google Scholar
S. Yang, B. Ren, X. Zhou, and L. Liu, “Parallel distributed logistic regression for vertical federated learning without third-party coordinator,” 2019, https://arxiv.org/abs/1911.09824.
View at: Google Scholar
C. Chen, B. Wu, L. Wang, C. Chen, and B. Zhang, “Nebula: a scalable privacy-preserving machine learning system in ant financial,” in Proceedings of the 29th ACM International Conference on Information and Knowledge Management, Electr Network, pp. 3369–3372, Ireland, October, 2020.
View at: Google Scholar
Q. Li, Z. Huang, W. J. Lu et al., “HomoPAI: a secure collaborative machine learning platform based on homomorphic encryption,” in Proceedings of the 36th International Conference on Data Engineering, pp. 1713–1717, Dallas, USA, April, 2020.
View at: Google Scholar
Q. J. Wei, Q. Li, Z. P. Zhou, Z. Q. Ge, and Y. G. Zhang, “Privacy-preserving two-parties logistic regression on vertically partitioned data using asynchronous gradient sharing,” Peer-to-Peer Networking and Applications, vol. 14, no. 3, pp. 1379–1387, 2020.
View at: Publisher Site | Google Scholar
C. Chen, J. Zhou, L. Wang et al., “When homomorphic encryption marries secret sharing: secure large-scale sparse logistic regression and applications in risk control,” in Proceedings of the 27th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 2652–2662, Singapore, August, 2021.
View at: Google Scholar
A. R. Ghavamipour, F. Turkmen, and X. Jiang, “Privacy-preserving logistic regression with secret sharing,” BMC Medical Informatics and Decision Making, vol. 22, no. 1, pp. 89–11, 2022.
View at: Publisher Site | Google Scholar
D. He, R. Du, S. Zhu, M. Zhang, K. Liang, and S. Chan, “Secure logistic regression for vertical federated learning,” IEEE Internet Computing, vol. 26, no. 2, pp. 61–68, 2022.
View at: Publisher Site | Google Scholar
H. Sun, Z. Wang, Y. Huang, and J. Ye, “Privacy-preserving vertical federated logistic regression without trusted third-party coordinator,” in Proceedings of the 6th International Conference on Machine Learning and Soft Computing, pp. 132–138, Singapore, January, 2022.
View at: Google Scholar
P. Paillier, “Public-key cryptosystems based on composite degree residuosity classes,” in Proceedings of the Advances in Cryptology - EUROCRYPT 1999: International Conference on the Theory and Application of Cryptographic techniques, pp. 223–238, Prague, Czech Republic, May, 1999.
View at: Google Scholar
A. C. Yao, “Protocols for secure computations,” in Proceedings of the 23rd Annual IEEE Symposium on Foundations of Computer Science, pp. 1–5, Chicago, Illinois, USA, November, 1982.
View at: Google Scholar
Z. Li, Z. Huang, C. Chen, and C. Hong, “Quantification of the leakage in federated learning,” 2019, https://arxiv.org/abs/1910.05467.
View at: Google Scholar
A. Shamir, “How to share a secret,” Communications of the ACM, vol. 22, no. 11, pp. 612-613, 1979.
View at: Publisher Site | Google Scholar
J. H. Cheon, A. Kim, M. Kim, and Y. Song, “Homomorphic encryption for arithmetic of approximate numbers,” in Proceedings of the Advances in Cryptology - ASIACRYPT 2017: 23rd International Conference on the Theory and Application of Cryptology and Information Security, pp. 409–437, Hong Kong, China, December, 2017.
View at: Google Scholar
A. Kim, Y. Song, M. Kim, K. Lee, and J. H. Cheon, “Logistic regression model training based on the approximate homomorphic encryption,” BMC Medical Genomics, vol. 11, no. S4, pp. 83–31, 2018.
View at: Publisher Site | Google Scholar
M. Joye and B. Libert, “Efficient cryptosystems from 2^k-th power residue symbols,” in Proceedings of the Advances in Cryptology - EUROCRYPT 2013: 32nd Annual International Conference on the Theory and Applications of Cryptographic Techniques, pp. 76–92, Athens, Greece, May, 2013.
View at: Google Scholar
K. Bonawitz, V. Ivanov, B. Kreuter, A. Marcedone, and K. Seth, “Practical secure aggregation for privacy-preserving machine learning,” in Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 1175–1191, Dallas, Texas, USA, November, 2017.
View at: Google Scholar
B. Dan, S. Laur, and J. Willemson, “Sharemind: a framework for fast privacy-preserving computations,” in Proceedings of the 13th European Symposium on Research in Computer Security, pp. 192–206, Málaga, Spain, October, 2008.
View at: Google Scholar
M. De Cock, R. Dowsley, C. Horst et al., “Efficient and private scoring of decision trees, support vector machines and logistic regression models based on pre-computation,” IEEE Transactions on Dependable and Secure Computing, vol. 16, no. 2, pp. 217–230, 2019.
View at: Publisher Site | Google Scholar
D. Reich, A. Todoki, R. Dowsley, M. D. Cock, and A. Nascimento, “Privacy-preserving classification of personal text messages with secure multi-party computation: an application to hate-speech detection,” in Proceedings of the 33rd International Conference on Neural Information Processing Systems, pp. 3757–3769, Vancouver, Canada, December, 2008.
View at: Google Scholar
H. Chen, W. Dai, M. Kim, and Y. Song, “Efficient multi-key homomorphic encryption with packed ciphertexts with application to oblivious neural network inference,” in Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, pp. 395–412, London, United Kingdom, November, 2019.
View at: Google Scholar
W. Fang, C. Chen, J. Tan et al., “A hybrid-domain framework for secure gradient tree boosting,” 2020, https://arxiv.org/abs/2005.08479.
View at: Google Scholar
E. Boyle, N. Gilboa, and Y. Ishai, “Function secret sharing,” in Proceedings of the Advances in Cryptology – EUROCRYPT: 34th Annual International Conference on the Theory and Applications of Cryptographic Techniques, pp. 337–367, Sofia, Bulgaria, April, 2015.
View at: Google Scholar
C. Dwork, F. McSherry, K. Nissim, and A. Smith, “Calibrating noise to sensitivity in private data analysis,” Theory of Cryptography, Springer, New York, NY, USA, pp. 265–284, 2006.
View at: Google Scholar
V. Lyubashevsky, C. Peikert, and O. Regev, “On ideal lattices and learning with errors over rings,” in Proceedings of the Advances in Cryptology - EUROCRYPT 2010: 29th Annual International Conference on the Theory and Applications of Cryptographic Techniques, pp. 1–23, French Riviera, June, 2010.
View at: Google Scholar
Microsoft Research and W. A. Redmond, “Microsoft SEAL (release 4.0),” mar, 2022, https://github.com/Microsoft/SEAL.
View at: Google Scholar
O. Goldreich, Foundations of Cryptography: Volume.I, Basic Applications, Cambridge University Press, Cambridge , UK, 2006.
A. Datta, J. C. Mitchell, and A. Ramanathan, “On the relationships between notions of simulation-based security,” Journal of Cryptology, vol. 21, pp. 492–546, 2008.
View at: Google Scholar

Copyright

Copyright © 2022 Xiaopeng Yu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

266

Downloads

420

Citations

Security and Communication Networks

Blockchain Empowered Integration of Sensing, Communication, and Computing for the Internet of Things

Privacy-Preserving Vertical Collaborative Logistic Regression without Trusted Third-Party Coordinator

Abstract

1. Introduction

2. Related Works

3. Preliminaries

3.1. System Architecture

3.2. Homomorphic Encryption

3.3. Logistic Regression

4. Privacy-Preserving Vertical Collaborative Logistic Regression

5. Performance Evaluation

6. Security Analysis

7. Conclusion

Appendix

Data Availability

Conflicts of Interest

Acknowledgments

References

Copyright