Table of Contents Author Guidelines Submit a Manuscript
Security and Communication Networks
Volume 2017, Article ID 3910126, 19 pages
https://doi.org/10.1155/2017/3910126
Research Article

Privacy-Preserving -Means Clustering under Multiowner Setting in Distributed Cloud Environments

State Key Laboratory of Complex Electromagnetic Environment Effects on Electronics and Information System, National University of Defense Technology, Changsha, China

Correspondence should be addressed to Hong Rong; moc.liamtoh@tdun_gnoh.r

Received 16 June 2017; Revised 15 September 2017; Accepted 27 September 2017; Published 13 November 2017

Academic Editor: Xiangyang Luo

Copyright © 2017 Hong Rong et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

With the advent of big data era, clients who lack computational and storage resources tend to outsource data mining tasks to cloud service providers in order to improve efficiency and reduce costs. It is also increasingly common for clients to perform collaborative mining to maximize profits. However, due to the rise of privacy leakage issues, the data contributed by clients should be encrypted using their own keys. This paper focuses on privacy-preserving -means clustering over the joint datasets encrypted under multiple keys. Unfortunately, existing outsourcing -means protocols are impractical because not only are they restricted to a single key setting, but also they are inefficient and nonscalable for distributed cloud computing. To address these issues, we propose a set of privacy-preserving building blocks and outsourced -means clustering protocol under Spark framework. Theoretical analysis shows that our scheme protects the confidentiality of the joint database and mining results, as well as access patterns under the standard semihonest model with relatively small computational overhead. Experimental evaluations on real datasets also demonstrate its efficiency improvements compared with existing approaches.