Table of Contents Author Guidelines Submit a Manuscript
Journal of Applied Mathematics
Volume 2014, Article ID 381361, 12 pages
http://dx.doi.org/10.1155/2014/381361
Research Article

Sharing Privacy Protected and Statistically Sound Clinical Research Data Using Outsourced Data Storage

Center for Information Security Technologies (CIST), Korea University, Anam-dong, Seongbuk-gu, Seoul 136-713, Republic of Korea

Received 14 November 2013; Accepted 28 April 2014; Published 18 May 2014

Academic Editor: Jongsung Kim

Copyright © 2014 Geontae Noh et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

It is critical to scientific progress to share clinical research data stored in outsourced generally available cloud computing services. Researchers are able to obtain valuable information that they would not otherwise be able to access; however, privacy concerns arise when sharing clinical data in these outsourced publicly available data storage services. HIPAA requires researchers to deidentify private information when disclosing clinical data for research purposes and describes two available methods for doing so. Unfortunately, both techniques degrade statistical accuracy. Therefore, the need to protect privacy presents a significant problem for data sharing between hospitals and researchers. In this paper, we propose a controlled secure aggregation protocol to secure both privacy and accuracy when researchers outsource their clinical research data for sharing. Since clinical data must remain private beyond a patient’s lifetime, we take advantage of lattice-based homomorphic encryption to guarantee long-term security against quantum computing attacks. Using lattice-based homomorphic encryption, we design an aggregation protocol that aggregates outsourced ciphertexts under distinct public keys. It enables researchers to get aggregated results from outsourced ciphertexts of distinct researchers. To the best of our knowledge, our protocol is the first aggregation protocol which can aggregate ciphertexts which are encrypted with distinct public keys.

1. Introduction

Researchers can accelerate their learning curve if they are able to freely access clinical data from other studies. Such clinical data sharing in outsourced publicly available services is crucial to scientific progress in clinical research. The benefits of clinical data sharing using these services have been widely reported, including reduced research costs, reduced management costs, improvement of quality control, and reduced time in discovering diseases and dealing with them effectively. Through shared data, researchers access valuable information that they would not ordinarily obtain. In its policy statement on grants, the U.S. National Institute of Health (NIH) supports data sharing by requiring investigators to include a plan for data sharing or explain why data sharing is not possible.

The problem with clinical data sharing in outsourced publicly available services for research is that researchers can inadvertently violate patient privacy. HIPAA (Health Insurance Portability and Accountability Act) offers protection of patients’ personal health information, but it is difficult not to invade patient privacy while sharing clinical data in outsourced publicly available data storage services [1]. Therefore, researchers would rather not make their data publicly available than run the risk of violating HIPAA.

To mitigate privacy concerns, the HIPAA describes two ways to use and disclose clinical data for research purposes. Under the HIPAA Safe Harbor policy, clinical data should be deidentified so that patients are not individually identifiable. The HIPAA Safe Harbor policy stipulates that the data sharer should deidentify data by removing 18 specific data attributes, such as name, address, and all dates related to the individual patient, which may include birth date and date of death. (In addition, some researchers continue to assert that combinations of other data that are excluded from the HIPAA Safe Harbor policy could individually identify a specific person with nonnegligible probability, so they insist that there are more than 18 specific data attributes that should be included in the Safe Harbor policy [24].) Once identifying information has been removed, the deidentified data are no longer subject to the Institutional Review Board (IRB) overview. Alternatively, researchers may use anonymity techniques to deidentify patient information instead of removing all of the 18 or more data attributes that are required to be deidentified. To date, anonymity techniques have been proposed, such as -anonymity [57], -diversity [8], and -closeness [9].

It is useful to protect patient privacy with deidentification formats when sharing clinical data in outsourced publicly available data storage services, but doing so degrades the statistical accuracy since it makes it difficult to get precise statistical results. However, in some cases where accurate statistical data on patients are critical, the anonymity techniques for deidentification are not sufficient. Due to poorly deidentified data, researchers can make bad decisions. Therefore, there needs to be a privacy-preserving method for accurate statistical data.

In this work, we propose how to outsource clinical research data securely and how to control the outsourced data against potential breaches of privacy, while not compromising the accuracy of statistical results. For example, a malicious researcher could circumvent any encryption by asking for one piece of data on one patient; in this way, the researcher could ultimately obtain each patient’s private information. In this case, we propose a method that will foil such a malicious attempt.

The system environment we propose for hospitals, aggregator, and researchers is illustrated in Figure 1. In our system, each hospital outsources its own clinical data to cloud storage servers. The clinical data must be deidentified or encrypted to be stored publicly. We use a hybrid method to store the clinical data; that is, we deidentify the clinical data for approximate statistical data requests and encrypt numerical clinical data for accurate statistical data requests. Therefore, researchers can request both approximate and accurate statistical data. Researchers would obtain approximate statistical data directly from the cloud storage servers but cannot obtain accurate statistical data directly. When researchers would like to get accurate statistical data, they can get the data through the aggregator. The aggregator aggregates the requested data from the encrypted database stored in the cloud storage servers, and then asks each hospital to decrypt the aggregated data by consent. Hospitals can refuse the request of the aggregator, unless initial consents that have been obtained from patients allow the secondary research. Since there are ethical and practical issues associated with aggregating databases [10], hospitals should ensure that they are following “best practices” for their outsourced data, such as determining whether initial consents that have been obtained allow secondary research.

fig1
Figure 1: System environment ((a) store phase and (b) search phase).

Since clinical data should remain private beyond a patient’s lifetime, cryptographic long-term security is absolutely needed [11] in the area of managing clinical data. Therefore, we take advantage of a lattice-based homomorphic encryption in order to encrypt clinical data. Lattice-based cryptography is believed to be secure against quantum computing attacks and guarantees long-term security. RSA, ECC, and DLP cryptosystems, which have gained attention so far, could be attacked with quantum computers [12]. Quantum computing is not yet possible, but may become so in our lifetime. Furthermore, lattice-based cryptographic algorithms are more efficient than others in computational overhead because they require only linear operations on matrices such as addition, multiplication, and inverse.

In 2009, Gentry proposed the first fully homomorphic encryption scheme using ideal lattices [13]. In 2010, Gentry et al. have proposed a novel homomorphic encryption scheme (referred to as GHV homomorphic encryption scheme hereafter) that supports one multiplicative and polynomially many additive operations on encrypted data [14]. As a building block, we use a variant of the GHV homomorphic encryption scheme, which supports only additive operations. This can make it possible to aggregate ciphertexts which are encrypted under distinct public keys. Due to this property, the aggregator can aggregate the outsourced encrypted data from hospitals. Therefore, once hospitals outsource their clinical data, they do not need to encrypt the clinical data again for individual researchers. Each hospital only has to encrypt the clinical data, and then it outsources the encrypted data.

Contributions. In this paper, we propose a controlled secure aggregation protocol in sharing clinical research data to balance the interests between hospitals and researchers. The main contributions of this paper are as follows.(i)Researchers can get approximate statistical data from deidentified clinical data directly. Researchers can also obtain accurate and aggregated clinical data from the encrypted database through the aggregator by obtaining each hospital’s consent.(ii)We take advantage of a lattice-based homomorphic encryption which is secure against quantum computing attacks. Therefore, our protocol resists quantum attacks and could remain secure in the long term.(iii)The aggregator can aggregate encrypted clinical data which are encrypted with distinct public keys. Therefore, hospitals do not have to encrypt the clinical data again whenever researchers send requests.

To the best of our knowledge, our protocol is the first protocol which takes advantage of the lattice-based homomorphic encryption in order to share outsourced clinical research data.

Organization. The remainder of this paper is organized as follows. Section 2 provides related works and background. Section 3 presents our controlled secure aggregation protocol. We present our secure clinical data aggregation system in Section 4 and analyze it in Section 5. We provide our conclusions in Section 6.

2. Related Works and Background

In this section, we present related works and background.

2.1. Data Aggregation Based on Homomorphic Encryption

In 2004, Hacıgümüş et al. proposed an aggregation protocol over encrypted relational databases [15]. They designed the aggregation protocol using the PH (Privacy Homomorphism) which supports additive and multiplicative operations. In the aggregation protocol, permitted users can get the accurate and aggregated data. However, Mykletun and Tsudik showed that the aggregation protocol using the PH is not secure against ciphertext-only attacks [16]. Since then, various aggregation protocols over encrypted data have been proposed in the literatures [1723]. Among those protocols, few literatures have focused on the health-care environment. In addition, most protocols considered the aggregation for a single provider’s data.

Molina et al. [22] designed the aggregation protocol, HICCUPS, using homomorphic encryption in the health-care environment. In HICCUPS, clinical data of multiple providers can be aggregated as follows: caregivers who store clinical data on their own database are randomly chosen as the aggregator. When a researcher requests the aggregated result, the aggregator aggregates the encrypted clinical data from each caregiver and sends the aggregated result to the researcher.

Since HICCUPS is not based on the outsourcing system, caregivers have to provide clinical data whenever a researcher requests a certain data. In addition, HICCUPS requires each caregiver to aggregate and encrypt clinical data with the researcher’s public key so that the aggregator can aggregate the encrypted clinical data. However, a malicious aggregator may want to have a researcher get a misleading result by intentionally excluding the encrypted clinical data from certain caregivers. Even though the malicious aggregator fabricates the aggregated result on purpose, there is no way for a researcher to detect the malicious behavior of the aggregator in HICCUPS.

To resolve the above issues, we design the controlled secure aggregation protocol which can aggregate outsourced ciphertexts under distinct public keys. Therefore, data providers (or hospitals) do not have to encrypt clinical data again, once they have outsourced their clinical data. Our protocol also enables a researcher to detect the malicious behavior of the aggregator. If the malicious aggregator excludes the encrypted clinical data from certain data providers on purpose, a researcher can detect that. Since each data provider (or hospital) collaboratively makes the aggregated data decryptable by a researcher, if the aggregated data is generated maliciously then the researcher cannot get a plausible result. The researcher gets the random result that cannot seem to be a meaningful result. Therefore, in our protocol, the researcher can be sure that the requested data are aggregated correctly.

2.2. Anonymity Techniques for Deidentification

Samarati and Sweeney introduced an anonymity technique called -anonymity [57]. They considered a relational database that consists of unique identifiers, quasi-identifiers, and sensitive attributes. A unique identifier is any attribute that is able to identify only one private individual, such as a personal ID, an e-mail address, or a cell phone number. A quasi-identifier is any set of attributes that can be joined with additional information to identify only one private individual, such as a zip code and a birthday. A sensitive attribute is any attribute that a data owner does not want to publish, such as health-care data. In order to preserve privacy, all unique identifiers must be removed and all quasi-identifiers must be anonymized. In -anonymity, each quasi-identifier is indistinguishable from at least other quasi-identifiers. Tables 1 and 2 are good examples of the original health-care data and the 4-anonymous pieces of health-care data.

tab1
Table 1: Original health-care data.
tab2
Table 2: 4-anonymous health-care data.

However, -anonymity is not secure against homogeneity attacks and background knowledge attacks [8]. For example, suppose that Alice knows that Bob is in his twenties and his zip code is 13032; then, Alice can identify that Bob must have a gastric ulcer from Table 2.

To mitigate these attacks, Machanavajjhala et al. introduced a new anonymity technique, called -diversity [8]. In -diversity, all the equivalence classes that have the same quasi-identifiers must have or more different sensitive attributes. Table 3 shows the -diverse kinds of health-care data.

tab3
Table 3: 3-diverse health-care data.

Since this result of -diversity, Li et al. showed that -diversity is insufficient for anonymity [9]. In -diversity, any information can be released if there exists a significant distribution difference between sensitive attributes of any equivalence class and all sensitive attributes. For example, if Alice knows Bob’s personal information such as his age and zip code, she will be able to identify from Table 3 that Bob has stomach-related disease (e.g., gastric ulcer, gastritis, and stomach cancer.)

To mitigate such potential problem, Li et al. introduced another new anonymity technique, called -closeness [9]. -closeness requires the distribution of sensitive attributes of any equivalence class to be similar to that of all sensitive attributes.

2.3. GHV Homomorphic Encryption Scheme

GHV homomorphic encryption scheme supports one multiplicative and polynomially many additive operations on encrypted data [14]. The security of the GHV homomorphic encryption scheme is based on the learning with errors (LWE) problem [24] which is one of the hardest assumptions so far.

Let be the security parameter, then other parameters are as follows:(i),(ii) is a positive integer by setting a prime number ,(iii), and(iv) is a Gaussian parameter.

Then the IND-CPA secure [25] GHV homomorphic encryption scheme GHV = {GHV.Key, GHV.Enc, GHV.Dec, GHV.Add, GHV.Mul} is as follows.(i)GHV.: given , , and , output a public key and a secret key such that , is invertible, and the elements of are bounded by . (To generate two matrices and , the trapdoor sampling algorithm in [26] can be used. For further details, please refer to [14].)(ii)GHV.: given and a plaintext , choose a uniformly random matrix and a Gaussian error matrix . Then output a ciphertext .(iii)GHV: given and a ciphertext , compute . Then output a plaintext .

In this algorithm, (iv) GHV.Add: given  ciphertexts, , …, , output   That is, the output of GHV.Dec is .(v) GHV.Mul: given two ciphertexts, and , output   That is, the output of GHV is .

In this paper, we use, as a building block, a variant version of the GHV homomorphic encryption scheme which supports only additive operations. We call this variant version of the GHV homomorphic encryption scheme a  homomorphic encryption scheme hereafter. We can replace , , , and of the GHV homomorphic encryption scheme with , , , and of the  homomorphic encryption scheme without any loss of security. Then the IND-CPA secure [25]  homomorphic encryption scheme  = .Key, .Enc, .Dec, .Add} is as follows.(i): given , , and , output a public key and a secret key such that , is invertible, and the elements of are bounded by .(ii): given and a plaintext , choose a uniformly random vector and a Gaussian error vector . Then output a ciphertext .(iii): given and a ciphertext , compute . Then output a plaintext .

2.4. Ajtai’s One-Way Function

Ajtai constructed a one-way function whose security is based on some well known approximation problems in lattices [27, 28].

Let be the security parameter, a positive integer, and a positive integer. For a uniformly random matrix and , the Ajtai’s one-way function  is as follows:

Note that the Ajtai’s one-way function  is regular [29]; that is, every output of  is uniformly distributed over [30].

3. Controlled Secure Aggregation Protocol

In this section, we propose our controlled secure aggregation protocol (CSA protocol hereafter). Let be the security parameter. Then we choose other parameters which are used in our CSA protocol as follows:(i),(ii) is a positive integer by setting a prime number ,(iii), and(iv) is a Gaussian parameter.

Suppose that there are n users, , a receiver  , and an aggregator .   Each user  outsources its own numerical data with encrypted form. We assume that the receiver wants to know an aggregated value , where  and is the number of elements in . We also assume that the receiver has a public key and a secret key by performing . Then the receiver can get by performing our CSA protocol.

Our CSA protocol consists of the following phases which are illustrated in Box 1: Key Generation, Encryption, Aggregation, re-Aggregation, and dec-Aggregation. In the Key Generation phase, each user generates a public key pair and a secret key. In the Encryption phase, each user encrypts its numerical data with his/her public key pair. In the Aggregation phase, ciphertexts generated under distinct public key pairs are aggregated. That is, to get an aggregated value, the receiver allows the aggregator to know . Then an aggregator aggregates each ciphertext on in this phase. In the re-Aggregation phase, the user eliminates from a ciphertext and adds .Enc which is a ciphertext on under the receiver’s public key . In this phase, a ciphertext under the public key is converted into a ciphertext under the public key maintaining the same plaintext. For example, suppose that and , then As a result, a ciphertext which is decryptable by a user is converted into a ciphertext which is decryptable by the receiver maintaining the same plaintext . This phase is needed in the dec-Aggregation phase to make an aggregated ciphertext decryptable by the receiver . In the dec-Aggregation phase, each user in turn makes an aggregated ciphertext decryptable by the receiver . Through these phases, the receiver can get an aggregated value .

figbox1
Box 1: CSA Protocol.

For example, we assume that  users participating in our controlled secure aggregation protocol CSA and each user has its numerical data . Each user () outsources its numerical data with encrypted form , using  algorithm. Suppose that the receiver wants to know an aggregated value . The receiver lets the aggregator know . Then runs the  algorithm to get .   gives , , and to , and and to . Then runs  to get sends to ; then runs  to get sends to ; then runs  to get . That is, is the same as , since is a sufficiently short value [14].

In the dec-Aggregation phase, any user can refuse to perform the  algorithm, unless initial consents that have been obtained from patients allow the secondary research. Then the receiver cannot get the result. The receiver can get the result only if all users perform the  algorithm. That means the receiver can get an aggregated value that he/she is seeking only by the unanimous consent of all fusers who have the data aggregated. That is the reason why we use the term “controlled” in the CSA protocol.

3.1. Security

We now analyze the security of our controlled secure aggregation protocol.

First, we show that our encryption  is IND-CPA secure. Intuitively, the only difference between our encryption scheme  and the  homomorphic encryption scheme is how to generate a vector . In the  homomorphic encryption scheme; the vector is chosen uniformly, but in our encryption scheme , it is generated by computing  using a randomly chosen vector . Since every output of the Ajtai’s one-way function  is uniformly distributed over , a vector  from our encryption scheme is uniformly distributed over . Therefore, the security of our encryption scheme  is the same as the  homomorphic encryption scheme.

Theorem 1. Our encryption scheme  provides IND-CPA if the  homomorphic encryption scheme provides IND-CPA and every output of the Ajtai’s one-way function  is uniformly distributed over .

Proof of Theorem 1. Formally, we show that if there exists an adversary breaking the IND-CPA security of our encryption scheme , there exists a challenger breaking the IND-CPA security of the  homomorphic encryption scheme.
Let  be an instance given to . chooses a uniformly random matrix and sends  to . chooses and sends to . outputs and returns , where . sends to , and outputs . Then outputs .

In our controlled secure aggregation protocol CSA, ciphertexts generated under distinct public key pairs can be aggregated. To decrypt the aggregated ciphertext which is generated by ciphertexts of users , each user needs to eliminate from and add  using the  algorithm. Therefore, we should show that  is secure.

Theorem 2. CSA.reAgg is an aggregation of secure ciphertexts if is an aggregation of secure ciphertexts including the ciphertext which is one of a pair of the ciphertexts .

Proof of Theorem 2. Let CSA.reAgg, , , , and , then Therefore, CSA.reAgg is the aggregation of the secure ciphertexts.

In the fifth step of the  algorithm, .Enc is added to be secure against an adversary who can eavesdrop on our controlled secure aggregation protocol CSA. Assume that in the fifth step of CSA.reAgg, then any adversary who can eavesdrop on our controlled secure aggregation protocol CSA is able to get , , and . Then, can compute the following: Since is a sufficiently short value, is the same as [14]. Therefore, can decrypt without the secret key .

In the dec-Aggregation phase, after all the users eliminate from , the result is the same form as a ciphertext generated under the public key . Therefore, the receiver can decrypt it.

4. Secure Clinical Data Aggregation System

In this section, we provide an overview of our system and how it works.

4.1. System Overview

The proposed system environment consists of hospitals, an aggregator, and researchers. In our system, each hospital outsources its clinical data to cloud storage servers. Hospitals use the following hybrid method to store data when outsourcing their clinical research data in cloud servers: they make anonymous data publicly available in the cloud servers using anonymity techniques for deidentification in Section 2.1. In addition, hospitals also store their encrypted numerical data together with the anonymous data for statistical accuracy.

Suppose that there are hospitals, , and that want to share their clinical data and have public and sescret key pairs , and of our CSA protocol, respectively. Suppose that there is an aggregator and a researcher who has a public and secret key pair of the  homomorphic encryption scheme. The original clinical data of hospitals are shown in Table 4. Each hospital outsources its clinical data to cloud storage servers. That is, stores deidentified nonsensitive data (such as zip code and age), sensitive data in the raw, and numerical data (such as age) using  on cloud servers. Both anonymous and encrypted clinical data on cloud servers are shown in Table 5, where is an output of  is an output of , and so on.

tab4
Table 4: Original clinical data.
tab5
Table 5: Anonymous and encrypted clinical data stored on cloud servers.

When the researcher wants to know the rough estimate of the age of the hospitals’ cancer patients, can directly get the estimate data from the cloud servers. When wants to figure out the average age of the hospitals’ cancer patients, can ask the aggregator for an aggregated age. sums up the ages of cancer patients in each hospital, then totals the ages across hospitals. That is, performs homomorphic additions to ciphertexts under the same public key, such as .Add.Add, and . After performing homomorphic additions, runs  to get an aggregated ciphertext . In order to allow to know the aggregated age, each hospital in turn gives its consent. , and in turn perform CSA.reAgg, CSA.reAgg and CSA.reAgg to get ,  , and , respectively. After the agreement procedure, can get the aggregated ciphertext under his/her public key. Then, can get an average age of the cancer patients, , by performing .Dec, that is, the sum of the age of the cancer patients.

4.2. Attack Model

For designing a secure clinical data aggregation system, the following conditions should be considered.(1)(Anonymity) Adversaries should not exactly identify only one private individual after looking ciphertexts on cloud storage servers.(2)(Confidentiality) Adversaries should not reveal any information from the encrypted numerical data on cloud storage servers.(3)(External security) The third parties (external adversaries) should not know any information with information flow.(4)(Internal security) Hospitals and researchers (internal adversaries) except the researcher who sends a request should not know any information with information flow.

4.3. Our System

Now, we propose our secure clinical data aggregation system (SCDA system hereafter). Let be the security parameter. Then we choose other parameters which are used in our SCDA system as follows:(i),(ii) is a positive integer by setting a prime number ,(iii), and(iv) is a Gaussian parameter.

Suppose that there are  hospitals , researchers , and an aggregator . We assume that the th hospital has  tuples and the relational database in the cloud servers has nC numerical clinical data attributes.

The building blocks of our SCDA system are our controlled secure aggregation protocol CSA and the  homomorphic encryption scheme . Our SCDA system consists of the following phases which are illustrated in Box 2: Preparation, Data Publication, Query, Aggregation, Consent, and Acquisition. In the Preparation phase, each hospital and each researcher generates a public key (pair) and a secret key. In the Data Publication phase, each hospital encrypts its numerical clinical data with his/her public key pair and makes anonymous data using anonymity techniques for deidentification. Then each hospital stores them in the cloud servers. In the Query phase, one of the researchers asks the aggregator for an aggregated clinical data. In the Aggregation phase, ciphertexts generated under distinct hospitals are aggregated. In the Consent phase, each hospital goes through the procedure for consent. In the Acquisition phase, the researcher can get the aggregated clinical data.

figbox2
Box 2: SCDA Protocol.

5. Analysis

In this section, we analyze the security and efficiency in our protocol.

5.1. Secure Parameters

We follow parameters which are defined in our SCDA system from Section 4.2. Let be the security parameter, then other parameters are as follows:(i),(ii),(iii) is a prime number,(iv), and(v) is a Gaussian parameter.

Using the above parameters, a ciphertext pair is only six times as large as a plaintext because and the lengths of a plaintext and a ciphertext pair are bits and bits, respectively. Our SCDA system supports additive operations in common with [14]. In the Query phase of our SCDA system, therefore, the number of including tuples in a request for an aggregated data must be less than . Table 6 provides examples of secure parameters.

tab6
Table 6: Examples of secure parameters.
5.2. Security

We now analyze that our SCDA system is anonymous, confidential, and secure against external and internal adversaries.

Theorem 3 (anonymity). Our SCDA system is anonymous if the anonymity techniques which are used in our SCDA system are anonymous.

Proof of Theorem 3. We use anonymity techniques for deidentification, which guarantee anonymity. In our SCDA system, each hospital outsources its clinical data to cloud storage servers using these techniques for researchers. Therefore, researchers, other hospitals, and the third party cannot identify any individual using ciphertexts on cloud storage servers.

Besides using the anonymity techniques that are mentioned in Section 2.2, we could use the technique that is used to make statistical database differentially private. In 2006, Dwork introduced the new concept, called “differential privacy,” which provides a strong privacy guarantee in statistical databases [31]. To achieve the differential privacy, we could add appropriately chosen random noise in statistical databases.

Theorem 4 (confidentiality). Our encrypted numerical data are confidential if the  homomorphic encryption scheme is IND-CPA secure and every output of the Ajtai’s one-way function  is uniformly distributed over .

Proof of Theorem 4. By Theorem 1 in Section 3.1, the encrypted numerical data are confidential.

Theorem 5 (external and internal security). Our SCDA system is secure against external and internal adversaries if the anonymity techniques for deidentification are anonymous and the  homomorphic encryption scheme is secure.

Proof of Theorem 5. All clinical data outsourced on cloud storage servers are anonymous and confidential since all hospitals use the anonymity techniques for deidentification and the  homomorphic encryption scheme. All transmitted data in our SCDA system are encrypted by the  homomorphic encryption scheme with fresh random numbers. Therefore, our SCDA system is secure against external and internal adversaries if the anonymity techniques for deidentification are anonymous and the  homomorphic encryption scheme is secure.

5.3. Efficiency

Table 7 shows the complexity of our SCDA system. In Table 7, parameters , and follow in Box 2.

tab7
Table 7: Complexity analysis of our SCDA system.
5.4. Experimental Results

To demonstrate the efficiency of our system, we use MATLAB on a computer with an Intel(R) Core(TM) i3-2100 CPU (3.10 GHz) processor and 4 GB of RAM. Table 8 gives our experimental results. We assume that there are hospitals with clinical data each. Each row in Table 8 represents the mean of 15 trials.

tab8
Table 8: Experimental results of our SCDA system.
5.5. Handling Overflows

In our SCDA system, a numerical data is represented as  (referred to as in this part). For handling overflows, we want to restrict , not . That is, is represented in binary string. For example, , , and . In additive operation on the  homomorphic encryption scheme, . Then we can decode which is similar to the decoding method in binary string.

As illustrated in Section 5.1, because and the number of including tuples in a request for an aggregated data should be less than . If we use the above method (i.e., ), then our SCDA system has no overflow problem, because .

5.6. Long-Term Confidentiality

In the area of managing sensitive information, cryptographic long-term confidentiality is absolutely needed [11]. In 1996, Shor showed that the RSA cryptosystem is broken by quantum attacks [12]. And the DLP (Discrete Logarithm Problem) cryptosystem and ECC (Elliptic Curve Cryptography) which are important alternatives to the RSA cryptosystem are also broken by quantum attacks.

In our SCDA system, we use the  homomorphic encryption scheme which is secure if the LWE problem is hard. The LWE problem is hard if the SVP (Shortest Vector Problem) is hard, and the SVP is known to be hard to quantum attacks. Therefore, our SCDA system guarantees long-term confidentiality because all algorithms in our SCDA system are secure against quantum attacks.

6. Conclusion

In this paper, we have proposed how to outsource clinical research data securely and how to control the outsourced data against potential breaches of privacy. We also were able to share accurate statistical patient data. To achieve this, we design the controlled secure aggregation protocol that enables a researcher to get aggregated results from outsourced ciphertexts of distinct researchers. Since our protocol is designed by using the lattice-based  homomorphic encryption, it guarantees long-term security against quantum computing attacks and is very efficient in computational overhead.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgment

This research was partly supported by Basic Science Research Programs through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT & Future Planning (NRF-2012R1A1A3005550, 2013R1A2A2A01068200).

References

  1. “Health Iinsurance Portability and Accountability Act of 1996,” Public Law 104-191, 104th Congress, August 1996.
  2. Z. Lin, A. B. Owen, and R. B. Altman, “Genomic research and human subject privacy,” Science, vol. 305, no. 5681, p. 183, 2004. View at Publisher · View at Google Scholar · View at Scopus
  3. G. Loukides, J. C. Denny, and B. Malin, “The disclosure of diagnosis codes can breach research participants' privacy,” Journal of the American Medical Informatics Association, vol. 17, no. 3, pp. 322–327, 2010. View at Publisher · View at Google Scholar · View at Scopus
  4. K. El Emam, “Methods for the de-identification of electronic health records for genomic research,” Genome Medicine, vol. 3, no. 4, article 25, pp. 1–25, 2011. View at Publisher · View at Google Scholar · View at Scopus
  5. P. Samarati and L. Sweeney, “Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression,” Tech. Rep. SRI-CSL-98-04, SRI Computer Science Laboratory, 1998. View at Google Scholar
  6. P. Samarati, “Protecting respondents' identities in microdata release,” IEEE Transactions on Knowledge and Data Engineering, vol. 13, no. 6, pp. 1010–1027, 2001. View at Publisher · View at Google Scholar · View at Scopus
  7. L. Sweeney, “k-anonymity: a model for protecting privacy,” International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 10, no. 5, pp. 557–570, 2002. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet
  8. A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam, “-diversity: privacy beyond k-anonymity,” ACM Transactions on Knowledge Discovery from Data, vol. 1, no. 1, article 3, pp. 1–3, 2007. View at Publisher · View at Google Scholar · View at Scopus
  9. N. Li, T. Li, and S. Venkatasubramanian, “t-closeness: privacy beyond k-anonymity and -diversity,” in Proceedings of the 23rd International Conference on Data Engineering (ICDE '07), pp. 106–115, April 2007. View at Publisher · View at Google Scholar · View at Scopus
  10. D. R. Karp, S. Carlin, R. Cook-Deegan et al., “Ethical and practical issues associated with aggregating databases,” PLoS Medicine, vol. 5, no. 9, article e190, pp. 1333–1337, 2008. View at Publisher · View at Google Scholar · View at Scopus
  11. J. Buchmann, A. May, and U. Vollmer, “Perspectives for cryptographic long-term security,” Communications of the ACM, vol. 49, no. 9, pp. 50–55, 2006. View at Publisher · View at Google Scholar · View at Scopus
  12. P. W. Shor, “Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer,” SIAM Journal on Computing, vol. 26, no. 5, pp. 1484–1509, 1997. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet
  13. C. Gentry, “Fully homomorphic encryption using ideal lattices,” in Proceedings of the 41st Annual ACM Symposium on Theory of Computing (STOC '09), pp. 169–178, May 2009. View at MathSciNet
  14. C. Gentry, S. Halevi, and V. Vaikuntanathan, “A simple BGN-type cryptosystem from LWE,” in Advances in Cryptology—EUROCRYPT 2010, vol. 6110 of Lecture Notes in Computer Science, pp. 506–522, Springer, Berlin, Germany, 2010. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet
  15. H. Hacıgümüş, B. Iyer, and S. Mehrotra, “Efficient execution of aggregation queries over encrypted relational databases,” in Database Systems for Advanced Applications, vol. 2973 of Lecture Notes in Computer Science, pp. 125–136, 2004. View at Google Scholar
  16. E. Mykletun and G. Tsudik, “Aggregation queries in the databaase-as-a-service model,” in Proceedings of 20th Annual on Data and Applications Security, pp. 89–103, July 2006.
  17. Z. Yang, S. Zhong, and R. N. Wright, “Privacy-preserving queries on encrypted data,” in Proceedings of the 11th European Symposium On Research In Computer Security (ESORICS '06), pp. 479–495, September 2006.
  18. G. Amanatidis, A. Boldyreva, and A. O'Neill, “Provably-secure schemes for basic query support in outsourced databases,” in Data and Applications Security XXI, pp. 14–30, 2007. View at Google Scholar
  19. T. Ge and S. Zdonik, “Answering aggregation queries in a secure system model,” in Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB '07), pp. 519–530, September 2007.
  20. W. K. Wong, D. W. Cheung, B. Kao, and N. Mamoulis, “Secure kNN computation on encrypted databases,” in Proceedings of the ACM International Conference on Management of Data (SIGMOD '09), pp. 139–152, June 2009. View at Publisher · View at Google Scholar · View at Scopus
  21. B. Thompson, S. Haber, W. G. Horne, T. Sander, and D. Yao, “Privacy-preserving computation and verification of aggregate queries on outsourced databases,” in Proceedings of the 9th Privacy Enhancing Technologies Symposium (PETS '09), pp. 185–201, August 2009. View at Publisher · View at Google Scholar
  22. A. D. Molina, M. Salajegheh, and K. Fu, “HICCUPS: health information collaborative collection using privacy and security,” in Proceedings of the 1st ACM Workshop on Security and Privacy in Medical and Home-Care Systems (SPIMACS '09), pp. 21–30, November 2009. View at Publisher · View at Google Scholar · View at Scopus
  23. R. Lu, X. Liang, X. Li, X. Lin, and X. Shen, “EPPA: an efficient and privacy-preserving aggregation scheme for secure smart grid communications,” IEEE Transactions on Parallel and Distributed Systems, vol. 23, no. 9, pp. 1621–1632, 2012. View at Publisher · View at Google Scholar
  24. O. Regev, “On lattices, learning with errors, random linear codes, and cryptography,” Journal of the ACM, vol. 56, no. 6, article 34, pp. 1–40, 2009. View at Publisher · View at Google Scholar · View at MathSciNet
  25. J. Katz and Y. Lindell, Introduction to Modern Cryptography: Principles and Protocols, Chapman & Hall, Boca Raton, Fla, USA, 1st edition, 2007. View at MathSciNet
  26. J. Alwen and C. Peikert, “Generating shorter bases for hard random lattices,” Theory of Computing Systems, vol. 48, no. 3, pp. 535–553, 2011. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet
  27. M. Ajtai, “Generating hard instances of lattice problems (extended abstract),” in Proceedings of the 28th Annual ACM Symposium on the Theory of Computing (STOC '96), pp. 99–108, 1996. View at Publisher · View at Google Scholar · View at MathSciNet
  28. O. Goldreich, S. Goldwasser, and S. Halevi, “Collision-free hashing from lattice problems,” in Studies in Complexity and Cryptography, vol. 6650 of Lecture Notes in Computer Science, pp. 30–39, Springer, Heidelberg, Germany, 2011. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet
  29. O. Goldreich, H. Krawczyk, and M. Luby, “On the existence of pseudorandom generators,” SIAM Journal on Computing, vol. 22, no. 6, pp. 1163–1175, 1993. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet
  30. S. C. Ramanna and P. Sarkar, “On quantifying the resistance of concrete hash functions to generic multicollision attacks,” IEEE Transactions on Information Theory, vol. 57, no. 7, pp. 4798–4816, 2011. View at Publisher · View at Google Scholar · View at MathSciNet
  31. C. Dwork, “Differential privacy,” in Automata, Languages and Programming, vol. 4052 of Lecture Notes in Computer Science, pp. 1–12, 2006. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet