Privacy-Preserving Health Data Collection for Preschool Children
With the development of network technology, more and more data are transmitted over the network and privacy issues have become a research focus. In this paper, we study the privacy in health data collection of preschool children and present a new identity-based encryption protocol for privacy protection. The background of the protocol is as follows. A physical examination for preschool children is needed every year out of consideration for the children's health. After the examination, data are transmitted through the Internet to the education authorities for analysis. In the process of data collection, it is unnecessary for the education authorities to know the identities of the children. Based on this, we designed a privacy-preserving protocol, which delinks the children’s identities from the examination data. Thus, the privacy of the children is preserved during data collection. We present the protocol in detail and prove the correctness of the protocol.
With computers and networks having become an important tool in everyday life, more and more data need to be transmitted through networks. Meanwhile, privacy issues have drawn public attention. How to protect privacy in a network environment has become a research focus in the field of computer network.
Privacy, broadly speaking, refers to private data held by organizations or individuals, which are confidential to others. For individuals, private information such as personal identification, physical condition, and geographical location is all private . The spread of private information will cause a lot of negative consequences, even leading to crimes. Therefore, privacy-preserving technology becomes an important research direction. At present, researches of privacy-protection technology in the network include at least the following areas.
Privacy Protection in Wireless Sensor Networks. Wireless sensor networks have broad application prospects in the fields of environmental monitoring, health care, national defense, and so on. However, in practical applications, wireless sensor networks are facing a serious risk of data disclosure or tampering that will lead to serious consequences [2–5]. For example, in the field of military, data collected by wireless sensor networks often contain important intelligence information which, if disclosed or tampered with, will pose a serious threat or military missteps. The privacy-protecting technology is an indispensable part of wireless sensor networks [6–10].
Privacy Preserving-Data Mining. Data mining is the most important knowledge discovery tool in today’s society. It can reveal the hidden rules behind large amounts of data for people. However, data sources used for data mining also contain a lot of individual privacy, business intelligence or government secrets. In the data mining process, if the data are used arbitrarily without any restraint, personal privacy and confidential information will be disclosed, and thereby people’s daily lives and even social stability will be seriously affected [11–15]. It is a dilemma to pick up potential and valuable knowledge from the massive amounts of data in data mining and in the meantime preserve privacy. The ideal solution is to transform the raw data, and then prevent the direct and indirect access to private information, while the mining algorithms are still able to get from the converted data almost the same information and knowledge as those from the raw data [16–20].
Privacy Studies for Medical and Health Information. In the field of medicine, medical treatments and results must be based on the patients’ privacy. With the application of information technology in the medical field, electronic medical records (EMRs) have become the main carrier of medical information. EMRs are prevailing in medical institutions because of their large storage capacity, saving of resources, convenient query, and good sharing of information, which improve the efficiency of diagnosis and treatment . However, since EMRs contain a lot of patients’ privacy and are easy to copy and spread, privacy protection is significant in the field of medical and health information [22–24].
In addition to the above, as new network applications emerge, some new privacy issues also need to be addressed. For example, in fields such as social network, data publishing, cloud computing, and the Internet of Things, privacy has attracted people’s attention [25–28]. In recent years, location privacy in the mobile network also become a highlight as location-based services develop [29, 30].
In this paper, we study the privacy in the health data collection for preschool children and present an identity-based encryption protocol to protect the identities of the children. The background of the protocol is as follows. A physical examination of preschool children is needed every year out of consideration for the children’s health. After examination, data need to be transmitted through the Internet to the education authorities for analysis. In the process of data collection, it is unnecessary for the education authorities to know the identities of the children. Based on this, we designed a privacy-preserving protocol, which delinks the children’s identities from the examination data. Thus, examination data can be transmitted over the network securely without the disclosure of the children’s identities.
The rest of this paper is organized as follows. In Section 2, we briefly review the related works and discuss their relationship with our work. In Section 3, we describe the preliminary and cryptographic tools we use to build our protocol. In Section 4, we present the design of our protocol and analyze it. Finally, we conclude the paper in Section 5.
2. Related Works
With the application of information technology in the field of medicine and health, the privacy issues also begin to grasp people’s attention. At present, research on EMRs focuses on three areas: privacy protection of raw data, access control of EMRs, and privacy-protecting medical information system [31–33].
Privacy Protection of Raw Data. Privacy protection of raw data refers to the fact that some technologies such as interference or anonymity are adopted to process raw data and form a new data set before the raw data are provided to others. After transformation, the new data set maintains the same distribution characteristics as the raw data, while it no longer contains personal information and therefore achieves the protection of individual privacy. Most of the existing privacy-protecting technologies of the raw data are based on anonymous method. Anonymizing the raw data will inevitably result in loss of information. Therefore, the research work is focused on finding the tradeoff between the availability of data and privacy protection [34–37].
Access Control of EMRs. Using a centralized management of rights, the access control technology is a defensive measure against unauthorized use of data. Its basic objective is to control access rights of users to EMRs or medical information system and thereby ensure that the medical data are used under authorization. Access control is an important measure to protect electronic medical data in information systems, which determines who can access the system and how the data are used. Appropriate access controls can prevent unauthorized users from making accidental or inadvertent access to data; however, the implementation of access control is complex, and the adjustment and management of rights are difficult [38–41].
Privacy-Protecting Medical Information System. In addition to the above privacy-protecting technologies, scholars designed some privacy-protecting medical information systems. In , Jieun Song and Myungae Chung put forward a safe framework of health privacy for environmental service model. The system includes authentication, access control and privacy-protecting service. In , Gardner and Xiong constructed an identity-conversion system to protect the health information of patients. The system uses conditional random fields method to extract identity properties from unstructured data and conduct identity conversion by anonymous method. In , Lin et al. proposed a privacy protecting scheme for electronic health systems and proved by formal reasoning that the scheme is able to protect medical privacy and context information simultaneously. However, till now, there is no perfect system architecture of privacy protection.
On the whole, privacy-protecting technologies in the field of medicine and health have made considerable progress. However, there are still some problems with the existing technologies. For example, security assumptions in the models are too strong to be adopted to the real scenario. In addition, existing privacy-protecting schemes have no universality. Every scheme is only for a specific situation or a specific privacy issue. As far as the privacy in the health data collection for preschool children in this paper is concerned, no existing schemes can be directly adopted. Thus, we design an identity-based encryption protocol for the privacy protection.
The identity-based encryption method is a kind of public key cryptosystems. In a public key cryptosystem, a public key of the other party is needed when users send encrypted messages or verify a digital signature. In order to ensure the legitimacy of the public key, the traditional public key cryptosystem adopts a public key infrastructure, in which a trusted party, called the certification authority (CA), is responsible for authenticating and issuing the corresponding public key certificate of users. The public key certificate binds the identity of a user with its public key. In this kind of system, the CA is responsible for the generation, issuance, storage, maintenance, and withdrawal of public key certificates for users, which requires a significant amount of computing and storage resources.
In 1984, an identity-based encryption (IBE) scheme was presented by Shamir , which simplified the management of public key certificates in traditional public key infrastructures. The IBE method directly adopts a user’s identity information as the public key. The private key is generated by private key generators (PKGs). Therefore, the communicating parties can take each other’s identity as public keys for communication encryption, without the need to get special public key certificates and authenticate the identities. The IBE method no longer needs the support of the CA, avoiding the establishment and management of public key infrastructure in traditional public key cryptography system. In 2000, Sakai, Ohgishi, and Kasahara suggested that bilinear maps on elliptic curves can be used to design the identity-based cryptography scheme. In 2001, Boneh and Franklin realized the first practical IBE scheme using bilinear maps on elliptic curves and proved that the scheme is resistant to chosen-ciphertext attacks in the random oracle model . Since then, the bilinear maps on elliptic curves have gradually become the main tool of identity-based cryptography scheme.
In applications, the IBE scheme is typically composed of four algorithms .(a)Setup. Select a security parameter , and get system parameters (params) and the master key. Params include a limited message space and a limited ciphertext space , which are open. The master key is private to PKG. (b)Extract. Input params, the master key, and , and get the private key . ID is an arbitrary sequence as a public key; is the corresponding private key. The Extract algorithm is used to extract private keys from given public keys. (c)Encrypt. Input params, ID, and , and get the ciphertext .(d)Decrypt. Input params, , and the private key , and obtain .
The above algorithms must be consistent. That is to say, for a given ID, when the private key is extracted by the Extract algorithm, there is Decrypt (params, , ) = , where and = Encrypt (params, , ).
Based on bilinear maps on elliptic curves, we design our IBE scheme in this paper which slightly differs from the Boneh-Franklin cryptosystem but is equivalent in terms of security. It consists of four algorithms as follows.
Initialization. Let be a security parameter and be a -bit prime. Suppose and are two cyclic groups of prime order and is an admissible bilinear map with generator of group . (See  for the definition of admissible bilinear maps). Assume that identities are -bit strings (where is polynomial in ). Consider a cryptographic hash function . The public key generator (PKG) chooses uniformly at random and computes . Here is the master private key, while all other parameters mentioned above are public.
Private Key Generation. For an identity ID, the private key is and the public key is .
Encryption. To encrypt under identity ID, one can compute , where is picked uniformly at random.
Decryption. Let be a valid ciphertext under identity . Then, can be decrypted as follows:
The protocol is homomorphic. That is to say, the following equation is satisfied: where the product of the two ciphertexts is defined as taking the product of each component of the ciphertexts.
4. Protocol Design and Analysis
Suppose is the identity of the preschool child and is the identity of a volunteer helper (who could be one of the preschool children volunteering to contribute his computational resources). We assume the administrator does not collude with the volunteer helper (if there is a risk of collusion, we can extend this protocol by adding more helpers, which is straightforward).
Let be the total number of preschool children. Assume that, before the health data are transmitted, each preschool child has been assigned a unique number , such that no two preschool children have the same number, that is, for any , .
The protocol includes two phases: a health data submission phase and a decryption phase.
In the th round of the health data submission phase, each preschool child first compares his own number with . If , then he submits. where is his health data and is picked uniformly at random. If , then he submits where is also picked uniformly at random.
Upon receiving the encryptions in the th round of the health data submission phase, the administrator computes
In the decryption phase, the administrator first forwards all to the helper, who computes and returns it to the administrator. Suppose . Then, the administrator computes
Theorem 1. The protocol is correct; that is, assuming all involved parties follow the protocol, then is a permutation of .
Proof. It is easy to see that, assuming all involved parties follow the protocol, Let be the value of such that . Hence, Since is a permutation on (), the result is a permutation of ().
With the development of network technology, more and more data need to be transmitted over the network. Related privacy issues also become a hot research topic. We studied the privacy issue of health data transmitted over the network. For the sake of children’s health, a physical examination of preschool children is needed every year. The data need to be transmitted to the education authorities over the Internet for health analysis after examination. In the process of data collection, it is unnecessary for the education authorities to know the identities of the children. Therefore, we designed a privacy-preserving protocol for health data transmission, which delinks the children’s identities from the examination data. The protocol is composed of three algorithms: Setup, Encrypt, and Decrypt. At last, we proved the correctness of the protocol.
R. Agrawal and R. Srikant, “Privacy-preserving data mining,” SIGMOD Record (ACM Special Interest Group on Management of Data), vol. 29, no. 2, pp. 439–450, 2000.View at: Google Scholar
C.-Y. Chow, M. F. Mokbel, and T. He, “A privacy-preserving location monitoring system for wireless sensor networks,” IEEE Transactions on Mobile Computing, vol. 10, pp. 94–107, 2011.View at: Google Scholar
J. Jose, M. Princy, and J. Jose, “Integrity protecting and privacy preserving data aggregation protocols in wireless sensor networks: a survey,” International Journal of Computer Network and Information Security, vol. 5, article 66, 2013.View at: Google Scholar
D. Bogdanov, R. Jagomägis, and S. Laur, A Universal Toolkit for Cryptographically Secure Privacy-Preserving Data Mining, Intelligence and Security Informatics: Springer, 2012.
Y. Li, M. Chen, Q. Li, and W. Zhang, “Enabling multilevel trust in privacy preserving data mining,” IEEE Transactions on Knowledge and Data Engineering, vol. 24, pp. 1598–1612, 2012.View at: Google Scholar
J. Vaidya, Y. M. Zhu, and C. W. Clifton, Privacy Preserving Data Mining, Springer, 2006.
C. C. Aggarwal and S. Y. Philip, A General Survey of Privacy-Preserving Data Mining Models and Algorithms, Springer, 2008.
L. Chen, J. J. Yang, and Q. Wang, “Privacy-preserving data publishing for free text Chinese electronic medical records,” in Proceedings of the IEEE 36th Annual Computer Software and Applications Conference (COMPSAC '12), pp. 567–572, 2012.View at: Google Scholar
I. V. Goldberg, “Electronic medical records and patient privacy,” The Health Care Manager, vol. 18, no. 3, pp. 63–69, 2000.View at: Google Scholar
Z. Hao, S. Zhong, and N. Yu, “A time-bound ticket-based mutual authentication scheme for cloud computing,” International Journal of Computers, Communications and Control, vol. 6, no. 2, pp. 227–235, 2011.View at: Google Scholar
V. Oleshchuk, “Internet of things and privacy preserving technologies,” in Proceedings of the 1st International Conference on Wireless Communication, Vehicular Technology, Information Theory and Aerospace and Electronic Systems Technology, Wireless (VITAE '09), pp. 336–340, May 2009.View at: Publisher Site | Google Scholar
B. Zhou, J. Pei, and W. Luk, “brief survey on anonymization techniques for privacy preserving publishing of social network data,” ACM SIGKDD Explorations Newsletter, vol. 10, pp. 12–22, 2008.View at: Google Scholar
E. Magkos, “Cryptographic approaches for privacy preservation in location-based services: a survey,” International Journal of Information Technologies and Systems Approach, vol. 4, pp. 48–69, 2011.View at: Google Scholar
S. A. Buckovich, H. E. Rippen, and M. J. Rozen, “Driving toward guiding principles: a goal for privacy, confidentiality, and security of health information,” Journal of the American Medical Informatics Association, vol. 6, no. 2, pp. 122–133, 1999.View at: Google Scholar
L. J. Damschroder, J. L. Pritts, M. A. Neblo, R. J. Kalarickal, J. W. Creswell, and R. A. Hayward, “Patients, privacy and trust: Patients' willingness to allow researchers to access their medical records,” Social Science and Medicine, vol. 64, no. 1, pp. 223–235, 2007.View at: Publisher Site | Google Scholar
S. Zhong, Z. Yang, and R. N. Wright, “Privacy-enhancing k-anonymization of customer data,” in Proceedings of the 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS '05), pp. 139–147, ACM, June 2005.View at: Google Scholar
G. Loukides, A. Gkoulalas-Divanis, and B. Malin, “Anonymization of electronic medical records for validating genome-wide association studies,” Proceedings of the National Academy of Sciences of the United States of America, vol. 107, no. 17, pp. 7898–7903, 2010.View at: Publisher Site | Google Scholar
T.-S. Chen, C.-H. Liu, T.-L. Chen, C.-S. Chen, J.-G. Bau, and T.-C. Lin, “Secure Dynamic access control scheme of PHR in cloud computing,” Journal of Medical Systems, vol. 36, pp. 4005–4020, 2012.View at: Google Scholar
L. D. Martino, Q. Ni, D. Lin, and E. Bertino, “Multi-domain and privacy-aware role based access control in eHealth,” in Proceedings of the 2nd International Conference on Pervasive Computing Technologies for Healthcare 2008, PervasiveHealth, pp. 131–134, February 2008.View at: Publisher Site | Google Scholar
A. Shamir, Identity-Based Cryptosystems and Signature Schemes. Advances in Cryptology, Springer, 1985.
D. Boneh and M. Franklin, Identity-Based Encryption from the Weil Pairing. Advances in Cryptology-CRYPTO 2001, Springer, 2001.
S. Chatterjee and P. Sarkar, Identity-Based Encryption, Springer, 2011.