Research Article | Open Access
Zhaowen Lin, Xinglin Xiao, Yi Sun, Yudong Zhang, Yan Ma, "A Privacy-Preserving Intelligent Medical Diagnosis System Based on Oblivious Keyword Search", Mathematical Problems in Engineering, vol. 2017, Article ID 8632183, 7 pages, 2017. https://doi.org/10.1155/2017/8632183
A Privacy-Preserving Intelligent Medical Diagnosis System Based on Oblivious Keyword Search
One of the concerns people have is how to get the diagnosis online without privacy being jeopardized. In this paper, we propose a privacy-preserving intelligent medical diagnosis system (IMDS), which can efficiently solve the problem. In IMDS, users submit their health examination parameters to the server in a protected form; this submitting process is based on Paillier cryptosystem and will not reveal any information about their data. And then the server retrieves the most likely disease (or multiple diseases) from the database and returns it to the users. In the above search process, we use the oblivious keyword search (OKS) as a basic framework, which makes the server maintain the computational ability but cannot learn any personal information over the data of users. Besides, this paper also provides a preprocessing method for data stored in the server, to make our protocol more efficient.
Due to the increasing health consciousness, the intelligent medical diagnosis system has received immense popularity around the world. Nowadays, patients can get personalized online medical services based on the original health data submitted by themselves, without the need for doctors. That is, patients can obtain a reasonable diagnosis anytime and anywhere, as long as they have health data and are willing to transfer them to the server. People are getting more and more accustomed to convenience that is made possible by the IMDS. Moreover, this kind of intelligent medical service is inexpensive, which means, via popularizing IMDS, we can reduce the expenditure of public medical services. All the above shows that IMDS has a bright future and will be an essential part in the future life [1–3].
However, the privacy disclosure issue has become a big obstacle for developing IMDS. Patients are afraid that their sensitive personal information may fall into wrong hands. In most cases, poor security mechanisms and weak safety awareness are main reasons for the information leakage. For example, in 2009, the AvMed Health Plans, a large nonprofit US health plans org, exposed the personal information of 200,000 subscribers and their dependents, as a result of the theft of two company laptops that contain sensitive information . Types of divulged personal information include names, addresses, phone numbers, social security numbers, and protected health information. For this reason, designing a privacy-preserving IMDS, which can ensure the privacy of patients, is an urgent task. Besides, the security of the server also cannot be ignored. Currently, on this respect, various schemes have been proposed [5–7]. One example is ; it designs a privacy-preserving recommendation system using homomorphic encryption. This system allows patients to rate physicians based on their satisfaction, so that other patients can choose a popular physician via these ratings. Another paper that also describes a physicians recommendation system based on hybrid matrix factorization is raised by . This paper applies text sentiment analysis to analyze patient comments that increase accuracy in grading physicians.
In this paper, combining the additive homomorphic cryptosystem and the oblivious keyword search, we propose a privacy-preserving IMDS, and the security of data stored in server can also be ensured. Patients can get an effective and reasonable diagnosis in our system, after uploading their health examination data to the server. The diagnosis will tell the patient which parameters are not in the normal range, and which diseases he is likely to get. Moreover, our system contains a preprocessing phase to reduce the calculation amount of search.
The rest of the paper is organized as follows. Section 2 gives a brief overview of related works. In Section 3, we show the framework and notions. Section 4 provides the detailed description of our protocol. Section 5 analyzes the security of this system. In the end, Section 6 discusses the performance of this system and concludes the paper.
2.1. Paillier Cryptosystem
In 1999, Paillier proposed a homomorphic public key cryptosystem called Paillier cryptosystem, which is a common encryption scheme used for data protection . The homomorphic property of this cryptosystem means that people can get the sum of two plaintexts, according to decrypting the product of their corresponding ciphertexts. Besides, Paillier cryptosystem is a semantically secure cryptosystem, which means no information about the plaintext can be obtained from the according ciphertext. Therefore, calculations over ciphertexts in Paillier cryptosystem will not reveal any extra information. The Paillier cryptosystem is briefly described as follows.
Let be a message, and are two large prime numbers, , is a random integer, and is a random number. The encryption of is defined asThe decryption of is defined aswhere , , and .
2.2. Oblivious Keyword Search (OKS)
Oblivious keyword search (OKS) protocol was proposed by Wakaha Ogata and Kaoru Kurosawa in 2002; it is a secure keyword search scheme between two parties . In an OKS protocol, there is a server that maintains some secret data and a user that is allowed to search for the data containing the keyword chosen by the user; this chosen keyword is secret to . Next, we introduce an efficient -out-of- OKS protocol based on RSA blind signature ( protocol) as follows.
In the protocol, stores data . Definewhere is the set of keywords, , and is the corresponding content.
2.2.1. Submit Phase
generates a public key and a secret key of RSA and then for computes where is a hashing function and is a pseudorandom generator. Then send to .
2.2.2. Transfer Phase
At each transfer round , firstly, choose a keyword and a random element to compute and then send to . Secondly, computes and sends it to . computes . Finally, let ; is the set that stores the search results. For , computesIf , then add to .
2.3. Useful Tools
We introduce here two generic subprotocols from literatures [8, 11–13], which will be employed in our protocol, and all these protocols can be easily implemented by using the Scalar Product Protocol; it is a standard protocol mentioned in . Moreover, let denote an encrypted value of . The subprotocols are described as follows:(1)Bits: return , which is an encryption of least significant bits of the plaintext of ; that is, ;(2)Min: return an encrypted bit , where iff .
3. The Framework
This section we develop the framework of our privacy-preserving IMDS. At the beginning, we describe two entities interacting with each other: the user and the server .
In the field of , each measures private health examination parameters of his own body in advance, such as blood glucose, vital capacity, and vitamin content. uploads those health data to for getting health service. We denote parameters that measured as and denote the corresponding measured value of as .
In the aspect of , it stores a wide range of diseases and generally more health examination parameters than , . Besides, we define as the parameter vector of , which represents the relationship between the disease and all kinds of parameters stored in . Suppose the th bit of is 1, which means when disease occurs, the measured value of parameter will be smaller than the normal range. Similarly, 2 means will be larger than the normal range, and 0 means will in the normal range. Obviously, has bits. Thus, each has the upper limit and the lower limit to represent the normal range. In , the above-mentioned and are represented by . Define
Next we introduce our framework. It contains three phases: submit phase, preprocessing phase, and search phase. The submit phase builds on the Paillier cryptosystem. firstly uploads his health examination data to in the form of encryption, using a secret key generated by the Paillier cryptosystem. At the meantime, directly tells which parameters he has uploaded, namely, . Then, compares all with and to get the parameter vector , which represents whether each health examination parameter of is in the normal range. The generation procedure of is depicted in Figure 1. In addition, the comparison is completed by functions Bits and Min mentioned in Section 2. In this comparison, has no knowledge of the uploaded data but maintains the computational ability. Finally, returns to in encrypted form.
To facilitate the following search operation and make it faster, we propose a transform method in the preprocessing phase. uses this method to split and reshape all into the keywords set WP, and reorganizes to the new data structures , which are pairs of keyword and disease. After that, to prepare the oblivious keyword search (OKS) used in the search phase, calculates and sends them to . can be understood as the encrypted form of .
In the search phase, we utilize the oblivious keyword search (OKS) to realize privacy-preserving search. Firstly, constructs a set of keywords on the basis of and successively calculates for each keyword and sends them to . We can also understand as the encrypted form of a keyword. Then, computes using and returns it to . Next, retrieves keywords constructed by in the via (6) and thus gets their corresponding diseases. Finally, according to the search results, finds the disease with the highest frequency as the diagnosis for . The whole structure of our IMDS is depicted in Figure 2.
4. Our Protocol
In this section, we detail the implementation of our protocol.
4.1. Submit Phase
The submit process is established on the Paillier cryptosystem; it is an additive homomorphic cryptosystem. First, generates a secret key and uses the public key encryption to distribute to . Then, for all , sends the to using Bits. Similarly, executes Bits to get and . Next, Min can help to compare with and . The comparison results will be transferred to after being encrypted. According to these results, constructs the parameter vector . We define the parameter vector of the user as follows:
The algorithm is shown in Algorithm 1.
To make following search process faster, data stored in need to be reorganized into a new structure. Let be the set of keywords. Definewhere represents the mark number of . That is, a keyword is obtained by concatenating the mark number of a parameter with its values in . We should note that only those abnormal parameters will be selected to create keywords. Now we define the new data structure aswhere and is one of the diseases whose parameter vector contains . The specific transformation method is described in Algorithm 2.
Next, generates a public key and a secret key of RSA and then only publishes . With the secret key and the hash value of each , computes and . Finally, outputs to , where . In addition, let be a security parameter, be a pseudorandom generator, and be a hash function. The computational process is also described in Algorithm 2.
4.3. Transfer Phase
In the search phase, successively finds out abnormal parameters by judging whether their values in equal 0 and constructs keywords for them. For example, if the th bit of equals 0, then is a normal parameter for ; otherwise we say is an abnormal parameter. At each transfer round , after choosing an abnormal parameter, creates a keyword by concatenating the mark number of chosen parameter with its values in . Then, calculates and sends it to , which is the encryption of . returns the search result . According to decrypting , retrieves keywords constructed by in the via (6) and thus gets their corresponding diseases. We maintain these diseases in a list as suspected diseases at each transfer round and denote this list as . When all abnormal parameters are traversed, we compute the frequency of occurrence of diseases in , and output a diseases list that consists of diseases with the highest frequency as the final search result. The algorithm is described in Algorithm 3.
5. Safety Analysis
We present here the analysis of the security of our system.
Lemma 1. The problem of computing th residue classes is believed to be computationally difficult.
Lemma 2. RSA known target inversion problem (RSA-KTI) is hard.
Firstly, we discuss the security of . In the submit phase, , the health examination data of the patient are these data that the attacker wants to steel. However, the only message that may get is ciphertext , which is encrypted by the private key of . The Paillier cryptosystem produces this and it is the basis of this phase. If wants to get the plaintext without knowing private key, his task is, given a composite and an integer , deciding whether is -residue modulo or not. By Lemma 1, we know that to complete this task is hard. Therefore, can not get the plaintext . In the search phase, ’s important and private data is , but has no information on because they are blinded in the RSA blind signature scheme.
Next, we prove the security of , assuming attacker is allowed to make at most queries to . At first, behaves as if it were . generates and sends to . After that, randomly chooses and sends them to . From Lemma 2, we could know it is hard for to get the plaintext.
In this paper we propose a privacy-preserving intelligent medical diagnosis system and also discuss how privacy-preserving protocols can be used for protecting sensitive patient data in medical scenarios. This system applies two security protocols, Paillier cryptosystem and oblivious keyword search (OKS), to medical diagnosis, and it can be put into practice. Besides, our system also has following advantages:(1)Previously mentioned information security requirements are achieved. That is, privacy of patient data and security of server get properly protected. Thus, server is blind to the personal information of patients; patients also know nothing about the data maintained in server.(2)Our system reduces the calculation amount of search by adding the preprocessing phase. This phase can link each keyword with the corresponding disease name . Hence, we can directly get the after is retrieved instead of looking up for in the database. In the search phase, system only needs to retrieve submitted parameters instead of every parameter in the database, which also makes search phase more efficient.(3)System is able to support multiple possible diseases instead of a single result for patients. When the submitted data are not enough to determine only one disease, system will show patients several results for reference.
Therefore, our privacy-preserving intelligent medical diagnosis system is capable of providing efficient and reasonable diagnosis for patients. We view this work as a start of our follow-up work. There are still a lot of work to be done. In the future, we will focus on the research for multiuser and multiserver IMDS. It is necessary for developing a system that can simultaneously and securely provide service for users.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
This work is supported by the National High Technology Research and Development Program of China (863 Program) (Grant no. 2013AA014702), the Fundamental Research Funds for the Central Universities (BUPT2016RC48, Grant 2014ZD03-03), and the National Natural Science Foundation of China (Grant no. 61601041).
- M. H. Tekieh and B. Raahemi, “Importance of data mining in healthcare: a survey,” in Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM '15), pp. 1057–1062, Paris, France, August 2015.
- L. Xu, C. Jiang, J. Wang, J. Yuan, and Y. Ren, “Information security in big data: privacy and data mining,” IEEE Access, vol. 2, pp. 1151–1178, 2014.
- V. S. Verykios, E. Bertino, I. N. Fovino, L. P. Provenza, Y. Saygin, and Y. Theodoridis, “State-of-the-art in privacy preserving data mining,” SIGMOD Record, vol. 33, no. 1, pp. 50–57, 2004.
- A. Moscaritolo, “Sensitive laptops stolen from Fla. Health insurance provider,” http://www.scmagazine.com//sensitive-laptops-stolen-from-fla-health-insurance-provider/article/163618/.
- M. Adjedj, J. Bringer, H. Chabanne, and B. Kindarji, “Biometric identification over encrypted data made feasible,” in International Conference on Information Systems Security, Lecture Notes in Computer Science, pp. 86–100, 2009.
- S. Katzenbeisser and M. Petkovic, “Privacy-preserving recommendation systems for consumer healthcare services,” in Proceedings of the 3rd International Conference on Availability, Security, and Reliability (ARES '08), pp. 889–895, Barcelona, Spain, March 2008.
- I. Song and N. V. Marsh, “Anonymous indexing of health conditions for a similarity measure,” IEEE Transactions on Information Technology in Biomedicine, vol. 16, no. 4, pp. 737–744, 2012.
- T. R. Hoens, M. Blanton, A. Steele, and N. V. Chawla, “Reliable medical recommendation systems with patient privacy,” ACM Transactions on Intelligent Systems and Technology, vol. 4, no. 4, article 67, 2013.
- Y. Zhang, M. Chen, D. Huang, D. Wu, and Y. Li, “IDoctor: personalized and professionalized medical recommendations based on hybrid matrix factorization,” Future Generation Computer Systems, vol. 66, pp. 30–35, 2017.
- P. Paillier, “Public-key cryptosystems based on composite degree residuosity classes,” in EUROCRYPT99, 238, p. 223, Springer, Berlin, Germany, 1999.
- B. Schoenmakers and P. Tuyls, “Efficient binary conversion for paillier encrypted values,” in Proceedings of the Advances in Cryptology - EUROCRYPT, 2006, International Conference on the Theory and Applications of Cryptographic Techniques, pp. 522–537, St. Petersburg, Russia, 2006.
- P. Bunn and R. Ostrovsky, “Secure two-party k-means clustering,” in Proceedings of the 14th ACM Conference on Computer and Communications Security (CCS '07), pp. 486–497, Alexandria, Va, USA, November 2007.
- T. R. Hoens, M. Blanton, and N. V. Chawla, “A private and reliable recommendation system for social networks,” in Proceedings of the 2nd IEEE International Conference on Social Computing (SocialCom '10), pp. 816–825, Minneapolis, Minn, USA, August 2010.
Copyright © 2017 Zhaowen Lin et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.