Abstract

As COVID-19 continues to spread around the world, the healthcare industry has accelerated the transformation to digital healthcare services. In the era of big data, many hospitals prefer to use remote cloud servers to store and manage massive electronic medical data. However, cloud-assisted medical data systems cannot guarantee the confidentiality, integrity, and availability of data. Searchable encryption can effectively address the above challenges by enabling data search on the ciphertext, which achieves the availability of medical data while ensuring data security and privacy. However, the search server may return mismatched search results due to economic interests or single points of failure. Blockchain is a decentralized computing paradigm with public verifiability, which provides an efficient solution to this problem. However, the existing blockchain-based searchable encryption solutions do not consider the flexible search function of multiple users and the restriction of encrypted data for medical scenarios. Therefore, we propose a blockchain-based multiuser normalized searchable encryption (BNSE) scheme and design a blockchain-based normalized searchable encryption system for medical data (BNSEM) based on the scheme. To verify the practicality of the system, we evaluate the performance from both theoretical and experimental aspects.

1. Introduction

With the rapid spread of COVID-19 around the world, the healthcare industry has accelerated the shift to digital healthcare services [1, 2]. In the era of big data, many hospitals prefer to use remote cloud servers to store and manage huge amounts of electronic medical data. However, due to the inherent properties of the cloud such as centralization and openness, cloud-assisted medical data systems will face new privacy and security challenges [3]. Firstly, medical data outsourced to cloud servers may be accessed or tampered with by unauthorized users. The confidentiality and integrity of the medical data cannot be guaranteed. Secondly, centralized cloud servers may suffer from a single point of failure, which will result in the unavailability of medical data. Although the traditional encryption technology can ensure security, it is difficult to take into account the availability of outsourced medical data.

Searchable encryption (SE) is a critical cryptographic technique to achieve the availability of data while ensuring security and privacy, which enables users to search ciphertext data [4]. In a searchable encryption scheme, the data owner uploads the encrypted data to the cloud server. Then, the user needs to construct a trapdoor and submit it to the cloud server to search for data containing the target keywords. In most cases, the server is regarded as a semihonest third-party entity [5, 6]; i.e., the server will perform the search operation correctly according to the protocol. However, in practical scenarios, the server may return mismatched search results due to economic interest or single point of failure.

Blockchain is a decentralized computing paradigm with public verifiability and tamper-proof features [7]. Applying blockchain to searchable encryption can effectively solve the problem of untrustworthy search results from the centralized servers [810]. Smart contracts deployed in the blockchain can perform search functions instead of third-party servers and automatically execute search protocols based on trigger conditions to produce correct results. In addition, blockchain nodes record transaction results in an immutable ledger, which guarantees the integrity of the results and eliminates the need for further validation of the results. Even if one or more nodes fail or are corrupted by malicious adversaries, the correctness of the results will not be affected due to the fault tolerance of the blockchain.

In the blockchain-based searchable system, data and search structure are stored in a ciphertext state, and thus, the legality of the data in the system cannot be guaranteed. In the medical scenario, the necessary supervision of encrypted medical data is needed to ensure the legality of medical data. For example, supervisors need to filter the search requests that contain illegal keywords to prevent the spread of false medical information. In addition, supervisors are supposed to check the legality of ciphertext data in the remote cloud when they suspect illegal data or when a user files a complaint with the supervisors. The controllability of medical data is key to maintaining a stable healthcare system, yet there is a lack of research related to the supervision of ciphertext data.

1.1. Motivation and Contributions

Inspired by the work [11], a blockchain-based searchable public key encryption with forward and backward privacy can be used to design a searchable encryption system for medical data. In medical scenarios, multiuser search functions and dynamic updates for authorized users need to be supported with data sharing requirements. In addition, to ensure the stability of the medical searchable encryption system, the ciphertext data in the system need to be legally supervised. As for the blockchain platform, the consortium chain hyperledger fabric is a good option considering the practicality and privacy. Based on the above analysis and practical application scenarios of the medical data, we propose a blockchain-based multiuser searchable encryption scheme supporting supervision and design a searchable encryption system for medical data based on the scheme. The main contributions of this study are as follows:(i)We propose a blockchain-based multiuser normalized searchable encryption (BNSE) scheme, which achieves efficient retrieval of ciphertext data in multiuser scenarios and supports the supervision on the ciphertext data(ii)We design a blockchain-based normalized searchable encryption system for medical data (BNSEM) based on the above scheme, which realizes the application of retrieval on the encrypted medical data(iii)We evaluate the theoretical performance of the scheme and test the practical performance of the system to verify the availability

1.2. Organization

In Section 2, we review the existing research work related to the security and functionality of searchable encryption. In Section 3, we introduce the blockchain technology and broadcast encryption, as well as security definition. In Section 4, we describe the specific construction of our proposed scheme BNSE and prove its security. In Section 5, we present the design of our system BNSEM. In Section 6, we provide the security analysis of BNSEM. In Section 7, we conduct a performance evaluation of the system. Section 8 makes a conclusion of this study.

In 2000, Song et al. [4] proposed the first SE scheme, which is a noninteractive single-keyword search scheme. The drawback of the scheme is that it is extremely inefficient when the number of documents is large. However, this pioneering work still greatly contributed to the research and development of searchable encryption. Later, many works [12, 13] focus on designing efficient security mechanisms to enhance the security. Meanwhile, some works also introduce searchable encryption schemes for functional extensions, including the multiuser SE scheme [14] and the dynamic SE scheme [15].

To balance security and efficiency, a practical SE scheme will leak some information to the adversary. However, the information leakage attack undermines the security of SE schemes [16]. Adaptive leakage exploit attacks have brought more attention to forward privacy [17]. Song et al. [12] proposed two schemes FAST and FASTIO, both of which have forward privacy. In addition, Bost et al. [6] presented a formal definition of backward privacy, and backward privacy gradually became a major security property of interest. Chamani et al. [18] proposed improvements in various aspects of performance to the work [6].

To avoid the problem of key management and distribution restrictions prevalent in symmetric searchable encryption (SSE) schemes, Boneh et al. [19] proposed the first searchable public key encryption (SPE) scheme, which is a noninteractive single-user search scheme. However, a significant limitation of SPE is that it contains a large number of time-consuming operations, such as bilinear pairs and exponential operations. In 2020, Chen et al. [11] proposed a lightweight SPE scheme with search performance close to the efficient SSE. However, the scheme does not implement multiuser search and cannot share data in multiuser scenarios.

From a functional point of view, most of the current research efforts focus on symmetric searchable encryption schemes that support only single-user search mode; i.e., the data user is the data owner. The few SSE schemes that support multiuser search also require the owner to calculate a search trapdoor [20] online. Multiuser searchable encryption (MUSE) [21] is a significant research content of SE with practical research significance. In MUSE, a data owner uploads data to a cloud server and wants to share the data with multiple users. Attribute-based searchable encryption (ABSE) [22] can manage the retrieval of ciphertext data in a multiuser scenario, but it is computationally inefficient and lacks practicality.

Broadcast encryption (BE) [23] enables multiuser data sharing and is suitable for scenarios where data users are relatively fixed. Liu et al. [24] designed a multiuser searchable encryption scheme based on a single-user system prototype and inherited the functionality of adding, modifying, and deleting documents from the original dynamic scheme. However, scheme [24] requires online search trapdoor generation and multiple rounds of client-server interaction, which increases the communication overhead. Later, Liu et al. [25] combined public key authenticated encryption supporting keyword search with broadcast encryption BE and proposed a broadcast authenticated encryption primitive BAEKS supporting keyword search, while the scheme reaches a performance bottleneck when the number of users increases to a certain number.

In most existing schemes, the search server is regarded as an honest third party that performs the prescribed search protocol [26]. However, the search server may be a malicious third party that returns partial or even mismatched search results due to profit or random failures. The main reason for these problems is that centralized servers have complete control over the data and execute the protocols independently without supervision. In view of this, blockchain technology [7], a decentralized computing paradigm with public verifiability and invariance characteristics, combined with searchable encryption [14] can effectively solve the problem of untrustworthy third-party search results.

There are two ways to combine blockchain with searchable encryption, one of which is to use the blockchain for storing credentials and the other is to use the blockchain’s smart contract to perform the search function. The first approach still follows the traditional server-side search by storing the transaction credentials on the blockchain [27]. Cai et al. [8] designed a dynamic and efficient searchable encryption scheme using blockchain. Tang [28] extends searchable encryption by saving essential messages on the blockchain and the scheme performs only a small number of operations on the blockchain, thus reducing the burden on the blockchain. When there are disputes and controversies, the misconduct of participants can be revealed through transactions on the blockchain. However, using the blockchain to store credentials still does not prevent the malicious behavior of servers.

Therefore, researchers have also proposed an alternative construction method to design smart contracts that include search functions instead of cloud servers to perform keyword search operation [14]. Chen et al. [29] used electronic medical record EHR file indexes to construct complex logical structures and store them on a blockchain so that data users can search the file indexes using these logical expressions. Hu et al. [20] enabled users to search private databases in a blockchain environment and implement dynamic access control for searches. However, all of the above schemes outsource complex operations or encrypted data to the blockchain, which greatly degrades the performance of the system. Chen et al. [11] designed a blockchain-based searchable public key encryption scheme with only lightweight hash operations.

3. Preliminaries

In this section, we introduce the blockchain technology, broadcast encryption, system model, security definition, and design goals.

3.1. Blockchain Technology

In 2008, blockchain technology received widespread attention following the publication of the Bitcoin white paper [7]. Blockchain provides a distributed, immutable, secure, transparent, and auditable ledger. The blocks in a blockchain store transactions at a specific time, and their hash values are recorded by a Merkle tree. The transaction data on the blockchain are shared in a P2P (peer-to-peer) network, and the security of the transaction data is ensured by cryptographic primitives (Merkle tree, asymmetric encryption, and digital signatures).

Since blockchain operates on a P2P network, a P2P network including a number of blockchain nodes (peer nodes, orderer nodes, etc.) needs to be created before deploying a blockchain platform. Each node provides two keys that can be used for encryption and signature. When a transaction is initiated, one node signs the transaction and broadcasts it to other peer nodes. When another node receives the signed transaction, it needs to verify the validity of the transaction before broadcasting it. The peer nodes (also known as miners) collect enough signatures of this transaction to pack it into a block and store it on the blockchain after passing consensus.

Smart contract: a smart contract contains a set of rules and logic, which is a decentralized, information-sharable program code deployed on the blockchain. The parties involve in signing a contract agree on the content of the smart contract and deploy it on the blockchain, which can automate the execution of the contract without relying on any third authority [30]. Smart contracts run automatically once started without the intervention of any contract signatory.

3.2. Broadcast Encryption

A public key broadcast encryption scheme consists of four algorithms, namely system setup , key generation , encryption , and decryption , defined as follows:(i): with the security parameter as input, the maximum capacity of the broadcast receiver group and the initial encryption key list are output.(ii): with the security parameter and the encryption key list as input, the user’s public-private key pair is output and the public key is added to the key list .(iii): the algorithm takes a subset of users , encryption key list , and a plaintext message to be broadcast as input. The public keys corresponding to the users in the subset from the encryption key list are selected. The broadcast ciphertext of the message under encryption using the key set is output. Note that the broadcast ciphertext can only be correctly decrypted by the receiver in .(iv): taking user’s private key and broadcast ciphertext as input, if user who has the private key , then the user can use his private key to decrypt the broadcast ciphertext and output the broadcast message .

3.3. System Model

The system model of our BMNSE scheme is shown in Figure 1. It consists of six entities: trusted institute (TI), cloud server (CS), blockchain (BC), data owner (DO), data user (DU), and supervisor (SUP). Although the TI and SUP are not involved in the main process of data search, they are still two indispensable entities that play an important role in the execution of the scheme and the maintenance of the ecosystem. Before running scheme, TI first generates the parameters required for system initialization and issues public key certificates for users who join the system, and TI is offline the rest of the time.

After the initialization is completed, the program needs to perform five main steps, which are described as follows:(1)Encrypt File. The DO first encrypts the data file using a symmetric encryption algorithm and then encrypts the symmetric key using a public key cryptography algorithm. Finally, DO uploads the ciphertext to CS.(2)Generate Searchable Encrypted Data Structures. The DO extracts keyword-index pairs from files and generates searchable encrypted data structures. Then, DO uploads the structures to BC.(3)Search for the Files that Contain the Target Keyword. DU generates a search trapdoor containing the target keyword and then sends the search request containing the trapdoor to a nearby blockchain node. The search request triggers the search process of the smart contract, which then returns the index of all matching encrypted files.(4)Access the Files. The DU first decrypts the encrypted file index returned by the smart contract in step 3 and then accesses the data in the CS after getting the plaintext index.(5)Return the Encrypted Data. Based on the file indexes submitted by DU, CS returns the corresponding files.

To ensure the legitimacy of the transactional data in the program, the necessary supervision of data cryptography by SUP is required. SUP has two main tasks: first, carrying out periodic audits of cryptographic data stored on CS, and second, scrutinizing the search requests of DU. The purpose of cryptographic data audit is to detect data files that contain illegal or sensitive keywords, timely revoke illegal files hosted on CS, and alert, warn, or punish the corresponding DU. The purpose of scrutinizing search requests is to monitor keyword search requests sent by DOs to the BC in real time and to intercept and warn the noncompliant search requests.

Based on the above system model, the following eight algorithms are defined in our scheme:(i)Setup : it is executed by TI and takes the security parameter as input and the system public parameter as output.(ii)KeyGen : it is executed by TI and takes the public parameter as input and outputs the user’s public-private key pair .(iii)Encrypt : this algorithm is executed by DO. The input parameters contain the system public parameters , the public keys of the authorized DUs, the database , and an empty mapping . The algorithm outputs the searchable encrypted database and the initialized mapping .(iv)Update : this algorithm is executed by DO with input parameters including system parameter , public key set of the users to be authorized, original broadcast cipher , target keyword , and secret values saved by DO, where is the secret value associated with version information and is the secret value involved in the encryption calculation. The algorithm outputs the updated broadcast cipher .(v)Trapdoor : this algorithm is executed by DU with the input of public parameter , authorized user’s private key and target keyword , and the output of search trapdoor .(vi)Search : it is automatically executed by the smart contract, takes the system parameter , the search trapdoor for the keyword , and the encrypted database as input, and outputs the matched search results .(vii)Decrypt : this algorithm is executed by DU, which takes the private key and the search result set as inputs and outputs the decrypted file index set .(viii)Supervise : this algorithm is executed by SUP with input parameters including system parameter , public keys of authorized users, illegal or sensitive keyword and private key of supervisor, and outputs search trapdoor of sensitive words.

3.4. Security Definition

Similar to [12], we demonstrate the confidentiality of our BMNSE scheme with a real/ideal simulation paradigm. To achieve higher operational efficiency, searchable encryption schemes will disclose some information to the server. The leakage information of our scheme is described by the leakage function . The nonformal definition of the confidentiality of searchable encryption scheme is that no information about the database should be revealed other than the information leaked in the leak function . The formal definition of confidentiality can be presented by a reality/ideal simulation paradigm containing the game and .

Definition 1. Let , , , , , denote the BMNSE scheme, denote the adversary, and be a simulator with a leakage function as an parameter. The following two probabilistic game experiments are defined:(i): the game runs the system setup algorithm Setup to generate system parameters and the key generation algorithm KeyGen to generate the user’s public-private key pair . The game publishes the public message and keeps the private key secretly. Then, the adversary selects a database and performs an encrypted query based on the information . Next, the game runs the encryption algorithm and returns the encrypted database to . chooses a keyword for the trapdoor query, and the game runs the trapdoor generation algorithm and returns the trapdoor to . Then, selects a trapdoor for the search query and the game runs the search algorithm and returns the result set to . The adversary can repeat the above steps several times and finally output a bit .(ii): the simulator generates the system public parameter using the leak function of system setup. Then, generates the user’s public-private key pair based on the public parameter and the leak function and publishes the public key list . Next, the adversary launches an encrypted query and the simulator generates an encrypted database and returns it to . Then, the simulator uses the leak function of the trapdoor to generate a search trapdoor in response to a trapdoor query from . After the adversary issues a search query, the simulator returns the result using the leak function of the search. Finally, the adversary outputs a bit .Scheme satisfies -adaptive security if for any probabilistic polynomial time (PPT) adversary , there exists a PPT simulator such thatwhere is a negligible function.

3.5. Design Goals

Combining the above system model and practical application requirements, our scheme should meet the following functional objectives:(i)Supervisibility. Supervision can ensure the controllability of the cryptographic data. The SUP needs to supervise the encrypted data in the CS and the DU’s search requests to ensure that the data can be stored and used in a legal and compliant manner.(ii)Multi-user Search. Multiuser search is a basic function in data sharing scenarios. In this scenario, multiple DUs need to be authorized to access the encrypted data to provide easier data retrieval services.(iii)Dynamic Update. It is an important function of the dynamic searchable encryption scheme. First, after DO generates data files containing preexisting keywords, the encrypted data structure corresponding to the keywords in BC and the data files in CS need to be updated. Secondly, in the multiuser scenario, dynamic update of authorized user DU needs to be implemented.

4. A Multiuser Normalized Searchable Encryption Scheme via Blockchain

In this section, we describe the specific construction of BNSE in detail and present its security proof. The algorithms are constructed as follows.

Setup : the setup algorithm takes the security parameter as input. It generates parameters for the bilinear map system, where is an additive group and is a multiplicative group with the same prime order , is a generator of , and is a bilinear map. Then, the algorithm picks several secure hash functions , where involve the elements on or , , , , , , , , , and . Then, the algorithm selects a pseudorandom function , with the inverse permutation . Finally, it outputs the public parameter

KeyGen : is generated randomly, and is computed. The private key of date user is , and is a secret value to derive the public key. Given the public key of supervisor , is randomly selected and , , and are computed. Then, the key generation algorithm outputs the user’s public key

Encrypt : the input parameters of the encryption algorithm contain the system public parameter , the authorized data users’ public key , where is the number of authorized users, , where means add file and means delete files), , , is the number of files containing the keyword , and is a mapping that stores the keyword state pointer, which is able to trace back to the last update of the files including the keyword. Then, the following steps are performed:(1) is randomly selected, and the version information is computed for the encrypted database.(2)Knowing that the authorized users of the encrypted data are and each user’s public key is , is chosen at random, and let the vector , where is the coefficient of in the polynomial .(3)For each keyword in the keyword set :(1)The state pointer map of the keyword is retrieved. If the retrieval result is empty, then the state of the current keyword is initialized. Let and , where is not involved in information storage and is the number of times the keyword is updated. If the retrieval result is not empty, no initialization is required. The pseudorandom permutation key is randomly generated, and is calculated. Subsequently, the local mapping is updated.(2)Given the current keyword’s state pointer and the symmetric key of the encrypted file, and are computed, where .(3)For each file index in , , the encrypted index is computed, and then, and are computed.(4)The trapdoor is computed, and then, and are computed.(4)The encrypted database is obtained through the above calculation.

The encrypted database is uploaded in the form of key-value pairs , , and to the blockchain ledger via the smart contract as a searchable cryptographic data structure of keywords.

Update : the input parameters include the system parameter , the set of users to be authorized , where is the number of all authorized users, and the secret values saved by the data owner, where the random number involves the version information of keyword and the secret value is used to generate the trapdoor and encrypted index. Authorization update is performed on the file index set containing the keyword . The vector is computed, where is the coefficient of in the polynomial and is the total number of all authorized users.

To improve the computation efficiency, the original polynomial ciphertext can be used to perform the computation by first subtracting the polynomial from the secret value and then multiplying with the term generated by the public keys of the users to be authorized to get a new polynomial. Finally, the secret value is embedded into this polynomial to get a new authorized polynomial . The update process only needs to calculate the relevant terms of the user to be authorized based on the original secret text. In addition, the previously authorized users can still use the original vector to compute the trapdoor and decryption.

Trapdoor : with the system parameter as input, only the authorized user can use his private key to compute the trapdoor of the keyword . The steps of the trapdoor calculation are as follows:(1)The version information of the keyword is obtained. is computed such that , and are computed.(2)Since is a root of the polynomial , is computed to get the secret value .(3)The trapdoor of the keyword is output.

Search : the search algorithm is the inverse process of the encryption algorithm, with the public parameter , the trapdoor of the keyword , and the encrypted database as input parameters. An empty set is initialized to store the search results. Then, the following steps are performed:(1)Given the trapdoor , is computed. is retrieved from the encrypted database. If , then the search algorithm is terminated and the search result is returned. Otherwise, is computed.(2) is computed, and is retrieved. If , the search algorithm is terminated and the search result set is returned. Otherwise, is computed.(3)For each , is computed and is retrieved. is computed, and then, is inserted into the search result set .(4)Using the state pointer of the keyword and the pseudorandom permutation key obtained in step 2, the previous state pointer is computed. Let , and then, step 2 is proceeded.

Decrypt : the decryption algorithm is used to decrypt the encrypted indexes in the search results . Using the secret value computed in step 2 of the trapdoor algorithm Trapdoor, for each record of the result set , is computed. If , the index is used to access the corresponding data ciphertext from the cloud server CS and decrypt the ciphertext using the key obtained in step 2 of the search algorithm to get the plaintext data file. If , it means this file index has been deleted and there is no need to access this file in the cloud server.

Supervise , : the input parameters include system parameter , public keys of authorized users, the private key of supervisor, and the set of sensitive words . is computed to obtain , and the steps are subsequently performed in the trapdoor algorithm Trapdoor to compute the secret value . After obtaining a set of secret values , the supervisor generates search trapdoors for each secret value of the sensitive word . Then, the hash value of the trapdoor is calculated and the hash values in the list are stored and uploaded to the BC through the smart contract to realize the supervision of search requests. Second, the trapdoor set is used to get the matching file index ciphertext by executing the search smart contract and the ciphertext is decrypted using the secret value to get the file index. Finally, the index is used to locate the illegal file containing the sensitive word in CS to achieve the supervision of the ciphertext data in CS.

Correctness analysis: when generating the searchable encrypted data structure, a broadcast polynomial is constructed. The authorized user is able to use his private key to compute

After obtaining the secret value by substituting into the broadcast ciphertext, the search trapdoor is computed. The trapdoor search steps are described in the soundness proof of the security proof subsection. As for the ciphertext data supervision, given the private key of the supervisor and the partial public key of the authorized user, is computed as follows:

After getting , the secret value used for searching and decryption can be calculated as formula (1).

Theorem 1. The BMNSE scheme is a -adaptive secure searchable encryption if is a pseudorandom permutation function, the hash function is collision-resistant, the DBDH difficulty problem holds, and the polynomial-based broadcast encryption algorithm is adaptively secure.

Proof. We demonstrate the adaptive security of the scheme through a sequence of games similar to reference [11]. The first game is the real-world game . Each game is slightly different from the previous one, but they are indistinguishable from the adversary, finally reaching the last ideal world game . According to the transmission property of indistinguishability, it can be concluded that is indistinguishable from , thus completing the proof of confidentiality.
In the second game , it maintains a list of state pointers for storing state pointers; i.e., . The state pointers are used in the encryption algorithm, and the game randomly chooses a string to generate the state pointers instead of using the pseudorandom permutation function . Because the pseudorandom substitution function is indistinguishable from the actual random function, the games and are indistinguishable.In the third game , it models all hash functions as random oracles, where each oracle maintains a list to store input/output pairs. For example, given a random oracle with input , the oracle randomly selects a string as output, where is the output length of the hash function, and stores in the list -. Because the hash function is collision-resistant, the games and are indistinguishable.In the fourth game , it computes on the basis of by randomly choosing a secret value in the encryption phase. Also, the game needs to maintain a list for storing in response to the trapdoor query from the adversary . is a tuple based on the DBDH problem, and is a random tuple. If the adversary can distinguish the games and , it means that the adversary is able to distinguish the two tuples, i.e., solve the DBDH problem, which is contrary to the assumption of the hard problem. Thus, the games and are indistinguishable.In the last game , the simulator maintains two lists, one for simulating random oracle queries and another counter that keeps track of the number of encryption updates since the system was initialized. For each encryption query, two random strings are selected. The simulator uses the encryption history to determine the encryption queries for the keyword . Based on the encryption history, state pointers and keys can be generated and then the random oracle is updated. In the adversary’s perspective, the view generated by the simulator is completely indistinguishable from the view in the game .Summing up, we can getwhere the advantage of solving the difficult DBDH problem is negligible, so our proposed scheme is a -adaptive secure searchable encryption scheme.

5. A Blockchain-Based Normalized Searchable Encryption System for Medical Data

In this section, we present our design of the BNSEM system based on the BNSE scheme presented in the preceding section.

5.1. System Architecture

We divide the BNSEM system into three layers: data collection layer, medical data processing layer, and medical data access layer. The system architecture is shown in Figure 2. The entities in the system are roughly the same as those in the BNSE scheme, and the difference is that the entities in the medical system are all medical service providers/users, including the medical data owner (MDO), medical data user (MDU), medical cloud server (MCS), and medical consortium blockchain platform (MCB).

In the medical data collection layer, medical data are mainly generated by doctors and patients. On the one hand, patients will generate corresponding medical data when they visit hospitals. On the other hand, the health data will be generated when patients use home medical tools or wearable medical monitoring devices, which can be used as reference indicators for the diagnosis of doctors.

In the medical data processing layer, the patients need to preprocess the data before uploading, including encrypting the medical data, establishing the index of medical file, extracting the keywords in the medical file, and constructing a searchable structure based on the file index and the keywords. Finally, the ciphertext of medical records are uploaded to MCS and the searchable structure are uploaded to MCB.

In the medical data access layer, only authorized medical data users can access the patient’s medical data. First, the MDU generates a trapdoor for the target search keyword and sends the search request containing the trapdoor to MCB. Then, the smart contract matches the trapdoor with the searchable structure and returns the corresponding medical file index. Finally, the MDU uses the file index to access medical data in MCS and MCS returns the corresponding data to the MDU.

5.2. Medical Data Preprocessing

When a patient goes to the hospital, the doctor makes a diagnosis and generates an electronic medical record. The record includes the diagnosed disease, examination results (medical images, laboratories, etc.), medication prescriptions, and personal information (such as name, age, and gender). Each electronic medical record is treated as a file and has a unique file identifier. The doctor synchronizes the generated medical records to the patient to complete a disease diagnosis process.

5.2.1. Building the Indexes of Medical Records

When owning a specific number of medical data records, the patient can upload the record files. Before uploading, indexes corresponding to the files need to be constructed. For example, when the patient, i.e., MDO, receives medical files , several indexes will be constructed for these files. The information related to the files can be embedded into the indexes according to the actual situation, such as the date and size of the files. The file indexes built for medical data files are .

5.2.2. Extracting Keywords from Medical Records

MDO performs keyword extraction for the keywords contained in each file in . For medical data files, we mainly consider the keyword extraction of name, gender, and age in basic information, disease name, drug prescription in medical indicators, and doctor, hospital, and visit time in treatment information.

5.2.3. Constructing the Inverse Indexes

The keywords extracted from different medical data files in were integrated to obtain the keyword dictionary . Then, for each keyword in the keyword dictionary , the inverse index containing the keyword is constructed. A specific construction of the inverse index of medical record files is shown in Figure 3.

5.3. Medical Consortium Block Chain Platform

In BNSEM system, Hyperledger Fabric is chosen as the medical consortium blockchain (MCB) platform. Because Fabric has a strict access mechanism, it can be managed collaboratively in a polycentric manner by entities from multiple organizations. In addition, the consortium blockchain can best balance the security and efficiency of the system compared with public and private blockchains. Initial access control can be achieved through the access mechanism of Fabric. By deploying smart contracts of Fabric, more fine-grained data access control can be realized.

MCB is a federation of multiple healthcare providers, which is built and maintained by different entities such as hospitals, research institutions, regulatory bodies (e.g., healthcare commissions), and insurance and pharmaceutical companies. Organizations with high trust level preselect some peer nodes as consensus nodes according to their management policies (e.g., supervision institutions and hospital management nodes). These designated consensus nodes are responsible for managing and updating the distributed ledger, while other peer nodes can only generate or contribute healthcare data transactions. Consensus nodes require a certain amount of computing power to perform consensus algorithms on transactions. In addition, if the number of consensus nodes increases, the degree of decentralization of the system increases and security and scalability can be improved.

MCB enables search structured storage and encrypted medical data retrieval by invoking predesigned and deployed smart contracts. Before MCB operates, the consortium members need to define a number of contracts developed by different organizations covering common terminology, data, rules, and processes to specify the model of data storage and sharing. A client application invokes a smart contract to execute the search protocol. When the execution is complete, the smart contract records the results (i.e., state changes) in the distributed ledger of MCB. Together with the ledger, smart contracts form the core part of the MCB system.

5.4. System Design
5.4.1. System Setup

Before the system runs, TI sets security parameters and generates system public parameters . The system parameters include bilinear operation parameters , hash functions with different output lengths, and pseudorandom permutation functions with reference to the setup algorithm in Section 4. The system selects AES algorithm as the pseudorandom permutation function to excrypt medical data. The medical data users in the system mainly include the data users’ MDUs and the supervisory institution SUP. Before the users join the system, they need to generate a set of public-private key pairs for data authorization. The public-private key pair of SUP is . The public-private key pair of MDU is , which is computed in the setup algorithm.

5.4.2. Encryption and Updating of Medical Data

After the system is initialized, MDO will store the encrypted medical data and the corresponding searchable structure to authorize access by multiple MDUs. When patients visit the hospital and get multiple electronic medical records, these medical records will be preprocessed as described in subsection B. Next, MDO gets the file index set , the keyword dictionary , and the file index set . Let the database , and a mapping stored locally for keeping the latest status of keywords (i.e., status pointer) is initialized. Then, MDUs are specified to be authorized, denoted as , whose public key is .

Taking the above parameters as input, the Encrypt data encryption algorithm in Section 4 is invoked to encrypt the medical record database to obtain the encrypted database; i.e., searchable data structure , where the variable is a reference to the subscript value and hides the information of the medical file indexes. After generating the encrypted database , the key-value pairs are uploaded to MCB through a smart contract.

Updates include medical data update and authorized user update. There are two types of medical data update: adding and deleting medical record files. When adding medical record files, MDO obtains the state pointer for the same keyword as in the previous keyword dictionary and invokes the Encrypt encryption algorithm in Section 4 to encrypt the newly added medical record database to update it. When deleting medical record files, MDO performs the operation differently by selecting the option from . The update of authorized medical data users is achieved by reconstructing the broadcast ciphertext. MDO adds the specified medical data user MDUs as , where is the total number of new and old authorized users, and the calculation method refers to the Update algorithm.

5.4.3. Retrieval of Encrypted Medical Data

When a patient (MDO) goes to another hospital for treatment, the authorized doctor (MDU) reviews the patient’s past medical records to assist in the diagnosis. The search process for medical records is as follows:Step 1. MDU selects a keyword (e.g., hypertension) and generates the search trapdoor by invoking the trapdoor algorithm using his private key.Step 2. MDU sends a search request containing the search trapdoor to the smart contract.Step 3. The smart contract matches the trapdoor with the search structure stored in the blockchain to obtain the encrypted medical indexes according to the search algorithm.Step 4. MDU uses the secret value to decrypt the ciphertext index to get the plaintext index and the option corresponding to the index and the file decryption key, .Step 5. If the option is , it indicates that the file has been deleted and no access is needed. On the contrary, MDU will access the medical data stored in the MCS with the indexes.Step 6. MDU decrypts the medical record ciphertext returned from MCS to get the medical record file using .

5.4.4. Supervision of Medical Data and Search Requests

To ensure that the data in BNSEM system can be stored and used legally, supervisors such as the healthcare commission are required to regularly review the encrypted medical data in MCS and monitor the search requests of MDUs in real time. First, SUP maintains a sensitive word dictionary , which includes sensitive keywords such as prohibited drugs, illegal hospitals, and fake doctors. Next, SUP invokes the supervise supervisory algorithm to locate the illegal files containing sensitive words in MCS using the private key . Then, SUP generates trapdoors for each sensitive word in , and after hash calculation, a trapdoor hash list is obtained. Finally, SUP uploads the hash list to MCB through smart contracts to filter trapdoors in search requests and intercept the illegal requests containing sensitive words.

6. Security Analysis

6.1. Forward Privacy

The requirement of forward privacy is that given a previous search trapdoor, the update query does not reveal information about the keywords that were searched in the past; i.e., the previous keyword trapdoor cannot be used to search medical records newly added after the trapdoor was released. In the BNSEM system, the trapdoor is equivalent to a state pointer of keyword . With the help of this pointer, the smart contract will find the latest state of the keyword , which is used to locate the corresponding encrypted medical file index , where . The smart contract then computes the last updated state to search the previously updated medical files.

When updating the medical files containing the keyword , MDO will compute a new status pointer , which is used to encrypt the file indexes and generate the searchable structure corresponding to the latest version information . Due to the security of the pseudorandom permutation function , the adversary cannot predict the next state pointer based on the current state pointer and the version information. Therefore, the previous search trapdoor cannot be used to search the medical data updated afterward, so forward privacy is guaranteed. The BNSEM system that implements forward privacy can effectively resist file injection attacks and avoid adversaries from inferring the keyword contained in a trapdoor.

6.2. Backward Privacy

Backward privacy limits the updated information of a keyword that an adversary can obtain during a search query on the keyword . That is, a searchable encryption system satisfies backward privacy if after a keyword-file index pair is added to the database and then deleted, and a search query on the keyword will not disclose the index . In BNSEM system, encrypting a medical file index yields , where the secret value is broadcast encrypted using the authorized MDUs’ public key and can only be decrypted by the authorized MDUs. Since the search result is in the form of ciphertext, even if it is stored publicly on the MCB, the adversary cannot decrypt the broadcast ciphertext to recover the secret value and cannot learn any useful information about the indexes of medical files. Therefore, the backward privacy of BNSEM can be achieved.

6.3. Distribution

Although BNSEM requires the use of a centralized MCS to store encrypted data, the search process is accomplished by smart contracts, which ensures the reliability and correctness of search results. First, to achieve the retrieval of encrypted medical data, the MDO uploads the searchable data structure to the distributed MCB platform by invoking the smart contract with storage function. Second, the MDU runs the trapdoor algorithm and uploads the trapdoor to trigger the smart contract with the search function. The correctness of the whole search process does not rely on the MCS, enabling decentralized search.

The blockchain is distributed, and each blockchain node is relatively independent and must be authenticated to join the system. It is difficult for the adversary to manipulate a large number of nodes at the same time to change the network rules and damage the blockchain system, which can effectively resist Sybil attack. In addition, since each search is recorded as an immutable transaction on the blockchain, the number of search requests sent by each MDU cannot be tampered with. The online keyword guessing attack (KGA) can be effectively resisted by setting an upper limit on the number of MDU’s requests.

7. Performance Analysis

7.1. Performance Comparison

We compare the theoretical performance of our scheme with other multiuser searchable encryption schemes, where the MVSSE [24] and BAEKS [25] schemes are both based on public key cryptography, and [20] is a symmetric searchable encryption scheme. In this study, we compare the computational overheads of the main algorithms of searchable encryption schemes, including encryption algorithm, trapdoor algorithm, and search algorithm. The results of the performance comparison are given in Table 1.

The notations in Table 1 are explained as follows: denotes the number of authorized MDUs and denotes the number of indexes containing the keyword . Symbols , , , , , and denote general hash functions (e.g., SHA-256 and SHA-3), exponential operation, scalar multiplication on the group , multiplication on the group , a bilinear pair from groups to , and a map-to-point map. Although the hash functions used in our scheme differ in input/output lengths, they can all be obtained by simple transformations of the general hash functions and will not add additional complexity. In addition, denotes pseudorandom permutation function (i.e., symmetric cryptography, e.g., AES and DES algorithms). The time overhead of the above operations is shown in Table 2.

It shows that the computational overhead of encryption algorithm in most schemes is linearly related to the number of indexes in Table 1. The BAEKS scheme does not consider the number of indexes containing the keywords. In addition, the encryption computational complexity of BAEKS is linearly related to the number of users, so it is not shown in the computational overhead graph. The scheme does not describe the broadcast encryption algorithm it uses, so the broadcast encryption overhead cannot be calculated. The encryption computation overheads of our scheme and the MVSSE scheme are and , respectively. Although our scheme contains additional time-consuming operations, they are independent of the number of indexes. The theoretical computational overhead of encryption algorithm for each scheme with respect to the number of file indexes is shown in Figure 4.

As for trapdoor algorithm, the computation overheads of MVSSE, , and our scheme are , , and , respectively. Although the trapdoor computation overhead of our scheme is slightly higher than other schemes, we avoid key management and distribution operations compared with the symmetric scheme MVSSE. Moreover, the user in the MVSSE scheme cannot generate search trapdoors independently and it requires interactive communication with the server. Similarly, the scheme requires the data owner to generate and distribute public-private key pairs for multiple recipients, which does not meet the key security specification. The theoretical computational overhead of trapdoor algorithm for each scheme is compared as shown in Figure 5. The computational overhead of our scheme is slightly higher than that of MVSSE scheme, and the trapdoor generation of scheme only involves pseudorandom permutation operation with minimal time overhead.

When performing search operations, the MVSSE scheme contains multiple scalar multiplication operations, which will incur a large computation overhead. The search computation overhead of our scheme is lower than that of the symmetric searchable encryption scheme because the computations in the main algorithm of our scheme are hash operations or symmetric cryptographic primitives. Therefore, our scheme is a searchable public key scheme with high search performance. Figure 6 shows the variation of the theoretical search computation overhead with the number of indexes for each scheme. Our scheme has the lowest computation overhead, and the MVSSE scheme has the highest time overhead with the number of indexes.

7.2. Prototype Implementation

We implement our BNSEM system using the MIRACL cryptographic library (C++) on a PC with 16 GB of RAM, Intel Core i5-7500 CPU, OS Windows 10, and a Fabric consortium blockchain on a PC with 16 GB of RAM, Intel Core i5-7500 CPU, and OS Ubuntu 16.04. In addition, we set the system security parameter to 128 bits, implement hash functions with different input and output lengths based on SHA-256, and use the AES algorithm in CBC mode as the pseudorandom permutation function with a key length of 128 bits. Finally, we choose a super-singular elliptic curve to achieve the ASE-128 security level. Next, we perform three simulation tests: the time cost of the encryption algorithm with the number of indexes, the time cost of the search algorithm with the number of indexes, and the time cost of all algorithms under a certain number of indexes of our system.

To overall evaluate the efficiency of our system, we test the average time overhead of all algorithms under the condition that the number of indexes containing the keyword is 10000, as shown in Figure 7. The key generation requires multiple scalar multiplication operations on the G1 group with a time overhead of about 85 ms. In addition, the time to generate a search trapdoor of the keyword is about 166 ms, while the time overhead to encrypt a search structure with 10000 file indexes is only 287 ms, mainly because the trapdoor algorithm requires the time-consuming operations (bilinear pairs and . The search algorithm in the smart contract is efficient with an average time overhead of about 55 ms for 10000 matched results.

8. Conclusion

In this study, we propose a blockchain-based searchable encryption scheme BNSE and design a searchable encryption system BNSEM for medical data based on the scheme. Firstly, the system adopts the smart contract of Fabric to guarantee the accuracy of search results. Secondly, we use polynomial-based broadcast cryptography to implement a multiuser search function. Then, the system achieves legal regulation of medical ciphertext data without violating the privacy of the private key. Finally, we provide the security analysis of BNSEM and perform a test of the time cost of each algorithm. For future work, we have considered functional extensions of multikeyword search and range queries on numerical data.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported the National Key Research and Development Program of China (2019QY0800), the Shandong Provincial Key Research and Development Program (2020CXGC010107 and 2021CXGC010107), the National Natural Science Foundation of China (U21A20466, 62172307, 61972294, and 61932016), the Blockchain Core Technology Strategic Research Program of Ministry of Education of China (2020KJ010301), the Special Project on Science and Technology Program of Hubei Province (2020AEA013), the Natural Science Foundation of Hubei Province (2020CFA052), the Wuhan Municipal Science and Technology Project (2020010601012187), the Foundation of Hangzhou Innovation Institute, Beihang University (2020-Y10-A-019), the Peng Cheng Laboratory Project (PCL2021A02), and the Foundation of Guangxi Key Laboratory of Trusted Software (kx202001).