Abstract

Now, it is common for patients and medical institutions to outsource their data to cloud storage. This can greatly reduce the burden of medical information management and storage and improve the efficiency of the entire medical industry. In some cases, the group-based cloud storage system is also very common to be used. For example, in an medical enterprise, the employees outsource the working documents to the cloud storage and share them to the colleagues. However, when the working documents are outsourced to the cloud servers, how to ensure their security is a challenge problem for they are not controlled physically by the data owners. In particular, the integrity of the outsourced data should be guaranteed. And the secure cloud auditing protocol is designed to solve this issue. Recently, a lightweight secure auditing scheme for shared data in cloud storage is proposed. Unfortunately, we find this proposal not secure in this paper. It’s easy for the cloud server to forge the authentication label, and thus they can delete all the outsourced data when the cloud server still provide a correct data possession proof, which invalidates the security of the cloud audit protocol. On the basis of the original security auditing protocol, we provide an improved one for the shared data, roughly analysis its security, and the results show our new protocol is secure.

1. Introduction

With the rapid development of network information technology in the medical field, internet medical care has become a new medical mode, and the pace of information construction of medical and health undertakings has been quickened. It makes the vast amount of medical information transform from traditional paper documents to fast and convenient digital storage, which undoubtedly brings a lot of convenience to medical institutions and patients, and greatly improves the diagnosis and treatment efficiency and service level of the hospital. With the increasing promotion of “Internet + Medical”, big data, cloud computing, and other technologies have been widely used in the medical field, which makes a large number of medical data generated in the process of collection, storage, and application that can be centrally stored in the cloud. However, due to the great value of medical big data, such as patient basic information and medical records, as well as unfair competition among medical institutions, and problems such as information disclosure, device intrusion, data abuse, and tampering may occur frequently. At the same time, cloud service providers are often dishonest. They may delete uncommonly used data in order to reduce storage costs or even do not report lost data for the sake of their reputation. These problems bring great threats and hidden dangers to the security of medical information. Therefore, how to protect the storage security of medical information has become an important part of Internet medical construction, and it is a difficult problem that the medical industry needs to face squarely and urgently to be solved. In recent years, the development of cloud secure audit technology can well solve many of these problems. Secure audit is a security mechanism independent of cloud service providers. Users or medical institutions only need to implement some kind of knowledge proof agreement with service providers to audit and check the data stored in the cloud, so as to ensure the security of medical data.

1.1. Related Work

Before the concept of cloud computing was put forward, people explored methods for auditing remotely stored data. In 2004, Dewarte et al. [1] provide a remote storage integrity check scheme, which use RSA-based hash functions to implement integrity checks. This method requires the verifier to download the data locally and compare it with the signature after public key recovery. In the cloud storage environment, due to the large amounts of data, verifying the integrity of the data through the signature will cause a huge communication overhead. And for those archived data that need to be backed up or saved for a long time, it is very unreasonable to download all the data just for integrity check.

Thus, the cloud storage auditing mechanism is proposed to solve this challenge problem. It does not require the user to download the data locally, but only requires the service provider to calculate some integrity evidence by accessing the user’s data, and then the cloud submit the evidence to the user for verification, which greatly reducing the communication overhead. More importantly, this conclusion of the data integrity obtained by users through auditing is more convincing than the announced conclusion by the service provider.

In cloud storage, integrity checking of outsourced data is a difficult problem to be solved. Only solving this problem effectively, the user’s data security can be truly guaranteed. For this purpose, people have conducted extensive research on a cloud storage auditing which is not only efficient but secure.

Basing on the idea of sampling the remote data, Jules et al. [2] and Ateneses et al. [3] proposed the primitive of PoR (Proofs of Retrievability) and PDP (Provable Data Possession), which are both designed to verify the integrity of remote data. The commonality of the solutions is that the cloud service party only access part of the data (the accessed data is randomly formulated by the auditor) through some form of the challenge-response protocol, to generate a probabilistic proof and submit it to the auditor for verification. The difference is that PoR uses a combination of error correcting code in the scheme, so it can provide data recovery when data corruption is detected. Compared with the PoR scheme, the PDP scheme provides relatively weak security; that is, it only detects data corruption and does not guarantee the ability of data recovering. However, the design of the PDP is more flexible, making it more flexible when the solution is extended to support additional functions (such as dynamic data update, and fairness).

Since most of the current cloud storage auditing schemes are based on these two schemes [314], in the following, we mainly discuss these two audit schemes, as well as their derived schemes and applications [9, 1518]. (1)PDP scheme. Based on the RSA ring and KEA-r assumption, Ateniese et al. proposed the PDP scheme in 2007. It is the first probabilistic audit scheme that is both safe and practical. The basic idea is to divide the file into blocks of fixed size based on Sebe’s [19] scheme at first and then calculate a Homomorphic Verifiable Tag (HVT) for each data block. Finally, the user stores the original data blocks and the corresponding homomorphic tags on the server and deletes its local backup.

When initiating an audit request, the audit party randomly chooses a subset index of the original data blocks and sends the auditing request to the cloud server. The prover generates integrity proof based on these specified data blocks and their tags and sends the proof to the auditor for validation. The proof mainly includes two parts: one part is a linear combination generated from the specified data block, and the other part is the aggregate signature generated from the specified tag. The verifier checks the validity of the proof by verifying a certain consistency relationship between these two parts. Since only a small part of the original data needs to be accessed, the proof provided by the prover is a kind of probabilistic proof. If the proof is correct, it can ensure that each data block specified by the auditor is complete and guarantees the integrity of all original data blocks with a high confidence probability at the same time.

The HVT used in the PDP scheme and the homomorphic hash function used in the earlier scheme [20, 21] are essentially homomorphic functions. It has such a feature: the result of computing Homomorphic Verifiable Tag (HVT) of the sum of two messages is equal to the product of the results of Homomorphic Verifiable Tag (HVT) of these two messages. One obvious benefit of this is that, in the proof generation stage, the tags of multiple data blocks can be merged into one tag. By verifying the validity of this aggregated tag, the validity of each individual tag can be ensured, which greatly saves the bandwidth overhead.

In the PDP scheme, Ateniese et al. first proposed the concept of public verifiability for cloud storage auditing. The audit can be conducted by any third party auditor (TPA) or any other ones, which has special significance for the broader application of integrity auditing schemes in the cloud environment. Generally speaking, the computing power of data owner’s equipment is limited, and the proof verification process in the audit scheme requires a lot of calculations. If the audit task can be delegated to a third party or any other ones trusted by the data owner, the burden on the data owner can be greatly reduced. However, it is worth noting that TPA is often regarded as the delegatee of the user, and the trust of the service provider is also limited. When there is a dispute between the TPA and the service provider about the proof, how to arbitrate fairly and effectively is also an important issue. (2)PoR Scheme. Juels et al. proposed the PoR scheme in 2007 to detect the integrity of remote data and recover the data through error correction codes when data corruption is found. The main idea of PoR is to add a certain number of equal-length marker blocks to the data blocks and then disturb the positions of all data blocks and marker blocks through a random permutation algorithm. Because the content of the marked block is a random value without semantic meaning, they must be placed in the encrypted data block sequence, which make the marked block and the data block indistinguishable from each other and make it difficult for an attacker to identify the specific location of the marked block. During the audit, the auditor specifies the location of some marker blocks, and the prover sends these marker blocks to the auditor for verification to determine whether the data is integrity. This idea is based on the fact that: the marked block and the ciphertext data block are indistinguishable, once a data block is damaged or lost, some marked blocks will also be damaged with a high probability.

The advantage of the Juels’ PoR scheme is that it combines integrity detection and data recovery, which makes it possible to recover the damaged data block immediately when it is detected by the data block that is damaged and provide higher level data protection, but the use of erasure codes and the embedding of marker blocks will generally increase the amount of data by 15 (a)In order to make the original data blocks and the marked blocks indistinguishable, the original data must be encrypted. And it cannot be applied to the audit of plain text data, such as archived data (weather, space data etc.) which needs to be backed up for a long time(b)The use of the marked data block is one-time. Once the location of the tag block is exposed, it cannot be used again in the next auditing, and each audits will expose a part of the location of the marked blocks, which make the total number of audits that can be conducted is restricted(c)The strategy of encoding data with error correction code and then encrypting it which make the auditing scheme based on PoR be very difficult in supporting data block insertion, deletion, and modification

1.2. Motivation

Recently, a lightweight secure auditing scheme for shared data in cloud storage is proposed by Tian and Jing [1]. By introducing Hashgraph technology and designing a Third Party Medium (TPM) management strategy, their scheme achieves security management of the groups and a lightweight calculation for the group members. And by employing a blind method to blind the data, it protects the group members’ privacy information. Unfortunately, we find their scheme not secure in this paper. Due to improper parameter settings of its signature algorithm, the adversary can easily forge authentication label. Even if all the outsourced data has been deleted by the cloud server, it can still give a correct data possession proof. The malicious cloud server can also modify the data blocks and the corresponding authentication labels arbitrarily without detection. At the same time, we noticed that the original scheme needs to be restored to the authentication label corresponding to the real data after the authentication label corresponding to the blind data is uploaded to the cloud server, resulting in a large amount of calculation. In order to solve these problems, we have improved the signature algorithm of the original scheme, specifically by changing the key parameter settings. In our solution, lightweight computing is also implemented, but the malicious cloud server cannot obtain the privacy information of the user key from the obtained authentication tag, so it is impossible to forge the authentication tag for the forged data block. If the cloud server dishonestly completes the storage work, it is impossible to pass the integrity verification of the third-party auditor. At the same time, we improved the processing method of blind data and its corresponding authentication tags on the cloud server, which reduced the computational complexity of the solution and improved the efficiency of the solution. With the increasing amount of medical data, a safe and efficient cloud storage solution is needed to manage this data, thereby reducing the storage burden of hospitals and patients. In order to solve this problem, we will design a management medical data based on our modified scheme to improve the efficiency and security of medical data cloud management. Our contribution can be summarized as follows: (1)We first point out that Tian et al.’s lightweight auditing scheme for shared data in cloud storage is not secure. The authentication labels can be easily forged. We demonstrate two concrete attacks on their protocols(2)We proposed an improved secure auditing protocol for shared data in cloud storage and analysis its security. The performance is also analyzed, and compared with other related work, the results show our protocol can be used in practical setting

1.3. Organization

We organize our paper as follows. In Section 2, we give the preliminaries which needs to understand our paper, including the mathematical tool, the definition, and security model for cloud storage auditing. In Section 3, we firstly review Tian et al.’s lightweight secure auditing scheme and show our attacks. In Section 4, we give our improved secure auditing protocol and roughly analyze its security. In the last section, we make our conclusion.

2. System Model

The system model of the lightweight secure cloud storage auditing protocol can be seen in Figure 1. There are four entities in this system model: group manager (GM), group member (M), the cloud (C), and the TPM.

For one group manager, there are multiple group members. After the data file is created by the data owner, he outsources it to the cloud server. And later,, the corresponding shared data can be accessed and modified by any group member. Here, the GM can be the original data owner. Here, we describe the functionality of four entities: (1)Data storage services are provided by the cloud (C) for group members, and the cloud platform is also provided by the cloud (C) for group members to share data(2)The group member (M) needs to complete the following tasks: (1) blind data, (2) record blind data, and broadcast it in the group(3)The group manager (GM) needs to complete the following tasks: (1)the TPM management strategy should be given, (2) the TPM’s public-private key pair is generated, (3) the secret seed is generated and used to blind the data for group members, and it is also used to recover the real data for the cloud(4)The TPM needs to complete the following tasks: (1) it generates data authentication label for group members and (2) to verify the integrity of the cloud data on behalf of the group members

The execution of the cloud storage auditing protocol can be described as follows:

3. Data Upload Stage

(a)The group members (data owners) generate data and outsource it to the cloud server. The data is first blinded by the secret seed and recorded by the Hashgraph and then is sent to the group manager(b)From the virtual TPM pool, a TPM for authorization is selected by the group manager according to the TPM management strategy; within the authorization time for these blinded data, the corresponding authentication labels is calculated by the authorized TPM(c)Then, the authorized TPM sends the pair of blind data and authentication label to the cloud. Before receiving these messages, whether or not the authorization from the TPM is valid at the current time will be checked by the cloud(d)If it is, the cloud will verify the authentication labels’ correctness. The real data is recovered. If they are correct, their authentication labels are also computed. Finally, these real data and authentication labels are stored by the cloud

4. Audit Stage

(a)According to the TPM management strategy, a TPM is selected by the group manager, and it also creates the authorization(b)Then, the challenge messages are sent to the authorized TPM by the cloud. Then, whether or not the authorization being valid from the TPM will be checked by the cloud. If it is, a proof of possession of the shared data is generated by the cloud(c)Finally, by checking the correctness of the proof, the integrity of shared data in the cloud can be verified by the TPM

5. Review of Tian et al.’s Scheme

Before we review Tian et al.’s scheme, we give the symbols and the corresponding description in Table 1.

In Tian et al.’s scheme, there are four parties which are the group manager, the group members, the TPM, and the cloud. Concretely in their scheme, the following algorithms are involved: (1)Key generation: (a)A random is selected by the group manager as the TPM’s private key. is also calculated by him(b)A random is selected by the group manager and is sent to the group members and the cloud(c) is randomly selected by the group manager, is also calculated, and the public key of is computed as the following: (d)The interconnection function and function sequence are selected by the group manager, and it also sets the input sending and output sending window and then sends them to the cloud(2)Data blind: (a)The secret seed is used to calculate the blind factor by the group members, and the blind data is calculated as (b)A request to upload the data is sent by a group member to the group manager, and it also calculates (the hash value) and sends , , to the group manager securely. A new event is then created by the group member(c)For the new event, a transaction record will be used and will be broadcasted within the group. According to the same hash algorithm, the group manager verifies after receiving the request. If the verification is passed, it receives .(3)Authorize: (a)According to the TPM management strategy, the output port in the virtual TPM pool is calculated by the group manager corresponding to the input port (the requesting group member).(b)The authorization message for is generated by the group manager as follows: where the group manager’s identity is , denotes the time when the request is processed by the group manager as , and denotes as the time authorized for the by the group manager.(c)According to the authorization message, the group manager calculates the value as follows: (d)The authorization message is then sent by the group manager to the cloud and sends and to (4)Authentication label generation: (a)Authentication label of is generated by , after getting the blind data block , is as the following: e.g., s authentication label is generated as Then, the data file and is sent by to the cloud.(b)After receiving the corresponding group manager’s authorization message and the corresponding s , the output port is first calculated by the cloud. If the message is just sent by at , then is calculated by the cloud, comparing with the value from . If they are the same, run the next algorithm, otherwise stop the execution(5)Authentication label check: the correctness of label is verified by the cloud as follows: is received and stored if the above equation is true, otherwise, rejected.(6)Data recovery: based on , the cloud calculates , and using the following equation computes the real data : According to , the real authenticator label is calculated by the cloud, i.e., Finally, the real data blocks and their real authenticator are stored by the cloud.(7)Challenge. When a challenge is initiated by the group manager to the cloud, is randomly selected by him as the authorization time to , where s sending window on the input side is . An audit authorization command is sent by the group manager to the through at , and is sent by it as the audit authorization information to the cloudThe implements the audit process after receiving the group manager’s authorization command as the following: (a) blocks from all blocks of the shared data are randomly selected by , and the indexes of the selected blocks are denoted as (b)Two random numbers are generated by , and and are calculated(c) is calculated by (d)The challenge information is outputted by : Then, is sent by to the cloud.(8)Proof generation: after the challenge information is received, according to , the cloud first calculates the output port TPM. The authorization message then is verified by the cloud. The proof of possessing shared data is generated by the cloud as follows: (a)Subsets are divided from the index set , where the selected blocks that are signed by are (b) and which are calculated by the cloud server for each subset ; here, and .(c) and are calculated by the cloud servers, and then is the following and it is sent to as the proof.(9)Proof check: based on the received and the challenge message , the correctness of the following equation is verified by : where . That is, it checks outputs if the equation is true, otherwise, outputs .

6. Our Attack

6.1. Attack I

Our attack I is based on the following observation: the public key of is and this public key is known to all; thus, the adversary can easily use it to forge authentication label. Concretely, the adversary launches the following attack: (1)The adversary can observe many data block and their corresponding authentication labels by querying authentication label generation oracles, which is allowed in the security model of cloud auditing. That is, the adversary can get the following results: (2)The above equations can be rewritten as (3)For the adversary knows the public key of thus, it can compute (4)The adversary (malicious cloud server) now modifies data blocks to be any other data blocks and it can also compute their authentication label as shown in the next step.(5)For the adversary knows, and it also knows thus, it can compute the forged authentication label for modified data blocks as following: (6)For , , that are correct authentication labels, thus the adversary only need to follow the protocol’s specification in the challenge-response auditing protocol by using these modified data blocks and forged authentication labels. And the forged proof and aggregated data blocks can pass through the verification equation.

6.2. Attack II

Our attack II is based on the following observation: even if the cloud server does not store any data blocks, it can forge proofs directly which can pass through the verification equation. Concretely, the attack is the following: (1)In the cloud storage auditing protocol, the steps before proof generation is running as normal. The adversary (malicious cloud) does nothing in these steps except finally, it deletes all the stored blocks and their corresponding authentication labels(2)2. Proof generation: after the challenge information is received, according to , the malicious cloud first calculates the output port TPM. The authorization message then is verified by the malicious cloud. The proof of possessing shared data is generated by the malicious cloud as follows: (a)Subsets are divided from the index set , where the selected blocks that are signed by are (b)Let randomly selected from and that are calculated by the cloud server for each subset ; here, and . Here, refers to the forged authentication label, and can be calculated by the malicious cloud server as the above attack I.(c) and are calculated by the cloud servers, and then is the following and it is sent to as the proof(3)Proof check: based on the received and the challenge message , the correctness of the following equation is verified by : where . That is, it checks outputs if the equation is true, otherwise, outputs . Here, we show why this attack can be successful:

7. Our Improved Secure Auditing Protocol for Shared Data

In this subsection, we give an improved secure auditing protocol for shared data, which is the following: (1)Key generation: (a)A random is selected by the group manager as the TPM’s private key. is also calculated by him(b)A random is selected by the group manager and is sent to the group members and the cloud(c) is randomly selected by the group manager and is also calculated, and the public key of is computed as the following: (d)The interconnection function and function sequence are selected by the group manager, and it also sets the input sending and output sending window and then sends them to the cloud(2)Data blind: (a)The secret seed is used to calculate the blind factor by the group members, and the blind data is calculated as (b)A request to upload the data is sent by a group member to the group manager, and it also calculates (the hash value) and sends , to the group manager securely. A new event is then created by the group member(c)For the new event, a transaction record will be used and will be broadcasted within the group. According to the same hash algorithm, the group manager verifies after receiving the request. If the verification is passed, it receives .(3)Authorize: (a)According to the TPM management strategy, the output port in the virtual TPM pool is calculated by the group manager corresponding to the input port (the requesting group member).(b)The authorization message for is generated by the group manager as follows: where the group manager’s identity is , denotes the time when the request is processed by the group manager as , and denotes as the time authorized for the by the group manager.(c)According to the authorization message, the group manager calculates the value as follows: (d)The authorization message is then sent by the group manager to the cloud and sends and to (4)Authentication label generation: (a)Authentication label of is generated by , after getting the blind data block , and is as the following: e.g., s authentication label is generated. Note the public key of TPM is not the previous , not only is secret but is also secret in our scheme. In this case, malicious cloud servers cannot obtain the value of from the public key of TPM and a large number of authentication labels . When there is no way to get the value of and ,the adversary naturally cannot forge the authentication labels. Therefore, our solution can effectively resist attack I. Then, the data file and is sent by to the cloud.(b)After receiving the corresponding group manager’s authorization message and the corresponding s , the output port is first calculated by the cloud. If the message is just sent by at , then is calculated by the cloud, comparing with the value from . If they are the same, run the next algorithm, otherwise stop the execution(5)Authentication label check: the correctness of label is verified by the cloud as follows: is received and stored if the above equation is true, otherwise rejected.(6)Data recovery: Based on , the cloud calculates , and using the following equation that computes the real data : Note here what the cloud server stores is the original data and corresponding blinded data blocks’ authentication labels, instead of storing the original data blocks and the corresponding authentication labels. In this way, the cloud server needs not to compute the original data blocks’ authentication labels . Finally, the real data blocks and the corresponding blinded authenticator are stored by the cloud.(7)Challenge. When a challenge is initiated by the group manager to the cloud, is randomly selected by him as the authorization time to , where s sending window on the input side is . An audit authorization command is sent by the group manager to the through at , and is sent by it as the audit authorization information to the cloudThe implements the audit process after receiving the group manager’s authorization command as the following: (a) blocks from all blocks of the shared data are randomly selected by , and the indexes of the selected blocks are denoted as (b)Two random numbers are generated by , and and are calculated(c) is calculated by (d)The challenge information is outputted by : Then, is sent by to the cloud.(8)Proof generation: after the challenge information is received, according to , the cloud first calculates the output port TPM. The authorization message then is verified by the cloud. The proof of possessing shared data is generated by the cloud as follows: (a)Subsets are divided from the index set , where the selected blocks that are signed by are (b)Based on , the cloud calculates and also computes ; then, it also computes and that are calculated by the cloud server for each subset ; here, and . Different from the original scheme, we processed the data again in a blind way and use blinded data blocks with their corresponding authentication labels to generate the proof. This processing is equivalent to replacing the operation of restoring the real authentication labels with the operation of data blind. The recovering of real authentication labels requires a lot of multiplication and exponentiation operations, while the blind processing of data only needs an addition operation. Obviously, the addition operation is less computationally expensive, so our scheme has a certain improvement in computing efficiency compared to the original one.(c) and are calculated by the cloud servers, and then is the following and it is sent to as the proof.(9)Proof check: based on the received and the challenge message , the correctness of the following equation is verified by : where . That is, it checks

outputs if the equation is true, otherwise, outputs .

Here, we show the correctness of our protocol:

Through this verification, we can know that the cloud storage can still be implemented. At the same time, due to our reasonable setting of public key parameters of TPM, it is impossible for the adversary to forge authentication labels at will. Therefore, if the cloud server performs proof in a dishonest way, it will not be able to pass the verification of TPM, so our scheme can effectively resist the second attack ¢ò.

8. Application

Now, we apply this system model to real life, which can be seen in Figure 2. Take the hospital as an example patient information and other important information in the hospital are kept confidential. If there is too much patient information, the hospital needs to upload the data to the cloud platform and use the convenience of the cloud platform to solve the problem. To ensure the integrity of the data, the system model design is shown in the figure. First, doctors use the hospital’s computer to blindly process medical data, integrate the blind data, and upload it to TPM. The TPM management strategy is designed by the hospital and the two exchanges of data and information after public and private key encryption. The TPM processes the blind data again, generates an authorization label, and then sends the blind data and the authorization label to the cloud server. Before this, the hospital must first send the authorization message to the TPM and cloud server to ensure that they can handle the blind data accordingly. The cloud server that receives the blind data then restores the data and stores the original data in the cloud database along with the authorization label. When you want to verify the integrity of the data, the hospital uses a third-party TPM to verify the integrity of the data. The challenge response is used to verify the authorization of the TPM, and if the verification is successful, the original data is sent to the TPM. In the whole process, the calculation cost of the hospital is reduced, and the integrity of the data can be verified. To a certain extent, it has been greatly improved compared with the original algorithm.

9. Conclusion

In this paper, We point out that Tian et al.’s lightweight secure auditing scheme for shared data in cloud storage is not secure. The authentication labels can be easily forged, and thus the cloud server can launch the following attacks: modify the data arbitrarily, delete the data arbitrarily, and add the data arbitrarily. In all these attacks, the cloud server still can give correct data possession proofs, which invalidates the security of cloud audit protocol. Then by optimizing the method of TPM’s public key generation and the data integrity proof protocol framework, an improved secure cloud storage auditing protocol for shared data is given. Through comparative analysis, the article proves that, to a certain extent, our improved scheme is more of security and efficiency. Finally, the application scenarios of this paper in medical field are presented.

Data Availability

No data were used to support this study.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work is supported by the National Key Research and Development Program of China under Grant No. 2017YFB0802000, National Nature Science Foundation of China (Grant Nos. 61572521, U1636114), National Cryptography Development Fund of China under Grant No. MMJJ20170112, and an Open Project from Guizhou Provincial Key Laboratory of Public Big Data under Grant No. 2019BDKFJJ008. This work is also supported by the Engineering University of PAP’s Funding for Scientific Research Innovation Team (No. KYTD201805) and Engineering University of PAP’s Funding for Key Researcher (No. KYGG202011).