Abstract

With the advent of data outsourcing, how to efficiently verify the integrity of data stored at an untrusted cloud service provider (CSP) has become a significant problem in cloud storage. In 2019, Guo et al. proposed an outsourced dynamic provable data possession scheme with batch update for secure cloud storage. Although their scheme is very novel, we find that their proposal is not secure in this paper. The malicious cloud server has ability to forge the authentication labels, and thus it can forge or delete the user’s data but still provide a correct data possession proof. Based on the original protocol, we proposed an improved one for the auditing scheme, and our new protocol is effective yet resistant to attacks.

1. Introduction

Since 2007, as one of the most interesting topics in the computer field, cloud computing has experienced rapid development and has become a key research direction for large-scale enterprises and institutions. Its high flexibility, scalability, high performance ratio, and other characteristics make it serve storage, medical, financial, education and other aspects [13]. Among them, cloud storage is an emerging technology developing in cloud computing in terms of data storage [4]. Compared with traditional data storage methods, cloud storage has the advantages of high performance and low cost. Cloud storage uses data storage and data management as its basic functions, allowing users to connect at any location and store local data and information on the cloud, facilitating users’ management of resources.

However, with the widespread application of cloud storage technology, its security has received more and more attention from users and has gradually become the key to the sustainable development of cloud storage technology. On the one hand, cloud service providers (CSPs) may delete users’ data stored in order to free up storage space for their interests or may want to obtain users’ data privacy [5]. On the other hand, the CSP has great openness and complexity, and it is easy to become the central target of various malicious attacks, leading to the loss, leakage, tampering, or damage of users’ data. Therefore, cloud storage integrity audits have emerged to solve the problem. Users regularly audit the integrity of their own data information stored in the cloud, discover whether their data have been discarded or tampered with, and take corresponding remedial measures.

1.1. Related Work

In the early years, cloud audit-related research was mainly about the integrity verification of remote data. Users do not own the original data and can only verify the integrity of the data stored on the cloud server through the protocol. In 2003, Deswarte et al. [6] proposed the first audit scheme that supports remote data integrity verification. The scheme is based on the Diffie–Hellman key exchange protocol using the homomorphic characteristics of RSA signatures and the difficulty of calculating discrete logarithms as a security basis. The entire file is represented by a large number and then subjected to modular exponentiation to achieve remote data integrity audit. However, this solution will generate a great computing overhead, which is a heavy burden for users. In 2006, Filho et al. [7], based on RSA’s homomorphic hash function, used the hash function to compress large data files into small hash values before performing operations. This scheme reduces the expense of calculation, but it is also not suitable for large-scale data storage in a cloud storage environment. The scheme put forward the important role of homomorphic hash function in remote data integrity verification, which is the biggest contribution of it. In 2008, Sebe et al. [8] based on the idea of partitioning to improve the previous scheme. The scheme divides the large data file into blocks and then each data block is calculated, which greatly reduces the computational expense. But the prover still needs to access all the data when generating the evidence, so this scheme is also not suitable for large data files.

The above schemes all require the user as a verifier to maintain a metadata set for verification. On the one hand, it is easy for users to lose or leak these metadata, which leads to the disclosure of private data. On the other hand, for users with limited computing resources, huge outsourcing data will increase the computing overhead in the audit process. In addition, in the event of a data corruption accident, the user or CSP will shirk each other’s responsibilities and cannot provide effective evidence to confirm who should be responsible for the accident. Thus, scholars have introduced an absolutely impartial third-party auditor to audit on behalf of users. Auditors are more professional than users in terms of data preservation and computing performance, and in the event of an accident, they can be held accountable for solving problems in a fair manner. Therefore, the audit scheme has gradually changed from a private audit between users and CSP to a public audit between users, CSP, and third-party auditors (TPAs). In 2007, Shah et al. [9] proposed a public audit scheme based on the difficulty of discrete logarithm calculation to audit ciphertext data and key integrity. The scheme uses a hash function with a key to precalculate a certain number of response values stored by the auditor. During the audit process, the auditor only needs to match the evidence provided by the server with the prestored response value. However, the number of audits in this scheme will be limited by the number of prestored response values.

The amount of calculation required for the integrity audit of all data is not a small expense even for professional third-party auditors. Scholars have been studying how to increase audit efficiency to reduce computational overhead, but from another aspect, reducing the data content that needs to be audited can also achieve the goal. In 2005, Noar and Rothblum [10] proposed an online memory detection scheme. The scheme studied the sublinear authentication and proposed related authentication protocols. The basic idea of the sublinear authentication is to verify the integrity of all the original data by verifying the integrity of a small part of the data block specified randomly. In 2007, Ateniese et al. [11] proposed the first probabilistic provable data possession (PDP) auditing scheme with both safety and practicality. The scheme is based on RSA’s homomorphic authentication label, which realizes the audit of outsourced data. The metadata of multiple data blocks can be aggregated into one value, which effectively reduces the communication overhead, and a random sampling strategy is adopted to check the user’s remote data instead of verifying all the user’s data, so the calculation cost is effectively reduced.

With the continuous improvement of the audit program, some expansion requirements are constantly raised, for example, audit programs that support privacy protection or batch audits. In 2010, Wang et al. [12] proposed an audit scheme supporting privacy protection through the integration of homomorphic authentication tags and random mask technologies for the first time, in which the bilinear signature is used to support batch audits. In 2013, Yang et al. [13] proposed an audit solution based on the index table technology that supports dynamic data update, and the tag aggregation technology is used to process multiple audit requests from different users to support batch audits in a multi-user multi-cloud environment. In 2015, Hui et al. [14] proposed a public audit scheme based on dynamic hash table (DHT), which can record the attribute information of data blocks to support dynamic data update and improve efficiency. The program also supports privacy protection and batch auditing.

In 2007, Juels and Jr [15] proposed an original proof of retrievability (POR) scheme. Different from the PDP scheme described above, the POR scheme can repair the corrupted data when data are detected to be damaged. The scheme uses sampling and error correction codes to perform fault-tolerant preprocessing on outsourced data files, which can restore data with a certain probability when the data are damaged. In 2008, Shacham and Waters [16] proposed a compact POR scheme. The scheme draws on the idea of homomorphic authentication tags and effectively aggregates the evidence into a smaller value, allowing the verifier to perform any number of audits, while also reducing the communication overhead in the verification process. POR and PDP have their own application scenarios. The former can recover damaged data, and the latter is more flexible which can be applied to privacy protection, dynamic auditing, and batch auditing. Cloud auditing schemes are constantly being improved based on users’ needs. When scholars are studying how to reduce computing and communication costs, they also try to expand functions horizontally or combine them with different technologies for innovation. In 2013, Zhao et al. [17] proposed the first identity-based cloud audit scheme, which uses random mask technology to achieve privacy protection. In the identity-based cloud auditing scheme, only the private key generator (PKG) holds a public-private key pair and its public key certificate. The public keys of other users can be calculated based on the identity information, and the private key is generated by PKG, which will reduce the calculation and communication overhead of the scheme. In 2015, Zhang and Dong [18] proposed the first certificateless cloud audit scheme that can resist malicious auditors. Thus, the concept of malicious auditors was introduced into the cloud audit program for the first time. The certificateless cloud audit scheme can solve the certificate management problem in the certificate-based cloud audit solution and the key escrow problem in the identity-based audit solution. In 2016, Xin et al. [19] combined the transparent watermarking technology with the auditing scheme, proposing a scheme to audit the integrity of static multimedia data, which can greatly save multimedia data calculation and storage costs.

1.2. Our Contribution

Recently, an outsourced dynamic provable data possession scheme with batch update for secure cloud storage (ODPDP) was proposed by Wei et al. [20]. However, we find that there are security problems in their scheme. The adversary can easily forge authentication labels. Even if all the outsourced data have been deleted by the cloud server, CSP can still give a correct data possession proof. And malicious auditors do not carry out auditing work but can conspire with the cloud server forging audit log to deceive client. Finally, we propose an improved secure auditing protocol, and roughly analysis shows that our new protocol is secure and can be used in practical settings.

1.3. Organization

This paper is organized as follows. In Section 2, we describe the system model of our scheme. In Section 3, we review Guo et al.’s outsourced dynamic provable data possession scheme with batch update for secure cloud storage. In Section 4, we give our attacks to the original scheme to show that it is not secure. In Section 5, we give our improved secure auditing scheme and roughly analyze its security. Finally, in Section 6, we draw some conclusions.

2. System Model

First of all, for the convenience of understanding, the notations and their corresponding meanings of this paper are described in Table 1.

There are three entities in the system model of ODPDP scheme: CSP (cloud service provider), client, and auditor, as depicted in Figure 1. The following three entities are involved:(1)CSP (cloud service provider): the service provider, which has abundant computing power and physical storage capacity, realizes the maintenance and management of the received data from client. This part is honest and curious.(2)Client: the data owner, which outsources the data that needs to be calculated and stored to the CSP, concern the integrity of the outsourced data, and checks whether the auditor is honest in the audit work regularly.(3)Auditor: the third-party auditor accepts audit task from the client and is responsible for ensuring the integrity of the data of the client stored in CSP.

The protocols used in the ODPDP scheme are as follows: (1)Setup {client:; auditor: ; CSP:}: random key generation protocol. The users input a security parameter K and then it generates pairs of signing-verifying keys (SKP, VKP) for each participant. For the convenience of expression, we assume that all the participants involved in each subsequent protocol always take the owners’ public key and its own secret key as input.(2)Store (client: M) {client: P,C; auditor: P, CT; CSP:P M}: the interactive protocol among the three parties. It takes the keys of the three participants as input and the data M owner by client, and then outputs the processed data for the CSP. Σ is generated by the client through the secret key as the tag vector of M. And for the auditor, it outputs a RBMTT based on M. Besides, it outputs a public parameter P that is confirmed by three participants and a contract C between the client and the auditor.(3)AuditData (auditor: ; CSP: ) {auditor: }: the interactive protocol between the CSP and auditor to make the auditor to be sure that the integrity of M in CSP is in good condition. The auditor takes the functionality of Bitcoin to extract pseudo-random challenge and then sends it to the CSP. The CSP computes a proof of data possession based on and M and then sends it to the auditor for verification. The auditor verifies the proof from the CSP through and then outputs a binary value as a response to indicate whether the auditor accepts the proof or not and a log entry L to record the auditing behavior.(4)AuditLog (client: B; auditor: ) {client: decc}: the interactive protocol between the client and auditor, which can help the client to audit a log file consisting of the log entries recorded by the auditor. The aim of this protocol is to check whether the auditor accomplished the auditing task or not. After the auditor receives the random subset B of the Bitcoin block index released by the client, it calculates the proof of the specified log based on and sends the proof to the client. The client checks the received proof and then outputs a binary value , which indicates if it admits the proof. Compared with the AuditData protocol, the frequency is much lower and the computational efficiency is much higher in this protocol.

3. Review of Guo et al.’s Scheme

In Guo et al.’s scheme, three parties are involved, which are the user, the auditor, and the CSP. In their scheme, they used the rank-based Merkle tree (RBMT) to protect the integrity of data block hashes, while the hash values and tags protect the integrity of data blocks. Then, they proposed a multi-leaf-authenticated (MLA) solution for RBMT to authenticate multiple leaf nodes and their indices all together without storing status value and height value. At the same time, they proposed an efficient homomorphic verifiable tag (EHVT) based on BLS signature to reduce clients’ log verification effort. For the specific implementation of these technologies, one can refer to the original paper [20]. Concretely, the following algorithms are involved in their scheme.

3.1. Setup Protocol

Each participant performs to obtain and . In addition, the client samples random elements and computes . Then, the client chooses a random element , and the secret key and public key are denoted as and .

3.2. Store Protocol

The data file is divided into , and each data block consists of s sectors and has the form such that each sector , where denotes concatenation.

Constructing RBMT. With all data blocks, the client first computes . Then, the client constructs RBMT on top of the ordered hash values, meaning that each leaf node stores the corresponding hash value .

Computing EHVT. Based on , and secret key , the client computes

Then, the client generates the processed data , where .

3.2.1. Outsourcing Data

The client sends and to CSP. CSP verifies , and if the verification is passed, CSP accepts .

3.2.2. Outsourcing Auditing Work

Auditing work is outsourced to the auditor and CSP sends with to the auditor. Then, the auditor verifies .

3.2.3. Agreeing Parameters

A public parameter needs to be agreed on by three participants, where denotes the number of data blocks and denotes the Merkle root of . In addition, the client and auditor also need to agree on a contract , where denotes the auditor’s checking policy. The auditing work will start from a Bitcoin block index , the auditing frequency depends on , and dictates the number of challenged data blocks for each checking.

Then, the client deletes and from its local storage, and she only maintains a constant amount of metadata.

3.3. AuditData Protocol

The scheme leverages the Bitcoin blockchain as a time-dependent pseudo-random source to generate periodic challenges. The auditor inputs the time to obtain a hash value of the latest block that has appeared since time in Bitcoin blockchain. Then, PRBG is invoked on the input to acquire pseudo-random bits, which will be used by the auditor to select a pair of keys . At last, the auditor generates a challenge and sends it to CSP, where the block corresponds to the time.

Upon receiving the challenge , CSP first computes the challenged indices and coefficients as follows:

Then, CSP computes the proof of data possession to prove the integrity of the challenged data blocks as follows:

Finally, CSP responses the auditor with the proof . Then, the auditor verifies the correctness of . First, the auditor computes the challenged indices and coefficients. Second, the auditor computes the value with as follows:

Third, the auditor verifies the proof by checking the following equation:

If the equation holds, the auditor assures that the challenged data blocks are intact. Lastly, the auditor saves the log entry in the log file to record the auditing work as follows:

3.4. AuditLog Protocol

The client chooses a random subset of indices of Bitcoin blocks and sends it to the auditor. Once receiving , the auditor finds ,, and from his log file for each and computes.

In addition, for each , the auditor reads from and computes the challenged indices by invoking . After eliminating the repetitive indices, the last ordered challenge index vector is denoted by . Then, the auditor obtains the corresponding multi-proof . At last, the auditor generates the proof of appointed logs as follows:and sends it to the client with .

After verifying , for each , the client first invokes to get and reconstructs the challenged indices and coefficients , . Then, the client verifies the correctness of . If the verification is passed, it means that all the challenged leaf nodes in are authenticated, and then the corresponding hash value stored in leaf node can be accepted by the client. Finally, with and all authenticated , the client verifies h(B) by checking the following equation:

If this verification passes, the client checks the last equation by using her secret key and the verified as follows:

If the above equation holds, the client assures that the auditor audited CSP for all the past challenged data blocks appointed by B honestly. The correctness of equation can be elaborated as follows:

4. Our Attack

In Guo et al.’s auditing protocol, their security model indicates that the malicious CSP cannot forge false proof to pass the challenger’s verification and the client can resist malicious CSP and auditor collusion attacks. However, we find that we can extract some key information from the client’s , data blocks, and their corresponding tags which are known to CSP. In this section, we firstly show how CSPs extract key information and how to use this information to forge “correct” data blocks and their corresponding tags. Then, we will show how malicious CSP and auditor collude to use false proof to pass the client’s verification.

4.1. Attack I

Our attack is based on the following observation: the public key of the client isand this public key is known to all, and thus the adversary can easily use it to forge authentication label. Concretely, the adversary launches the following attack:(1)In the Store Protocol, CSP can receive from client, which includes the data of the client and its corresponding authentication tags. The adversary can get a large number of authentication tags as follows:(2)The above equations can be rewritten as follows:The CSP knows the data blocks of client, and it can calculate the corresponding hash values through :In order to simplify the attack process, let , , , , and take three linear irrelevant tags as follows:(3)With these equations, the adversary can compute A, B, and C. Concretely, the adversary first computesand then computesNext, the adversary computesFrom this equation, we can knowThe value of B can be obtained by substituting the value of C into formula (3):According to formula (3), the following results can be obtained:The value of A can be obtained by substituting the value of B and C into the above equation.Through the above process, the adversary can obtain the value of . Significantly, when , the value of can be calculated in the same way. In this way, the adversary can get the key parameters of tags.(4)Now, the malicious cloud server modifies data blocksto be any other data blocks(5)The adversary knows the value ofand thus it can compute the forged authentication label for modified data blocks as follows:

4.2. Attack II

Our attack II is based on the following observation: even if the CSP does not store any data blocks, the auditor does not need to carry out the audit work and store RBMT T, and the CSP and the auditor can conspire to generate the correct log file, which makes the client believe that CSP stores data integrally and the auditor performs the audit work honestly. Concretely, the attack is the following:(1): after receiving the client’s data and its corresponding tag collection , the CSP first verifies the correctness of their signatures according to the original scheme. The malicious cloud server can get the value of through doing the same as the attack I. Then, CSP deletes the client’s data and its corresponding tags. After receiving the auditing work and T from client, the auditor does the same as original scheme but deletes T.(2): in this step, the malicious cloud server and the auditor complete the interactive process of challenge and response, but the CSP does not have the real data, and the auditor does not need to complete the verification work.(a)The auditor generates a challenge as original scheme and sends it to CSP.(b)After receiving the challenge , CSP first computes the challenged indices as follows:(c)To generate the proof, CSP randomly chooses gg and and computes a combination of the challenged blocks as .(d)For any , the malicious cloud server computes the forged tags as and aggregates the tags as .(e)CSP responses the auditor with the proof and its signature . Note that CSP also need to send the forged hash value to the auditor.(f)The auditor verifies the validity of , and if it is correct, the auditor does the next calculation. First, with the value of received from CSP, the auditor computes the value as follows:Finally, the auditor does not need expand a lot of computational expanse to verify the proof , but creates the following log directly:(3): here we show that the malicious auditor has the ability to generate correct log file, which can convince the client that he has honestly performed the auditing work and that CSP has honestly stored all data.(a)The auditor finds , , and from his log file for each with the random subset B of indices of Bitcoin blocks and computesThen, the auditor generates the proof of appointed logs as original scheme as follows:and sends it to the client with .(b)The client first verifies the correctness of and and then verifies the following equations:Here we can verify that the forged proof is a valid one if the following equation holds:

5. Improved Secure Auditing Protocol

In this section, we give an improved secure auditing protocol.

5.1. Setup Protocol

Each participant performs to obtain and . In addition, the client chooses s + 1 random elements and computes . The client’s secret keys and public keys are denoted as and .

5.2. Store Protocol

The data file held by the client is divided into n data blocks as , and each consists of s sectors. More precisely, the data block has the form such that each sector , where denotes concatenation.

Constructing RBMT. With all data blocks, the client computes hash values . Then, it constructs RBMT on top of the ordered hash values, meaning that each leaf node stores the corresponding hash value .

Computing EHVT. Based on and , the client computes

Then, the client generates the processed data , where .

5.2.1. Outsourcing Data

The client sends and to CSP. CSP verifies ; if passes, CSP accepts .

5.2.2. Outsourcing Auditing Work

The client outsources auditing work to the auditor by sending with . If is passed, the auditor accepts .

5.2.3. Agreeing Parameters

A public parameter needs to be agreed on by three participants, where is the total number of data blocks and is the Merkle root of . In addition, the client and auditor need to further agree on a contract that specifies the checking policy for the auditor. denotes a Bitcoin block index from which the auditing work starts, denotes the auditing frequency, and denotes the number of challenged data blocks for each auditing. Now the client deletes and from the local storage.

5.3. AuditData Protocol

The scheme leverages the Bitcoin blockchain as a time-dependent pseudo-random source to generate periodic challenges. The auditor first inputs the time to obtain a hash value of the latest block that has appeared since time in Bitcoin blockchain. Then, PRBG is invoked on the to obtain long enough pseudo-random bits, which will be sequentially used by the auditor to select a pair of keys . At last, the auditor generates a challenge and sends it to the CSP, where the block corresponds to the time .

Upon receiving the challenge , the CSP first computes the challenged indices and coefficients as follows:

Then, the CSP computes the proof of data possession to verify the integrity as follows:

Finally, the CSP responses the auditor with the proof , and then the auditor verifies the correctness of . First, the auditor computes the challenged indices and coefficients. Second, with the corresponding hash values stored in his local , the auditor computes the value

Third, the auditor checks the following equation to verify the proof:

If the equation holds, it means that the challenged data blocks are intact. Lastly, the auditor creates the following log entry that records his auditing work:and saves it in his local log file . The correctness of the equality can be elaborated as follows:

5.4. AuditLog Protocol

The client chooses a random subset of indices of Bitcoin blocks and sends it to the auditor. Once receiving , the auditor finds , , and from his log file for each and computes

In addition, for each , the auditor reads from and computes the challenged indices by invoking . After eliminating the repetitive indices, the last ordered challenge index vector is denoted by . Then, the auditor runs to obtain the corresponding multi-proof . At last, the auditor generates the proof of appointed logs as below:and sends it to the client with .

After verifying , for each , the client first invokes to get and reconstructs the challenged indices and coefficients , . Then, the client verifies the correctness of by calling , where can be obtained by utilizing her own constructed indices for all . If the verification is passed, it means that all the challenged leaf nodes in are authenticated. Finally, with all authenticated , the client verifies h(B) by checking the following equation:

If this verification passes, the client checks the last equation by using and :

If the above equation holds, it means that the auditor audited the past challenged data blocks appointed by B honestly. The correctness of the equation can be elaborated as below:

In our improved protocol, the construction of data tags isinstead of

Therefore, the adversary cannot forge the authentication tags as attack I. Furthermore, malicious CSP and auditor cannot conspire to deceive the client through attack II.

6. Conclusion

In this paper, we point out that Guo et al.’s outsourced dynamic provable data possession scheme with batch update for secure cloud storage is not secure. The authentication tags can be easily forged, and thus the cloud server can modify or delete the data arbitrarily, and the auditor cannot carry out auditing work. In all these attacks, the cloud server can still give correct data possession proofs, and the auditor can still give correct auditing log files. Finally, an improved secure cloud storage auditing protocol is given. We remark that Guo et al.’s outsourced dynamic provable data possession scheme with batch update for secure cloud storage is very novel but has some design flaw, and we hope similar shortcoming can be avoided in future scheme designs to improve the security of public auditing protocols.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This study was supported by the National Key Research and Development Program of China (grant no. 2017YFB0802000), National Natural Science Foundation of China (grant nos. U1636114 and 61572521), Foundation of Guizhou Provincial Key Laboratory of Public Big Data (no. 2019BDKFJJ008), Engineering University of PAP’s Funding for Scientific Research Innovation Team (grant no. KYTD201805), and Engineering University of PAP’s Funding for Key Researcher (no. KYGG202011).