Abstract

Cloud storage technology is evolving at a high speed; effectively auditing the cloud data’s integrity has become a focal point. Recently, Ming and Shi proposed a certificateless integrity auditing scheme with a privacy protection function. The scheme used the certificateless cryptosystem to solve the certificate management problem of the auditing schemes based on public key infrastructure and the key escrow problem of the identity-based auditing schemes. Although their scheme is novel and efficient, we found that their scheme was not secure and could not achieve integrity auditing of cloud data. The malicious cloud server can generate the proof through the blocks and tags sent by the user. On the basis of the original scheme, we propose an improved auditing scheme; our new scheme is more secure and effective. In addition, for the problem of idle tags in the existing cloud data integrity auditing scheme, we propose the idea of intermediate tags and we applied the idea to the improved scheme to improve audit efficiency.

1. Introduction

Users are increasingly inclined to store data in the cloud to obtain more convenient data management services. Cloud service providers (CSP) centrally hold massive amounts of users’ data. For an attacker, a successful attack on the cloud server will gain a great deal. Therefore, it is easy for CSPs to become the targets of centralized attacks. The dishonest CSPs may also deliberately delete users’ data to reduce their own storage burden or deliberately conceal security incidents that damage data integrity to maintain their own reputation. Therefore, the cloud data integrity audit schemes are proposed to effectively solve problems [1].

Motivation: We note that the existing audit schemes require users to calculate data tags corresponding to all data blocks when preprocessing data blocks and upload them to the CSP for storage. However, in the auditing process, generally few tags are used to generate the proof. Once the proof is verified, it can ensure that each data block specified by the auditor is complete and guarantee all original data’s integrity with a high confidence probability. For 1,000,000 data blocks with a size of 4 kB, assuming that the server deletes or is tampered with of the data blocks, the auditor only needs to audit 460 data blocks, which can be higher than confidence probability [2] to judge the integrity of all the data. Therefore, most of the data tags are idle during the audit process. Suppose it is an application scenario where data blocks are frequently updated [3]; a large number of tags are calculated and stored in the cloud, but they will be updated as the data blocks are updated before they are used, resulting in larger computing and storage resources waste. To solve the idle tags problem, we propose the idea of intermediate tags. Before uploading the data, when processing the data, users only generate the key intermediate tags and then upload them to CSP. When the third-party auditor (TPA) challenges the cloud data, CSP generates complete certification tags for the challenged data block. Then they enter the normal audit process.

Recently, Ming and Shi [4] proposed a certificateless auditing scheme called CLPDP that supports privacy protection. In their scheme, CSP can use tags to easily forge the proof. Even if all outsourced data are deleted by CSP, it can still give the correct proof to pass the audit. So we point out the security problem in their scheme. In addition, we find that the original scheme is one that can apply the idea of intermediate tags. Therefore, we also improved the original scheme.

Our contributions are as follows:(1)We analyze Ming and Shi’s scheme and find the security problem. CSP can forge the proof to pass the audit. Then we described the attack method in detail.(2)We propose the idea of intermediate tags, which can reduce the computing overhead of users in audit schemes. After improving the safety of the original scheme, we use the idea of intermediate tags to promote the original scheme.(3)We performed security analysis on the improved scheme, and we proved that the improved scheme is secure. The efficiency of the improved scheme is also analyzed and compared with that of the original scheme. The improved scheme is more efficient, which proves the applicability of intermediate tags.

Early data integrity audit schemes required users to download all their stored data and verify the downloaded data locally. However, most users store a large amount of data, so it requires high communication, storage, and computing costs for users to download all data for verification, and users generally cannot meet such requirements. Ateniese et al. [2] formally defined the Provable Data Possession (PDP) scheme. When verifying the integrity, users divide data files into blocks, and only partial data blocks are downloaded. Finally, the integrity of all data can be verified with a very high confidence rate. This method enables users to complete the audit task without downloading complete files, reducing the huge communication cost in the process of a data integrity audit. In 2013, Wang et al. [5] introduced TPA into the data integrity audit system. Users can further reduce their own expenses by outsourcing audit tasks to TPA. At present, scholars add various functions to the basic data integrity audit scheme [2] to meet the requirements of different application scenarios.

Users will inevitably need to change their data after uploading data files. Therefore, the cloud data’s content should be allowed to change dynamically. Considering the urgent need for data integrity audit schemes in the dynamic update, scholars put forward audit schemes with dynamic update functions. Dynamic data update has gradually become the basic function in cloud data integrity audit schemes, which is indispensable in the application of real scenarios. Existing data structures applied to dynamic data updates mainly include Index Switcher, Index Hash Table, Merkle Hash Tree, Skip List, Dynamic Hash Table, Red-Black Tree, etc. In the construction of a data integrity verification scheme supporting dynamic data updates, the difficulty lies in solving the problem of extra computation costs caused by index change.

Jin et al. [6] constructed the mapping from the data block index to the tag index and designed the Index Switcher data structure to avoid the extra computational overhead caused by tag recalculation. In addition, the dispute arbitration function is added to the proposed audit scheme to ensure that users or the cloud will not commit improper acts during the audit process. Tian et al. [7] proposed the audit scheme supporting dynamic updates, privacy protection, and batch audit. The dynamic hash table data structure is designed to realize fast audit and efficient data updates by recording the attributes of files and data blocks at the audit ends. Shen et al. [8] proposed the whole/sampling audit method to solve the problem of distrust between users and the cloud and designed a double-linked information table to achieve efficient data update. Their scheme also supported the batch audit function. Guo et al. [9] constructed a multileaf authentication method based on the Merkle tree, which can simultaneously authenticate multiple leaf nodes and corresponding indexes and realize batch data updates. The scheme supports log auditing. By checking the log files generated by auditors, users can verify whether the auditors perform their audit work honestly. The public audit protocol designed by Hou et al. [10] supports blockless verification and batch verification. The scheme uses the chameleon authentication tree to realize the efficient and dynamic operation of outsourced data and reduces computing costs and improves the audit efficiency. Mishra et al. [11] used a binomial binary tree and indexed hash table data structure to construct an audit scheme supporting batch audit and efficient dynamic update based on BLS signature.

The reliability of data is the basis of its value and benefit. After the reliability of data is solved, other problems of data such as consistency, practicality, and availability are meaningful. Multicopy storage is the most straightforward and simple way to improve reliability. CSP provides storage services at low prices. Users can use the massive storage space it provides. More and more users choose multicopy storage to obtain more availability of data. The audit schemes supporting dynamic manipulation of multiple replicas while ensuring data integrity remain to be explored and further investigated. Curtmola et al. [12] constructed the first multicopy audit scheme, in which each copy can generate a corresponding integrity proof against challenges, and storing multiple copies is more efficient than storing each copy individually. Liu et al. [13] constructed the multiple-copy audit scheme supporting data dynamic updating. The Merkle hash tree node used in their scheme contains the node level parameters, which are allocated to each data block. It is more efficient when verifying multiple replica updates. The audit scheme of Guo et al. [14] reduces the storage burden of CSP by sharing an authenticated identity tree among multiple copies. The scheme supports multicopy and batch auditing, which also reduces the computational cost. Yaling and Li [15] proposed a flexible multicopy PDP scheme based on the characteristics of a multibranch tree. Their scheme ensures the integrity and reliability of multiple copies and implements the verification of any copy and supports dynamic update operation and privacy protection.

In recent years, in order to optimize audit performance and improve update efficiency, batch audit and batch update have become indispensable functions of cloud data integrity audit schemes. Qi et al. [16] applied the rank-based Merkle hash balanced tree to integrity verification and improved the dynamic update’s efficiency. Deng et al. [17] implemented batch auditing using BLS signature and rank-based Merkle hash tree.

Later, scholars introduce TPA to perform a public audit on behalf of users to reduce the computation cost. However, TPAs are often not fully trusted [18], which can lead to the disclosure of users’ privacy [19]. Li et al. [20] solved the key management problem based on fuzzy identity. The scheme took the user’s biometrics as the identity and designed a corresponding audit protocol to protect the data content. Wang et al. [21] scheme uses a ring signature to calculate the metadata required for verification. The authenticator and random mask technology are used to protect data privacy; the scheme can also realize batch audits. The audit scheme of Wang et al. [22] is based on an algebraic signature and integrates forward error correction codes to enhance data possession assurance and recover data when a small number of blocks are deleted, thus significantly reducing communication complexity.

With the development of blockchain technology, many scholars apply blockchain technology to cloud data integrity audits [23]. The certificateless audit scheme proposed by Zhang et al. [24] can resist malicious TPA; the scheme uses Bitcoin as the source of pseudorandom numbers to help generate challenging information. Li et al. [25] proposed a lightweight audit scheme with blockchain technology for integrity audit. In their scheme, the user and CSP are set as two mutually untrustworthy entities, and the TPA is removed. After the user stores the lightweight verification tags into the blockchain, the Merkle hash tree is constructed through the tags to generate the proof, so as to save computational power. Yang et al. [26] provided the mutual blockchain for outsourced cloud data and proposed an incentive mechanism based on credit, so that CSPs can supervise each other, which prevents collusion and realizes public audit efficiently. Yang et al. [27] proposed a certificateless multicopy and multicloud data public audit scheme based on blockchain technology. Their scheme leverages the unpredictability of blocks in the blockchain to build fair challenge information, preventing malicious auditors from colluding with CSP to deceive users. Wang et al. [28] used blockchain to replace TPAs and designed a blockchain-based fair payment smart contract for a cloud data audit. In their scheme, users and CSP will run blockchain-based smart contracts to ensure that the cloud periodically submits data to the cloud with proof of possession. Only after verification can the CSP be paid. Wei et al. [29] built a blockchain integrity protection mechanism. The scheme deploys the distributed virtual machine agent model on the cloud allowing multitenant collaboration and achieving reliable storage, monitoring, and verification tasks. Reference [30] proposed a protection model based on a private chain, which synchronously uploads modification records of files and hash values of files to blockchain for storage and judges whether the data is complete by comparing hash values.

Quantum computers use qubits to represent many possible states of 1 and 0 at the same time and have more processing power than standard computers. Most cloud storage data auditing schemes are based on a traditional cryptosystem. However, with the introduction of algorithms such as quantum large number decomposition, the traditional cryptosystem loses its security. Lattice-based cryptography is generally considered to be effective against the quantum attack. Xu et al. [31] designed the first lattice-based cloud data audit scheme based on the small integer solution problem. The audit scheme designed by Liu and Cao [32] supports public verification but does not provide strict security certification. Zhang et al. [33] designed an ID-based public audit protocol based on lattice by using ID-based signature technology and further provided a solution to solve the key exposure problem [34], which protected user data’s privacy. In addition, TPA cannot obtain information about users’ data during audit verification. Sasikala and Shoba Bindu [35] designed a lattice-based certificateless public auditing protocol for the first time, but it was pointed out by [36] that the scheme had security problems.

Organization: We organize our paper as follows. In Section 3, we reviewed the certificateless privacy protection secure cloud storage scheme of Ming and Shi. Section 4 describes the attack against the original scheme. In Section 5, we propose the concept of intermediate tags and give an improved audit protocol. In Section 6, the security and performance of the improved scheme are analyzed to prove that it is safer and more efficient. Finally, in Section 7, we summarize our work.

3. Review of Ming and Shi’s Scheme

The system model of Ming and Shi is shown in Figure 1, including a key generation center (KGC), a data owner (DO), CSP, a data user (DU), and TPA. Figure 1 shows their system model. To facilitate understanding, we define and explain the various symbols and variables that appear in our paper in Table 1.

Specifically, the following is the operation process of the original scheme:(1)Setup: KGC first selects the cyclic group on the elliptic curve , defines the large prime number with the order of , and selects the generator . Then it selects the secure hash function , selects a random as the system master key, and calculates . Finally, KGC keeps the master key in secret and exposes the parameters.(2)PartialKeyGen: DO sends the real identity information to KGC. After KGC receives , it selects a random number and calculates , .Then KGC sends DO’s virtual identity to DO and randomly selects .alculates , , . Finally, KGC sends DO’s partial keys to DO.(3)SecretValueGen: DO randomly chooses and obtains complete private key .(4)PublicKeyGen: DO calculates and obtains the complete public key .(5)TagGen: the data file is divided into blocks by DO as , where . DO selects a random number and calculates , , , and for , where is the identifier of . Thus the tags are generated by DO; they are sent with the data blocks to CSP. DO deletes the local data and tags.(6)Challenge: after receiving DU’s audit request, TPA generates the challenge message. It first selects a random subset in . The subset includes elements. For , TPA randomly selects ; then it sends as the challenge message to CSP.(7)ProofGen: CSP calculates and after it receives ; then it sends as the proof to TPA.(8)Verify: after receiving , TPA calculates . Then it calculates and for and verifiesIf equation (1) holds, DO’s data is complete.The proof of the correctness of equation (1) is as follows:

4. Our Attack

In the scheme of Ming and Shi, we find that the CSP can calculate the value of the aggregated data blocks needed at the stage. In this way, even if the CSP deletes DO’s cloud data, the correct data possession proof can be generated by it at the stage and passed the audit. In this section, we show two types of attacks; the process by which CSP forges the “correct” blocks is also introduced.

4.1. The First Type of Attack

The first attack is caused by a design error in the verification equation; the detailed description is as follows:

Assume that the entities in the scenario run the audit scheme following the process described above; when the scheme progresses to the ProofGen stage, CSP needs to generate the proof . We note that, in equation (1), CSP can obtain all values except and , so CSP just needs to randomly select ; it can obtain by calculating equation (1). Similarly, CSP can also calculate the value of by calculating equation (1) when it randomly selects . Thus, CSP does not need to store DO’s data to generate the proof that satisfies equation (1).

4.2. The Second Type of Attack

At the TagGen stage, the CSP receives blocks and tags. CSP first calculates , so it gets the following equations: and are DO’s public keys, CSP knows the values of , , and , it can also calculate the value of and for , and then it obtains for to calculate the following equations:

At the ProofGen stage, the CSP needs to calculate

Even if CSP deletes , it can calculate the value of with , …… , which can pass the audit.

5. The Improved Auditing Scheme

In this section, we first explain what an intermediate tag is and how to set an intermediate tag; then we give an improved secure auditing scheme.

We first analyze the probability of misbehavior detection in existing PDP schemes. For 4 KB data blocks, we assume that of the data blocks’ integrity is damaged; TPA can specify 460 data blocks to obtain a confidence probability higher than . We set as the data blocks’ total number, as damaged data blocks’ number, and as randomly challenged data blocks’ number during the audit. We set a random variable representing the number of corrupted blocks in the challenged blocks; represents the corresponding probability. We have the deduction as follows:

Because , so:

In the case of , when is 300, 460, and 688, is greater than , , and , respectively. Therefore, in an audit process, few data blocks are challenged, and the relevant tags are used to generate the proof. Most of the other data blocks and relevant tags are idle.

Assuming that there are total data blocks and tags stored in the cloud, 460 of them are challenged in each audit, and the challenged data blocks are different in multiple audits. Then it takes about 2173 audit times to use all the data blocks and corresponding tags. In practical applications, due to the user’s demand for data update, many idle blocks and corresponding tags are modified and updated before they can be used, resulting in a large waste of computing overhead.

Therefore, we propose the idea of intermediate tags: at the TagGen stage, users only generate intermediate tags composed of the private key and data blocks, instead of calculating mature tags used by CSP when generating evidence, which reduces the calculation overhead of users. At the ProofGen stage, CSP calculates mature tags of only a few challenged data blocks according to the challenge information from the TPA and uses them to generate the proof. The idea of intermediate tags is applied to the following improved scheme:(1)Setup: KGC first selects the cyclic group on the elliptic curve , defines the large prime number with the order of , selects as the generator, as hash functions, and a random as the system master key, and calculates . Finally, KGC keeps in secret and exposes the public parameters.(2)PartialKeyGen: DO sends the real identity information to KGC. After KGC receives , it selects a random number and calculates , . Then KGC sends DO’s virtual identity to DO and randomly selects and calculates , , . Finally, KGC sends DO’s partial keys to DO.(3)SecretValueGen: DO randomly chooses and obtains complete private key .(4)PublicKeyGen: DO calculates and obtains the complete public key .(5)TagGen: the data file is divided into blocks by DO as , where . DO randomly selects , and calculates , . Note that here we have simplified the formula for calculating , and the intermediate tag in the improved scheme is different from the mature tag in the original scheme. Thus the tags are generated by DO; they are sent with the data blocks to CSP. DO sends to TPA and deletes the local data and tags.(6)Challenge: after receiving DU’s audit request, TPA generates the challenge message. It first selects a random subset in . The subset includes elements. For , TPA randomly chooses ; then it sends to CSP as the challenge message.(7)ProofGen: CSP calculates and for after receiving . Then it calculates , as the proof and sends to TPA.(8)Verify: after receiving the proof , TPA calculates and calculates , for . Then, it verifies

If equation (8) holds, DO’s cloud data is complete.

The correctness of equation (8) is as follows:

6. Analysis of the Improved Protocol

In this section, we first demonstrate that the improved scheme can resist the above attacks. Then the improved scheme’s performance is analyzed. We also compare the computation overhead in two schemes, so as to prove that our improved scheme is more efficient.

6.1. Security Analysis

CSP holds the following equations in the improved scheme:

In equation (10), and are unknown to CSP; it always has more unknowns than equations, so CSP cannot solve the equations to calculate the values of and . At the stage, CSP cannot know . When CSP uses the second of the above attacks, it can list the following equations:

Since CSP does not know the value of , it cannot compute the value of . When generating , CSP can not calculate the value of with the tag uploaded by DO. Only when the are stored correctly and completely by CSP, can CSP generate the correct and pass the TPA audit.

When CSP uses the first of the above attacks, after randomly selecting one of the values of and , it attempts to obtain the value of the other variable by calculating equation (1). But in equation (9), is unknown to CSP, and CSP cannot compute from and equation (1) or compute from and equation (1).

6.2. Performance Analysis

The idea of intermediate tags is to save computing overhead for DO. The difference of storage and communication costs between two schemes is small, so we mainly analyze the computing costs of the two schemes.

In the original scheme, at the TagGen stage, DO needs to calculate , , , , and the calculation cost is . At the ProofGen stage, CSP calculates and , set as the number of elements in , and the calculation cost is . At the Verify stage, TPA calculates , for , calculate , and equation (1), the and calculation cost is .

In the improved scheme, at the TagGen stage, DO only needs to calculate , , and the computational cost is . At the ProofGen stage, CSP calculates , for each challenged block. Then, it calculates , , and the calculation cost is . At the Verify stage, TPA calculates , calculate , for , and equation (1) is also calculated. The calculation cost is . The computational costs of the two schemes at each stage are compared as Table 2.

As we can see from Table 2, in the improved scheme, DO reduces the computational overhead of at the TagGen phase. At the ProofGen phase, CSP needs to bear the extra computation overhead of . At the Verify phase, TPA needs to bear the extra computation overhead of . Notice that the value of is much larger than the value of , the extra computing overhead borne by CSP and TPA is far less than the reduced computing overhead by DO, and the improved solution is more user-friendly and more efficient.

7. Conclusion

In this paper, we point out that Ming and Shi’s scheme is insecure. The aggregated data blocks required for the audit are easy to forge. CSP can provide the correct integrity proof after modifying or deleting the data, and TPA will give the correct integrity audit results. In addition, to solve the idle tags problem in the existing audit schemes, we propose the idea of intermediate tags, which can save computing power for users. Finally, we apply the idea to the improved scheme and upgrade the original scheme on security to solve the security problems of the Ming and Shi’s scheme and improve the audit efficiency. We hope that our idea of intermediate tags can be used by more scholars to construct more efficient audit solutions and the security issue pointed by us can be avoided when they design the scheme.

Data Availability

The datasets of this article are available on request from the authors.

Conflicts of Interest

The authors declare no conflicts of interest.

Authors’ Contributions

Xiuguang Li and Ruifeng Li are responsible for the writing of the article and the construction of the improved scheme, Xu An Wang is responsible for the derivation of the formulas in the article and gives some significant ideas, Ke Niu is responsible for the verification of the security of this article, Xiaoyuan Yang is responsible for the polishing of the language of the article and the collection of the information related to this article, and Hui Li revised the finished manuscript.

Acknowledgments

This work was supported by National Key Research and Development Program of China (no. 2017YFB0802000); National Natural Science Foundation of China (nos. 62172436, 62102452, and 61732022); National Natural Science Foundation of China Key Program (U1836203); State Key Laboratory of Public Big Data (no. 2019BDKFJJ008); Engineering University of PAP’s Funding for Scientific Research Innovation Team (no. KYTD201805); and Engineering University of PAP’s Funding for Key Researcher (no. KYGG202011).