Abstract

Cloud storage can provide a way to effectively store and manage big data. However, due to the separation of data ownership and management, it is difficult for users to check the integrity of data in a traditional way, which leads to the introduction of the auditing techniques. This paper proposes a public auditing protocol with a self-certified public key system using blockchain technology. The user's operational information and metadata information of the file are formed to a block after verified by the checked nodes and then to be put into the blockchain. The chain structure of the block ensures the security of auditing data source. The security analysis shows that attackers can neither derive user’s secret key nor derive users’ data from the collected auditing information in the presented scheme. Furthermore, it can effectively resist against not only the signature forging attacks but also the proof forging attacks. Compared with other public auditing schemes, our scheme based on the self-certified public key system has been improved in storage overhead, communication bandwidth, and verification efficiency.

1. Introduction

Cloud storage, which provides a way to effectively store and manage big data [1], is an important branch of cloud computing. Because cloud storage has superiorities of low cost, scalable, location-independent, and high performance [24], more and more individuals and businesses tend to outsource their data to the cloud. Although the advantages of cloud storage services are many and huge, it still faces a variety of security challenges [414].

For example, the security of data sharing and storage in the same group is an urgent issue to be solved in the cloud environment [6]. In other words, since the cloud users lose the management of data, a cloud service provider (CSP) must satisfy users’ need for the security of stored data [7]. And users cannot verify the integrity of their data with traditional methods owing to the trust gap between users and CSP. In addition, cloud storage also faces many internal and external security threats [810]. Firstly, malicious attackers might do their best to retrieve users’ outsourced data, even to destroy and delete the outsourced data. Then, the confidentiality, integrity, and availability of users’ stored data are destroyed. Secondly, the user’s outsourced data might also be illegally manipulated by CSP. For instance, CSP may selectively conceal certain errors in user’s outsourced data due to Byzantine failures [11]. Furthermore, CSP might deliberately delete data that are rarely accessed by ordinary users in order to reduce storage space and save bandwidth [12, 13]. Finally, users may not be able to timely know the data changes and they may lack trust on CSP. Then, disputes arise, although those disputes may be caused by users’ own improper operations [14]. Therefore, it is critical and significant to develop efficient data auditing techniques to check the confidentiality, integrity, and availability of stored data.

After data are outsourced to the cloud, users would delete local data and lose the management of outsourced data. Therefore, users can use audit technology to remotely verify whether the outsourced data are correct. The most core challenge of cloud data auditing is how to efficiently check the cloud data integrity. To address this problem, a proof of retrievability (PoR) protocol [15] and a provable data possession (PDP) protocol [16] have been presented in 2007, respectively.

In typical PoR protocol, the user first encodes the data file with error-correcting code before outsourcing data to the CSP. Therefore, the user can reconstruct the entire file from the CSP’s partial response. However, PoR protocol is applicable for static data. And it does not support third-party auditing and is a typical private auditing scheme. In private auditing, remote verification operation is performed directly between user and CSP. The user is the only source of verification results, while CSP and users do not trust each other and users cannot provide convincing auditing results for verification. Furthermore, user’s burden is increased due to insufficient computing resources. Since one of the important motivations of outsourcing data is to reduce the user’s burden of storage management, it is not recommended that users audit their data frequently.

To address this problem, a PDP protocol was first provided by Ateniese et al. [16]. In PDP, the RSA-based homomorphic authenticator is employed to check the data integrity and an independent authorized third-party auditor (TPA) was introduced. TPA can not only provide independent audit results but also bear the communication overhead and computation costs. Compared with PoR, PDP makes the process of verification more convenient and efficient and is more suitable for public auditing [5, 7, 10, 11, 1625].

The public audit has advantages over private auditing, so it has attracted much attention of researchers. Since the idea of public auditing was raised in 2007 [16], a lot of auditing protocols have been designed in recent years [10, 12, 1828].

In 2010, Wang et al. [22] also provided a similar architecture for public audit scheme with privacy-preserving property. To overcome the data leakage to the TPA, CSP integrates the aggregate value of the data blocks with random masking. However, the lack of strict performance analysis has greatly affected the practical application of the scheme. Furthermore, the length of data block must be equal to the size of cryptosystem. That means the storage space of tags generated for data blocks must be equal to the size of the original file [26]. This shows that the efficiency of the presented public auditing scheme is low. In order to improve the efficiency, Wang et al. [10] extended the above auditing protocol to multiuser settings. The extended protocol can support batch verification. However, the expected goal has not been achieved because the implementation of verification and updation brings higher computing and communication costs to TPA [27]. In 2011, Wang et al. [12] implemented complete data dynamics by using a Merkle hash tree (MHT), while the implementation of verification and updation also makes communication cost of protocol higher [21].

In 2013, Wang et al. [10] found that there was a risk of leakage of data information in the proposed scheme with public auditability [16]. Then, they designed a privacy-preserving scheme, which combined homomorphic linear authenticator (HLA) and random masking technique. Nevertheless, the designed scheme does not have the ability to protect the identity privacy of signers [28]. In order to reduce the computational cost and communication overhead, Zhu et al. [18] proposed a new public audit scheme based on index hash table (IHT), which is employed to organize the data properties for auditing. However, the index table is a sequence table. If you need to locate a certain element, it will take an average of half the total length of the table. This resulted in very efficient update operations, such as insertion and deletion [21]. In addition, these update operations would inevitably change the serial numbers of some blocks. Then, it is necessary to recalculate the tags of those blocks. In this way, CSP would require more extra computational costs and unnecessary communication overhead [19].

Then, Tian et al. [19] designed a public auditing protocol based on dynamic hash table (DHT) to support data dynamics, which claims to address the problem in Zhu’s scheme [18]. The dynamic hash table is a single linked sequence table. Though the proposed public auditing protocol is efficient, there are still some drawbacks in this scheme. Firstly, because time stamps for verification are generated by the user and TPA only serves the user, CSP may suffer from the collusion attack launched by the user and TPA [21]. Secondly, there is no index switcher in the proposed scheme. Then, the relationship between the index number and the serial number of a certain data block cannot be clearly known. Finally, the proposed protocol still has relatively high computational costs.

In addition, Shen et al. [21] designed a novel public auditing protocol based on a new dynamic structure to overcome the drawbacks in [18]. The proposed dynamic structure consists of a doubly linked info table and a location array. Though the above protocols can effectively achieve public auditing, search operations in those schemes are relatively inefficient in the verification phase and the updating phase [27].

In 2018, Jin et al. [20] presented a scheme by employing an index switcher. Then, the relationship between the index number and the tag number of a certain data block can be clearly known. And there is no need to recalculate the tags caused by block update operations. Nevertheless, the index switcher needs to be periodically transformed among the systems, which will inevitably result in huge extra costs. Moreover, such an index switcher is not a complete structure. And how to switch between the two constituent tables is not explained in the proposed scheme [21].

In 2019, Ding et al. [29] proposed a public auditing protocol that is intrusion-resilient to mitigate the damage caused by key exposure problems. The protocol divides the lifetime of files stored in the cloud into several periods, each of which is further divided into several refreshing periods. The auditing key is updated every time period, and the secret value used to update the auditing key changes during each refreshing period. These two update operations are performed by the client and the third-party auditor (TPA).

In 2020, Garg et al. [30] proposed an efficient data integrity auditing method for cloud computing. The objective of this protocol is to minimize the computational complexity of the client during the system setup phase. Based on the properties of bilinear pairings, the protocol is publicly verifiable and supports dynamic manipulation of data. The security of the protocol depends on the stability of the calculation of the Diffie–Hellman problem (CDHP) in the random oracle model (ROM).

The nature of blockchain is particularly suitable for data accounting and auditing. Because of its shared and fault-tolerant database, it has attracted the interest of the research community. Blockchain uses cryptography to build trust in peers to protect interactions of them. Meanwhile, it adopts consensus algorithm to ensure the block data are not changed, which is very suitable for data security in the cloud. In the past few years, some cloud security schemes based on blockchain have been proposed. Li et al. proposed a security framework for cloud data audit using blockchain technology, in which user’s operational information on the file is formed to a block after validated by all checked nodes in the blockchain network and then put into the blockchain [31]. Linn et al. proposed a data auditing framework for health scenarios based on blockchain, in which blockchain was used as an access moderator to control the access to outsourced shared data [32]. Fu et al. introduced a privacy-aware blockchain-based auditing system for shared data in cloud applications [33]. Ghoshal et al. proposed an auditing mechanism based on blockchain structure, in which any user can perform the validation of selected files efficiently [34]. Fu proposed a blockchain-based secure data-sharing protocol under decentralized storage architecture [35]. Miao et al. proposed a decentralized and privacy-preserving public auditing scheme based on blockchain (DBPA), in which a blockchain is utilized as an unpredictable source for the generation of (random) challenge information, and the auditor is required to record the audit process onto the blockchain [36]. Li et al. proposed a public auditing scheme with the blockchain technology to resist the malicious auditors [37]. In addition, through the experimental analysis, we demonstrate that our scheme is feasible and efficient. Due to the limited capacity of blocks in the blockchain, only very important security information is considered to be stored in blocks; otherwise, the system performance will not be acceptable.

This paper proposes a public auditing protocol with a self-certified public key system using blockchain technology. The chain structure of the block ensures the security of auditing data source. Taking the security and efficiency into account, a novel public auditing scheme for cloud data is proposed in this paper based on a self-certified public key system. The contributions of this paper are as follows. Firstly, recent related public auditing protocol are introduced. Secondly, we propose a public auditing protocol with a self-certified public key system using blockchain technology, in which the security and efficiency are taken into account. Finally, we conduct detailed theory analysis of the security and efficiency of the new scheme.

The outline of the paper is as follows: the research background and necessary preliminaries for the new public auditing system are firstly introduced. In the latter, the corresponding algorithm of the proposed scheme is described. Then, the security and efficiency of the new scheme are comprehensively analyzed from four aspects. Finally, a few concluding remarks are given in the last section.

2. System Model and Desired Objectives

In general, our public auditing scheme includes the following four entities: CSP, TPA, user, and blockchain. The system model is shown in Figure 1.

CSP, who has large-scale computing and storage resources, provides users with on-demand data storage services. CSP is considered as an untrustworthy party. For their own self-interest or maintaining their reputations, CSP may choose to conceal the data errors from the users. To reduce the amount of storage space and save bandwidth, CSP may deliberately delete some data that users rarely access. Furthermore, the CSP may launch some attacks on TPA. For example, CSP may try to forge some legitimate data blocks and their corresponding tags in order to pass verification phase.

TPA, who undertakes audit tasks for users, provides fair and objective audit results. TPA is supposed to be credible but curious. More concretely, TPA can perform auditing credibly in the verification phase, but it may be curious about the privacy information of users’ data and even may try to derive the users’ data contents.

User, who has large amounts of data, outsources the data to the cloud. Then, he/she can enjoy the reliability of data storage and high-performance services. The maintenance overhead can also be reduced. However, due to the loss of the management of outsourced data, users will have a strong desire to periodically check the integrity and correctness of those data.

We use blockchain to store user’s operations on the file and metadata information of the uploaded file. The system does not care where files are stored but only stores a file URL in metadata file. We take advantage of the blockchain’s tamper-resistant nature to ensure the reliability of operation logs and file metadata. Metadata information is used to audit the integrity of the data, and the analysis operation log can be used for behavioral audit.

Based on the above description of public auditing scheme model, the desired objectives to be achieved must be given for designing a secure and efficient public auditing scheme.Public Auditing. Any authorized TPA is allowed to verify the correctness and integrity of user’s data stored in the cloud.Blockless Verification. During the verification process, TPA does not need to audit cloud data by retrieving the data blocks.Storage Correctness. CSP, who does not store the intact data as required, cannot pass the audit.Privacy Preserving. TPA cannot derive users’ data contents from the collected auditing information during the verification phase.Batch Auditing. TPA can efficiently deal with multiple audit tasks from different users. It not only reduces the number of communications between TPA and CSP during the auditing phase but also enhances the verification efficiency [17, 23].Lightweight. The public auditing scheme should have less communication overhead and lower computation cost.

3. Preliminaries

3.1. Self-Certified Public Key

The notion of self-certified public key (SCPK) was first introduced by Girault [38]. The user’s public key is derived from the signature of the user’s secret key with his/her identity in the SCPK system. The signature is signed by the system authority using the system’s secret key. And the user’s identity, public key, and secret key satisfy a computationally unforgettable mathematical relationship. While using the keys to perform encryption and decryption, signature verification, key agreement, or other cryptographic operations, the public key can be implicitly authenticated in the process of signature verification. In addition, each public key does not have a separate certificate and the verifier does not need to authenticate the certificate of the public key. Consequently, the SCPK system can reduce the storage space and computational overhead in public key schemes. Moreover, the user’s private key is chosen by himself and the system authority who cannot get the private key from the transmitted data and cannot forge the signature as a user. Compared with ID-based public key system, the SCPK system has higher security and is more suitable for applications in open network environment.

3.2. Bilinear Map

Let G1 and G2 be two multiplicative cyclic groups of prime order p and be a generator of G1. A bilinear map is a map with the following properties [39]:Bilinearity: for all and , .Computability: the map e is efficiently computable.Nondegeneracy: .

3.3. HVA

Homomorphic verifiable authenticator (HVA) is a basic component of public auditing [10, 19, 40]. Specifically, HVA can be generated based on digital signatures, such as RSA-based signature and BLS-based signature. Therefore, such HVAs can be considered as homomorphic verifiable signatures. Taking advantage of HVA, a public auditor can verify the integrity of outsourced data without downloading the original data. Generally speaking, HVA has the following properties [41, 42]:Blockless Verifiability. Without knowing the actual data content, TPA can verify the integrity of the data blocks based on the proof constructed by HVAs.Homomorphism. Let and be multiplicative groups, whose orders are a large prime p. Let “” and “” be operations in and . If a map function satisfies homomorphism, then , .Nonmalleability. Let and be signatures of block and block , respectively. Given a certain block , where α1 and α2 are two random numbers in . For any user, if he/she does not know the private key, he/she cannot simply generate the legitimate signature of block based on and .

3.4. Merkle Tree

Merkle hash tree (MHT) is an authentication structure built based on hashes of data. The leaf node of Merkle tree stores the hashes of data elements (a file or a collection of files). The nonleaf node stores the hashes of its child nodes. MHT can identify whether the data were altered by comparing the calculated root hash with the value held by the validator. In blockchain network, MHT is used to store transaction’s hash and check transaction’s authenticity.

Figure 2 shows block structure in blockchain. Each block header saves the root hash of all transaction ti in this block. The root hash participates in the hash operation of block header, and thus any modification to transaction data will lead to the change of the root hash, which will result in the hash change of the block header. In this paper, the user’s operational information and metadata information of files are put into the blockchain. The chain structure of the block ensures the security of auditing data source.

3.5. Security Assumptions

The security of our new public auditing scheme will be based on the CDH assumption and DL assumption.

3.5.1. Computational Diffie–Hellman (CDH) Problem

Let G be a multiplicative cyclic group. The order of G is a large prime p. The generator of G is . The CDH problem is described as follows: given two random numbers and , compute the value .

Definition 1. CDH assumption: the probability that any probabilistic polynomial‐time adversary solves the CDH problem can be negligible, namely,In other words, it is computationally feasible to solve the CDH problem or impossible to solve the CDH problem in a limited time.

3.5.2. Discrete Logarithm (DL) Problem

Let be a multiplicative cyclic group. The order of G is a large prime . The generator of G is . The DL problem is described as follows: given , compute , such that .

Definition 2. DL assumption: the probability that any probabilistic polynomial‐time adversary solves the DL problem can be negligible, namely,In other words, it is computationally feasible to solve the DL problem or impossible to solve the DL problem in a limited time.

4. Public Auditing Scheme Based on SCPK

Then, we describe how to construct our public auditing scheme based on the SCPK system in more detail.(1)System Initialization Phase. Let and be two groups of a large prime order and be a generator of . Let be a bilinear map with . Let be a hash function expressed as . Suppose that the outsourced file is divided into data blocks, i.e., . Assume the identities of the user and file are and , respectively.(2)Key Generation Phase. In the system, TPA can be used as a trusted authority who is responsible for the user’s registration and the generation of user’s public key. TPA first publishes the modulus and its public key . The private key of TPA is . The length of is more than 1024 bits, and , where is Euler’s totient function. Then, the user selects a random number as his private key and calculates . After that, the user sends the and his identity to TPA who will calculate user’s public key and send to user. After receiving , the user verifies the validity of equation . If the equation holds, then the running result of this stage is , where is the random element of .(3)Signature Generation Phase. With the public parameter and his private key , the user generates a signature for each data block . The mentioned is ’s version number, and is ’s time stamp. Then, let the signature set of all blocks be .(4)File Tag Generation Phase. To ensure the integrity of the unique file identifier , the user computes the file tag with his private key . In the equation, and , where is a random number chosen by the user.Finally, the user sends the data information to the TPA for auditing and uploads to the CSP for storage.(5)Block Tag Generation Phase. After receiving , CSP further generates a tag for each block by using the bilinear map . Then, CSP stores the verification metadata along with the file .(6)File Identifier Check Phase. The user delegates the verification task of a certain file to the TPA. Then, TPA requests the corresponding file tag from CSP and verifies the equation with user’s public key . If the verification fails, TPA informs the user that the files have been corrupted; otherwise, verification continues.(7)Challenge Generation Phase. TPA launches the verification challenge to the CSP in this stage. TPA first chooses a random number and calculates , which is called random masking and is used to achieve privacy preserving [39]. Then, TPA sends the challenge information to CSP, where is the index of the blocks to be checked, is the random number, and is the selected number of the blocks to be checked [12].(8)Proof Generation Phase. After receiving the challenge information, CSP would generate corresponding proofs of required blocks, which contain two parts: the tag proof and the data proof. More specifically, CSP generates the tag proof as follows:which can indicate the tags’ correctness. And CSP generates the data proof as follows:where and . The data proof can indicate the data’s integrity. Then, CSP sends the proof to TPA.(9)Proof Verification Phase. After receiving the proof, TPA would check whether the proof is valid. More concretely, TPA checks whetherholds. If the above verification equation holds, it shows that the outsourced data in the cloud are integral; otherwise, it shows that the data are incomplete.

The correctness of the above equation can be demonstrated as follows:

5. Security Proof and Performance Analysis

In the proposed public auditing scheme, CSP is assumed to be an untrustworthy party and TPA is considered credible but curious. CSP may conceal the data errors or deliberately delete some data. TPA may be curious about the privacy information of users’ data and even may try to derive the users’ data contents. Then, necessary security and performance analyses of the new scheme will be comprehensively demonstrated in this section.

First of all, let us analyze the security of the self-certified public key system.

If an attacker attempts to retrieve the user’s secret key from his/her public key , he/she must calculate the secret key from the equation . In this way, he/she will face the difficulty of computing discrete logarithm modulo . In other words, the attacker’s probability of success is to solve the discrete logarithm problem and factorization problem. Moreover, even TPA knows and ; the difficulty for him to retrieve the user’s secret key is also equivalent to the difficulty of computing discrete logarithm.

Another scenario is that an attacker tries to derive user’s secret key from the user’s signature. For the file tag , the attacker should obtain from or . However, the difficulty for him to achieve it is also equivalent to the difficulty of computing discrete logarithm problem. For the block signature , the attacker should compute from the equation. He also faces the difficulty of computing discrete logarithm problem.

The final scenario is that an attacker tries to impersonate the signer to forge a valid signature without knowing the signer’s secret key . For the file tag, the above analysis shows that the attacker cannot reveal the user’s secret key. Then, he cannot forge a valid signature that can pass the verification. For the block signature, Definition 2 indicates that the probability that any probabilistic polynomial‐time adversary solves the DL problem can be negligible. Then, it is computationally infeasible for the attacker to forge a valid HVA in a limited time. The proof, which is demonstrated in the security analysis of [12], is omitted in this paper.

Secondly, we discuss the unforgeability of proofs.

In the presented public auditing scheme, CSP sends the proof to TPA after the proof generation phase. The above analysis shows that the tag proof cannot be forged owing to the CDH assumption. Then, we only need to prove that the data proof cannot be forged. Suppose CSP sends a fake proof to TPA, where and . If CSP wants to pass the verification, the equationmust hold. Then, we can deduce that according to the properties of bilinear maps. However, this contradicts the above assumption. That is to say, the data proof is unforgeable. In summary, our presented scheme can effectively resist against the forging attacks launched by CSP.

Thirdly, we discuss the communication and computation overhead, which are reduced by introducing the batch auditing.

With the batch auditing, multiple verification tasks from different users can be handled concurrently. Suppose that TPA sends challenges to CSP. Then, the tag proof and the data proof are calculated separately. And CSP figures out the aggregate proofs according to the following equation:where , , , and . is the identity of the j-th user. Then, CSP sends the aggregate proofs to TPA. Once received, TPA checks whether the equationholds. and are the version number and time stamp of block for the j-th user. is the random number chosen by TPA for the j-th user. is the random masking calculated by TPA for the j-th user. is the random number chosen by TPA for the j-th user.

If the above verification equation holds, it shows that our scheme can realize the batch auditing. Then, its correctness can be demonstrated as follows:

Finally, our new scheme is based on the self-certified public key system. Compared with other public auditing schemes [10, 12, 18, 19, 2128, 38, 39], there is no public key certificate included in the public authentication parameters. And there is no need to store and transmit the public key certificate before the interaction of auditing. Then, the validation and validity of public key certificate is omitted. The verification of public key is hidden in the process of the verification of signature. Consequently, the storage space and communication bandwidth are saved. The network load and transmission delay are reduced. The verification efficiency of public and the authentication efficiency of the scheme are improved.

6. Discussion and Conclusions

In this paper, we present a public auditing protocol with a self-certified public key system using blockchain technology, which differs from the state-of-the-art schemes. The user's operational information and metadata information of the file are formed to a block after verified by the checked nodes and then to be put into the blockchain. The chain structure of the block ensures the security of auditing data source. Comprehensive analyses show that attackers cannot derive user’s secret key in the proposed scheme. TPA cannot derive users’ data from the collected auditing information during the verification phase. Attackers cannot impersonate the signer to forge a valid signature without knowing the signer’s secret key. The presented scheme can also effectively resist against the forging attacks launched by CSP. The realization of batch auditing and the efficiency of the scheme are also discussed in this paper. Compared with other public auditing schemes, the storage space and communication bandwidth are saved in our public auditing scheme. The network load is also reduced. In addition, the verification efficiency of public key and the authentication efficiency of the scheme are improved.

However, in the actual cloud storage environment, a lot of various data need to be updated dynamically motivated by various application requirements. For instance, users might try to perform insertion operation owing to the incomplete outsourced data or might try to delete some data that are no longer used. Our public auditing scheme does not specifically discuss dynamic data auditing, which can be referred to DHT [19] or put forward as a new structure in our future research. Furthermore, TPA may dishonestly perform public auditing protocols and may even collude with CS to deceive users. Some existing public audit schemes use blockchain to resist against malicious TPA. However, CS may guess the challenge messages, and there is a risk that user information may be disclosed to TPA during the audit process. The above questions will be the focus of our future research.

Data Availability

All data, models, and codes generated or used during the study are included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This study was supported by the National Natural Science Foundation of China under grant no. 61702316, the Natural Science Foundation of Shanxi Province under grant nos. 201801D221177 and 201901D111280, the Educational Research Projects of Young and Middle-Aged Teachers in Fujian Education Department under grant no. JAT170142, the Key Research and Development Project of Shandong Province under grant no. 2019JZZY010134, and the Graduate Education Reform Research Project of Shanxi Province under grant no. 2020YJJG145.