Abstract

Ensuring the integrity of remote data is the prerequisite for implementing cloud-edge computing. Traditional data integrity verification schemes make users spend a lot of time regularly checking their data, which is not suitable for large-scale IoT (Internet of Things) data. On the other hand, the introduction of a third-party auditor (TPA) may bring about greater privacy and security issues. We use blockchain to address the problem of TPA. However, implementing dynamic integrity verification with blockchain is a bigger challenge due to the low throughput and poor scalability of blockchain. More importantly, whether there is a security problem with blockchain-based integrity verification is not yet known. In this paper, we propose a scalable blockchain-based integrity verification scheme that implements fully dynamic operations and blockless verification. The scheme builds scalable homomorphic verification tags based on ZSS (Zhang-Safavi-Susilo) short signatures. We exploit smart contract technology to replace TPA for integrity verification tasks, which not only eliminates the risk of privacy leakage but also resists collusion attacks. Furthermore, we formally define a blockchain-based security model and prove that our scheme is secure under the security assumption of cryptographic primitives. Finally, the mathematical analysis of our scheme shows that both the communication complexity and the communication complexity of an audit are , in which is the number of challenge blocks. We compare our scheme with other schemes, and the results show that our scheme has the lowest time consumption to complete an audit.

1. Introduction

The rapid development of the Internet of Things (IoT) brings huge amounts of data. IoT devices store data on the cloud for cloud-edge computing. However, ensuring the integrity of remote data is the prerequisite for implementing cloud-edge computing [1].

Traditional cloud data integrity verification schemes [2, 3] rely on techniques such as message authentication codes and hash functions to let users know the status of their data. Nonetheless, these heuristics have large computation and communication overheads since users need to retrieve all data. Some schemes [4, 5] reduce the verification overhead of the integrity verification system by constructing homomorphic verification tags. While these schemes enable quick auditing of data, users still need to spend a lot of time auditing their data periodically. To reduce the auditing burden on users, third-party auditors (TPAs) [6] are introduced to perform auditing tasks on cloud data. However, in real-world scenarios, TPAs are not completely trustworthy, and there are two threats [79]. First, a malicious TPA may extract data privacy by auditing the same data blocks over and over again. Second, a malicious TPA may collude with cloud servers to produce fake audit results.

Fortunately, blockchain smart contract technology [10, 11] makes it possible to address these issues simultaneously. Smart contracts are encapsulated scripts that can be automated for execution. Therefore, we can use smart contracts to perform auditing tasks instead of TPAs. However, the low throughput and poor scalability of blockchain make it difficult for blockchain to be used in dynamic cloud storage. Therefore, it is a huge challenge to address the scalability of integrity verification schemes in blockchain network environments. More importantly, whether the security of integrity verification schemes is affected in the open network environment of blockchain should be noticed. To the best of our knowledge, there is no scheme that gives formal security proof. Therefore, it is essential to give proof of security for blockchain-based integrity verification schemes.

In this paper, we propose a scalable blockchain-based integrity verification scheme that enables fully dynamic actions such as insertion, deletion, and modification to address the issues raised above. We create a scalable homomorphic verification tag based on the ZSS (Zhang-Safavi-Susilo) short signature, which uses basic cryptographic hash functions such as SHA-1 or MD5 and does not require expensive specific hash algorithms to accomplish scalability. The scheme supports blockless verification that allows users to audit their data without retrieving all of it. In addition, we use blockchain smart contract technology instead of TPA for the task of integrity verification, which not only eliminates the risk of privacy leakage but also protects against collusion attacks. To evaluate the level of security of our scheme in a blockchain environment, we formally define a blockchain-based security model and demonstrate that the scheme is secure against adaptive chosen message attacks under the security assumption of cryptographic primitives.

1.1. Contributions

The following are the main contributions of this paper: (1)We propose a scalable blockchain-based integrity verification (SBB-IV) scheme that implements fully dynamic operations and blockless verification. The scheme achieves scalability under blockchain networks by building scalable homomorphic verification tags (HVTs) based on ZSS short signatures, which use general cryptographic hash functions and do not require expensive special hash functions(2)We exploit smart contract technology to replace TPA for integrity verification tasks, which not only eliminates the risk of privacy leakage but also resists collusion attacks. Furthermore, we formally define a blockchain-based security model that captures the semantic security of adaptive chosen message attacks (CMA). We show that the SBB-IV scheme is secure against adaptive CMA under the security assumption of the q-CAA problem(3)The mathematical analysis of our scheme shows that both the communication complexity and the communication complexity of the scheme are , in which is the number of challenge blocks. In addition, we do a series of tests on Hyperledger Fabric V2.2 and compare our scheme to the current state-of-the-art. Our technique is more efficient, as it takes only 2.3 seconds to conduct an audit when 1% of the data blocks are faulty

1.2. Paper Organization

The remainder of this work is arranged in the following manner. We provide an overview of related works in Section 2. Preliminaries are shown in Section 3. The network, threat, framework, protocol, and security model are all shown in Section 4. The detailed algorithms are presented in Section 5. We examine the correctness, dynamic, and security in Section 6. The mathematical analysis and experimental results are presented in Section 7. The paper comes to a close with Section 8.

2.1. Traditional Data Integrity Verification

Provable data possession (PDP) [12] and proofs of retrievability (POR) [13] are two types of data integrity verification models. The PDP model was formally specified by Ateniese et al. [12], who presented an HVT based on RSA (Rivest-Shamir-Adleman) signatures. They separated the data into blocks and calculated the HVTs for each one. The user then chose a fixed number of blocks for verification at random. Although the sampling approach decreases the computing cost from linear to constant, the scheme is not capable of dynamic operations due to the fixed index of blocks. Juels et al. [13] presented a sentinel-based POR technique in which data segments (sentinels) were randomly inserted into the full data encoded using error correction codes. Due to the limited number of sentinels, it can only undertake limited auditing. BLS (Boneh-Lynn-Shacham) signatures were utilized by Shacham et al. [14] to create HVTs, which reduces communication overhead because the BLS signature is shorter than the RSA signature. Wang et al. [8] described how to build a dynamic PDP system using Merkle tree, an authenticated data structure. Similarly, Erway et al. [15, 16] proposed a skip-list-based dynamic-PDP (DPDP) system. Instead of using a fixed index, these data structures indicate block positions in terms of the order of leaf nodes, allowing blocks to be dynamically inserted at varied locations. However, because these data structures require supplementary information to validate the leaf node placements, they have a computational and communication complexity of , making them unsuitable for large-scale data. By first encrypting the data and then providing some precomputed hashes of the encrypted data to the TPA, Shah et al. [6, 17, 18] introduced a TPA to audit the data. The TPA, on the other hand, will be unable to continue auditing after the hashes run out. Furthermore, a hostile TPA may collect information by auditing the same data blocks over and over again. Although random mask approaches [5, 9, 19, 20] have been devised to obscure the linear combination of data and prevent the TPA from extracting it, they are still ineffective in preventing collusion attempts.

2.2. Blockchain-Based Data Integrity Verification

By replacing the integrity management service of centralized nodes with a fully decentralized data integrity service, Liu et al. [10] proposed a blockchain-based Internet of Things (IoT) data integrity service framework that eliminates TPA. However, as they only implemented the proposed protocol’s basic features, the efficiency of building smart contracts for IoT devices is insufficient for large-scale IoT data. To assure data availability and privacy, Liang et al. [23] suggested a decentralized and dependable cloud data source protection architecture. The architecture used tamper-proof blockchain records and embedded data provenance in blockchain transactions, with auditors verifying the data’s origins based on the information in the blocks. Paying the blockchain miners, on the other hand, would be prohibitively expensive for cloud customers. To address the problem of unreliability in traditional verification procedures, Yue et al. [22, 24] presented a blockchain-based P2P cloud storage data integrity verification methodology. The approach used the Merkle-tree to verify data integrity and examined system performance using various Merkle-tree architectures. Wang et al. [25] proposed a decentralized architecture to tackle the traditional paradigm’s single-point trust problem through communal trust. The architecture built a public protocol that maintains the data state under public scrutiny and prevents storage parties from engaging in fraudulent activities. For large-scale IoT data, Wang et al. [11] developed a blockchain-based data integrity verification system. They constructed a prototype system of edge computing processors near IoT devices to preprocess large-scale IoT data and performed data integrity verification in the form of transactions. None of the aforementioned approaches provide formal proof of security, and the security of integrity verification in a blockchain network setting remains an open question. We compared our scheme with the state-of-art, as shown in Table 1.

3. Preliminaries

3.1. Smart Contract

A blockchain is a distributed database that uses encryption, hashing, timestamping, consensus mechanisms, and other techniques [26]. All operations (transactions) are recorded on the blockchain, which is a chained data structure with tamper-proof features. A smart contract is a blockchain-based event-driven program [27]. It is contained within a virtual node that allows automated script execution and data processing in response to event triggers. Smart contracts, like transactions on the blockchain, offer distributed storage and tamper-proof characteristics. Being different from traditional executable programs, smart contracts are distributed and run according to preset rules that create communication protocols between communicating parties [28]. As a consequence, smart contracts enable traceable and irreversible activities without the involvement of a third party.

3.2. ZSS Signature

Let be the generator of the group which is a cyclic additive group with the large prime order . Allow to be a cyclic multiplicative cyclic group of order . Let be a bilinear pairing if it satisfies the following properties:

(1) Bilinear: , and , the equation holds

(2) Computability: , there is an effective algorithm to calculate

(3) Nondegenerate: , such that , which means that the map does not send all pairs in to the identity in . The ZSS signature [29] includes three algorithms: KeyGen, Sign, and Verify. Let be a secure hash function. (i) Randomly select an integer and compute . The private key is , and the public key is (ii) Given a message , the signature is (iii) Given a signature , a public key , and a message , compute and verify the equation:

If the equation holds, the signature is valid; otherwise, the signature is invalid.

4. Approach Overview

4.1. Network and Threat Model

Figure 1 depicts the SBB-IV scheme’s network model, which consists of three entities: data owner devices (DODs), cloud service providers (CSPs), and smart contracts. (1)DODs: DODs act as nodes on the blockchain network, outsourcing users’ data to CSPs and paying for the execution with smart contracts(2)CSPs: Data storage and maintenance services are provided by CSPs, which are connected to the blockchain network as nodes(3)Smart contracts: Smart contracts are virtual nodes that contain automated scripts. They cannot be destroyed or modified by any enemy

DODs outsource users’ data to CSPs and pay for the execution through smart contracts on the blockchain network. Smart contracts issue a challenge to audit cloud data integrity. Based on the proof created by CSPs, smart contracts use the verification algorithm to check the proof’s validity and deliver the outcomes to DODs. Finally, the blockchain keeps track of everything. The TPA collusive attack is avoided in this process because smart contracts are automated execution scripts. As a result, only the threat model described below is considered in this paper.

4.2. Malicious CSP

The malicious CSP knows the data and the public information; the purpose of the malicious CSP is to cheat smart contracts. That is, the malicious CSP owns the knowledge and wants to find fake proof to pass the verification of smart contracts.

4.3. Protocol

The SBB-IV scheme is a collection of five polynomial-time algorithms: , , , , and (see detailed algorithm in Section 5-B). Based on the scheme, we create an integrity verification protocol. Setup, Challenge, and Verify are the three stages of the protocol, as shown in Figure 2.

In the Setup stage, DOD uses the algorithm to create the pair of keys. It runs the algorithm to compute a HVTs sequence and a hash sequence, then uploads the blocks and the HVTs sequence to CSP, and saves the hash sequence to audit smart contracts (A-SC).

In the Challenge stage, DOD sends a random to the challenge smart contract (C-SC). Then, using the algorithm , C-SC produces a challenge and transmits it to CSP and A-SC.

In the Verify stage, CSP creates a proof using the procedure and delivers it to A-SC, according to . The proof is then verified by A-SC using the algorithm , and the result is sent to DOD.

4.4. Blockchain-Based Security Model

An interactive game between a challenger , a smart contract , and an adversary defines the blockchain-based security model. In the Setup phase, we convey the challenged data to the adversary to capture the semantic security of adaptive chosen message attack. As a result, in the Query phase, the adversary might adaptively pick multiple data blocks for the HVT query. The game is played in the following manner: (i)Setup: generates and sends a public key to . Then, sends a data and its HVTs sequence to . Finally, sends the hash sequence to (ii)Query: adaptively makes queries; the selected blocks is sent to . According to queries, computes HVTs for all , where , and returns the HVLs. Note that can be a block of the data (iii)Challenge: generates for by running the algorithm(iv)Forge: According to the challenge , calculates a proof for and sends it to (v)Verify: verifies the proof by executing the algorithm . If is valid, wins the game

The game process can be referred to as Figure 3. The security definition of the scheme is as follows.

Definition 1. If any probabilistic polynomial-time adversary cannot win the game with nonnegligible probability, the SBB-IV scheme is secured against the adaptive chosen messages attack when the integrity of the remote data is violated.

5. Our Schemes

5.1. Notations

We employ a pseudorandom permutation function (PRP), , and a pseudorandom function (PRF), , in addition to the symbols specified in the preceding section. Aside from that, is a secure generic hash function.

5.2. Scheme Detail

The SBB-IV scheme is described in full in this section: (1). The algorithm selects a random number from the ring, , as a private key , and then computes as a public key, according to the security parameter (2). The algorithm firstly splits a data into equal length blocks; that is, . Next, for , it calculates the hash value for a block and then computes the HVT as following equation: Finally, it outputs a HVTs sequence, , and a hash sequence .(3). Based on the , the algorithm chooses two random numbers , where . For , it computes and by the PRR function and the PRF function. The output is (4). According to the challenge, the algorithm’s calculation is as follows:

Lastly, it outputs . (5). The algorithm accepts , , and as input and then calculates:

Finally, the output is depending on whether the following equation holds:

If the equation holds, the algorithm outputs ; otherwise, it outputs .

5.3. Dynamic

Because the HVT built in the scheme (as shown in Equation (2)) is based solely on the block and excludes a fixed numerical index, it can enable completely dynamic operations such as modification, insertion, and deletion. The technique generates a hash sequence that records the location of each block. As a consequence, the following procedure is used to update the data:

Step 1: DOD delivers an request to A-SC, , where represents the updated operations, denotes the updated position, and represents the updated content. Note that when a delete operation is performed, is empty

Step2: According to the request, A-SC performs the corresponding update operation and records the modification on the blockchain

Step3: When the blockchain is recorded successfully, DOD sends the to CSP to complete the update

5.4. Implementation

In our scheme, we encapsulate the algorithm and the algorithm into smart contracts to perform the task of auditing instead of TPA. Users can trigger the execution of smart contracts by sending . This not only includes the number of audit blocks and security parameters, but users can also set parameters such as the audit cycle time and the number of audits performed according to their needs. The parameter is the number of randomly sampled blocks in one audit. The parameter is larger, the higher the audit confidence and the higher the computational overhead. Therefore, users set different according to their needs to make a trade-off between different confidence levels and computation overhead. The tamper-proof nature of smart contracts eliminates the possibility of privacy leakage and collusion attacks. Because to the collision resistance and one-way nature of the hash function, an attacker cannot access the data through the hash value, despite the fact that we have put the hash sequence on the public smart contract.

6. Scheme Analysis

6.1. Correctness

The PRP and PRF functions in the algorithm of the SBB-IV scheme ensure that the blocks are randomly picked for each audit, making it impossible for a malicious CSP to prepare proofs ahead of time. If the remote data is preserved, the proof generated by the algorithm will always pass the algorithm’s verification. The scheme is correct in the following ways:

6.2. Security

We treat the hash function as a random oracle and reduce the security of the SBB-IV scheme to the q-CAA problem [21].

Definition 2 (q-CAA problem). For an integer , and , , given where to compute for some .

q-CAA assumption. The q-CAA problem is -hard if for a -time adversary , the advantage of to solve the problem is negligible: where is a negligible probability, and .

Theorem 3. Suppose the -q-CAA assumption holds in the group , our scheme is -secure against adaptive chosen message attack under the random oracle model.

Proof. If an adversary can break the security of the SBB-IV scheme, we will show a challenger how to use to solve the q-CAA problem. The challenger has known that and , and her goal is to calculate for some . Therefore, an interactive game between a challenger , a smart contract , and an adversary as follows: (i)Setup: The challenger generates the public key , and sends it to . Then, selects a data and constructs its HVTs sequence and its hash sequence as follows. maintains a list of tuples . The list is initially empty. For a block , selects a and computes ; its HVT is . Then adds the tuple to the list and removes from -parameters (). Finally, sends the HVTs sequence to and the hash sequence to (ii)Query: The adversary adaptively selects different blocks and sends them to for HVTs queries. Note that () can be a block of the data . At any time can query hash value. When queries at , responds as follows: [1)](1)Hash query. firstly checks if the query already exists in the list . If so, responds with ; otherwise, randomly selects a in the remaining -parameters and responds with and adds the tuple to the list and removes from -parameters(2)HVT query. firstly checks if the query already exists in the list . If so, responds with corresponding ; otherwise, randomly selects a in the remaining -parameters and computes and responds with corresponding . Then, adds the tuple to the list and removes from -parameters(iii)Challenge: generates for by running the algorithm(iv)Forge: According to the challenge , calculates a proof for and delivers it to (v)Verify: verifies the proof by executing the algorithm . If outputs 1, wins the gameWhen the block audited is corrupted, suppose that there is a block corrupted and can forge a fake proof that passes the verification with a nonnegligible probability.
We assume that the fake proof is , where When verifies the proof by executing the algorithm , it computes Therefore, the process of verification is as follows: If the fake proof passes the verification, we get . Hence, from the above derivation, we get the following equation: where . That is, . As a result, we get . Since we have assumed that , we will discuss it in two cases. (i)Case 1: In this case, we get , which contradicts our hypothesis. Therefore, this case proves that the block is not corrupted when the adversary wins the game. (ii)Case 2: .This case shows that when the adversary finds a fake HVT with a nonnegligible probability in a time , the challenger finds a for some with same nonnegligible probability in a time , which means breaks the q-CAA problem.
In summary, suppose the -q-CAA assumption holds in the group , our scheme is -secure against adaptive chosen message attack under the random oracle model.

6.3. Scalability

In an IoT data storage system, with the continuous increase of IoT data, cloud storage needs to have scalability. In the SBB-IV scheme, we divide large data into smaller blocks, which is beneficial to the fine-grained control of the data and enhances the scalability of the cloud storage system. In the meanwhile, the proposed scheme is fully dynamic which means node devices can insert, modify, and delete uploaded data according to their needs. Furthermore, the scheme can be compatible with more systems without compromising efficiency, since HVTs are computed using general cryptographic hash functions rather than expensive elliptic curve hash functions [30]. As a result, the scheme is suitable for the integrity verification of large-scale IoT data.

In addition to IoT systems, our scheme can also be applied to a blockchain-based P2P (peer-to-peer) file system. In this system, an edge device is a peer node, and each peer node can become a client or server. Our scheme solves the bandwidth problem of sharing files from a central server to clients. Files can be shared through different nodes without requesting all files from a central server. At the same time, due to the homomorphism of HVTs, the speed of nodes verifying file integrity is greatly improved. Therefore, the SSB-IV scheme greatly improves the scalability and efficiency of file sharing

7. Evaluation

To justify the performance of the SBB-IV scheme, we conduct mathematical analysis and a series of experiments in this part. The pairing-based cryptography library (PBC, http://crypto.stanford.edu/pbc/) is used in our experiments. The experiments are implemented in the GoLang programming language and run on an Intel(R) Core(TM) i7-10700 CPU with 16 GB of RAM. The blockchain platform is Hyperledger Fabric 2.2.0. The security level has been set to 80 bits, implying that the . We set each block’s size to 8 KB and produce 1000, 5000, 10000, 50000, and 100000 blocks for the test. We present the average values across these 10 trials throughout the examination.

7.1. Mathematical Analysis

We calculate the computation complexity of the SBB-IV scheme. Users execute the algorithms and ; smart contracts run the algorithms and ; CSPs runs the algorithm . The complexity of each algorithm is shown in Table 2. (i): This algorithm performs only one multiplication(ii): In this algorithm, since each block is needed to compute a HVT, the overhead of the algorithm is (iii): According to the parameter , C-SC needs to use the PRF function and the PRP function to calculate two groups of random numbers. Hence, the computation overhead is (iv): This algorithm needs to calculate , , and . The computation cost is (v): This algorithm needs to verify the equation , where . The computation cost is

In the Setup phase, user sends , , , and , in which the communication complexity is . In the Challenge phase, C-SC sends challenge which is bits. In the Verify phase, CSP sends proof which is bits. Therefore, the communication complexity of an audit is .

7.2. Experiments

In this section, we evaluate the actual performance of the scheme with a series of experiments.

7.2.1. Setup

In the Setup stage, the user’s main computation overhead comes from the algorithm. At the same time, the smart contract needs to store a hash sequence . In our experiments, we set the number of blocks to 1,000, 5,000, 10,000, 50,000, and 100,000, respectively. Because each block is 8 KB in size, 100,000 blocks represent 780 MB of data. As shown in Figure 4, the time consumption of the algorithm grows linearly, but the algorithm only needs to be executed once. For 780 MB of data, the smart contract’s storage consumption is only 15.04 MB, which is easily achievable for a distributed ledger (Figure 5).

7.2.2. Audit

As we discussed in Section 5, in the algorithm, the parameter is larger, the higher the audit confidence, and the higher the computational overhead. Therefore, users set different according to their needs to make a trade-off between different confidence levels and computation overhead.

To fully understand the time consumption of an audit, we locally tested the overall time consumption of three algorithms which include the algorithm, the algorithm, and the algorithm. We select other three blockchain-based schemes (YLZ-[22], WCF-[25], and WHZ-[11]) for comparison. In our experiments, we audit numbers to 200, 300, 400, 500, and 700, respectively. The experimental results (as presented in Figure 6) show that our scheme has the lowest overall time consumption for one audit.

In addition, we test the time consumption of the algorithm running locally and the time consumption of the algorithm running in the encapsulated smart contract. Our blockchain platform uses Hyperledger Fabric 2.2.0, and we build a test network on a virtual machine (Ubuntu 20.04). Let indicates a probability and means the number of corrupted blocks, we get

Equation (13) shows that when blocks are corrupted, different values of will produce different confidence levels. Therefore, we assume that when 1% blocks corrupted, we set and to get and confidence, respectively, As presented in Figure 7, the time cost of and algorithm will remain even with the increase of the number of blocks.

7.2.3. Dynamic

For dynamic simulation tests, we use . The time it takes to edit a block in our situation is determined by the time it takes for the blockchain to write the record and the time it takes to produce HVTs. We configured an endorsement node in our test network to write operation records to the Hyperledger (https://hyperledger-fabric.readthedocs.io/en/latest/index.html). The time consumption of insertion and modification is linear as the number of dynamic blocks rises, but the time consumption of deletion remains constant, as shown in Figure 8. Because deletion does not involve the creation of new HVTs, insertion and modification take longer than deletion. The time spent on deletion is primarily due to the time spent writing records by the endorsing node. Because the method for both operations is the same, insertion and modification take almost the same amount of time.

8. Conclusion

This paper mainly solves three problems, including the problem of TPA’s privacy leakage and collusion attack, the problem of poor blockchain scalability, and the security problem of blockchain-based integrity verification schemes. To address the problems above, we propose a scalable blockchain-based integrity verification scheme that implements fully dynamic operations and blockless verification. The scheme builds scalable homomorphic verification tags based on ZSS short signatures. We exploit smart contract technology to replace TPA for integrity verification tasks, which not only eliminates the risk of privacy leakage but also resists collusion attacks. Furthermore, we formally define a blockchain-based security model that captures the semantic security of adaptive chosen message attacks. We show that our scheme is secure under the security assumption of cryptographic primitives. Finally, the mathematical analysis of our scheme shows that both the communication complexity and the communication complexity of an audit are , in which is the number of challenge blocks. We compare our scheme with other schemes, and the results show that our scheme has the lowest time consumption to complete an audit.

Data Availability

The data that support the findings of this study are not openly available due to reasons of sensitivity and are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the Key R&D Program of Zhejiang Province (Grant No. 2022C01055), the Key R&D Program of Zhejiang Province (Grant No. 2020C05005), the Hangzhou Innovation Institute, Beihang University, under Grant 2020-Y5-A-022, and the Beijing Natural Science Foundation (No.4202036). An earlier version of this paper has been presented at conference in 2021 IEEE SmartWorld Ubiquitous Intelligence and Computing Advanced and Trusted Computing Scalable Computing and Communications Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/IOP/SCI) .