Abstract

The data outsourcing services provided by cloud storage have greatly reduced the headache of data management for users, but the issue of remote data integrity poses further security concerns and computing burdens. The introduction of a third-party auditor (TPA) frees data owners from the auditing burden and alleviates disputes over the audit results between data owners and cloud storage providers. However, malicious cloud servers may collude with TPAs to deceive users for financial profits. Hiring multiple auditors in a single audit assignment appears to be a method to address the above problem, but the ensuing voting issues need to be further explored. In this paper, we proposed a smart contract-based outsourced data integrity auditing scheme for multiauditor scenarios. Unlike some existing schemes using reputation like factors as their voting weights, auditors in our scheme vote equally and audit as they go, without any maintenance. This mechanism not only frees auditors from trivia not related to the auditing but also avoids the drawbacks of centralization associated with over-high voting weights. The challenge used to check the integrity of the outsourced data is jointly generated by each involved auditor. Any collusion would be detected as long as there exists more than one honest auditor in the audit. We implement and deploy the scheme as Ethereum smart contracts. With the help of blockchain, the entire auditing process is public and transparent. Both the generated data and the obtained results are persisted with immutability, which ensures the traceability of all historical audits. The comprehensive theoretical and experimental analyses demonstrate that our scheme meets the claimed targets with high efficiency and low gas costs.

1. Introduction

With the rapid development of information age, individuals and organizations have produced a large amount of data. By 2025, the amount of data generated globally is expected to reach 463 exabytes each day [1]. Traditional local storage models can no longer meet the management needs of such a massive volume. Cloud storage is quickly attracting the attention of users for its scalability, low cost, and location free [2]. With technologies such as virtualization, cloud storage converges loose nodes into a powerful platform to provide unified services to users. Today, more and more people are willing to migrate their local data to leased cloud storage [3]. However, once the data is uploaded to the cloud storage, the owner completely loses control over the data. They are obliged to access the data through the interface provided by cloud storage servers, and they have to entirely rely on the cloud storage to ensure the integrity of their data. Unfortunately, even though cloud storage employs a variety of advanced technologies to guarantee the reliability and robustness of users’ data, corruption caused by hardware failure, management errors, or external attacks still occurs [4]. What is worse, malicious servers may even delete the data that is rarely accessed by users in order to free up more storage space to gain greater profit. In addition, once data integrity has been compromised intentionally and otherwise, dishonest storage servers tend to conceal the incidents to prevent their reputation from tarnishing. So how to effectively detect the integrity of data stored in the cloud storage has become a research hotspot.

In order to address this problem, several remote data integrity auditing schemes have been proposed [512]. These schemes enable users to efficiently audit their data’s integrity without a complete download. To achieve this, a user needs to divide the original file into blocks and then generate a tag for each block, which is used to verify the integrity of its corresponding block. When launching a file audit, a challenge will be generated and then sent to the storage server. The challenge contains a collection of selected block indexes and a collection of random numbers corresponding to the indexes. On receiving the challenge, the cloud server picks the data blocks specified in the challenge and computes them together with the random numbers to obtain an integrity proof. By verifying the proof, the data owner can determine whether the cloud server is actually keeping his data virgin or not.

To get rid of tedious audit routines and complex calculations, data owners would like to delegate TPAs to conduct audit tasks. However, introducing a third party poses additional risks that malicious cloud servers may try to trick their users by colluding with auditors. Employing multiple auditors on an audit assignment and determining the final audit outcome based on the votes of all participants can mitigate this collusion, but how to design a reasonable voting mechanism with multiple untrusted participants is still challenging.

The common method to deal with inconsistent voting results in a multiparticipant scenario is weighted voting, where a weight is supposed to be maintained for each auditor, which is typically represented by reputation. The weight of an auditor stands for the extent to which his vote influences the final result. In addition, when an auditor’s vote is consistent with the final result, his reputation increases, otherwise it decreases. Intuitively, weighted voting hopes to build a virtuous ecosystem where honest auditors will always tell the truth and their reputation go on rising. On the contrary, dishonest auditors who are caught cheating will receive a reduction in reputation as punishment, and their reputation will keep declining as the cheating continues. At this rate, a few reputable auditors in the system are bound to become “elders,” and their excessive voice will gradually centralize the system. In contrast to weighted voting, the result of nonweighted voting depends only on the number of votes each candidate receives, and the only thing that needs to be considered is membership, namely, who can vote and who is not allowed to vote in the system. If there is no threshold for the voting, malicious attackers can easily generate a large number of accounts with a very low cost to be involved in an audit, directly affecting the final result by an overwhelming numerical advantage. This type of attack is known as the Sybil attack [13], which is common on peer-to-peer networks.

In summary, the introduction of multiple auditors may somewhat mitigate the collusion, but the problem has not been fundamentally solved and the following threats remain.(1)In weighted voting, the mechanism would lead the system to be progressively centralized. The collusion of a few reputable auditors is enough to sway the final outcome, even if honest auditors are outnumbered. This will reduce the cost of malicious cloud servers doing evil, while also weakening user confidence in the auditing system.(2)In nonweighted voting, without a reasonable membership for the system, Sybil attacks can be easily launched. Malicious cloud servers can generate or buy large numbers of audit accounts to promote their desired results.(3)The collusion between malicious cloud servers and auditors makes the detection of corrupted data fail. This collusion is undetectable because there is no way to distinguish whether an auditor’s challenge is randomly generated or well constructed. With this, the cloud server can “truly” pass the proof verification by saving only a small part of specific data blocks.

1.1. Motivation

As mentioned above, in contrast to weighted voting, which inevitably leads to centralization, nonweighted voting only needs to design a reasonable membership mechanism to avoid Sybil attacks. Besides, the whole auditing process is considered to be written in the form of smart contracts and deployed to Ethereum, where any externally owned account (also known as a user account) can participate by simply paying a deposit. Another benefit of using smart contracts as the carrier for the multiauditor scenario is that it makes the process public, transparent, and traceable. The participants, details of the execution process, intermediate data, and the final result of the audit assignment are permanently recorded on the blockchain. You can always look up any historical audit without worrying about loss or manipulation.

1.2. Contribution

Based on the above motivation, we design a remote data integrity audit scheme based on Ethereum smart contracts with the following features:(i)Cheating resistance. Without complete retention of user data, any server spoofing cannot pass the data integrity audit. This is the basic security requirement for remote data integrity auditing.(ii)Smart contract-based audit. The auditing process is scheduled by smart contracts. Any Ethereum externally owned account can participate in the audit and has nothing to maintain, namely, audit as you go. Every single audit instance is persisted on the blockchain, which ensures public transparency and traceability.(iii)Collusion resistance. We propose an aggregated challenge generation algorithm, where the final challenge is composed of the share independently submitted by each auditor. Such that, as long as there exists at least one honest auditor, the challenge is not going to be generated as malicious auditors might expect. We also designed a nonweighted voting mechanism, namely “one person, one vote.” When the audit results come out inconsistent, the arbitration will be enforced and the honest will be rewarded and the dishonest punished.

1.3. Related Works

Traditional remote data integrity verification mechanisms fall into two main types: one is provable data possession (PDP) and the other is proof of retrievability (PoR). In PDP, the user requests a proof by sending some randomly selected blocks to the server and then determines the integrity of the remote data by verifying that the proof is correct. The PoR scheme stores each encrypted file in a cloud server with a set of pseudorandom blocks. The client can then check the integrity of the data by verifying that the server retains the pseudorandom blocks. In 2007, Ateniese et al. [5] first defined PDP and proposed the PDP scheme. In their scheme, the data user randomly selects several blocks of data to verify the integrity of the data with less communication and computational cost. If the integrity verification of these selected blocks passes, it can be determined that the server has a high probability of having complete data. Later, Juels et al. [14] proposed a PoR model in which the main idea is to embed a set of random values called “sentinels,” and the auditor can check the integrity of the data by checking the presence of sentinels at specific data points. Shacham and Waters proposed two PoR schemes based on the homomorphic linear verifier [15], which further improved the efficiency of the PoR scheme proposed by Juels and Burton [14]. To implement PDP on dynamic cloud data, Ateniese et al. proposed another PDP scheme [16], which supports all dynamic operations except insertion operations. Shen et al. [6] proposed a dynamic PDP scheme that supports fully dynamic operations. Later, various PDP and PoR schemes were proposed to extend the performance or functionality of traditional schemes. A number of common PoR and PDP schemes have emerged to enrich the integrity checking capabilities of outsourced data, such as deduplication [17], batch audit [18, 19], and data update [7, 20]. To reduce the computational burden on the user side, public auditing schemes [8, 1012, 2123] are proposed to allow TPAs to audit the integrity of their cloud data on behalf of data owners. To guarantee the integrity of medical data and reduce the burden of the data owner, Li et al. [24] propose an efficient, privacy-preserving public auditing protocol for cloud-based medical storage systems that supports the functions of batch auditing and dynamic update of data. This scheme not only saves TPA and data owner computation costs but also reduces the communication overhead between TPA and cloud servers. Considering that key retention is a burden for data users, Shen et al. [25] propose a new paradigm called “data integrity auditing without private key storage,” which utilizes a linear sketch with coding and error correction processes to confirm the identity of the user. To enable data integrity auditing under the multiwriter model, He et al. [26] propose the first public auditing scheme for shared data that supports fully dynamic operations. To implement the new paradigm, they proposed a specially designed authenticated structure, called the blockless Merkle tree, and a novel cryptographic primitive, called permission-based signature in edge computing scenarios, caching data on edge servers can minimize users’ data retrieval latency. However, this new architecture poses challenges for traditional data audit models. Li et al. [27] propose a new data structure named variable Merkle hash tree (VMHT) for generating the integrity proofs of those data replicas during the audit, which solves the above problem. Considering existing schemes suffer from issues of complex certificate management or key escrow problems, Gudeme et al. [28] propose a certificateless privacy-preserving public auditing scheme for dynamic shared data with group user revocation in cloud storage, without public key infrastructure (PKI) or identity-based cryptography (IBC). To verify whether an untrusted CSP stores all their replicas in different geographic locations or not. Yu et al. [29] propose a dynamic multireplica auditing scheme, with both the integrity and geographic locations of a cloud user’s data replicas verified.

Recently, blockchain has been considered as one of the most promising technologies to provide security support for IoT systems [30]. It was initially used to provide digital payments [31] and is now commonly used for smart contracts [32, 33] and data storage. The trust issues associated with traditional data integrity verification make the integration of blockchain into data integrity verification an inevitable trend. Based on a distributed data storage blockchain, Zhang et al. [34] proposed a privacy-preserving electronic health record (EHR) public auditing scheme to prevent malicious behavior by TPA. However, it does not support batch auditing and data updates. Liu et al. [35] proposed to apply blockchain to avoid the use of TPA, and Yue et al. [36] proposed a blockchain-based framework that attempts to obtain trustworthy audit results. They all lack the necessary considerations to ensure the credibility of the results of off-chain events. Kun et al. [37] implemented private blockchain-based data validation in an untrustworthy environment, but their solution requires building and deploying a private blockchain, which is very difficult in practice. Zhou et al. [38] proposed a witnessing model to credibly enforce smart contract-based off-chain cloud service level agreements (SLA). Miao et al. [39] proposed a mechanism to generate challenges using block hashes, but the method does not guarantee that the audit results will not be tampered with off-chain. There are also some blockchain-based multiaudit models [37, 40]. However, their proof validation process is in smart contracts or in blockchains using proof of work, which can consume excessive costs of public chains or validation time. Zhang et al. [41] propose a certificateless public verification scheme against procrastinating auditors (CPVPA) by using blockchain technology. CPVPA is built on certificateless cryptography and is free from the certificate management problem. This scheme mitigates the impact of the TPA’s laziness on the audit. To solve the problem of repeated auditing of data shared by multiple tenants, Xu et al. [42] propose a blockchain-based deduplicatable data auditing mechanism, which also works out the problems such as high cost and reliance on trusted third parties in traditional approaches. Chen et al. [33] proposed a blockchain-based crowdsourcing auditing approach to achieve trustworthiness in audit results. The model relies on an untrusted audit committee. However, the scheme maintains a reputation as the voting weight for each auditor, which may introduce the disadvantage of centralization to integrity auditing.

1.4. Organization

The rest of the paper is organized as follows. We discuss the preliminaries in Section 2. Section 3 describes the subalgorithms executed by each participant and the scheduling framework of the scheme. The security analysis and formal proof are described in Section 4. Section 5 analyses the implementation and performance. Finally, Section 6 concludes the paper.

2. Preliminaries

2.1. Bilinear Map

Let and be two multiplicative cyclic groups with a large prime order . is a bilinear map with the following properties:(i)Bilinearity. and , it has .(ii)Non-degeneracy. where are generators of , it has .(iii)Computability. , there exists an efficient algorithm to calculate .

2.2. Complexity Assumption

Definition 1. (Computational Diffie–Hellman (CDH) problem). Suppose is a multiplicative cyclic group. is a generator of . Given the tuple with the unknown elements , the CDH problem is to calculate .

Definition 2. (CDH assumption). The advantage for any probabilistic polynomial time (PPT) algorithm to solve the CDH problem in is negligible. It is defined as . Here, denotes a negligible value.

Definition 3. (Discrete logarithm (DL) problem). Given the tuple where is unknown. the DL problem is to calculate .

Definition 4. (DL assumption). The advantage for any PPT algorithm to solve the DL problem in is negligible. It is defined as . Here, denotes a negligible value.

2.3. Blockchain and Smart Contract

Blockchain technology enables decentralized peer-to-peer transactions, coordination, and collaboration without trust through data encryption, timestamps, and distributed consensus. A “smart contract” is simply a program that runs on the blockchain. It is a collection of codes (its functions) and data (its state) that resides at a specific address on the blockchain. They are typically used to automate the execution of an agreement so that all participants can be immediately certain of the outcome, without an intermediary’s involvement or time loss. User accounts can interact with a smart contract by submitting transactions that execute a function defined in the smart contract. Smart contracts cannot be deleted by default, and interactions with them are irreversible. With the help of blockchain’s immutability, the process of running smart contracts and generating data cannot be changed later. This is very important when you want to trust something or make something more trustable. The scheduling part of the audit assignments can be stripped out of the overall audit logic and put into a smart contract. The parties participate in the audit by interacting with the contract. The contract is responsible for driving the audit process, collecting the intermediate results of each participant’s calculations, assigning calculation tasks to each participant, completing the vote tally, and outputting the final audit results.

2.4. Two-Phase Commit

The concept of two-phase commit (2PC) is derived from the database management system. It is a standardized protocol that ensures that a database commit is implemented in the situation where a commit operation must be broken into two separate parts. Since our audit scheme is based on smart contracts, any data submitted by the participants is publicly available. This poses a security risk to the operation of the protocol. The purpose of introducing 2PC in a public system is to ensure that the data submitted by each participant is confidential to others.

3. Proposed Scheme

In this section, we introduce the components of the proposed system and then explain the subalgorithms related to data integrity auditing and their executors.

3.1. System Model

The system consists of a data owner, a storage provider, an auditor, and a smart contract, where there can be any number of auditors.(i)Data Owner. The data owner rent cloud storage services and outsource large amounts of data to the cloud storage. The data owner may be individual or organizational consumers.(ii)Storage Provider. The storage provider provides cloud storage services to the data owner. It has significant storage capacity and powerful computing capability. When receiving a data auditing challenge, the storage provider should respond with an integrity proof to auditors.(iii)Auditor. The auditor challenges the storage provider and identifies the integrity of the user data by verifying the proof returned by the provider.(iv)Smart Contract. The smart contract stipulates the audit process. There are two smart contracts in the system. While AMSC (assignment management smart contract) manages the audit assignments, AASC (audit assignment smart contract) is instantiated by AMSC and performs a specific auditing assignment.

3.2. Notations

To make the proposed scheme more clearly understood, we summarize the main notations involved in Table 1.

3.3. Auditing Framework

This section introduces how the smart contract boosts the interaction of each participant and achieves data security checks and aggregation. Note that all of them are executed on-chain except for the data outsourcing indicated by the dotted line in Figure 1, which is done off-chain.(i): AMSC is deployed at the very beginning. All storage providers, data owners, and auditors in the system are going to listen at its address for the events.(ii): assuming data outsourcing has been done off-chain, the data owner submits the file identifier and his storage provider’s address to AMSC for the file enrollment.(iii), , and : these three steps are done consecutively together. When an audit is launched, the data owner sends an auditing request to AMSC along with the fee he is willing to pay. The request includes and the challenged number of data blocks . After a brief verification, an AASC instance for this file will be deployed by AMSC. All participants listening to AMSC will receive this event. Consequently, the data owner and his storage provider begin to listen to the newly deployed AASC’s address.(iv): at the same time, auditors also received the above event. Any interested auditor can apply for the audit by generating two large random numbers and submitting them to AASC with 2PC. Meanwhile, enough deposits are required. The detailed process of this 2PC is illustrated in Figure 2.(v): AASC adds all the submitted s into and s into , then sends to the corresponding storage provider as the challenge.(vi): on receiving the challenge, the storage provider computes together with the challenged data blocks to obtain the integrity proof .(vii) and : after receiving from the storage provider, AASC distributes it together with the two previously generated numbers to auditors. Each auditor acquires the result of the data’s integrity by checking the validity of , then sends the result back to AASC.(viii): AASC compares all the received results, if they are consistent, this result is taken as the final result. The balance in the contract account (including the data owner’s auditing fee and the auditors’ deposits deducted for failing 2PC) is then distributed to the remaining auditors as their rewards.(ix): if auditors do not draw a unanimous conclusion about the result, AASC sends to the data owner, who then performs an arbitration to get the final auditing result. Based on this result, AASC distributes the balance in the contract account to the auditors who achieve the same result as the data owner as their rewards. The detailed process is illustrated in Figure 3.

3.4. Algorithms

This section introduces the calculations that each participant needs to complete in an auditing assignment.(1): the algorithm is executed by the data owner. With the security parameter , the data owner chooses two cyclic multiplicative groups with the same prime order , and one bilinear map . Let be the generator of . The data owner chooses a cryptographic hash function , a pseudorandom function , a pseudorandom permutation , a secure hash function , and a random value . Then the data owner selects a random value as the secret key and calculate the public key as . Finally, release to public and keep as secret.(2): this algorithm is executed by the data owner. Let the file be identified by , where . Then, for each block, the data owner computes the corresponding authenticator , where is the identifier of and . Then generate a file tag , where is the signature of the file. The data owner then uploads the data file and corresponding data tag to the storage provider, where .(3): this algorithm is executed by the storage provider. Besides verifying the validation of the file identifier’s signature, the storage provider checks the correctness of each authenticator byand output the result of the authenticator verification, 1 for true and 0 for false.(4): this algorithm is executed by auditors together with AASC, each auditor independently picks two big random numbers and from , then sends them to AASC. AASC aggregates all s and s into and , respectively. AASC finally sends the two numbers to the storage provider as the data integrity challenge.(5): this algorithm is executed by the storage provider. On receiving and , together with the challenged block amount , the storage provider calculates the challenge index set and the random parameter set , where . The storage provider sets . After calculating and , the storage provider responses as the integrity proof to AASC.(6): this algorithm is executed by each auditor. After receiving the proof , and transmitted by AASC, auditors check whetherholds, then output the auditing result, 1 for true and 0 for false.

4. Analysis of Our Scheme

4.1. Security Model

We consider our scheme to fulfill the following two security requirements. First, the integrity of the challenged files is properly verified if the storage provider and auditors execute the protocol honestly. Second, the scheme resists semitrusted storage providers from deceiving the auditors about the integrity of the challenged data. It means, if the storage provider does not have the intact data file, it cannot generate the correct proof of data integrity. The first security requirement is defined as follows.

Definition 5. The proposed scheme is correct for data integrity checking, if for any random , a data file and the corresponding tag , the following equation holds:The second security requirement aims to resist three attacks mentioned in [43] launched by the storage provider, namely forge attack, replay attack, and replace attack. In each of these three attacks, the semitrusted storage provider responds to auditors with an invalid proof. We can capture the requirement through a security game that covers all three attacks. This security game consists of adversary and challenger . plays the role of a semitrusted storage provider who tries to trick auditors by forging data integrity proof. The game is described as follows:(1) runs algorithm to generate , then release to .(2) makes queries repeatedly to for some files. returns to .(3)Finally, outputs for a data file and data tag on the challenge .We define the advantage of is . We say the adversary wins the above game if is non-negligible.

Definition 6. The proposed scheme is sound, if there exists an efficient extraction algorithm such that, for output by adversary to the data file and data tag on the challenge and wins the above game, the extraction algorithm recovers file from and .

4.2. Security Analysis

Theorem 1. (Auditing correctness). When the storage provider stores the user’s data correctly, the proof it generates can be verified by auditors.

Proof. Given valid proof from the storage provider , the verification equation (1) in the algorithm will hold. Based on the properties of the bilinear mapping, the verification equation (1) can be proved correct by deriving the left-hand side from the right-hand side as follows:We use the hybrid argument technique to prove soundness, as in [15]. “Hybrid arguments” have been used extensively in cryptography for many years. Such an argument is essentially a sequence of transitions based on indistinguishability. First of all, we define the following games: Game-0. Game-0 is the original game defined in Section 4.1.Game-1. Game-1 is the same as Game-0, except that the challenger keeps a local list of all the tags he has signed. If the adversary has ever submitted a tag that (2) has a valid signature under but (2) has not been signed by , then announces failure and aborts.Game-2. Game-2 is the same as Game-1, except that records all responses to queries from . If succeeds but output by is not equal to , the challenger announces failure and aborts.Game-3. Game-3 is the same as Game-2, except that challenger announces failure and aborts if at least one .

Lemma 1. If there exists an algorithm that can distinguish between Game-0 and Game-1 with a non-negligible probability, then we can construct an algorithm to break the existential unforgeability with non-negligible advantage.
Analysis. If causes to abort in Game-1, then we can use to construct an algorithm against the existential unforgeability of the signature scheme.

Lemma 2. If there exists an algorithm that can distinguish between Game-1 and Game-2 with a non-negligible probability, then we can construct an algorithm to break the computational Diffie–Hellman assumption with non-negligible advantage.

Analysis. Suppose that and are elements of the CDH problem and we set . Suppose can respond to a signature , which is different from the expected signature . We can compute

Therefore, we can calculate .

Lemma 3. If there exists an algorithm that can distinguish between Game-2 and Game-3 with a non-negligible probability, then we can construct an algorithm to break the computational Diffie–Hellman assumption with non-negligible advantage.

Analysis. We assume that is a random oracle controlled by an extractor that answers a hash query posed by the adversary. For from the extractor, the adversary outputs such that

Then, the extractor sets to be . The adversary outputs such that

We divide the above two equations, then we have

Finally, can be taken as a response to the extractor.

Theorem 2. (Soundness). Assume that the computational Diffie–Hellman problem is hard in bilinear groups and the digital signature scheme is existentially unforgeable. Then no probabilistic polynomial-time adversary can break the soundness of the scheme with a non-negligible probability.

Proof. Any adversary’s advantage in Game 3 must be 0, because if there is no intact file , i.e., at least one , the challenger always announces failure and aborts. According to the game sequence and Lemmas 13, the advantage of the adversary in the original game, Game 0 must be negligible.

4.3. Analysis of Collusion Resistance

As portrayed in the algorithm, is used as the seed for the pseudorandom function to generate the indexes of the blocks to be challenged. This means that if is inherently secure, the indexes cannot be known without knowing . As proven in Theorem 2, the probability that a storage provider generates a proof that passes the verification without preserving the complete data is negligible. Malicious auditors can make the negotiated seed fall into the designed set by colluding with the storage provider, so that the indexes and random numbers of the challenge blocks are generated as per their expectation. The storage provider only needs to store a small part of the real data block to pass the algorithm. As long as there exists at least one honest auditor involved in the audit, the generation of aggregated random numbers is then not controlled by malicious auditors, and the probability that the number happens to be in the designed set is , where is the size of the set, which is negligible.

4.4. Discussion on Data Owner’s Trustworthy

The only security assumption in our scheme is that the data owner is honest. The data owner will perform arbitration when the auditors do not reach a consensus on the audit results. Actually, this is different from cutting out the auditors and allowing the data owner to perform the audit directly by himself. In a system where only two parties participate, the conclusions declared by either party are unconvincing. In our scheme, the arbitration will only be performed when the auditors’ results are inconsistent, which means that each kind of result is reached by multiple individuals. Moreover, the auditor who lied will definitely be discovered, which allows the data owner to perform the audit directly by himself. This leaves the auditor with no reason to lie, meaning that the arbitration may rarely be enforced.

4.5. Discussion on the Employability of Two-phase Commit

Our program uses 2PC in two phases, and . The generation of the challenge in our scheme relies on two numbers submitted by each auditor independently, which are confidential to the other auditors. If a malicious auditor knows other auditors’ numbers, then he can construct special numbers that prompt the smart contract to generate a challenge as he intends, which will make the whole scheme fail. We artificially divide the submission of secret numbers into two steps by introducing 2PC: the first step submits the hash value, and the second step submits the corresponding hash key. Due to the one-way nature of hashing, a malicious auditor cannot derive the secret numbers in the first step even if he knows the hash value, and thus cannot have any influence on the generation of the final challenge by constructing his own number. When it comes to the results submission phase, some auditors may choose to copy other auditors’ results due to their laziness. In the first step, the honest auditor can concatenate the audit result with its blockchain address to calculate the hash value. In this way, the smart contract can determine whether an auditor has copied someone else’s results by checking whether the hash value submitted in the second step matches the key submitted in the first commit.

5. Implementation and Performance Analysis

In this section, we discuss the performance of the proposed scheme in terms of computation and gas cost, respectively. We carry out a series of simulation experiments to evaluate the performance of our scheme, and the codes can be found at https://github.com/TDMaker/sc-paper. Note that, since the underlying layer of our scheme is a P2P overlay network, the network traffic required to maintain it must be much larger than other end-to-end schemes, so we have omitted the comparison of communication costs.

5.1. Environment

The experiments were carried out on an Ubuntu Desktop 20.04 with the processor of Intel(R) Core(TM) i7-6500U CPU @ 2.50 GHz  4 and 4 GB of RAM. In the local computing part of each participant, we use the pairing-based cryptography library [44] and the GNU multiple precision arithmetic [45], and we implement the simulation experiment using language. In our experiments, we choose the parameter to be the parameters of the library. The smart contracts are written in language and run in the Ethereum test net. Each participant uses programs written in to interact with the smart contract by calling the package [46].

5.2. Computation Analysis

We analyze the computation costs of all subalgorithms of the proposed protocol. We chose the size of the data block to be 160 bits. Without loss of generality, we change the block count from 100 to 1000 with an increment of 100 in each test. Since and are executed only once for the same file, and the time overhead is relatively large compared with other algorithms, as shown in Figure 4. The rest of the algorithms need to be executed repeatedly during each audit, as shown in Figure 5. For , it is used to generate the system parameters. Since its time overhead is static and relatively small, we do not plot it on the figure and only note it 4.715 ms averaged over ten experiments. For and , it is used to compute and check data owner’s outsourced authenticators, which take much longer time than other algorithms. This time cost increases with the size of the user file, which might become quite large. Fortunately, this time cost is one-time and can be done offline. For , it is to determine two random numbers to constitute the challenged block’s index sequence, which is pretty fast. For , it is to calculate integrity proofs by aggregating the challenged data blocks. This time cost relies mainly on the length of the challenge sequence and increases with the number of challenged blocks. For , it is to check the integrity proof, which is generated by the storage provider. This time cost is also increasing with the number of challenged blocks due to the same reason as . In our protocol, the most frequent algorithms are , , and , which are periodically performed by the storage provider and auditors. Thus, data owners in our protocol have a little workload after data outsourcing except when arbitration is needed.

Comparison: to show the efficiency advantage of our scheme, we compare it with the schemes proposed in [24, 25, 4749]. We list the results in Table 2. The computation cost of our scheme mainly lies in the expensive operations such as multiplication, exponentiation, and pairing. Other operations like hash function and addition only incur negligible costs, so we omit them when analyzing the computation cost. For simplicity, we use , , and to represent the overhead of multiplication operation, exponentiation operation, and pairing operation on group , respectively. Suppose there are blocks in total, of which blocks are challenged. It is easy to see that the entire efficiency of the scheme is mainly dependent on the efficiency of the algorithms , , , and . However, the and are run only once, its impact on the overall efficiency of the audit protocol is negligible. Therefore, we only make comparisons to evaluate the efficiency of the algorithms and . It is easy to find that our scheme takes the same number of multiplication operations as [48], but one more than all four remaining schemes in proof generation. Our scheme has the same exponentiation operation as the first three schemes. Meanwhile, [25] has one less exponentiation operation than ours, and [24] has one more than ours. The pairing operation is the most time-consuming operation, but it occurs only in [24, 47, 48]. In proof verification, the scheme [25, 47] needs two pairing operations and the scheme [49] needs three paring operations, but in the scheme [48], the paring operation is linear with the number of challenged blocks, while our scheme and the scheme [48] reduces multiplication operation and two exponentiation operations compared with the scheme [47, 49]. Although [24] outperforms other schemes in terms of exponentiation and paring operations, it does increase linearly in terms of multiplication operations.

Nonetheless, the above schemes all make various computational concessions for functionality while satisfying their proposed functional properties on the basis of security. So, the mere computation cost comparison can only be used as a meager reference.

5.3. Gas Cost Analysis

Gas is the fuel to be paid for running smart contracts on Ethereum. It measures how much “work” needs to be done for an operation or a series of operations. The gas prevents junk transactions from blocking the network and serves as additional income for miners. We deployed our smart contracts on Rinkeby [50], which is an Ethereum test net (or test network). The only difference in whether all auditors reach the same conclusion is that there is an additional step of by the data owner at the end. All other steps are exactly the same. Therefore, we only explain the case that requires the data owner’s arbitration. Because any number of lying auditors can be detected as long as there exists at least one honest auditor, and because the number of dishonest auditors have no effect on the final result, we introduce only two auditors in the experiment: an honest auditor and a dishonest one.

Figure 6 illustrates such an audit assignment, where the vertical coordinates represent each participant, and the horizontal coordinates portray how much Wei of gas each participant spends to execute a certain algorithm. Wei is the smallest unit of currency in Ethereum. 1 Ether =  Wei. Note that, , , and are events emitted (which are only auxiliary steps), so we did not list them in Section 3.3 for clarity. and are substeps of 2PC, so they have not been listed, either. As we can see from the figure, the two algorithms with the highest gas cost are and , because these two algorithms involve the deployment of smart contracts. The large amount of gas consumed by smart contract deployment comes from two aspects, on the one hand, the CREATE op code of the smart contract, which is called during contract creation, costs a fixed gas; on the other hand, from the storage of contracts, more byte code means more storage, and each byte costs 200 gas. This adds up very quickly. And the left operations require very little gas overhead. Fortunately, the algorithm is a management contract for audit assignments that is deployed only once in an audit system, while the algorithm is instantiated once for every audit assignment executed. Other operations require less gas overhead. In an audit assignment, the increase of gas for each additional auditor is less than . The gas overhead for the other participants is fixed, except when data owner arbitration is required, which needs an additional gas, but this overhead is insignificant compared to the reward it can earn.

6. Conclusion

In this paper, we design a remote data integrity audit auditing scheme based on the Ethereum smart contract. The challenge of this scheme is jointly generated by all Ethereum users participating in the audit. When auditing results are inconsistent, the data owner will complete the final arbitration. Safety proofs and experimental results show that our scheme is secure and efficient.

Data Availability

No data were used to support this study.

Conflicts of Interest

The authors declare that there are no conflicts of interest with this study.

Acknowledgments

This work is funded by: the National Natural Science Foundation of China under grant no. 61701190, by the National Key R&D Plan of China under grant no. 2017YFA0604500, and by the National Sci-Tech Support Plan of China under grant no. 2014BAH02F00.