Abstract

In the internet of things, user information is usually collected by all kinds of smart devices. The collected user information is stored in the cloud storage, and there is a risk of information leakage. In order to protect the security and the privacy of user information, the user and cloud provider will periodically execute a protocol called proof of retrievability scheme. A proof of retrievability scheme ensures the security of the data by generating proof to convince the user that the cloud provider does correctly store the user information. In this paper, we construct a proof of retrievability scheme using the blockchain technology. Using the advantage that the stored data cannot be tampered with in blockchain, this ensures the integrity of the data. Specifically, some related definitions, security models, and a blockchain-based construction of a proof of retrievability scheme are given. Then the validity and security of the scheme are proved later. As a result, user information can be protected by our scheme.

1. Introduction

1.1. Background

With the information systems coming into our life, there are many user private information appliances such as surveillance cameras, smartwatches, smart door locks, and the online supermarket. They provide a lot of convenience for our life. However, these providers will collect user information and store it in the cloud where new technologies are widely used [17]. Due to the vulnerability of the cloud, user information could be attacked by hackers in information systems and can be easily stolen if the cloud storage provider is compromised. Among the problems and challenges of cloud storage [810], only the problem of how to ensure the security and integrity of the information is considered in the paper. In order to solve it, three kinds of methods are used [11]: proof of ownership (PoW), provable data possession (PDP), and proof of retrievability (PoR). We focus on the PoR and for the state-of-the-art of PoR, the reader is referred to [1125].

Generally, the schemes of PoR are under different settings and security models. On the one hand, some schemes [1114] are for static data. Some schemes [1517] discussed the multiserver setting. In these schemes, the client can identify machines and recover the data from the others by using the audit mechanism. Other schemes [1825] are for dynamic data. On the other hand, works in [1821] are about security. The authors of [2224] researched on memory checking and study how to authenticate remotely stored dynamic data. The scheme in [25] is for the multiserver and dynamic data setting.

Recently, blockchain is used to eliminate a trusted third party in many protocols [26]. However, it is still unknown how to utilize blockchain in PoR schemes, which is also a new challenge in constructing a PoR scheme.

1.2. Motivation and Contribution

The concept of blockchain was first proposed in 2008 in “Bitcoin: a peer-to-peer electronic cash system” [27] published by the cryptography mailing group by a scholar known by the pseudonym “Satoshi Nakamoto.” The verification, bookkeeping, storage, maintenance, and transmission of the data in blockchain are all based on the distributed system structure, and the trust relationship between distributed nodes is established by the pure mathematical method instead of the central mechanism. Thus, a decentralized and reliable distributed system can be formed. The goal of blockchain is to provide trusty for transactions between untrusted entities, without the need for a trusted third party. At present, many institutions have combined the industry conditions with the characteristics of blockchain and made beneficial attempts in many industries, including payment, Internet of things, credit investigation, transaction settlement and clearing, crowdfunding, equity transaction, audit, supply chain, digital asset management, notarization, and other fields [2833]. We consider using blockchain technology to solve the problem of the trusted third party in the verification of the PoR scheme.

In this paper, we first define a security model for the blockchain-based proof of retrievability by modifying the model in [14, 34, 35]. Secondly, we propose the first concrete PoR scheme based on blockchain. Finally, we demonstrate that the proposed scheme is provably secure in the new model.

1.3. Organization

The rest of the paper is organized as follows. Preliminaries are given in Section 2. In Section 3, we formally define the framework and security model for blockchain-based PoR schemes. Then a concrete construction of a blockchain-based scheme is presented in Section 4. We analyze the security of the proposed scheme in Section 5. Finally, conclusions are made in Section 6.

2. Preliminaries

In this section, some notions are introduced such as hash function, Merkle tree, blockchain, and bilinear pairing.

2.1. Hash Function

The hash function is used to map data of an arbitrary length (input) to data of fixed length (output). is called the hash of . Many Hash functions [36] are widely publicly available and can be selected based on the context.

This transformation is a compression mapping, which has the following properties:(i)The space of the hash value is usually much smaller than the space of the input.(ii)Different inputs may hash into the same output, but it is hard to find two different inputs such that .(iii)It is infeasible to determine the input value from the hash value .

Assumption 1 (hash function preimage assumption). Given , it is hard to compute .

Assumption 2 (hash function collision assumption). Given , it is hard to compute such that .

2.2. Merkle Tree

Merkle tree, also known as a Hash tree, as the name implies, is a tree that stores hash values. A leaf node of a Merkle tree is attached to the hash value for a data block. A nonleaf node is attached to the cryptographic hash of its corresponding child nodes.

Figure 1 presents a simple example of a Merkle tree with 4 pieces of data. Let be a hash function and denotes the set of data used to generate the Merkle tree. A Merkel tree is generated as follows: firstly, for all leaf nodes, where and is the binary form of ; secondly, for all inside nodes, the value of the node is where and are the value of left child and right child, respectively. An Merkle tree is valid if and only if the value of each inside node equals to . As a result, this example outputs the following:

In a Merkle tree, the value of the root node is called the hash of the Merkle tree. For the example in Figure 1, the hash of that tree with data is

In the rest of this paper, we use to denote the Merkle tree created by the data set and use to denote the hash of a Merkle tree , where is the underlying hash function. For example, the hash of the Merkle tree created by the data set can be denoted by .

2.3. Blockchain

Within a blockchain, the hash function is used to determine the state of the blockchain and Figure 2 shows the structure of blockchain which can be viewed as a linked list of blocks. Every block has four basic objects: the hash of the previous block, the timestamp of generation, the random number of security, and the hash of a Merkle tree. Usually, the corresponding Merkel tree is linked with the block too. Two neighbor blocks are linked by a hash pointer that points from the previous block and thus it creates a chain of connected blocks, hence the name blockchain. By linking blocks in this manner, the ordered hashes of all the blocks represent the entire state of the blockchain, namely,where is a hash function. A blockchain is valid if equal to the value of the field hash of in the structure of the block , for all .

To utilize blockchain for a data set (see the example in Subsection 2.2), a corresponding Merkel tree will be constructed by the data set . Then a new block denoted by can be generated with the help of a timestamp provider. Adding more parameters, we use to denote a block where is the data set to generate the hash of the Merkle tree, is the timestamp of the current time, and is the random number. Moreover, a blockchain provides the following operations:(i)NewBlock(X;ts): create a valid block .(ii)AppendBlock(B): append the block to the blockchain by filling a suitable random number in the block.(iii)FetchBlock(ts): return the block at a time in the blockchain. If there are no blocks at that time, then NULL is returned.

Recently, there are issues in maintaining a blockchain, such as generating blocks [37, 38] and updating with efficiency [39]. Anyway, to summarize the characteristic of blockchain, we have the following assumption.

Assumption 3 (blockchain assumption). All the state and blocks of blockchain is hard to modify after they were generated.

2.4. Bilinear Pairing

Bilinear pairing is also called bilinear mapping, which was first used to construct tripartite key exchange protocol [40]. It involves three multiplicative cyclic groups , and which have a prime order . Bilinear pairing is a mapping satisfying the following conditions:(1)For any , and , it always has (2)There exists two elements and such that where is the identity in (3)For any , it is feasible to compute

Let and be the generators of , respectively. There are two security assumptions related to bilinear pairing.

Assumption 4. (bilinear decisional Diffie–Hellman). Given a bilinear pairing , , and a randomly selected element , it is hard to distinguish from .

Assumption 5. (bilinear computational Diffie–Hellman). Given a bilinear pairing , , it is hard to compute .

3. Security Model

3.1. System Setting

Our system has three entities, the user, the cloud storage provider where user information is stored, and a blockchain where several timestamp providers are available to all entities. The structure of the system setting is shown in Figure 3.(i)The User. The user is the entity who wants to store the data on the cloud storage. Whenever the user wants to check whether the data is correctly stored on the cloud storage, then a request of PoR will be generated and sent to the cloud storage. With the help of blockchain, the user can verify the retrievability of stored data by the proof received from the cloud storage provider.(ii)The Cloud Storage Provider. A Cloud storage provider is an entity who exactly stores the data for the user. Besides, the cloud storage provider generates and sends the proof of retrievability after receiving the request from the user.(iii)BlockChain. BlockChain is mainly for keeping the transcripts of PoR scheme constant. Moreover, timestamp providers in a blockchain can help the cloud storage providers to generate PoR and the user to verify the generated PoR.

3.2. Timestamp Usage

The timestamps are provided to both the user and the cloud storage provider. The existence of the data is guaranteed by timestamp through computing the hash value which is included in the next timestamp. In our scheme, we will modify the traditional timestamp computation. At the end of every proof generation, the timestamp provider proceeds to compute a timestamp on the current time and makes the timestamp published on the blockchain. The timestamp is used to compute the hash value in blockchain by the cloud storage provider.

We benefit the security from the usage of timestamps. On the one hand, running a PoR scheme twice at two different moments would be the PoR for the duration between the two moments. On the other hand, it gives a timeline of PoR records which can be used to analyze the efficiency.

3.3. Definition

There are five algorithms in the blockchain-based PoR which are described as follows:(i)Keygen: The input of the algorithm is the security parameter, and the output is the public key and private key of the system and the user.(ii)Outsource: In this stage, it inputs the private key and user data , and outputs a data set with blocks and one tag for each block. For the blockchain, also generates new blocks for the data.(iii)RequestChallenge: The user randomly selects a challenge and sends it to the cloud storage provider.(iv)ResponseProof: The proof process is an interactive protocol. The input is a public key, the file name and tag of the file and the output is proof for a proof response.(v)VerifyProof: The input is a system parameter and proof, the output of the algorithm is accepted or rejected.

Remark 1. Note that system parameter includes the structure and the state of selected blockchain, as well as another luxury public information such as the hash function implementations and bilinear pairing implementations.

3.4. Security Model

Under the assumptions mentioned in Section 2, a blockchain-based PoR scheme is secure if it satisfies the following two properties.(1)Correctness. If all the effective proofs generated by the algorithm (KeyGen, outsourcing, Request Challenge, Response Proof, and Verify Proof) are defined above, the verification algorithm outputs accept, then a blockchain-based PoR scheme is correct.(2)Reasonableness. For reasonableness, if any malicious cloud storage provider can generate proof such that the Verify Proof outputs accept. That is, the user believes that the cloud storage provider can generate the proof only if it correctly stores the user data.

If the probability that an adversary with arbitrary probabilistic polynomial-time wins the game described below is negligible, then a blockchain-based PoR scheme is reasonableness.(a)Setup: The challenger runs the Keygen algorithm to obtain the public key and private. Then the public key is sent to the adversary.(b)Outsource: The adversary selects a data set and sends it to the challenger, who runs the Outsourcing algorithm and responds with the output.(c)ChallengeProof:(1)In the Request Challenge algorithm, the challenger randomly generates a challenge message and sends it to the adversary.(2)The adversary generates a data set first by running an arbitrary algorithm that returns a proof. The proof will be sent to the challenger in the Response Proof algorithm.(d)Verify: The challenger runs the VerifyProof algorithm to verify the proof received from the adversary. It outputs accept if and only if the proof is accepted by the challenger.

The adversary wins the game if accept is outputted in the last Verify step.

4. Our PoR Scheme

4.1. High Description

In this section, we will propose a blockchain-based PoR scheme. To cut costs, the cloud storage provider only needs to generate a Merkle tree for a data set and store the hash of the Merkle tree in the blockchain. The data set can be stored anywhere by the cloud storage provider. When the user requests a challenge of PoR, the cloud storage provider fetches back the Merkle tree and generates a PoR to the user with the help of blockchain.

4.2. Blockchain-Based Proof of Retrievability Scheme

Our scheme consists of five algorithms, namely, Keygen, Outsource, RequestChallenge, ResponseProof, and VerifyProof.

4.2.1. Keygen

Both the user and the cloud storage provider make consensus on public system parameters: a hash function , a blockchain , a block size , a prime number , a generator of the cyclic multiplicative group , and a bilinear pairing on .

The user chooses a nonzero element randomly as a private key and computes and publics as a public key.

4.2.2. Outsource

When a user wants to store a file on the cloud storage, the interactive algorithm is run between them.(1)Given a data set , the user uses an error correction code to get the encoded data . In the case that some blocks may be lost by the cloud storage, an error correction code is used to reconstruct the original data set [41].(2)Divide the encoded data into blocks, , where .(3)For each data block , the user computes the authentication tag as follows:(i)Randomly choose a nonzero element called block nonce.(ii)(4)The user outsources and to the storage server.(5)The cloud storage provider creates a Merkle tree by , and stores the hash of the Merkle tree into the blockchain by doing an operation .

Remark 2. When we compute , and are treated as a big integer number.

4.2.3. RequestChallenge

To verify that the provider has stored the data correctly, the user randomly selects an integer indicating which block should be checked. Then and are sent to the provider for requesting challenge.

4.2.4. ResponseProof

For the cloud storage provider, there are blocks of data and the -th block is requested to be checked. Now when the provider receives a request challenge, a PoR can be generated as follows:(1)Randomly select a nonzero element .(2)Compute and .(3)Fetch out and from storage devices to retain the hash of the Merkle tree .(4)Send to the timestamp provider in the blockchain.(5)The timestamp provider verifies that is valid when received it. If it is valid, then a timestamp is generated to runand is sent back to the cloud storage provider. Otherwise, the algorithm is terminated.(6)The cloud storage provider generates the proofwhere and . Then is sent back to the user.

4.2.5. VerifyProof

After receiving the proof, the user does the following operations in order:(1)Send to the timestamp provider in the blockchain. If no accept is returned, then the algorithm is terminated with a reject.(2)If the blockchain is invalid (See Section 2.3), then the algorithm is terminated with a reject.(3)Run to obtain the corresponding Merkle tree and the hash of that Merkle tree from the blockchain.(4)If , then the algorithm is terminated with a reject.(5)If is invalid (See Section 2.2), then the algorithm is terminated with a reject.(6)If does not equal the value of the corresponding leaf node in the Merkle tree, then the algorithm is terminated with a reject.(7)If , then the algorithm is terminated with a reject.(8)If , then the algorithm is terminated with a reject.(9)If , then the algorithm is terminated with a reject.(10)Return accept.

Remark 3. Firstly, the above operations first check that the blockchain (without Merkle trees) and the Merkle tree related to the last block are valid. Secondly, the existence of the -th block is checked.

5. Security Analysis

5.1. Correctness

Theorem 1. The verify process is correct. It means thatholds where .

Proof. It follows from the property of bilinear pairing that

Remark 4. Due to Assumptions 4 and 5, the private key is still secure even the result of bilinear pairing computation is public.

5.2. Reasonableness

Theorem 2. If the cloud storage provider is honest, the final proof must bewhere , and is the current timestamp.

Proof. If the cloud storage provider is honest, the following points hold true:(i) and guarantee that at least the cloud storage provider stores the -th block which is not revealed to the public in the bilinear pairing computation (See Remark 5.1).(ii)The Merkle tree is created by the cloud storage provider at the time was required, and the leaf nodes of the tree are all part of the data set . It follows from Assumptions 1 and 2 that these hashes cannot be found without knowing the original data set .(iii) generated by blockchain is trusted according to Assumption 3.(iv)The consistency of the Merkle tree and timestamp are assured by and , respectively.To sum up, the cloud storage provider must store the data set correctly if VerifyProof return is accepted.

5.3. Traceability

Theorem 3. The blockchain-based PoR scheme in Section 4 is traceable.

Proof. If the cloud server is dishonest, that is, the server modifies, deletes, or tampers with a piece of file without authorization of the user, cannot compute the value of the root node correctly, so it cannot prove that he has completely stored the data. By verifying the Merkle tree, it will get which piece of file has been modified finally.
For example, to verify whether the fifth block file has been modified, the following procedure can be followed and the structure as shown in Figure 4:(i)Verify Node 1. Verify that the calculated value of node 1 is correct through the values of node 2 and node 3.(ii)Verify the Value of Node 3. The receiver computes the value of node 3 through the values of node 6 and node 7 that he has received and verifies whether the calculated value of node 3 is correct.(iii)Compute the Value of Node 6. The receiver computes the value of node 6 through the values of node 12 and node 13 that he has received and verifies whether the calculated value of node 6 is correct.(iv)The receiver computes the value of node 12 from the value of and verifies that the calculated value of node 6 is correct.The correct value can be determined by whether the value of node 6 is consistent. This allows you to track down blocks of files that have been modified.

5.4. Resistance to Two Kinds of Attacks

In this subsection, two kinds of attacks are considered, i.e., replay attacks and collusion attacks.

5.4.1. Resistance to Replay Attack

In Section 4.2.4, note that there is a timestamp attached to the , where(i)If a data storage provider uses an old timestamp , then it would be rejected in the first step (1) in Section 4.2.5 since the timestamp provider can easily find that such is expired. In other words, such may be valid in a short time. However, the user could not run this protocol twice in such a short time.(ii)If a data storage provider uses an old proof , then it would be rejected in the seventh step (7) in Section 4.2.5 since is attached with a challenge that is randomly generated by the user. should be different in two runs of this protocol.

In a word, our protocol is resistant to replay attacks with old timestamps or old proof .

5.4.2. Resistance to Collusion Attack

If we consider the case that the timestamp provider (and by extension the blockchain provider) colludes with the data storage provider, then, in other words, the data storage provider would also play as a timestamp provider in the blockchain context. However, due to the security analysis of blockchain [42, 43], such malicious timestamp providers could be detected by the nodes in the blockchain network. Under Assumption 3 (BlockChain assumption), our protocol is resistant to such collusion attacks which can be reduced to an attack in a blockchain context.

6. Conclusion

In order to protect the security and integrity of user data, we formally defined a novel security model for a blockchain-based PoR scheme and proposed a secure scheme under the defined security model. The properties of the PoR scheme and the characteristics of blockchain, ensure the security and the integrity of data, respectively. Furthermore, we prove the correctness and reasonableness of our scheme. Our scheme makes user data more secure. In our scheme, blockchain plays an irreplaceable role in the privacy and security of user data. It is believed that as a blockchain improves the PoR scheme, it will continue to promote the progress of technology.

However, there are still many attacks not being considered, such as reset attacks and malicious attacks. To improve the performance, it is interesting to remove the bilinear mapping while reserving the same security level.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest in this work.

Acknowledgments

This work was supported by the Higher Education Technology Innovation Projects Foundation of Shanxi (Grant nos. 2021L467and 2020L0560), National Statistical Science Research Program of China(Grant no. 2021LY047), Research Project of Yuncheng University(Grant nos. YQ-2020020 and XK-2020036), Key Discipline Project of Yuncheng University, Young Innovative Talents Project of General Colleges and Universities in Guangdong Province (Grant no. 2019KQNCX112), Talent Special Project of Research Project of Guangdong Polytechnic Normal University (Grant no. 2021SDKYA051), Opening Project of Guangdong Provincial Key Laboratory of Information Security Technology (Grant no. 2020B1212060078), and the Guangdong Basic and Applied Basic Research Foundation (Grant no. 2021A1515011954).