Abstract

Cloud storage services allow users to outsource their data remotely to save their local storage space and enable them to manage resources on demand. However, once users outsourced their data to the remote cloud platform, they lose the physical control of the data. How to ensure the integrity of outsourced data is the major concern of cloud users and also is the main challenge in the cloud service deployment. Limited by the communication and computation overheads, traditional hash-based integrity verification solutions in the stand-alone systems cannot be directly adopted in remote cloud storing environment. In this paper, we improve the previous privacy preserving model and propose an effective integrity verification scheme of cloud data based on BLS signature (EoCo), which ensures public audition and data privacy preserving. In addition, EoCo also supports batch auditing operations. We conducted theoretical analysis of our scheme, demonstrated its correctness and security properties, and evaluated the system performance as well.

1. Introduction

Recent years, cloud storage services are getting more and more popular, which allow users to outsource their data remotely to save their local storage space and enable them to manage resources on demand. However, once users outsourced the data to the remote cloud platform, they lose the physical control of their data. Some cloud service providers (CSPs) may remove users’ data that are less accessed to gain more profits. In addition, the loss of data may also be caused by system crashes or operation errors [1, 2]. In this situation, CSPs may conceal the fact that users’ data have no longer been stored correctly. How to ensure the integrity of outsourced data is a major concern of cloud users. Being limited by the communication and computation overheads, traditional hash-based integrity verification solutions for stand-alone systems cannot be directly adopted in remote cloud storing environments. For example, a strawman solution is to compute and keep the message authentication code of the file before outsourcing and then retrieve the whole file from the cloud server and check with the message authentication code. However, this will introduce unacceptable communication and computational overhead.

To verify the integrity of cloud data without retrieving the whole outsourced files, Ateniese et al. [3] proposed a provable data possession (PDP) scheme, which provides remote integrity verification based on homomorphic tag and sampling techniques. Their improved approach supports public audition but cannot ensure the privacy preserving property. Juels et al. [4] proposed a proof of retrievability (POR) protocol to audit and ensure the correctness of remote data once data corruption happened. Nevertheless, the scheme only supports limited rounds of verification without privacy protection of public audition. The scheme proposed by Shacham et al. [5] cannot protect data privacy either. Although many other public auditing solutions have been proposed [612] that preserves privacy during the integrity audition, some schemes [6, 8, 9] only prevent the data leakage, rather than provide rigorous privacy guarantee. The solutions proposed by Wang et al. and Worku et al. [6, 9] fail to achieve reliable data protection under their security models.

In this paper, we propose an effective verification of cloud data integrity scheme, EoCo, based on BLS (Boneh-Lynn-Shacham) signature with strong privacy protection. Besides efficient remote data integrity verification, our scheme also supports public auditability, privacy protection, blockless verification, and batch audition. We formally define a new integrity protection model based on the work proposed by Worku et al. [9] and prove that dishonest server cannot bypass the verification based on the CDH (computational Diffie-Hellman) problem. We also introduce a ZK-privacy (zero knowledge privacy) model and prove that our scheme is secure against CMA (adaptive chosen messages attack) attacks and CPA (chosen plaintext attack) attacks under the security assumption of the cryptographic primitives.

Contributions. Our contributions can be summarized as follows:(i)We propose a remote data integrity protection scheme that ensures public auditability, privacy protection, and blockless verification. Besides, our scheme may also support batch auditing operations.(ii)We introduce and formally define a ZK-privacy model, where the adversary obtains zero knowledge from the auditing interactions. We prove the privacy preserving property of our scheme under the ZK-privacy model.(iii)We evaluate the performance of our proposed scheme through mathematically analysis and compare it with related schemes in communication and computation overhead. The communication overhead of EoCo is only .

Paper Organization. The rest of this paper is organized as follows. In Section 2, we discuss the related work and the key challenges; Section 3 presents the preliminaries of the proposed scheme; in Section 4, we describe the system model and the security goals; in Section 5, we propose our publicly auditable integrity verification scheme; we conduct theoretical analysis with security proof in Section 6; in Section 7, we evaluate the performance overhead of our approach; and Section 8 concludes the paper.

2.1. Provable Data Possession and Proof of Retrievability

Ateniese et al. [3] first proposed a remote auditing system using a provable data possession (PDP) model to ensure data integrity in untrusted storage services. To deploy RSA (Rivest-Shamir-Adleman) digital signature, the scheme splits data into small file blocks. The high probability guarantee of data integrity is achieved by randomly selecting some blocks and check the correctness by using the attribute of RSA homomorphic linear validation. Ateniese et al. [13] introduced the retrievability property on the basis of PDP using error-correcting codes. However, as the linear combination of sample file blocks is exposed to external auditors in the public auditing process, the above two solutions cannot achieve provable privacy protection. Juels et al. [4] proposed POR model to construct data integrity verification. The scheme ensures data integrity and retrievability through spot-checking and error-correcting codes. Spot-checking is to randomly embed special check blocks, sentinels, into the data file, and then randomly select a number of sentinels to verify the file data’s integrity. However, this scheme has a critical problem that once the times of validations is beyond a certain number, these fixed sentinels will be exposed and the data integrity will not be guaranteed. In addition, the scheme cannot support public audition. Bowers et al. [14] improved the POR model, and Shacham et al. [5, 15] also proposed an improved POR scheme base on BLS signature [16]. However, these schemes cannot ensure privacy protection for the same reason as the scheme proposed by Ateniese et al. [3]. Shah et al. [12, 17] introduced a third party auditor (TPA) and sent a number of precomputed symmetric-keyed hashes over the encrypted data to the TPA. Nevertheless, this scheme can only be applied to encrypted files and used in a limited way. When the keyed hashes are used up, this scheme will give the user additional online burden. Worku et al. [9] redefined a new integrity protection model that ensured a stronger definition of integrity by adding a second query phase and also proposed a scheme that claims to be able to acquire provable security under the model. However, Liu et al. [11] proved that the scheme in [9] does not satisfy the definition of its own security mode. Worku’s scheme cite 10 selects a unique identifier for each file, but the cannot be well embedded into the scheme and is not tightly connected to the scheme. Therefore, the adversary can extract important knowledge in the second inquiry phase of the security model [18].

2.2. Public Audition

Wang et al. [6] proposed a public audition scheme with the privacy protection property that an adversary cannot obtain the information of data in the PDP model. Worku et al. [9] presented that the Wang’s scheme [6] is not secure against the attacks from malicious servers and proposed a privacy preserving scheme. Wang et al. [8] proposed an improved scheme to achieve the property of privacy protection. However, these schemes [6, 8, 9] cannot satisfy the definition of the strict privacy protection model, IND-privacy (indistinguishability-privacy), presented by Fan et al. [10]. The IND-privacy model achieves privacy protection by proving the indistinguishability of responses in the auditing process to external auditors. The general idea and design of Worku’s scheme [9] and Wang’s scheme [8] are similar. The former uses to hide the linear combination of the blocks, while the latter is implemented by , where and are the random numbers and stands for hash function. These two ways can only ensure that external auditors cannot obtain the relevant information of the data, but for the malicious auditors, it is easy to distinguish the different information to obtain the relevant knowledge of the cloud users. The protocol proposed by Fan et al. [10] is inefficient and its symmetric external Diffie-Hellman assumption can be solved in the presence of bilinear mapping.

2.3. Dynamic Maintenance

In practical settings, supporting dynamic maintenance is desired in remote data attestation. Ateniese et al. [19] proposed the concept of dynamic operation and built a scheme that does not require batch encryption based on a symmetric cryptosystem. However, this scheme is limited to the number of queries and does not support full sense of dynamic scenarios. Wang et al. [7, 8] supported full dynamic operations by using Merkle hash tree (MHT) structure. Erway et al. [20] developed a skiplist-based scheme to enable the integrity of data with full dynamics operations. Sookhak et al. [21] also proposed a dynamic scheme. Xin et al. [22] proposed an effective and secure access control approach for multiauthority cloud storage systems.

In addition, the communication and computational overheads are also critical metrics to evaluate the efficiency of cloud services [23, 24]. Among the schemes above, the communication overhead of [8] during validation is , while [9] is more efficient in computational overhead. We illustrate the comparison of the relevant solutions in Table 1.

3. Preliminaries

3.1. Bilinear Map and GDH Groups

Let , , and be multiplicative cyclic groups of the same large prime order , where . Let be the generators of , respectively. is a bilinear map if it satisfies the following properties:(i)Bilinear: and , holds.(ii)Nondegenerate: , such that , is the identity element of the cyclic group .

If there exists an isomorphism , with , is a Gap Diffie-Hellman (GDH) group pair. We can set and and take to be the identity map.

(a) Computational co-Diffie-Hellman (co-CDH) on GDH [16]. Given and as input and , compute .

(b) Decision co-Diffie-Hellman (co-DDH) on GDH [16]. Given and as input, if the output is yes; otherwise, the output is no. When the answer is yes, we say that is a co-Diffie-Hellman tuple.

When , these problems reduce to a standard CDH and DDH problems. The co-DDH problem is easy to be solved but co-CDH is hard on the GDH group [16].

3.2. BLS Signature

The BLS signature [16] includes three functions, , and . Let be a GDH group where . It makes use of a full-domain hash function .(i)KeyGen. Randomly choose and compute . The public key is and the private key is .(ii)Sign. Given a private key and a message , compute and . The signature is .(iii)Verify. Given a public key , a message , and the signature , compute and verify whether is a valid co-Diffie-Hellman tuple. If so, output valid; otherwise, output invalid.

4. Approach Overview

4.1. System Model

Our scheme, EoCo, is built on the system model presented in Figure 1. The model consists of three entities, Users, CSP Servers, and The Third Party Auditor.(i)Users. Cloud users own the data and want to save local storage and computing resources by uploading them to the cloud.(ii)CSP Servers. The CSP servers have a large amount of storage space available for users. At the same time, CSP servers provide effective cloud operations such as data update and queries and retrieve requests from the customers.(iii)The Third Party Auditor (TPA). The TPA has more ability and expertise than the clients. Users could ask a TPA help to audit the integrity of the outsourced data on behalf of them.

In our system model, our remote data integrity protection scheme consists of five critical operations, KeyGen, TokenGen, Challenge, Response, and CheckProof. A cloud user runs KeyGen to generate her/his public key and private key (with a security parameter as its input). The user runs the TokenGen algorithm to generate a token for the file and outsources the file with corresponding token to the CSP server. When the user wants to check the correctness of her/his data, she/he may delegate the integrity audition to a TPA. The TPA creates using Challenge and sends it to the CSP server. The server runs Response and returns the proof to TPA. The TPA runs CheckProof to check whether it is correct, and if it is correct, the output is ; otherwise, the output is .

In the system shown in Figure 1, we mainly take two threats in to account, the threat to integrity, and the threat to privacy [25]. Accordingly, we classify attackers into two types based on the knowledge that they processed.(i)Threat to Integrity. The attacker observes the data , the authentication identifier , and the public key , i.e., . The purpose of such attacker is to produce a legitimate proof for the forge DA.(ii)Threat to Privacy. The attacker observes only the public key and the proof , i.e., . The purpose of such attacker is to acquire additional knowledge, such as the content of data or the type of data.

4.2. Integrity Protection Model

We improve the integrity protection model proposed by Worku et al. [9] and through this model, we demonstrate two objectives.(1)If the data are not stay the original state, an adversary cannot successfully construct a valid proof in polynomial time with nonnegligible probability.(2)If the adversary can always pass the verification, then it can be shown that the data remains intact.

Our integrity protection model allows an adversary to query large files . The adversary , may be a dishonest cloud service provider who interacts with a challenger (users or TPA). The integrity game consists of the following steps and we illustrate the details in Figure 2.(i)Setup. The challenger runs the algorithm of key generation, sends the public key to the adversary , and retains the private key secret.(ii)Query1. The adversary adaptively makes tagging queries: it selects a file and sends it to . The challenger then computes the token and sends it back to . The adversary continues to query for the token on the files of its choice . In general, the challenger generates for some .(iii)Challenge. The challenger generates a challenge and requests the adversary to provide a proof of integrity for the file (note that file must differ from the in query phase).(iv)Query2. Repeat Query1, the challenger generates for some .(v)Forge. The adversary computes a proof of integrity for the file according to challenge and returns to .

If the can pass the verification, the adversary wins the game.

Definition 1. An EoCo scheme guarantees data integrity if for any polynomial time adversary cannot win the game with nonnegligible probability.

4.3. Privacy Protection Model

In this subsection, we define a new privacy protection model, the ZK-privacy model, and prove that the scheme does not have any information leakage by showing that the attacker has zero knowledge during the audit process. The ZK-privacy model takes place between a challenger such as cloud server and an adversary such as malicious TPA. The game model includes the following steps and we summarize the critical operations in Figure 3.(i)Setup. The challenger runs the algorithm of key generation to generate key pair and sends the public key to the adversary .(ii)Phase 1 (Steps 1–3). The adversary adaptively makes queries: it selects a file and sends it to the challenger , then the challenger generates the token, , for the file , and sends it to the adversary .(iii)Phase 2 (Steps 4–7). The adversary chooses two files , , and , which are different from the files in Phase 1. Then the challenger generates corresponding . Next, the challenger randomly selects and sends to . The adversary generates a challenge to . The challenger generates proof, , to the . Finally, the adversary outputs a bit as a guess of the . If , the adversary wins the game.

Definition 2. We define the advantage of the adversary as The indicates the probability distribution of the view of in the case of or . An EoCo scheme guarantees that there is no an information leakage if .

5. Our Schemes

5.1. Definition and Framework

Our scheme mainly consists of five polynomial time algorithms: KeyGen, TokenGen, Challenge, Response, and CheckProof. The cloud user is represented by , the cloud server is represented by , and the third party auditor is represented by .(i)KeyGen(. is a probabilistic key generation algorithm, which is set up and initialized by . It takes as input a security parameter and outputs a pair of key .(ii)TokenGen(. is a deterministic algorithm to compute tokens. It takes as inputs a private key , a public key and a file . The output is the token , where is a file tag which includes a file , is an ordered collection, and is the unique authentication identifier corresponding to the blocks in the file .(iii)Challenge(. is a deterministic algorithm. It takes as input a security parameter . It outputs a challenge .(iv)Response(. is to create the proof of the data integrity corresponding to the received challenge. It takes as inputs a public key , a file , a challenge , and an ordered set . The output is an integrity proof of the file .(v)CheckProof( or . is the function to verify the returned proof . It takes as inputs a public key , a challenge , and a proof as input. It outputs or .

A EoCo scheme can be summed up as two processes: one is the setup process to initialize the scheme, and the other is the verification process to confirm whether the data is integrity.(i)Setup. The cloud user C has a file and runs the algorithm KeyGen to generate the key pair . C then stores the key pair locally and uses it to run the algorithm TokenGen to generate a token, . Finally, C sends and to S and deletes and from its local storage.(ii)Audition. TPA runs the algorithm Challenge to generate a challenge corresponding to the target files. TPA sends to the server . After receiving , computes the proof accordingly and returns the TPA the proof and by using the function . The TPA validates and using the function CheckProof.

5.2. Notations

In this subsection, we illustrate the parameters and notations used in our scheme. As listed in Table 2, is a large prime and the EoCo scheme is built on the group and supports a cyclic group with a bilinear setting. Let be an element of cyclic group. In the setup phase, the date owner splits the file into small file blocks, that is, , and for , every block . is a secure hash function. The additional BLS signature used in our scheme is represented as . In addition, we take advantage of a pseudorandom permutation (PRP) and a pseudorandom functions (PRF) with the following parameters. We write to denote keyed with key applied on input .(i).(ii).

5.3. EoCo Scheme

(i)KeyGen(). The algorithm first generates a secure BLS signature key pair . Next, the algorithm randomly selects an element in the , , and computes . Then, the algorithm randomly selects an element in the , , and computes . Finally, the algorithm returns public key and private key .(ii)TokenGen(). The algorithm randomly selects an element in the as a unique identifier of file , , and calculates file identification tag by BLS signature, . Next, the algorithm generates unique authentication identifier for each block in the file, . For , . The algorithm then outputs token .(iii)Challenge(). The algorithm first determines the number of file blocks . According to the security parameters , the algorithm randomly selects key for and key for . Finally, the algorithm returns .(iv)Response(). The algorithm first selects a secret random value . For , the algorithm computes and , and the algorithm then computes aggregated authentication identifier (note that is the -th value in ). , and . Finally, the algorithm outputs proof .(v)CheckProof(). The algorithm first verifies whether the file tag is correct. If is incorrect, the algorithm outputs and quit; otherwise the algorithm extracts and for , the algorithm calculates , and . Finally, the algorithm verifies the following equation:If (2) holds, returns ; otherwise, it returns .

6. Scheme Analysis

6.1. Correctness

In this subsection, we discuss the correctness of our scheme; that is, the CSPs will definitely pass through the audition if they follow the protocol honestly. By signing the of a file with an additional BLS signature, we can prevent an adversary from tampering with the . We use a hash function to ensure that the file tag, , is perfectly embedded in the authentication identifier. We make use of a PRP function and a PRF function to ensure the randomness of the challenge content and the security of the response. If the data is kept properly, the correctness of the verification equation (2) can be shown as follows:

6.2. Integrity Protection

The integrity protection of EoCo is based on the symmetric co-CDH problem or standard CDH problem.

Theorem 3. Under the CDH assumptions, EoCo guarantees data integrity protection in the random oracle model.
We first propose a simplified scheme, S-EoCo. By proving the security of S-EoCo, we can prove the security of EoCo. S-EoCo and EoCo differ only in the KenGen and Response algorithms as follows:(i)KeyGen(). The algorithm first generates a secure BLS signature key pair . Next, the algorithm randomly selects an element in the , , and computes . Then, the algorithm randomly selects an element in the , . Finally, the algorithm returns public key and private key .(ii)Response(). For , the algorithm computes . Then, for , the algorithm computes aggregated authentication identifier , , where . Finally, the algorithm outputs proof .

Proof. We reduce the security of our S-EoCo scheme to the security of the CDH problem. We model hash function as random oracle.
If an adversary can break the integrity protection of the S-EoCo scheme, we show how to construct an adversary that uses in order to break CDH problem.
For the CDH problem, is given and needs to calculate . Then will play the role of the challenger in the integrity protection game and will interact with as follows.(i)Setup. The challenger selects a secret key pair and gets , where , , . Then sends public key to the adversary .(ii)Query1. The adversary adaptively presents queries in the way that it selects different files and sends them to to create s. In order to answer queries, simulates in a random oracle machine as follows:(1)For , randomly selects an element in group as the unique identifier of file , , and calculates file identification tags .(2)(a)When queries for hash value, first checks whether is in the hash tuple list . If it is in the list, sends to as the answer; otherwise, randomly selects and replies an elements in , , as , and adds to the hash tuple list .(b)When performs the identifier query, calculates . If , announces failure; otherwise, randomly selects , calculates , and builds a list of identifiers . can compute using the formula below. (3) sends the file identifier to .(iii)Challenge. For files , sends challenge to , and is different from above that have been query.(iv)Query2. Repeat the Query1, but cannot query the file blocks included in .(v)Forge. creates and returns to according to challenge . If can pass through the verification, wins the game.Suppose that an honest cloud server computes a proof, , then the following verification formula should be satisfied:Suppose that the adversary wins games with nonnegligible probability. The proof output by is that can pass the verification with nonnegligible probability, thus formula (6) is satisfied.According to the game, . Let . Because is a multiplicative cyclic group, we can find an inverse element of in the group. We substitute for in formula (5) to get the deformation of formula (5).We multiply by and obtain the following equation:Because , we replace in formula (8).The above equation can be further deformed as follows:It can be derived from formula (10) that . takes back from the adversary and calculates . It means that as long as can solve the CDH problem. As , the probability that is , which is nonnegligible. And the probability that simulation failure possibility of is is negligible. So if the adversary wins the game with a nonnegligible probability, the CDH problem can be solved with a nonnegligible probability. Aggregation tags in Response in the EoCo scheme is that and . We add on the basis of S-EoCo that makes scheme stronger.
Here we proved the Theorem 3.

6.3. Privacy Preserving

Theorem 4. According to Definition 2, an adversary is completely zero knowledge if, for any polynomial time algorithm , .

Proof. To prove Theorem 4, we construct a simulator to interact with an adversary following the steps below.(1) selects two different files and , for , which satisfy .(2)The simulator generates and for the files and , respectively. Then randomly selects and sends to .(3)When received , randomly generates a challenge and sends to request proof of (4)The simulator calculates and sends response proof to .(5)The identifier in is that , , and , and it satisfiesFrom the perspective of , can only see . Since cannot know the value of (unless can solve the discrete logarithm (DL) problem) and is randomly and uniformly distributed, for the adversary , and also are uniformly distributed.
Since is public, so and have the same probability distribution. The advantage of the adversary is as follows:The above equation shows that the protocol is strict zero knowledge to the adversary , and thus, the adversary cannot get any information through the auditing process.
Here we complete the proof.

6.4. Batch Auditing

Sometimes, a TPA will audit multiple files on behalf of different users. While one by one validation is too tedious and inefficient, it is desired that the TPA could parallelly audit the integrity of multiple cloud user data. Suppose that cloud users are entrusted with the same TPA. We slightly modify our scheme by using BLS signature to aggregate different signatures and thus provide the effective verification for all users/files simultaneously. In fact, our attribute of batch auditing is to enhance the audit efficiency of TPA. Therefore, in EoCo scheme, we only need to make minor modifications on the CheckProof algorithm.(i)CheckProof(). For , the algorithm first verifies whether the file tag is correct. The algorithm then outputs if it is wrong; otherwise, the algorithm extracts and continues. Next, for , the algorithm calculates and and computes aggregate authentication identifier . Finally, the algorithm verifies the following equation:

We validate the correctness of batch audition scheme below and show that the honest CSPs will pass the audition successfully if they follow the protocol.

7. Evaluation

In this section, we evaluate the performance of our proposed scheme in the following three aspects:(1)Computational complexity: the cost of the setup phase, the cost of producing response proof by S, and the cost of verify response proof by an auditor.(2)Block complexity: S needs the number of file blocks to be accessed according to the challenge.(3)Communication complexity: the data size and bandwidth of communication between an auditor and a CSP.

Suppose that the EoCo chooses file blocks for audition and the security parameter is . The large prime number should be according to the analysis by Shacham et al. [5]. If the security level is 80 bits, then bits. This determines the required block size to achieve the desired security level. We compare the EoCo with the scheme in [5, 8, 9] in terms of computation complexity and communication complexity in this section.

7.1. Communication Complexity

We can see from Table 3 that the number of bits required of SW-[5] is bits. The number of bits required of Wang-[8] is bits. The number of bits required of Worku-[9] is bits. The EoCo’ the number of bits required is bits. From this, we can see that the EoCo has the smallest communication overhead, which can greatly reduce the I/O burden of cloud providers and increase bandwidth utilization.

7.2. Computation Complexity

In order to calculate the cost of computing on both sides of the auditor and cloud service provider, we detail the operations on the basic computational symbols in Table 4. We can see comparison of several schemes from Table 5 in three aspects.(i)Server computation overhead: for cloud service providers, we see that the Worku-[9] is obviously less than Wang-[8]. While the EoCo is compared with the Worku-[9], the EoCo adds one more exponential operation, but reduces one hash operation. Generally speaking, one hash is more time-consuming than the exponential operation, so that the EoCo is more efficient on this side.(ii)Auditor computation overhead: for the auditor, Wang-[8] and Worku-[9] are only slightly different in efficiency; the latter uses one-time exponential operation less than the former. Although our scheme uses one-time bilinear mapping more than Worku-[9], our scheme uses one-time hash operation, three-time exponential operation and two-time multiplication operation less than Worku-[9]. In this respect, the EoCo has a great promotion.(iii)Setup phase overhead: because the schemes are based on BLS short signatures, the computational overhead of the setup phase is equal.

According to the theoretical analysis above, our scheme ensures the desired security properties with relatively low performance overhead.

8. Conclusion

In this paper, we proposed a secure and effective cloud data integrity verification scheme with privacy protection, EoCo, which is based on the BLS short signature. We presented a more practical data integrity protection model and introduced a ZK-privacy model to ensure the privacy preserving property in public audition. We theoretically analyzed our approach and prove the security of EoCo based on the CDH problem. We also demonstrated that our scheme ensures zero knowledge leakage through indistinguishable views of the attacker. We evaluated the performance of EoCo and made the comparison with related solutions. The analysis results show that our approach has relatively small communication and computational complexity.

Data Availability

No data were used to support this study.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported in part by the National Key R&D Program of China (no. 2015BAG15B01) and the National Natural Science Foundation of China (nos. U17733115, 61402029, 61379002, and 61370190).