Dynamic Outsourced Proofs of Retrievability Enabling Auditing Migration for Remote Storage Security

Rao, Lu; Tu, Tengfei; Zhang, Hua; Wen, Qiaoyan; Xiao, Jia

doi:https://doi.org/10.1155/2018/4186243

Wireless Communications and Mobile Computing

On this page

Abstract Introduction Background Related Work Conclusions Conflicts of Interest Acknowledgments References Copyright Related Articles

Special Issue

Service Migration in Mobile Edge Computing

View this Special Issue

Research Article | Open Access

Volume 2018 | Article ID 4186243 | https://doi.org/10.1155/2018/4186243

Dynamic Outsourced Proofs of Retrievability Enabling Auditing Migration for Remote Storage Security

Lu Rao,¹Tengfei Tu,¹Hua Zhang,¹Qiaoyan Wen,¹and Jia Xiao¹

Academic Editor: Kok-Seng Wong

Received26 Dec 2017

Accepted18 Feb 2018

Published09 May 2018

Abstract

Remote data auditing service is important for mobile clients to guarantee the intactness of their outsourced data stored at cloud side. To relieve mobile client from the nonnegligible burden incurred by performing the frequent data auditing, more and more literatures propose that the execution of such data auditing should be migrated from mobile client to third-party auditor (TPA). However, existing public auditing schemes always assume that TPA is reliable, which is the potential risk for outsourced data security. Although Outsourced Proofs of Retrievability (OPOR) have been proposed to further protect against the malicious TPA and collusion among any two entities, the original OPOR scheme applies only to the static data, which is the limitation that should be solved for enabling data dynamics. In this paper, we design a novel authenticated data structure called bv23Tree, which enables client to batch-verify the indices and values of any number of appointed leaves all at once for efficiency. By utilizing bv23Tree and a hierarchical storage structure, we present the first solution for Dynamic OPOR (DOPOR), which extends the OPOR model to support dynamic updates of the outsourced data. Extensive security and performance analyses show the reliability and effectiveness of our proposed scheme.

1. Introduction

In today’s information era of data explosion, it is an inevitable trend for most people to have the ever-increasing big data storage demands. Storage outsourcing through the cloud has become a promising technology paradigm that populates the recent literatures [1–3] and has been regarded as a faster profit growth point [4] by various IT industry giants (e.g., Google Drive, Microsoft OneDrive, and Amazon EC2 and S3). Cloud storage not only allows mobile clients to access their outsourced data from anywhere at any time, but also provides mobile clients with many benefits such as the inexpensive storage cost and elastic configuration of the storage capacity, which attract more and more mobile clients (e.g., smartphones and laptops) to join the cloud for the convenient lifestyle.

However, at the side of the cloud storage server (CSS), there still exist all kinds of internal and external threats against the storage security of outsourced data, such as Byzantine failures, the monetary reasons, and the hacker attacks [5, 6]. So, it is well known that CSS would be considered as a malicious entity that might try to hide the accident when data loss occurs, or even deliberately delete client’s data for saving storage cost. In this case, for the mobile client who has not actually possessed her data after storage outsourcing, an urgent requirement is how to guarantee the correctness and retrievability of the outsourced data. Here, correctness guarantee means that any appointed data returned from CSS should be the latest version of the authentic data, and retrievability guarantee means that the whole outsourced data can be correctly retrieved by the client without any data loss.

Based on the application of erasure code and the periodic auditing against CSS, the security model Proof of Retrievability (POR) [7, 8] is defined to offer client-side devices the above two guarantees in the context of malicious CSS. However, given the fact that most mobile clients only have a limited capacity so that these clients are unlikely to keep online all the time to perform the frequent auditing, various public auditing schemes are proposed [5, 6, 9–11] for auditing migration, which enables mobile client to free herself by moving the heavy auditing tasks to a third-party auditor (TPA). But existing public schemes rely on the hypothesis that TPA is trusted to complete the migrated auditing tasks, meaning that these schemes do not provide any security guarantee to resist a malicious TPA that might break the auditing protocols, which is exactly the potential risk that has not been covered by current public auditing schemes [12].

The first Outsourced Proof of Retrievability (OPOR) solution, proposed by Armknecht et al. and called Fortress [12], is a stronger security model to protect against any malicious entity (e.g., malicious TPA) and against collusion among any two entities. Fortress enables auditing migration, but it is just a static scheme. Supporting dynamic updates is an essential requirement for numerous practical cloud storage applications [13]. Although all kinds of authenticated data structures [6, 9, 13–16] have been proposed to support data dynamics, there still exist research gaps in these structures. For one thing, most existing dynamic authenticated data structures [9, 14–16] are designed based on the Merkle Hash Tree (MHT). Unfortunately, the use of MHT is not efficient in some cases since MHT is an unbalanced tree. For example, the height of MHT will increase linearly when many new data blocks are continuously inserted in the same leaf position, so in this worst case the expected performance cannot be ensured for authenticating the index and value of any appointed leaf node via MHT. For another, although some other authenticated structures such as rb23Tree [6] and Skip List [13] are proposed for dynamism, when there is a need to verify multiple leaf nodes of different data blocks (i.e., authenticate the indices and values of these leaf nodes), all above-mentioned dynamic methods merely adopt the straightforward way of verifying these different leaf nodes one by one with their respective proof paths. Since the verification upon a single proof path is of bandwidth and computation costs, as the number of verified leaf nodes increases, it is clearly not an efficient way to separately verify that many leaf nodes one by one. Although a balanced authenticated data structure (e.g., the rb23Tree [6]) constructed upon the balanced tree can avoid the worst case of MHT as discussed above, however, to the best of our knowledge, there is no such balanced authenticated structure that can deal with the problem of how to efficiently batch-verify any number of appointed leaf nodes altogether, which is a limitation that should be further addressed in this paper.

Furthermore, to support data dynamics when applying erasure code, the scheme of [6] is based on the way of local coding, that is, encoding each raw data block individually to ensure that an update upon any raw block only affects a small amount of the encoded blocks. However, such local coding solution is vulnerable to the selective deletion attack from malicious CSS [14], because once a targeted raw block has been updated in the case of local coding, CSS can learn which congenetic encoded blocks correspond to this targeted block. Then, CSS can selectively delete these congenetic encoded blocks to actually cause the loss of the targeted block, and simultaneously CSS can pass the data auditing with significant probability, since the auditing relies on the sampling technology that can hardly cover these selectively deleted encoded blocks if the size of deleted data is tiny. Although both the private and the public Dynamic POR (DPOR) schemes [14] are proposed to resist the above selective deletion attack, the direct application of these two DPOR schemes into OPOR model will result in security or efficiency problems. On the one hand, within the private DPOR of [14], the auditing can only be executed using client’s secret key. But TPA is prohibited from obtaining such secret in OPOR, or otherwise the malicious TPA might share client’s secret key with malicious CSS [12] so that CSS can break the auditing protocols without actually holding the outsourced data. On the other hand, in order to support public auditing, the public DPOR of [14] cannot apply the blockless verification technique [8, 9, 13, 17] that combines multiple challenged blocks into a single aggregated block for efficiency, so it has to use the straightforward way of requiring TPA to retrieve all randomly challenged actual blocks during each POR audit. As shown in [9], this straightforward way could lead to a large communication overhead and thus is inefficient and should be avoided.

From the above, there will be various problems if the current dynamic schemes are directly ported to the OPOR model. In this paper, to solve these mentioned problems, we propose a concrete Dynamic Outsourced Proofs of Retrievability (DOPOR) scheme enabling auditing migration. DOPOR not only can defend against the malicious TPA and collusion, but also can enable efficient data dynamics under the setting of erasure code. Specifically, our contributions are summarized as follows.

Different from traditional authenticated structures that only have the ability to verify different leaves one by one, we propose the novel authenticated data structure called bv23Tree, which is based on the balanced 2-3 tree to ensure the logarithmic complexity in any case of updates and simultaneously enables the verifier to batch-verify the indices and values of multiple appointed leaves all at once for efficiency.

To defend against the selective deletion attack, we utilize a hierarchical storage structure with the same-sized levels for the unified management of outsourced encoded data and encoded update operations. According to this hierarchical structure and the bv23Tree, we resolve the open questions of [12] by transforming another secure public POR into the OPOR model and designing an appropriate dynamic scheme to efficiently integrate dynamic updates with OPOR.

We analyze the security of our solution and conduct an extensive experimental study. The experimental results demonstrate the effectiveness of our scheme.

The rest of this paper is organized as follows: Section 2 states the background and introduces the architecture and system model of DOPOR. Section 3 shows the novel dynamic structure bv23Tree in detail, based on which we present detailed DOPOR solution in Section 4. Section 5 provides the security analysis, and Section 6 evaluates the experimental performance. Section 7 overviews the related work. Finally, this paper is concluded in Section 8.

2. Background and System Architecture

2.1. Problem Statement

We begin with the background of OPOR, as shown in [12]. There are three entities involved in an outsourced auditing environment: mobile client (i.e., data owner), cloud storage server (CSS), and third-party auditor (TPA). Because of their limited local storage capacity, mobile clients are motivated to outsource their large data files to CSS and then can make use of various on-demand cloud storage services. However, because CSS might be misbehaving, it is very important to design the periodic remote data auditing mechanism against CSS, which enables the mobile client to have the assurance that her outsourced data is always available and can be completely retrieved from CSS if necessary. Further, to liberate the mobile client from struggling with the endless online data auditing, OPOR also introduces TPA that has the expertise and ability to perform the above frequent auditing tasks on behalf of the mobile client, and then the mobile client can be offline to rest most of the time.

Although OPOR model has the same three entities as existing public auditing model [9, 10, 15, 16], one of the main differences between the two is that TPA might be also malicious within OPOR [12]. In other words, TPA might violate the auditing protocols, for example, by claiming that he has honestly performed all past auditing, but actually he tells a lie. Furthermore, any two entities might be in collusion within OPOR [12]. For example, firstly, malicious CSS might collude with malicious TPA to deceive the honest client when outsourced data has been lost. Secondly, malicious client might collude with malicious CSS to frame the honest TPA, by asserting that TPA did not correctly perform the required auditing work to bring TPA into the compensation lawsuit. Since any entity might be malicious, in order to solve the problem of securely sampling the periodic challenges for frequent auditing, the time-based bitcoin pseudorandom source is introduced into the OPOR model. As demonstrated in [12], due to the fact that the bitcoin pseudorandom source cannot be manipulated by any entity, it is secure to be used for generating the continuous random seeds for periodic challenges.

OPOR inherits the retrievability guarantee of POR [8] by applying erasure code, meaning that the client can retrieve the whole outsourced data in case of minor data corruption. However, when both of the data dynamics and erasure code are considered, the problem of how to efficiently perform the updates is intractable. Within the first OPOR scheme Fortress [12], client’s original data file (including raw data blocks) is entirely encoded before outsourcing, so an update operation upon any single raw data block will affect the whole outsourced encoded data. In this case, the only way for client is to download and decode the whole outsourced encoded data and then encode and upload all the data again after performing the updates, which means unbearable bandwidth and computation costs. This is why Fortress is just a static scheme that cannot support efficient dynamic updates.

2.2. Dynamic OPOR (DOPOR) Architecture

The representative DOPOR architecture is presented in Figure 1. Based on the bitcoin source that controls the random sampling of periodic challenges, once TPA accepts the migrated auditing tasks from the client, TPA must generate the corresponding logs after he completes each specified POR audit against CSS. In this case, the client is able to check TPA’s work at any point in time by verifying TPA’s logs, and then she can judge that if TPA did his auditing work correctly in the past. As shown in [12], such client’s checking against TPA can be much less frequent than the TPA’s POR audits against CSS, since the client can batch-check a number of accumulated TPA logs all at once.

To support dynamic updates, we apply a similar idea to [14] within our solution, which is to place all accumulated update operations into an erasure-coded buffer at CSS side, rather than immediately executing these update operations upon the outsourced encoded data. As shown in Figure 1, the CSS-side storage is organized in two different buffers denoted with U (i.e., unencoded buffer) and E (i.e., erasure-coded buffer). Buffer U will independently store an up-to-date copy of all the raw data blocks, which are organized by our proposed bv23Tree, to support the efficient batch reads from cloud storage without struggling with the erasure code. On the other side, buffer E is further divided into two parts, ED and EO, which store the whole outsourced original encoded data blocks and all the accumulated encoded update operations, respectively. In case of data loss, client can recover the up-to-date copy of the whole outsourced raw data by decoding the entire buffer E and combining both of ED and EO, so the periodic POR audits only need to be performed upon buffer E for the retrievability guarantee.

As will be described in Section 4.1, both ED and EO constitute a complete hierarchical storage structure where all levels have the same size, which are different from the levels of exponentially growing capacity in [14]. After the client performs a batch of update operations upon buffer U, benefitting from the same-sized levels of DOPOR, this batch of update operations can be wholly encoded and then directly placed into EO to fill up the corresponding level for improved computation cost, without executing the rebuilding of a level as in [14] that incurs amortized cost for each update operation. More importantly, based on the same-sized levels, our DOPOR solution can build upon the public verification POR scheme of [8], the aggregation technique of which provides the support for the client’s checking against malicious TPA.

Finally, after every update operations, when the size of EO grows to the same size as ED, buffer E will be rebuilt. The rebuilding of E will rewrite the whole ED with the encoded version of all the up-to-date raw data blocks and meanwhile empty the whole EO. Since E is only rebuilt once in every update operations, the amortized complexity of rebuilding E will be per update operation. However, different from the existing schemes [14, 18, 19] that require client-side temporary memory for such rebuilding, as shown in Section 4.3, based on the ability of batch reads and the same-sized levels, the client of DOPOR only requires client-side memory to gradually rebuild E, where is the security parameter that is independent of the data size . So, DOPOR further reduces the required client’s memory when rebuilding and thus is suitable for the client-side mobile devices.

2.3. System Model

Formally, the complete definition of DOPOR system can be described by the following ten protocols:(i): when inputting the security parameter , this protocol outputs a pair of public-private keys for each entity .(ii): when inputting the client’s secret key and the original file that is an ordered set of raw data blocks , this protocol encodes into the encoded file and outputs as an ordered set of codeword blocks . It also outputs the tags set , where each is computed based on and .(iii): when inputting the original file , the encoded file , and tags , it outputs the bv23Tree that is constructed based on . After outsourcing all the input data and to CSS, this protocol also outputs the public parameters set for POR audits.(iv) → : when inputting the root hash of the bv23Tree, the set of any blocks indices , and the CSS state , this protocol outputs the appointed data blocks set , or otherwise.(v): when inputting the tree root hash , the set of update operations , client’s secret key , the CSS state , and the state pointer that is related to , it outputs a new root hash , a new CSS state , and a new pointer showing that all the operations in are correctly executed in a batch, or otherwise.(vi): when inputting the tree root hash , client’s secret key , CSS state , and state pointer , it outputs a new CSS state and a new pointer showing that the rebuilding is completed, or otherwise.(vii): when inputting the bitcoin source , the time , and the public parameters , it outputs a bitcoin-based challenge for POR audit.(viii): when inputting the challenge and the CSS state , it outputs CSS’s proof to enable TPA to perform a POR audit.(ix): when inputting public parameters set , challenge , and CSS’s proof , it outputs TPA’s log and decision . is if TPA audit passes, or otherwise.(x): when inputting the bitcoin source , a point-in-time set , the corresponding TPA’s logs set , the public parameters , and client’s secret key , this protocol outputs a client’s decision . is true if client’s batch-checking upon can pass, or otherwise.

3. Balanced Authenticated Data Structure

Within the rb23Tree of [6], for verifying a leaf, the client must retrieve from the adversary a corresponding proof path that consists of marks. However, the limitation of rb23Tree method is that a lot of duplicated information will exist in the proof paths of different leaves, which will waste too much communication cost when the client verifies many leaves one after another by retrieving different proof paths. To solve this problem, we propose the batch-verifications 2-3 Tree, called bv23Tree, for efficiency.

3.1. Batch-Verifications 2-3 Tree

Following the definition of 2-3 Tree [20], each nonleaf node of bv23Tree can have two or three children. Let be an original data file consisting of raw blocks . With the hash function , the bv23Tree on file can be constructed by storing at each tree node a 3-element tuple , defined as follows:(i) is the status of node . Let denote the parent node of . Let be the three left-to-right children of , respectively. Specifically, if only has two children, then . So, is defined as(ii) is the rank value of node , which is similar to the concept defined in existing rank-based MHT scheme [16]. Namely, stores the number of leaf nodes that belong to the subtree with node as the root. If is a leaf node, we define .(iii) represents the authentication hash value of node . The value of is defined with different cases.

Case 0. ; then

Case 1. is the th leaf node; then

Case 2. is a nonleaf node with children , and as above (sometimes will be ):where denotes the concatenation operation.

We show an example of bv23Tree in Figure 2, which is constructed on 16 file blocks . Next, we use the concise shorthand for some symbols. Given a bv23Tree node , we use , , and to denote , , and , respectively, so each tree node and its corresponding 3-element tuple are not distinguished. And we also use to denote the hash value for convenience.

3.2. Batch Queries

The integrity (i.e., authenticity and freshness) of file blocks can be protected by the corresponding hash values stored in leaves, while the integrity of the leaves themselves will be protected by bv23Tree. Now, suppose that the whole file blocks and a bv23Tree on all blocks have been stored at CSS. Client wants to verify the integrity of any ordered file blocks read from CSS and thus batch-queries CSS by issuing the ordered indices set that appoints ordered leaves . Then, CSS calls Algorithm 1 to generate the corresponding proof table and responds to client with the appointed leaves and their proof table.

Proof_Table. With the bv23Tree of levels, and the ordered indices set that appoint
ordered leaves , this algorithm generates the corresponding proof table .
(1)initialize node array by ;
(2) tracks the parent of the current node as
(3); tracks the column of the proof table
(4)whiledo
(5)fordo
(6)ifthen
(7);
(8)else
(9) denote the current tree node stored in
(10)if node only has one sibling node, which is denoted by then
(11)if exists in , in this casethen
(12); ;
(13)else does not exist in current array
(14);
(15)end if
(16)else has two sibling nodes, which are denoted by
(17) that the sequence of must follow the left-to-right principle as in Figure 2, e.g., two siblings of
node must be denoted by
(18)if only (or ) exists in then
(19) (or );
(20);
(21)else if both and exist in and , respectively, then
(22); ; ;
(23)else both and do not exist in current array
(24);
(25) end if
(26)end if
(27) the parent node of the current node ;
(28)end if
(29)end for
(30);
(31)end while
(32)return the proof table ;

An example of proof table is shown in Table 1. Without loss of generality, suppose the number of the levels of a bv23Tree is . Due to the property of balanced tree, the number must be complexity. Let denote the proof table of any appointed leaves; then has the following characteristics:(i) contains rows and columns, respectively (i.e., consists of items).(ii)Each item can have one or two components, and each component can be a tree node or a mark or . Since only denotes a pointer that points to the th row of the table itself, the communication cost of a mark is little when compared to the cost of a node .(iii)The more the leaves are batch-verified, the more the that exists in . And no matter how many leaves are batch-verified, each necessary auxiliary tree node only appears once in .

In the context of the batch verifications upon appointed leaves, compared to the proof path method as in [6], the communication cost of the proof table is much less than that of transferring different proof paths of leaves, respectively. This is because the proof table avoids the limitation of proof paths that the repetitive node information (e.g., node hash values) will exhibit in the proof paths of different leaves with high probability, so there is a plenty of in the proof table to save the communication cost. Furthermore, compared to the 8-element tuple mark in proof path of [6], each tree node included in the proof table is only related to a 3-element tuple, which further reduces the communication cost.

3.3. Batch Verifications

Upon receiving from CSS the appointed leaves set and the required proof table , client can run Algorithm 2 to batch-verify the indices and values of these leaves in all at once, by using her local metadata . An example of batch verifications upon multiple leaves is shown in Table 2.

Batch_Verify. This algorithm can batch-verify not only the
hash values of all ordered leaves provided by CSS, but also that the
indices of these leaves are exactly matched with the appointed indices set .
(1)for each leaf , , do
(2)initialize ; ; ;
(3)end for
(4)initialize ;
(5)for do
(6)for each in do
(7)if item only has one component then
(8)call Deal_One_Component ;
(9)else
(10)call Deal_Two_Components ;
(11)end if
(12)end for
(13)end for
(14)if and then
(15)return ;
(16)else
(17)return ;
(18)end if
Function Deal_One_Component
(1)if is then
(2)continue;
(3)else if is a node then
(4);
(5)if then
(6);
(7);
(8)else
(9);
(10)end if
(11)else
(12);
(13);
(14);
(15);
(16)delete from the ;
(17)end if
Function Deal_Two_Components
(1)if and are two nodes and , respectively, then
(2) that ,
(3)assign and ;
(4);
(5)if then
(6);
(7)else if then
(8); ;
(9)else
(10); ;
(11)end if
(12)else if is a node , and is a mark , then
(13) terms of the left-to-right principle, the value of can only be 0 or 1 in this
(14)if then
(15);
(16);
(17);
(18);
(19);
(20)else
(21);
(22);
(23);
(24)
(25)end if
(26)delete from the ;
(27)else if is a mark , and is a node , then
(28);
(29);
(30);
(31);
(32)delete from the ;
(33)else and are two marks and ,
(34);
(35)
(36);
(37);
(38);
(39);
(40)delete and from the ;
(41)end if

Within Algorithm 2, the function is to merge two different sets of numbers while preserving the order of these numbers. For example, given two sets and , then . In addition, the operator notation “” is to add a number to every element of a set. For example, let ; then . Algorithm 2 applies each non item of to iteratively compute tuple . If the returned leaves set and table are right, we will get the following results after the outermost for-loop of Algorithm 2 is finished:(i)Value is equal to , that is, the authentication hash value of the root of bv23Tree.(ii)Value is exactly the same as , that is, the indices set of appointed ordered leaves in .

At a high level, Algorithm 2 is also the way to gradually construct the partial bv23Tree, which precisely covers all appointed leaves in , the paths from these appointed leaves to the root, and all the siblings of the nodes on these paths. Based on this partial bv23Tree, the batch updates upon outsourced original raw blocks can be supported, as shown in Algorithms 3 and 4.

Batch_Updates . Input parameters are a batch of update operations, is
the bv23Tree, and are the whole outsourced original blocks. This algorithm outputs an ordered
leaves set , the proof table , and the updated root hash value from the final state bv23Tree .
(1)extract from the largest ordered targeted indices set , by removing the duplicate indices;
(2)read leaves set from ;
(3)obtain ← Proof_Table;
(4)update the file blocks set according to the sequential executions of all update operations ;
(5)perform each of the update operations in sequence on and then obtain the final state ; more specifically,
transform in terms of each operation, and update the status , rank , and hash value of the affected tree nodes
during each transformation; (for the modification operation without transformation, only need to update the
hash values of the nodes on the path from the targeted leaf to the root)
(6)return , where is the authentication hash value of the root of final state ;

Verify_Updates. Input parameters is client local metadata,
update operations set are generated by client herself, , , are provided
by CSS as computed in Algorithm 3. This algorithm outputs true if the batch updates are successful, or
false otherwise.
(1)extract from as in Algorithm 3;
(2)if Batch_Verify then
(3)construct partial bv23Tree with as the root hash;
(4)else
(5)return ;
(6)end if
(7)perform each update operation of upon above partial bv23Tree by the same transformations as
in Algorithm 3, and then compute the final state root hash ;
(8)if then
(9)replace local with , return ;
(10)else
(11)return ;
(12)end if

3.4. Batch Updates

Three basic types of dynamic update operations are modification (), insertion (), and deletion () [9]. Any block-level update operation can be defined by the form of , where denotes operation type, is the index of targeted block, and is the new data block that will be exactly stored according to the targeted index ( is for deletion). For example, is to modify the 2nd block to , is to insert after the 3rd block, and is to delete the 4th block.

In the setting of dynamism, the update operations should be performed not only on the data blocks, but also on the bv23Tree. Note that only the insertion and deletion can cause the structure transformation of bv23Tree, and the maintenance of this transformation is essentially identical to the maintenance of a standard 2-3 tree [20], except that the updating of each affected tree node should be considered in terms of the 3-element tuple . As in Figure 3, we give an example for the structure transformation of bv23Tree after repetitively inserting (or deleting) the appointed data blocks at the same index position.

Now, suppose that client caches a batch of ordered update operations , which refer to an ordered indices set , , by the removal of duplicate indices. To batch-update the remote data, client first issues to CSS and obtains the returned ordered leaves set along with the proof table . As shown in Section 3.3, during the process of executing Algorithm 2 to batch-verify , client can construct the corresponding partial bv23Tree. Then, after sequentially performing each operation of upon the partial bv23Tree, client can compute by herself what would be the authentication hash value of the final state tree root. Finally, if CSS outputs the same final state root hash value as the one computed by client herself, client outputs true, meaning that CSS correctly performs the batch updates according to . Otherwise, client outputs false. As shown in Algorithms 3 and 4, we outline the Batch_Updates algorithm performed by CSS and the Verify_Updates algorithm performed by client, respectively.

4. DOPOR Solution

4.1. Cloud Server Storage Configuration

As shown in Figure 1, there are two different buffers U and E at CSS side for outsourced data storage. Client’s original file , consisting of raw blocks , will be separately stored into U and E with different formats, detailed as follows.

U (Unencoded Buffer). To support the efficient reads, buffer U always stores the up-to-date copy of the raw data blocks, and thus the update operations issued from client must be immediately performed upon the appointed blocks of U. All the blocks in U are organized by the bv23Tree as proposed in Section 3, the batch-verifications property of which enables client to batch-read a group of raw blocks from U for improved performance.

(Erasure-Coded Buffer). Buffer E is organized by the same-sized hierarchical structure, as in Figure 4, where each level has the same size to hold encoded blocks with the blocks tags ( is the security parameter). Moreover, E is equally divided into two parts ED and EO with the same capacity. ED is to store encoded data blocks, and EO is to store encoded operations blocks. At the beginning, client applies erasure code to encode the entire file into encoded data blocks, computes the blocks tags, and sequentially stores them into ED in terms of the blocks indices as shown in Figure 4. Subsequently, the content of ED will not be changed until buffer E is rebuilt. As shown in Section 3.4, each update operation is of the form , which can also be regarded as an operation block. In this case, after a batch of update operations are performed upon U, client will encode into encoded operations blocks and store them along with the blocks tags into a corresponding level of EO. So, EO is empty in the initial state and will be incrementally filled up level by level over time. At last, once EO is full, the rebuilding of E is triggered, as will be described later.

At a high level, by decoding the whole E and sequentially performing the accumulated update operations of EO upon the original data blocks of ED, the latest version of the whole client’s outsourced raw data can be recovered. Therefore, the periodic POR audits only need to be deployed against the buffer E, which functions as a backup storage to provide the retrievability guarantee when data loss occurs. To resist the mentioned selective deletion attack as before, every POR audit will sample challenged blocks from each filled level of E. Because each level is entirely encoded, malicious CSS must corrupt a significant portion of the encoded blocks of one level to actually cause the data loss. However, if malicious CSS corrupts that many encoded blocks of one level, it cannot pass the POR audits with overwhelming probability.

4.2. Initialization

We also work in the bilinear setting, where is a multiplicative cyclic group of prime order and is a generator of Let be the same nondegenerate bilinear map as in [8] that has the following property: for any and , . Let be the secure BLS hash function. The initialization of our scheme is described as follows.(1): each entity generates a signing key pair for their respective signatures. In addition, client samples a random element and computes . is kept secret by client but is public. So, client’s private key and her public key .(2): client applies erasure code to encode into codeword blocks , and each is sectors long: . Client then generates a name for and samples elements . For each index , , with her secret key in , client computes for the corresponding tag and attaches to .(3): based on , client generates the corresponding bv23Tree with the root hash , as shown in Section 3.1. Then, client outsources into the buffer U and outsources all encoded data blocks (along with their tags ) into the ED part of buffer E, the layout of which is shown in Figure 4. In addition, let be a state pointer that denotes the number of the filled levels of buffer E. Clearly, the range of is , as in Figure 4. Let denote the number of the challenged blocks from a filled level, and let be the public parameters set for POR audits. Finally, client keeps and locally, sends to TPA, and deletes from her local storage.

4.3. Data Access Mechanisms

Based on the bv23Tree of buffer U and the hierarchical configuration of same-sized levels of buffer E, DOPOR supports batch updates that enable the client to perform a batch of update operations upon the outsourced storage, which is suitable for the common scenario of [14] where writes are frequent. Now, after completing the initialization of DOPOR, the client can access and update her outsourced data by the following three protocols.

. With her local root hash of bv23Tree, the client can batch-read any appointed raw blocks from CSS, by sending the ordered blocks indices set as the query to CSS. Here, let buffer U be the CSS state . In terms of Section 3, upon receiving , CSS accesses appointed raw blocks and the tree leaves from U, generates Proof_Table , and returns to the client. Then, the client batch-verifies the authenticity of by calling Batch_Verify and finally checks the integrity of all raw blocks of according to the corresponding hash values stored in .

. Suppose that the client keeps local storage to cache ordered update operations . Then, the client sends to CSS for performing these operations in a batch. As shown in Section 3.4, on receiving , with the raw data and bv23Tree stored in buffer U, CSS can execute Batch_Updates to return to the client the results for batch updates, and the client can call Verify_Updates to authenticate these returned results.

If the above results pass the client’s authentication, the client then applies an erasure code to encode into encoded operations blocks . Based on local state pointer , client first computes the indices of these encoded blocks in order , , and computes the tag for each by the same tag formula as in of initialization. Secondly, client outsources these blocks along with their tags into the th level of buffer E (i.e., the corresponding empty level of EO). Finally, client sends to TPA the updated state pointer with her signature and empties her local storage for caching the next ordered update operations. Overall, through this protocol, the CSS state consists of U and EO as above.

. Once in every update operations, when EO of buffer E gets filled (i.e., is equal to as in Figure 4), the periodic rebuilding for E is triggered. Since buffer U stores the up-to-date copy of all raw data blocks, client can carry out this rebuilding based on U for the performance improvement, instead of decoding the whole E and applying all operations of EO on the original data of ED. (ED and EO will not be decoded and combined, unless client detects data corruption within CSS and wants to retrieve the whole data.)

Benefitting from the ability of batch-reading multiple blocks from CSS and our hierarchical configuration, client can rebuild E with only local memory that is the same size as a level of the hierarchical structure, which is the significant improvement when compared to existing schemes [14, 18, 19] that require such rebuilding with client local memory. In this protocol, the CSS state consists of U and E. At the beginning, client runs to batch-read the first raw blocks from U. After encoding these blocks into codeword blocks, client computes the corresponding tags by the same way as in , where the indices of codeword blocks of each level are shown in Figure 4. Then, client outsources to CSS these codeword blocks and tags that will be stored in the first level of ED. Subsequently, client can empty her local memory and batch-read from U the next batch of blocks, which are processed and outsourced to CSS by the same procedures as above, except that each batch of processed codeword blocks and tags will be stored in the corresponding different level of ED (e.g., the second batch will be stored in the second level of ED, etc.). After the last batch of blocks from U are processed and outsourced, client authorizes CSS to update the whole ED with all above outsourced codeword blocks and tags, and CSS simultaneously empties the whole EO. Finally, client publishes the updated state pointer to TPA, meaning that the rebuilding is over. Overall, the amortized bandwidth of rebuilding E is per update operation, since this rebuilding is executed only once every update operations.

4.4. Outsourced Proof of Retrievability (OPOR)

(1) . The periodic random challenges are generated based on the time-based bitcoin pseudorandom source . More specifically, given the current time , the tool [12] from the bitcoin source can output the hash of the latest block that has arisen since time in the bitcoin block chain. As shown in [12], for a future time , no adversary can predict the hash of a bitcoin block that will arise in the future. In addition, for a past time , the hash of previous bitcoin block, returned by , is objective and irrefutable against any adversary. Thus, let denote the output of ; then can be considered as a secure pseudorandom coin for time .

With the public parameter , TPA can generate the random POR challenge of length , by calling the same probabilistic algorithm as in [12] that is obtained from for the different time . Specifically, to ensure that a POR challenge can sample the same amount of blocks from each filled level of buffer E (i.e., ED and EO) at CSS, TPA first calls to choose a random -elements subset of , and then it computes the -elements set as follows:

Moreover, for each , as in [12], TPA also depends on to choose a corresponding random element . Finally, let denote an -elements set, which is regarded as the POR challenge at time and sent to CSS by TPA.

(2) . Here, let buffer E be the CSS state . Upon receiving challenge , for each index specified by , CSS reads the corresponding block and tag from E. Now CSS performs exactly as in the public verification scheme of [8] by computing the values , , along with as follows:

At last, CSS sends to TPA its response , where , and is CSS’s signature to provide nonrepudiation.

(3) . Based on the public parameters set and the challenge , TPA can compute his own auditing parameter as follows:

Then, after TPA parses the CSS’s response to obtain , TPA will audit by checking the following equation:

If this verification does not pass, TPA informs the client of this abnormal situation, meaning that the data loss occurs. Finally, TPA must generate and store the following log that corresponds to the challenge at time :

(4) . To protect against malicious TPA who might violate the above auditing process or even collude with CSS, the client can verify TPA’s work by checking TPA’s logs. However, instead of checking the accumulated TPA’s logs one by one, client is able to batch-check multiple TPA’s logs all together, so such a client’s batch-checking against TPA is only seldom performed in practice, as shown in [12].

Client can check the latest TPA’s log for a minimal check, since this log reflects the latest status of retrievability for the outsourced data [12]. More generally, to perform a batch-checking against TPA, client selects a point-in-time set and sends to TPA, where each marks the time of a past challenge.

Upon receiving , for each , in terms of and stored in , TPA computes

Then, TPA responds to client with his proof , which is also signed by TPA.

Based on the public parameters and , for each , the client is able to reconstruct alone each past challenge as described in the protocol , by using with the pseudorandom obtained from . So, client can compute her own checking parameter as follows:

After verifying TPA’s signature on the proof , client first checks that whether is equal to of . If , client outputs , confirming that TPA was irresponsible for the past POR audits. Finally, client checks the following equation with her secret key :

If the above client’s check fails, client outputs , which means that there exists collusion among TPA and CSS and that the data corruption has occurred within CSS. The correctness of (12) is demonstrated as follows:

5. Security Analysis

Similar to the analysis of [8, 12], we evaluate the soundness of our DOPOR scheme according to three parts: unforgeability, liability, and extractability.

Theorem 1 (unforgeability). It is computationally infeasible for any adversary to forge a proof that can pass verifier’s check, if the Computational Diffie-Hellman (CDH) problem and the Discrete Logarithm (DL) problem are hard.

Proof. Since CSS does not check any proof throughout the whole process of executing DOPOR, there are only two cases to be discussed.
Case 1. TPA plays the role of verifier to check the proof returned from CSS during executing the protocols and as shown in Section 4.4. In this case, observe that both CSS and TPA perform exactly the same as the BLS-based public verification scheme of [8], so the unforgeability guarantee immediately follows from the work of [8]. As shown in [21], the BLS scheme is secure when the CDH problem is hard in bilinear groups, based on which the unforgeability of BLS-based public scheme has been proven in [8] and thus omitted here.
Case 2. The client acts as the verifier to check TPA’s logs as in the protocol of Section 4.4. To pass the client’s check with (12), TPA should return the correct proof . Now, assume that TPA is able to forge the proof. As shown in [8], due to the security of BLS scheme, the BLS-based homomorphic verifiable tag is unforgeable, and thus the aggregated tag is also unforgeable. So, the only choice for TPA is to generate the forged aggregated block, denoted with , as the response to client’s check. Then, for (12) to be satisfied, we haveIn addition, according to the correct proof, we haveNote that is the parameter computed by client herself, and the security of is ensured by the security of bitcoin pseudorandom source of [12]. Based on the security of BLS scheme, we can learn thatwhere and .
For any two given elements , we have , where . Hence, (16) is transformed as follows: Obviously, (17) means that malicious TPA can solve the DL problem, which is in conflict with the assumption that DL problem is hard. Therefore, it is infeasible for TPA to forge a proof to pass the client’s check. This completes our proof.

Theorem 2 (liability). If any adversary attempts to cheat or frame the honest entity who has been well behaving, the honest entity can output incontestable evidence to confirm the misbehavior of adversary in case of lawsuit.

Proof. It is clear that if the honest entity can protect against the collusion of the other two malicious entities, then this honest entity can certainly protect against any single malicious entity. Hence, to prove Theorem 2, it suffices to consider the following three cases where only one entity is honest.
Case 1. Honest client defends against the collusion of CSS and TPA. Obviously, CSS has incentive to collude with TPA only when the outsourced data corruption has occurred at cloud side. In this case, once the corrupted data blocks are challenged, according to Theorem 1, both CSS and TPA cannot forge an effective proof to pass client’s check against TPA’s log, unless they can solve the DL problem (but this probability is negligible). Therefore, the output by client when executing the protocol is the incontestable evidence to identify the collusion of CSS and TPA.
Case 2. Honest TPA defends against the collusion of client and CSS. As shown in the protocol of Section 4.4, TPA completes his auditing work by computing the auditing parameter and verifying (8), where all these processes are reproducible and undeniable for the malicious entities. More specifically, on the one hand, the nonrepudiation of parameter is derived from the objectivity of challenge , which is computed based on the secure bitcoin pseudorandom source and the public parameters . On the other hand, all other inputs involved in verifying (8) are also undeniable for both client and CSS; for example, are signed by CSS’s signature, and are the public parameters confirmed by all entities. Hence, the honest TPA can provide his logs in case of lawsuit, which includes all above incontestable evidence enabling the playback of all past TPA’s auditing work to prove the innocence of TPA.
Case 3. Honest CSS defends against the collusion of client and TPA. When malicious client colludes with TPA to falsely accuse CSS of corrupting the th block , CSS can output the intact and its tag as the incontestable evidence. As shown in [8], each tag constructed and outsourced by client herself is unforgeable. Based on the security of BLS signature scheme, as long as the above and output by CSS satisfy (18), then CSS is innocent.This completes the proof of Theorem 2.

According to Theorem 2, all the three entities have to behave properly in DOPOR. In this case, the extractability during the TPA’ audits against CSS can immediately follow from the work of [8], since the procedure of TPA’s audits of DOPOR corresponds to the public verification scheme of [8]. As for the extractability of performing protocol, we have the following theorem.

Theorem 3 (extractability). During the client’s checking against TPA, if client does not output after checking TPA’s logs, then there exists a deterministic extraction algorithm, based on which client can extract the challenged file blocks by the repetitive interactions with TPA.

Proof. According to Theorem 1, to pass client’s check during the execution of protocol , TPA has to respond to client with the correct proof that includes the aggregated block , and each is a linear equation of the following form:where is a set of point-in-times chosen by client herself, and all the coefficients are dominated by these point-in-times as shown in the protocol of Section 4.4.
Now, suppose client chooses the appropriate point-in-times in the past to generate different set and checks TPA’s logs for a polynomial number of times by sending different to TPA; then, client can get a total of systems of linear equations that are built upon the challenged target blocks. Finally, by solving these systems, client can extract all target blocks.

When referring to dynamic updates, as shown in Section 4.3, the procedure of performing updates is divided into two parts: performing the batch updates upon buffer U according to the algorithms Batch_Updates and Verify_Updates , as shown in Section 3.4, the essence of which is based on the property of batch verifications of bv23Tree, while the security of this property is ensured by Theorem 4; outsourcing a batch of encoded operations blocks and their tags to buffer E, the security of which is directly ensured by the security of the unforgeable tags, the periodic executions of POR audits against buffer E, and the erasure code scheme.

Theorem 4. Assuming the existence of a collision-resistant hash function , for any ordered indices set appointed by the client, the corresponding proof table generated using the bv23Tree ensures the integrity of all the appointed leaves returned from CSS with overwhelming probability.

Proof. As shown in Section 3.2, upon receiving the appointed ordered indices set , CSS should respond to client with the corresponding leaves and proof table . Now, suppose that CSS tries to act dishonestly; then, the possible ways for CSS to misbehave can be covered by the following two cases.
Case 1. Malicious CSS either forges some leaves within or forges some items within the proof table . As shown in Section 3.3, client possesses the public tree root hash and will verify both and by calling the algorithm Batch_Verify , the procedure of which is to recalculate the public by iteratively hashing the values of all tree nodes included in and in terms of the specified order. Apparently, the above forged or can enable Batch_Verify to output the same root hash value as , meaning that CSS is able to find the collisions against the hash function, which contradicts the assumption that is collision-resistant. Therefore, it is a negligible probability for malicious CSS to forge or .
Case 2. Malicious CSS launches the replacing attack; that is, CSS returns the replaced where some appointed leaves are replaced with other existing leaves of bv23Tree and the corresponding proof table that is correctly generated based on . Without loss of generality, suppose that , where and are not the appointed leaves; that is, and . In this case, although the final hash value output by Batch_Verify is equal to the public , the final value of the variable within Batch_Verify will be instead of the specified , which contradicts the expected results as shown in Section 3.3. So, client will still output meaning that above malicious attack is detected by client.
In short, if there exists a collision-resistant hash function, malicious CSS has to return all the appointed leaves along with the correct proof table to pass client’s batch verifications upon these leaves.

6. Performance Evaluation

Our experiments were deployed using Python language on the Linux system with Intel Xeon E5-2609 CPU running at 2.40 GHz, 16 GB of RAM, and 7200 RPM 600 GB Serial ATA drive with a 32 MB buffer. The cryptographic operations were implemented based on the Python Cryptography Toolkit [22] and Pypbc library [23], and we used the 80-bit security parameter that means the order of group is of 160-bit length. We chose 1 GB raw data file for testing and relied on the erasure code for encoding. For ease of comparison, all block sizes are set to 4 KB as in [6, 14]. Our results are an average of 20 rounds.

6.1. POR Audits Cost

During each POR audit, since the number of challenged blocks is far less than the total number of encoded file blocks (e.g., the percent as in [12]), the time consumed in proof computation (or proof verification) will not be the bottleneck for CSS (or TPA). The POR audit phase of DOPOR corresponds to the execution of the public verification construction of [8], the efficient computation performance of which has been confirmed as shown in previous studies [5, 6, 10]. Therefore, the computation time of POR audit phase is not the primary concern in our DOPOR scheme, and we will focus on evaluating the bandwidth cost of this phase.

Figure 5 depicts the total TPA-CSS bandwidth cost for executing POR audit once, for various percents of challenged blocks. Here, with regard to the given parameters and in DOPOR, the percentage of challenged blocks is equal to . It is obvious that the public DPOR of [14] results in a large communication overhead since it must transfer all challenged blocks during each audit, which greatly affects the bandwidth performance. By relying on the technologies of blockless verification and homomorphic authenticators (tags) to compress the proof size, the bandwidth costs of both our DOPOR and the static OPOR (i.e., Fortress) of [12] are only dominated by the sizes of challenges released from TPA and thus gradually increase with the percents of challenged blocks. Note that the bandwidth cost of DOPOR is always less than that of Fortress, since TPA only needs to send a single challenge for each audit in DOPOR, but there are two parallel challenges for TPA to be sent in Fortress. This is due to the fact that, during each audit, Fortress requires CSS to respond with two different responses [12]: one is used by TPA for auditing CSS and the other will be used by client for checking TPA’s work. And thus these two responses correspond to two parallel challenges in Fortress. However, as shown in Section 4.4, DOPOR enables CSS to respond with only one response, which is based on the public key cryptosystem to support both TPA’s auditing and client’s checking. So, within DOPOR, there is only one challenge corresponding to the above sole CSS’s response.

6.2. Read Cost

When client reads a batch of raw blocks from CSS, the integrity of these blocks is guaranteed by the authenticated data structure of the up-to-date buffer U. In Figure 6, we evaluate the extra bandwidth cost (i.e., not including the bandwidth of transferring the blocks themselves) incurred on client side for batch-verifying the leaves of all returned raw blocks with the proposed proof table of bv23Tree, when compared to the costs of the rb23Tree method [6] and the standard MHT method [14] that can only verify all appointed leaves by transferring their respective proof paths. As shown, with the increasing number of blocks batch-read from CSS, the extra bandwidth cost caused by rb23Tree is much higher than MHT, since the basic component of proof path of rb23Tree is an 8-element tuple mark with a larger size. However, owing to the proof table that avoids transferring all repetitive node values within different proof paths, our bv23Tree incurs the lowest extra bandwidth cost among the three dynamic structures. Likewise, as shown in Figure 7, based on bv23Tree, client also further reduces the computation time spent for verifying the integrity of all returned blocks, due to the fact that the proof table enables client to batch-verify all returned leaves together just by computing the tree root hash once, avoiding the straightforward way of rb23Tree and standard MHT that client has to verify different leaves one after another by repeatedly computing the tree root hash with different proof paths.

In conclusion, our results show that the more the blocks batch-read from CSS, the more the repetitive node values omitted for transferring and computing according to proof table, and thus the client will save more costs based on bv23Tree for improving the performance of reads.

6.3. Write Cost

Now, we evaluate the performance of writes for DOPOR and the public DPOR of [14], both of which apply the client-side cache measure for performing writes; that is, client will cache locally a group of raw blocks (contained in the update operations of modification or insertion as in Section 3.4) and write these blocks in a batch to CSS, as shown in the protocol of Section 4.3. For DOPOR of this experiment, the parameter is the number of client-side cached blocks, which determines the parameter according to the erasure-coding rate.

Figure 8 depicts the client-CSS amortized bandwidth cost for writing each 4 KB raw block. With the increasing number of client-side cached blocks, our results show that DOPOR incurs 17%~48% more amortized bandwidth than the public DPOR, due to the fact that DOPOR needs to transfer the additional encoded operations blocks besides the raw blocks. However, recall that the required bandwidth cost for frequent POR audits in the public DPOR is orders of magnitude higher than that in DOPOR (Figure 5), and DOPOR achieves a stronger security level by protecting against malicious TPA and collusion than the public DPOR. Furthermore, as shown in Figure 9, the public DPOR incurs an average of 45% more computation time at CSS side than DOPOR, since during performing writes the public DPOR must rebuild the corresponding levels of an MHT-based hierarchical structure located on CSS’s disk, which results in a lot of additional disk I/O time when compared to DOPOR that does not need to do such rebuilding.

6.4. Client-Side Checking Cost

As shown in [12], although client should not be embroiled in the most frequent POR audits, it is necessary to give client the capability of checking TPA’s past work to protect against the malicious TPA.

Since both DOPOR and the Fortress scheme of [12] adopt the aggregation technology to compress the proof size, the client-TPA bandwidth costs during checking TPA are alike for these two schemes, so in this experiment we focus on measuring the client’s computation time of two investigated schemes when batch-checking TPA’s logs, as shown in Figure 10. Here, one log corresponds to a past POR audit performed by TPA with challenging 1% of the total encoded blocks. Compared to Fortress, DOPOR requires more time at client side for checking TPA, due to the fact that the exponentiation operation on the elliptic curve of DOPOR incurs more computation cost than the module operation of Fortress. However, recall that DOPOR enables the efficient data dynamics that cannot be supported in Fortress, so the additional client-side time cost incurred by DOPOR, that is, the distance between the line of DOPOR and that of Fortress as in Figure 10, can be regarded as the price for dynamism under the OPOR setting. Indeed, this additional dynamism cost can be tolerated by client to a great extent, since the client’s checking against TPA is an optional verification and is only seldom executed in practice [12]; for example, client might just batch-check TPA’s logs once in several months or even a year.

Nowadays, with the rapid development of cloud computing, more and more cloud applications are designed upon the big data stored at CSS side, such as the service quality evaluation [24] and cloud service recommendation [25, 26]. However, how to guarantee the storage security of the big data is a critical challenge for mobile clients in the setting of cloud computing. Proof of Retrievability (POR) is a kind of security measure that builds upon cryptographic proofs to ensure the correctness and retrievability of client’s big data outsourced to cloud. Juels and Kaliski Jr. [7] proposed the first POR scheme by utilizing the “sentinels” technique, where client can conceal some sentinel blocks among other original data blocks for remote POR audits before outsourcing her data. But this proposal can only support a limited number of POR audits, since performing the audits will expose the corresponding sentinels, so the frequent audits cannot be sustained once all sentinels are exhausted. Based on the pseudorandom functions (PRFs) and BLS signatures [21], Shacham and Waters [8] proposed two improved POR schemes with private verification and public verification, respectively. Both of these schemes enable an unlimited number of audits against CSS and simultaneously compress the response of CSS into one aggregated block along with a small authenticator value for optimized auditing bandwidth. Subsequently, Dodis et al. [27] generalized the constructions of [7, 8] by combining the concepts of POR with the coding and complexity theory. In view of the importance of data dynamics, Cash et al. [18] provided a Dynamic POR (DPOR) scheme based on the ORAM technique. Because ORAM will incur the heavy bandwidth overhead for client when performing dynamic updates under the POR setting, by replacing ORAM with the FFT-based constructible code and the hierarchical storage structure, Shi et al. [14] designed a more efficient private DPOR scheme than that of [18] and simultaneously applied the MHT structure to turn this private DPOR into a public DPOR scheme. Furthermore, with the observations made upon previous POR studies, Etemad and Küpçü [19] proposed a general framework to construct efficient DPOR and defend against the selective deletion attack described in [14].

On the other hand, Provable Data Possession (PDP), first proposed by Ateniese et al. [17], is a closely related research direction that focuses on ensuring the integrity of outsourced data. The difference between POR and PDP is that POR applies the erasure code but PDP does not. As shown in [19], the security level of POR is stronger than that of PDP, since POR ensures that the whole outsourced data can be retrieved by client, when compared to PDP that only guarantees the integrity of most of the outsourced data. Given that some existing public auditing schemes [5, 15, 16] are designed without involving erasure code, these schemes can be classed as the variants of PDP. Zhu et al. [4] presented a cooperative PDP (CPDP) scheme for distributed Multicloud Storage setting. Wang et al. [5] designed the random masking technology to protect client’s outsourced data from leaking to TPA during the audits. Erway et al. [13] proposed the first Dynamic PDP (DPDP) scheme to support efficient data updates using Skip List. And then variant authenticated structures were proposed for data dynamics, such as the standard MHT method [9], rank-based MHT [15], multireplica MHT [16], and rb23Tree [6]. However, all these authenticated structures can only verify different leaves one by one, which is an inefficient way for client when there are many leaves that need to be verified.

In addition, as shown in [12], when referring to public verification (auditing), the potential security risk is that TPA might also be malicious. But this risk has not been considered by all the above public schemes. Outsourced Proof of Retrievability (OPOR), proposed by Armknecht et al. [12], is the first scheme to protect against malicious TPA under the public POR setting. However, the OPOR construction of [12] only supports the static data, which is the limitation that should be further solved.

8. Conclusions

As a stronger security model in the context of remote data auditing, Outsourced Proof of Retrievability (OPOR) focuses on dealing with the dilemma that client hopes to resort to TPA for assessing the storage security of her outsourced data, while TPA might be malicious and collude with CSS to cheat client. In this paper, we propose a concrete DOPOR scheme to support data dynamics under the environment of OPOR. Our DOPOR scheme is constructed based on a newly designed authenticated data structure, called bv23Tree, which not only relies on the property of balanced tree to guarantee the expected logarithmic complexity in any case of dynamic updates, but also enables client to batch-verify multiple appointed leaves all together for improved performance. Under the setting of employing erasure code, by separating the updated data from the original data and adopting the hierarchical structure of same-sized levels to uniformly store all encoded data, DOPOR can efficiently support batch reads and updates upon outsourced storage according to the feature of batch verifications of bv23Tree. When compared to the state of the art, our experiments show that DOPOR incurs a lower bandwidth cost for frequent TPA’s audits than the original static OPOR scheme, and the overall performance of DOPOR for reads and writes is comparable to that of existing public Dynamic POR scheme.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work is supported by NSFC (Grant no. 61502044).

References

S. Guarino, E. S. Canlar, M. Conti, R. Di Pietro, and A. Solanas, “Provable Storage Medium for Data Storage Outsourcing,” IEEE Transactions on Services Computing, vol. 8, no. 6, pp. 985–997, 2015.
View at: Publisher Site | Google Scholar
J. Yu, K. Ren, and C. Wang, “Enabling cloud storage auditing with verifiable outsourcing of key updates,” IEEE Transactions on Information Forensics and Security, vol. 11, no. 6, pp. 1362–1375, 2016.
View at: Publisher Site | Google Scholar
K. He, J. Chen, R. Du, Q. Wu, G. Xue, and X. Zhang, “DeyPoS: Deduplicatable Dynamic Proof of Storage for Multi-User Environments,” IEEE Transactions on Computers, vol. 65, no. 12, pp. 3631–3645, 2016.
View at: Publisher Site | Google Scholar
Y. Zhu, H. Hu, G.-J. Ahn, and M. Yu, “Cooperative provable data possession for integrity verification in multicloud storage,” IEEE Transactions on Parallel and Distributed Systems, vol. 23, no. 12, pp. 2231–2244, 2012.
View at: Publisher Site | Google Scholar
C. Wang, Q. Wang, K. Ren, and W. Lou, “Privacy-preserving public auditing for data storage security in cloud computing,” in Proceedings of the IEEE INFO-COM, pp. 525–533, March 2010.
View at: Publisher Site | Google Scholar
Z. Ren, L. Wang, Q. Wang, and M. Xu, “Dynamic proofs of retrievability for coded cloud storage systems,” IEEE Transactions on Services Computing, vol. PP, no. 99, 2015.
View at: Publisher Site | Google Scholar
A. Juels and B. S. Kaliski Jr., “Pors: proofs of retrievability for large files,” in Proceedings of the 14th ACM Conference on Computer and Communications Security (CCS '07), pp. 584–597, ACM, Alexandria, VA, USA, November 2007.
View at: Publisher Site | Google Scholar
H. Shacham and B. Waters, “Compact proofs of retrievability,” in Advances in Cryptology—ASIACRYPT 2008: Proceedings of the 14th International Conference on the Theory and Application of Cryptology and Information Security, Melbourne, Australia, December 2008, vol. 5350 of Lecture Notes in Computer Science, pp. 90–107, Springer, Berlin, Germany, 2008.
View at: Publisher Site | Google Scholar | MathSciNet
Q.-A. Wang, C. Wang, K. Ren, W.-J. Lou, and J. Li, “Enabling public auditability and data dynamics for storage security in cloud computing,” IEEE Transactions on Parallel and Distributed Systems, vol. 22, no. 5, pp. 847–859, 2011.
View at: Publisher Site | Google Scholar
C. Wang, S. S. M. Chow, Q. Wang, K. Ren, and W. Lou, “Privacy-preserving public auditing for secure cloud storage,” IEEE Transactions on Computers, vol. 62, no. 2, pp. 362–375, 2013.
View at: Publisher Site | Google Scholar
H. Tian, Y. Chen, C. C. Chang, H. Jiang, Y. Huang, and J. Liu, “Dynamic-hash-table based public auditing for secure cloud storage,” IEEE Trans. Services Comput, 2015.
View at: Google Scholar
F. Armknecht, J.-M. Bohli, G. O. Karame, Z. Liu, and C. A. Reuter, “Outsourced proofs of retrievability,” in Proceedings of the 21st ACM Conference on Computer and Communications Security, CCS 2014, pp. 831–843, USA, November 2014.
View at: Publisher Site | Google Scholar
C. Erway, A. Küpçü, C. Papamanthou, and R. Tamassia, “Dynamic provable data possession,” in Proceedings of the Proceedings of the 16th ACM Conference on Computer and Communications Security (CCS '09), pp. 213–222, ACM, Chicago, Ill, USA, November 2009.
View at: Publisher Site | Google Scholar
E. Shi, E. Stefanov, and C. Papamanthou, “Practical dynamic proofs of retrievability,” in Proceedings of the 2013 ACM SIGSAC Conference on Computer and Communications Security, CCS 2013, pp. 325–336, Germany, November 2013.
View at: Publisher Site | Google Scholar
C. Liu, J. Chen, L. T. Yang et al., “Authorized public auditing of dynamic big data storage on cloud with efficient verifiable fine-grained updates,” IEEE Transactions on Parallel and Distributed Systems, vol. 25, no. 9, pp. 2234–2244, 2014.
View at: Publisher Site | Google Scholar
C. Liu, R. Ranjan, C. Yang, X. Zhang, L. Wang, and J. Chen, “MuR-DPA: Top-Down Levelled Multi-Replica Merkle Hash Tree Based Secure Public Auditing for Dynamic Big Data Storage on Cloud,” IEEE Transactions on Computers, vol. 64, no. 9, pp. 2609–2622, 2015.
View at: Publisher Site | Google Scholar
G. Ateniese, R. Burns, R. Curtmola et al., “Provable data possession at untrusted stores,” in Proceedings of the 14th ACM Conference on Computer and Communications Security (CCS '07), pp. 598–609, Virginia, Va, USA, November 2007.
View at: Publisher Site | Google Scholar
D. Cash, A. Küpçü, and D. Wichs, “Dynamic Proofs of Retrievability via Oblivious RAM,” in Advances in Cryptology – EUROCRYPT 2013, vol. 7881 of Lecture Notes in Computer Science, pp. 279–295, Springer Berlin Heidelberg, Berlin, Heidelberg, 2013.
View at: Publisher Site | Google Scholar
M. Etemad and A. Küpçü, “Generic efficient dynamic proofs of retrievability,” in Proceedings of the 8th ACM Cloud Computing Security Workshop, CCSW 2016, pp. 85–96, Austria.
View at: Publisher Site | Google Scholar
A. V. Aho, J. E. Hopcroft, and J. D. Ullman, The design and analysis of computer algorithms, Addison-Wesley Publishing Co., Reading, Mass.-London-Amsterdam, 1975.
View at: MathSciNet
D. Boneh, B. Lynn, and H. Shacham, “Short signatures from the Weil pairing,” in Advances in cryptology—ASIACRYPT 2001 (Gold Coast), vol. 2248 of Lecture Notes in Comput. Sci., pp. 514–532, Springer, Berlin, 2001.
View at: Publisher Site | Google Scholar | MathSciNet
“Python Cryptography Toolkit (PyCrypto),” 2014, https://pypi.python.org/pypi/pycrypto.
View at: Google Scholar
“Python 3 bindings for libpbc (Pypbc),” 2017, https://github.com/debatem1/pypbc.
View at: Google Scholar
L. Qi, W. Dou, Y. Zhou, J. Yu, and C. Hu, “A context-aware service evaluation approach over big data for cloud applications,” IEEE Transactions on Cloud Computing, 1 page, 2015.
View at: Publisher Site | Google Scholar
L. Qi, X. Xu, X. Zhang et al., “Structural balance theory-based e-commerce recommendation over big rating data,” IEEE Transactions on Big Data, 2016.
View at: Google Scholar
L. Qi, X. Zhang, W. Dou, and Q. Ni, “A Distributed Locality-Sensitive Hashing-Based Approach for Cloud Service Recommendation From Multi-Source Data,” IEEE Journal on Selected Areas in Communications, vol. 35, no. 11, pp. 2616–2624, 2017.
View at: Publisher Site | Google Scholar
Y. Dodis, S. Vadhan, and D. Wichs, “Proofs of Retrievability via Hardness Amplification,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics): Preface, vol. 5444, pp. 109–127, 2009.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2018 Lu Rao et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

933

Downloads

875

Citations

Wireless Communications and Mobile Computing

Service Migration in Mobile Edge Computing

Dynamic Outsourced Proofs of Retrievability Enabling Auditing Migration for Remote Storage Security

Abstract

1. Introduction

2. Background and System Architecture

2.1. Problem Statement

2.2. Dynamic OPOR (DOPOR) Architecture

2.3. System Model

3. Balanced Authenticated Data Structure

3.1. Batch-Verifications 2-3 Tree

3.2. Batch Queries

3.3. Batch Verifications

3.4. Batch Updates

4. DOPOR Solution

4.1. Cloud Server Storage Configuration

4.2. Initialization

4.3. Data Access Mechanisms

4.4. Outsourced Proof of Retrievability (OPOR)

5. Security Analysis

6. Performance Evaluation

6.1. POR Audits Cost

6.2. Read Cost

6.3. Write Cost

6.4. Client-Side Checking Cost

7. Related Work

8. Conclusions

Conflicts of Interest

Acknowledgments

References

Copyright