Abstract

With the growth of cloud computing technology, more and more Cloud Service Providers (CSPs) begin to provide cloud computing service to users and ask for users’ permission of using their data to improve the quality of service (QoS). Since these data are stored in the form of plain text, they bring about users’ worry for the risk of privacy leakage. However, the existing watermark embedding and encryption technology is not suitable for protecting the Right to Be Forgotten. Hence, we propose a new Cloud-User protocol as a solution for plain text outsourcing problem. We only allow users and CSPs to embed the ciphertext watermark, which is generated and embedded by Trusted Third Party (TTP), into the ciphertext data for transferring. Then, the receiver decrypts it and obtains the watermarked data in plain text. In the arbitration stage, feature extraction and the identity of user will be used to identify the data. The fixed Hamming distance code can help raise the system’s capability for watermarks as much as possible. Extracted watermark can locate the unauthorized distributor and protect the right of honest CSP. The results of experiments demonstrate the security and validity of our protocol.

1. Introduction

1.1. Problem Background

Right to Be Forgotten (RTBF) is a kind of people’s right that was proposed for protecting people’s privacy and has been mentioned as early as 1995 in Data Protection Directive of EU [1]. The 17th article of General Data Protection Regulation (GDPR) [2], which was passed by EU in 2012 to strengthen data protection for individuals in EU, defined RTBF as the right that people deserve to obtain or erase the data expired or related to their privacy from the data controller. In 2013, Senate Bill 568 of California [3] was signed to protect the RTBF of children. In 2014, the European Court of Justice compelled Google to delete the links about a Spanish man’s bankruptcy from its searching results, which confirmed that the RTBF is a basic right for people living in EU. Since then, Google, Facebook, and YouTube have erased tens of thousands of links based on the request of EU citizens [4]. However, the erasure of data cannot be technically confirmed by users if they do not believe their Cloud Service Providers (CSPs). Moreover, cloud computing becomes more and more powerful and economic. Companies like Amazon, Alibaba, and Microsoft have provided cloud computing service to help people manipulating their data more cheaply and easily. If users want to lodge their data in cloud servers to lower the expenses, they need to think carefully about the risk of data leakage. As a result, confirmed deletion and several related ideas can be introduced to deal with this problem, which is also the target of our protocol.

1.2. Related Works

There are two kinds of methods used in confirmed deletion. First comes the encryption. User (U) encrypts his data and transfers it to CSP for storing [57]. Once U wants to delete his data, he just needs to abandon the encryption key and inform CSP that related data are useless. The management of key can be authorized to several Third Parties and use secret sharing technology to prevent conspiracy [8]. Encryption can protect the privacy of data and RTBF in ideal circumstance though it destroys the value of data. When U uses encryption technology, he can only use the storage space of CSP while wasting their ability of computation.

To solve this dilemma, homomorphic encryption (HE) [9] was introduced into this field [10]. Once the data is encrypted by HE algorithm, CSP can calculate data as user ordered while knowing nothing about it. However, HE has some other flaws. For example, it requires user to have the knowledge about what operations they want to do on data before knowing their results. What is more, the full-HE, which can do both addition and multiplication on ciphertext, is unbearably slow and costs a lot. The semi-HE, which is faster, faces the problem of restricted operations. In a word, it is not convenient and economic for using encryption to protect RTBF so far. In our solution, data will be stored as plain text in cloud servers so that U can use the ability of computation completely to manage U’s data.

Other than confirming deletion, not deleted is more easy to be confirmed, which suggests the second way, tracing the unauthorized distribution of data. To the best of our knowledge, watermarking is used in copy deterrence and tracing down the distribution of illegal copies [1113]. This fact indicates watermark can be used to protect RTBF by proving the crimes of CSP. As the successful cases have shown, Google and Facebook were forced to delete [4] those links infringing people’s privacy once U reports them and proves the infringement. But this method faces a new problem that user can use his data to fraud CSP if he can get benefit from lawsuit like defaming the specific CSP or diddling indemnity. If CSP requires embedding another watermark so that he can identify whether the copy is stored in his server, CSP can leak the copy with both watermarks to avoid being charged. Once the embedding process is outsourced to a Third Party, it will raise the risk of information leak from TP. In a word, watermark technology cannot be used to protect RTBF directly.

In this paper, we design a new Cloud-User protocol as the solution based on the work of buyer-seller protocol [14, 15]. We generate and embed the watermark in ciphertext to make sure the watermark can be erased during downloading. By using only one watermark, we increase the SNR of data. Moreover, we introduce the idea of feature-extraction function (FEF), a fixed Hamming distance code into protocol to reduce the cost of searching and increase the capacity of system while maintaining the robustness of watermark system.

The rest of this paper is organized as follows. In Section 2, we will give a brief introduction to the problem models, design goals, and the threat models. Preliminaries will be introduced in Section 3. The proposed solution is described in Section 4 and the security of the scheme is analyzed in Section 5. Section 6 will explain the design of experiment as well as the subalgorithm we used for building demo. The results of each experiment will be analyzed in Section 7. The last section contains the conclusions and future work.

2. Problem Formulation

2.1. Problem Model

The problem model in this paper involves three parties: User (U), Cloud Service Provider (CSP), and the Trusted Third Party (TTP).

User. U possesses large quantities of data and wishes to store it in the CSP’s server. In addition, those data are valuable and need enough computing power to dig out their value. As a result, U wants to store his data on CSP’s server and requires CSP to do some complex operations for him. In our scheme, only U and CSP can touch and manipulate these data. According to RTBF, U has the right to retrieve his data and require CSP to delete it at any time. Once U finds his data that should be deleted, U can suspect that a CSP has distributed his data illegally for interests and require TTP to verify where it comes from. Once confirmed, U can sue CSP for being guilty and ask for a compensation.

Cloud Service Provider. CSP controls piles of servers which have large storage space and powerful computing ability. U can store, manipulate, and delete his data on CSP’s servers only if he pays for it according to contract. Although CSP controls all the data in his servers, he does not have the ownership of data and should take responsibility for their security. CSP can never distribute data whether U cares or not and needs to backup it in case of servers’ crash.

Trusted Third Party. TTP is an arbitration agency who is responsible for generating a valid watermark for every single trade between U and CSP. TTP should be trusty so that his verification can be used as evidence. Besides, TTP should know nothing about U’s data unless U requires TTP to verify whether a specific copy has been marked to be deleted.

Our solution is designed to make sure any one of the three can only know what they allowed to know and do what they required to do. Whoever disobeys the contract will suffer loss.

2.2. Threat Model

In the proposed solution, we assume TTP is selected by U and CSP, so we do not consider TTP will conspire with any one and no one can get payment from him. So there is no conspiracy among our solutions. We should consider the threat that CSP or U can get benefit while offending the other one.

CSP’s Attack. CSP controls all data stored in his servers; he should obey U’s order to manipulate U’s data according to contract. But it may copy U’s data as a backup even after U requires CSP to delete it. Since CSP has a full access to these data in plain text, we can do nothing about his analysis on data and that should be considered in contract. On one hand, CSP may not delete the data as required, and those data are leaked for CSP’s careless management. On the other hand, CSP may deliberately sell these data after U’s delete requirement and even try to adjust it so that U and TTP cannot trace it.

U’s Attack. U possesses the ownership of data. The benefit U that can be gained from CSP is the compensation. On one hand, if CSP is innocent, U can only use the retrieved data and original data to create a copy. On the other hand, once CSP has leaked a part of U’s data files, U may use them to guess other data files’ watermark and forge CSP’s loss.

2.3. Design Goals

This paper aims to design a solution among CSP, U, and TTP that allows U to store his data in CSP’s servers as plain text while providing the remote control according to contract. In particular, we formally detail the goals as follows.

Data Privacy. As claimed in problem model, TTP is responsible for generating watermarks for giving data while TTP should have no access to data’s content. Hence, we carefully design our solution so that TTP can embed the identity watermark directly in the encrypted data.

Nonrepudiation. Any copy of unauthorized data must be identifiable to find the illegal distributor.

Fairness. The proposed solution is secure and fair to all parties. Nobody can frame an honest party.

The Right to Be Forgotten. Acceptable deletion requires no information about the data remaining in servers of CSP. Once CSP does not follow requirements and the bad behavior can be proved by U (i.e., the unauthorized copy is detected). U can require TTP to verify the watermark of leaked copy and provide it as evidence which cannot be denied based on contract.

3. Preliminaries

In our solution, there are four kinds of technology we will use. Each technical method can be adjusted to fit all kinds of data (D). To simplify the declaration, we use image data as an example to introduce our solution and complete our experiment. Here is a simple introduction to these technologies and the restriction our solution required.

3.1. Feature-Extraction Function (FEF)

FEF is used to identify the content of data while getting no detail about it. FEF is an important part of our solution which is used to define the validity of data for U. FEF’s input is data file and the output is a feature (Fea). Once a data file A and its adjusted copy satisfied , we call a derivative copy (DC) of . The set of DC is derivative set (DS).

In our solution, FEF must fit the following requirement.

(1) One-Way Function. For = FEF(), no one can create a DC of A if he only has the knowledge about . This is because Fea of the stored data is shared among all three parties in our solution. This property can make sure only U and CSP can distribute DC of data.

(2) Content-Based. For no digital watermark algorithm can promise that it can resist all attacks, we use FEF as a restriction of watermark extraction algorithm so that our solution can get balance between validity and security. In our solution, U should carefully select FEF to make sure all the valuable copies of original data belong to a DS.

(3) Equiprobability. The set of possible value for Fea must be large enough, and the possibility of each value is equal. This property protects the efficiency of searching process.

3.2. Digital Watermark

Digital watermark (W) is a signal embedded into data to identify some attributions of the data (i.e., ownership). According to the domain embedded, digital watermark embedding algorithms are divided into time-spatial embedding, which is fast and relatively easy to operate but is easy to be erased by geometrical attack, and transform domain embedding [1619], which is good at resisting geometrical attack but is fragile facing filtering. Moreover, according to the preknowledge related to data before embedding, we classify the embedding method into preknowledge dependent embedding and preknowledge independent embedding. In most cases, dependent embedding is more robust than independent one. In our solution, we recommend to use the preknowledge dependent transform embedding method to enhance the security of our solution. Furthermore, our solution requires the following properties that digital watermark embedding algorithm should have.

(1) Markov Property. For a given of length , the embedding and extracting process of has no effect on the process of .

(2) Predictability. Predictability means the embedding positions can be determined only by the length of embeddable positions and the bit length of watermark.

(3) Robustness. Based on the requirement of U, the watermark algorithm should guarantee that the watermark can be extracted from the DS of embedded data.

3.3. Homomorphic Encryption (HE)

Encryption is the most famous method in information security. Homomorphic encryption [18] can translate some operations on plain text into other operations on ciphertext. In our solution, we require that data should be encrypted during transferring and embedding process. For full-HE is slow and costly, we decide to use semi-HE as a compromise that give the consideration to both efficiency and security. We list the requirement of our solution for the semi-HE as follows (() is encryption function, KEY is the encryption key, and is the target information).

(1) Addition Homomorphism

(2) Multimap. The absolute value of each of the encryption results depends on the random number it used in different times:

4. Solution Framework

Our solution contains three protocols based on Public-Key Infrastructure (PKI) that is used for distributing public and private key pair combining to each registered ID. The notation used in protocols has been listed at the end of the paper.

4.1. Uploading Stage

In this subsection, we describe the details about uploading stage, including watermark’s generating and embedding.

Before outsourcing data are transferred to CSP, U needs to embed watermark into his data as shown in Figure 1.

All transferred data are encrypted by CSP’s public key or TTP’s special key. The details of each process are presented by Figure 2 and introduced in the following steps.

Step 1. U sends CSP his ID and service contract to apply for storing and computing his data. The first contract () details the responsibilities and obligations of U and CSP and the subalgorithm, including parameters which CSP needs to know, used in the whole solution. The contract is signed by to make sure of its integrity. Once CSP does not admit the contract, CSP can reject U’s request and the protocol is finished.

Step 2. CSP sends and to U, which is signed by . This step means U’s request has been permitted. has been signed twice to make sure that its content has not been changed and will be used as an evidence in the future.

Step 3. U selects a watermark algorithm matching the requirements declared before and threshold according to Algorithm 1, which will be signed and attached behind , to embed and into his data, where is the watermark capacity of and calculates the differences between original data and embedded data as and according to (3). TESTD is the data embedded with a random sequence of length , which is used to test the robustness of watermark algorithm; the test round can be done more than 1 time for security purpose. U should make sure that the production of embedding process still belongs to DS().Then, U sends IDs, , FEF(), , and to TTP for recording and generating watermark .

Input:
Output: the threshold of watermark matching
  if    then
  return  100
  end
  if  embed   or will change Fea  then
  return  100
  end
  
  
  while  there is attack method has not been tested  do
   AD = attacked TESTD
  
  end

Step 4. TTP generates according to existing data of U that share the same Fea. We present Algorithm 2 as an example for generating watermark here. TTP creates by , , and like Algorithm 3.

Input: SW(Set of exist watermark of U with same
Fea), , L(Capacity of File)
Output: A new watermark
  
  while  flag  do
   randomly generate a sequence t shorter than . for
All item in SW  do
  if  sumthen
   break
  end
  end
  
  end
  return  
Input:
Output:
  n length of W;
  ;
  embedding positions random sequence
from[1,2,...step] of length
  for    to    do
  for    to  step  do
  if    then
   Break;
  end
  else
  if   is in embedding positions  then
  if    then
  
  end
  else
  
  end
  end
  else
   is equal to
  end
  end
  end
  end
  return  

According to Markov property and predictability, Algorithm 3 guarantees that TTP can create an additive watermark based on and . is the same as the difference between original data and its copy embedded with by the selected watermark algorithm. Then, TTP sends the encrypted to U as well as signature. Here, we suggest that TTP use two keys to encrypt . encrypted copy is for embedding, and TTP’s KEY encrypted copy is for erasing in the future which will release the storage burden of TTP.

Step 5. U verifies TTP’s signature to make sure that is valid. Then U uses to encrypt and embeds into according to the addition homomorphism of encryption algorithm as the following proof has shown, which will get the encrypted file (ED) that contains .

Proof.

U sends encrypted by TTP’s KEY and ED to CSP. CSP decrypts ED and stores it. Then, the uploading stage is finished.

4.2. Downloading Stage

The downloading stage is much simpler than the uploading protocol, for has been stored encrypted in CSP’s servers. The data flows are shown in Figure 3.

All data are still encrypted. Details of downloading stage are presented by Figure 4 and introduced as follows.

Step 1. U sends and to CSP. contains the requirement of retrieving or deletion which need erasing from ED. U can use FEF() to help CSP and TTP search the exact data that he wants.

Step 2. After verifying U’s signature, CSP sends and along with encrypted to CSP so that CSP can create reversed watermark to erase from ED.

Step 3. TTP verifies all the information stored in his database. If the information is correct, TTP first decrypts and creates the reversed watermark . Then TTP encrypts it by and sends it back to CSP.

Step 4. CSP embeds encrypted into ED and then sends it to U. U decrypts receiving file to get his data according to the following proof.

Proof.

Once requires CSP to delete U’s data, U cannot download that data in future again, and TTP will create the log of this data and abandon the KEY of .

4.3. Arbitration Stage

When U finds an unauthorized file that belongs to DS(), U can identify the illegal distributor and bring a suit against it.

U should first execute FEF function to get Fea about the leaked data and then provide the , FEF() as well as to TTP. After verifying the information about U, TTP searches the data based on FEF() and to get logs of possible leaked data as set . If is empty, TTP tells U that this data is not recorded in his database. Otherwise, TTP executes according to watermark algorithm and embedding positions of each item of and calculates the bit error ratio (BER) of and as . If there is any below declared in of that data, TTP believes the CSP signed violate U’s RTBF or privacy. TTP will provide a proof with digital signature to U as a legal evidence.

5. Solution Analysis

Our solution proposed above can solve the problem we mentioned in problem model. The safety of our solution relies critically on the security of subalgorithms like watermarking and encryption algorithm. In this section, we will analyze properties we described in design goals and requirement to each party.

5.1. Effectiveness

Our solution can solve the problem of RTBF as we have mentioned. Once CSP want to violate U’s RTBF, he needs to distribute DC() to others. If U finds that copy, he can send it to TTP and ask for arbitration, and CSP’s crime will be proved. Once U wants to fraud an innocent CSP, U should create a copy that belongs to DS() and contains . However, U has no information about in plain text. It is technically impossible for CSP to do that if the encryption algorithm is secure enough.

5.2. Security

The security of our solution is based on the fact that U and CSP cannot get information about . We assume all the subprotocols can satisfy the property we required.

CSP possesses embedded data (ED), and encrypted by KEY or . CSP wants to create a copy of ED-, which is impossible if the encryption algorithm is strong enough. Besides, CSP can try to attack ED so that are larger than . In this case, the robustness of watermark algorithm and FEF function is tested. With the help of and FEF, CSP cannot create a useful copy while maintaining the validity of data for distributing.

U possesses , and in plain text, , ED encrypted by , encrypted by KEY. According to the multimap property, U cannot use and to create in polynomial time. Besides, embedding positions will make it harder for both U and CSP to get information about , though it sacrifices the robustness to some extent. Moreover, considering CSP may leak a part of data and be found by U, U can get a message containing . U may try to use it to guess other watermarks. The watermark generation is completely random and each watermark shares different length and embedding positions. The possibility of creating a DC to match the watermark is . Here, we neglect the possibility that a extract watermark can be recognized as two embedded watermarks.In conclusion, our solution can make sure that U and CSP cannot get DC of the other one’s copy. The robustness of watermark is controlled by U according to the FEF function and watermark algorithm.

5.3. Consumption

U. U outsources local data to CSP for reducing the local data storage space and the cost of complex computing. In our scheme, after uploading data, U can reserve FEF() for reducing the cost of searching. U should also do some computation for encrypting and decrypting data.

TTP. TTP has enough storage space for keeping the records of contracts, IDs, Fea, and watermarks for arbitration parts. In this paper, TTP is designed with memory and some necessary computing powers. TTP can take some fee for arbitration requirement so that it will not be annoyed by unsure request and balance the expenses.

CSP. CSP provides large storage space and strong computing power as service. It is reasonable to put the burden of storing outsourced data as well as encrypted on CSP.

6. System Design

In this section, we will introduce the experiment we used for verifying the validity and security of our solution. We choose image as U’s data to finish our experiment because it is the most popular kind of data used in outsourcing service. Before introducing experiment, we first clarify the subalgorithms we used in our solution.

6.1. Watermark Algorithm

The watermark scheme we used for experiment is Dither Modulation-Quantization Index Modulation (DM-QIM) [20]. It is a classical watermarking scheme and easy for use. Although it has been proved not safe enough [21, 22], it satisfies the requirements we proposed for watermark algorithm.

DM-QIM embeds watermark into transforming domain. It adjusts the value of some coefficients, which is the preknowledge, to embed the message according to (7) where step is the quantizer and is the dither.In the extracting process, we use to reprensent the ouput of EW(). According to (8), we can find that different judgments (Jud), which are guessed result before extracting, will lead to different extracting processes and extract different values because of the quantizer. We can add up all DW() that embed same bit of watermark to measure whether Jud is equal to . In any case, we will get the watermark embedded in the picture.This watermark scheme embeds 1 or 0 into each selected coefficient as Algorithm 3 which means it satisfies the Markov property. In our solution, we split image into several 8 × 8 nonoverlapping blocks firstly and use DCT to transform these blocks into transform domain, which means all the coefficients can be placed in a meaningful place so that TTP can determine which position to embed. Thus, this scheme has predictability.

In our solution, DCT coefficients below 0.4 are chosen for watermarks. Embedding positions are selected according to the value of , where and are the coordinates of the coefficient. In our demo, the step is 100 and the dither is 25.

6.2. Encryption

In our system, we use AES and Paillier [23] as encryption algorithm that can fit solution’s requirements. AES is a famous symmetric encryption algorithm [24] which is fast and safe. Paillier is a semi-HE that supports additive operations in ciphertext according to Algorithm 4 where .

Generate n = q p where p and q are both random
large prime;
Generate g as a random number of
Public key (n,g), Private key (p,q)
#encryption:
randomly select r < n
mod
return  
#decryption:
mod
return  
#addition:
return  

Although Paillier allows user to do addition, negative numbers and decimals are not allowed to calculate. Because and always need to encrypt negative numbers and decimals, we suggest U and CSP do as shown in Algorithms 5 and 6 which can solve this problem.

Input: message m, amp (integer amplifier which
decides the accuracy of data)
Output: pretreated m, balance
  
  if  contains negative  then
  
  end
  else
  
  end
  for    to    do
  
  end
  return  , balance, amp
Input: decrypted dm, amp, balances
Output: message m
  FB=sum(balances)
  for    to    do
  
  end
    return  

Besides, and may leak some information about the image; we recommend that U adds a mark to and which can be subtracted after decrypting .

6.3. Perceptual Hash Algorithm (PHA)

We choose PHA as FEF function for it can reflect the content about image in its low frequency coefficients which is also used in searching engine [25]. Algorithm 7 shows pHash we used as FEF function.

Input: I
Output: hash
  
  =DCT()
  sum=0
  for    to  8  do
  
  
  end
  mid=
  for    to  8  do
  
  end
  return  

7. Tolerance about System

We first evaluate the tolerance about system. We assume U that has uploaded a large number of images to CSP that have been registered in TTP. One of his images, which has been required to delete, is attractive that CSP wants to distribute it for benefits. CSP needs to erase the watermark embedded in image while he knows nothing about the watermarks’ algorithm. So CSP could only use some basic function to attack it. Geometric attacks are not in considered for user that can get the information about watermark by recovering it in most cases, which is dangerous for CSP because U can use this information to create a copy of ED easily. We will consider three types of attack means: JPEG compression (JC), Gaussian filter (GF), and White Gaussian Noise (WGN) to represent the loss compression, filter, and noise attack in the following parts.

We use peak signal-to-noise ratio (PSNR) and bit error ratio (BER) as two indicators that evaluate the performance about our solution. In this section, we will evaluate DC of watermarked Lena provided by each attack mean of different parameters. To compare, the retrieved image’s (Figure 5) PSNR maintains 313 dB in our solution.

7.1. JPEG Compression Test

JPEG compression is one of the most popular compression ways that is used for maintaining the main information in smaller size. We want to examine whether CSP can distribute a compressed version data illegally.

Figure 6 shows that the BER decreased rapidly as the quality factor (QF) grows. When QF is 5, which is not a normal choice for compression, Fea of the attacked picture (Figure 8) has changed. This means that our solution can be against the JPEG compression if .

7.2. Gauss Filter

Filter is the riskiest attack for DM-QIM, since it erases the details within each block of selected scale by adjusting DCT coefficients. As grows, picture will become more and more smooth. The mid one in Figure 8 is attacked by GF with scale = 8 and . It suggests that PHA we have used is not the best way to represent the content of image.

Table 1 and Figure 7 show the PSNR and BER affected by GF in different scales and . We can notice that GF with scale of 2 has no risk to our solution. When is close to 1, BER of attacked image decreases to nearly 0 and the PSNR grows. We consider this as a kind of tolerance to GF. As grows continually, BER grows rapidly and the watermark and the detail of picture are erased.

To be against these attacks means that we can change the watermark algorithm or amplify the step as well as dither, which will introduce more noise to embedded picture. This is completely a trade-off between security and the validity of data. The restriction to in this experiment is 30%.

7.3. White Gaussian Noise (WGN)

Noise is another kind of attack, which will quickly decrease PSNR of image. We use Gaussian noise to attack our picture. GF and compression will erase the details of images. This will help the attacker decrease the noise watermark introduced in and raise the PSNR of picture in some degree. However, WGN introduces more noise into picture (Figure 8) to cover the watermark which will decrease PSNR quickly and change the Fea.

Table 2 shows the change of BER according to WGN. The PSNR of Figure 8 is 15.88 which suggests that should be at least 26% so that they can defend WGN to some extent in our solution.

8. Capacity of System

In this section, we will take the arbitration stage into consideration. We assume that U finds a picture Y which may be a DC of his deleted data. For U may have not backed up his data, U uploads Y, FEF(Y), and to TTP so that TTP can determine which CSP may leak his data. In this experiment, we will test the capacity of our solution, which is the number of watermarks that are embedded into user’s data with same Fea. There are several ways to generate a secure watermark [26]. For simplicity, we choose Algorithm 8 to generate our watermark easily. We named the result of Algorithm 8 as fixed Hamming distance codes, the Hamming distance of each element within answer is no less than the input limits . Fixed Hamming distance code allows us to identify the log about data as well as its contracts. And the watermark will be able to defend the attacks as long as U has to test according to Algorithm 1.

Input: , L
Output:
   = empty set
  threshold=L
  flag=1
  while  not reach time-limit  do
   random generate a temp watermark t of length L
   flag=1
  for  each w in   do
  if  sum(XOR()<threshold)  then
   flag=0
   break
  end
  end
  if  flag  then
   Add t into
  end
  end
  return  

The final capacity of a TTP will be calculated by (9) where cap(D) means the span of D’s value.

8.1. Result

We first use 30% as threshold, 10 seconds as time limit and 128 bits as the length of watermark. We get at least 1000 watermarks. we select the 500th watermark as the embedded watermark and do attacks as Table 3 presents.

The results are presented by Figure 9 (The -axis represents BER and -axis represents sequence number of images). It suggests that our protocol can identify the certain data of it within our database and charge the CSP successfully under the predicted attacks. Fixed Hamming distance code makes sure that the robustness of this protocol is only determined by watermark algorithm and encryption method. The third picture in each line of Figure 9 shows that if the picture is overattacked, we cannot determine the source of the picture from watermark.

In addition, we raise the threshold to 40% which results in a quickly decreasing of capacity. We can only get 60 watermarks within 10 seconds. We select the 30th watermark for embedding and do the same tests. The results are shown in Figure 10 (The -axis represents BER and -axis represents sequence number of images). It suggested that raising up threshold is not economic to increase the robustness of watermark algorithm for it decreases the number of watermarks largely.

9. Conclusion

In this paper, we propose a Cloud-User protocol as a solution to solve the Right to Be Forgotten problem technically. Our solution supports confirmed deletion of plain data that is stored in CSP’s servers. To achieve security goals, our solution combines the existing homomorphic cryptography, watermark techniques, minimum Hamming coding, and the content-based feature extraction so that the innocent party will not suffer losses by the other one’s attack. We implement a prototype of our solution to demonstrate its availability and practicality.

10. Future Work

For future work, there are still some aspects worth thinking. Firstly, the algorithms we used in prototype are not the best ones that fit our solution. Choosing a better encryption algorithm and watermark scheme may decrease the cost of communication and computation for U and TTP.

Secondly, a better FEF can help protecting U’s right and raise the robustness of our protocol. We treat the combination of FEF and watermark scheme as the most challenging question for our solution.

Thirdly, every time user retrieves his data will cost a lot for all three parties. Designing a better drawing back protocol can raise the efficiency of our solution.

Finally, as a large pile of data is plain text in CSP’s server, how to provide preview of data base on its content like existing systems [27] in low cost while not leaking the information of watermarks is waiting to be solved.

Notations

CSP:Cloud Service Provider
U:User
TTP:Trusted Third Party
D:Data
W:Digital watermark
:Derivative copy of
:Derivative set of
:Watermark extraction function that extracts watermark from
:Watermark embedding function that embeds into
:Feature of
:The th round contract
:Identity of
:Key for symmetric cryptography
:Public key of
:Private key of
:A string generated by combination of that has length
:Decryption function for while key is KEY/PUB
:Encryption function for while key is KEY/PUB
:The difference between and as .

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported in part by the Natural Science Foundation of China under Grants U1636201 and 61572452.