Abstract

Outsourcing data in clouds is adopted by more and more companies and individuals due to the profits from data sharing and parallel, elastic, and on-demand computing. However, it forces data owners to lose control of their own data, which causes privacy-preserving problems on sensitive data. Sorting is a common operation in many areas, such as machine learning, service recommendation, and data query. It is a challenge to implement privacy-preserving sorting over encrypted data without leaking privacy of sensitive data. In this paper, we propose privacy-preserving sorting algorithms which are on the basis of the logistic map. Secure comparable codes are constructed by logistic map functions, which can be utilized to compare the corresponding encrypted data items even without knowing their plaintext values. Data owners firstly encrypt their data and generate the corresponding comparable codes and then outsource them to clouds. Cloud servers are capable of sorting the outsourced encrypted data in accordance with their corresponding comparable codes by the proposed privacy-preserving sorting algorithms. Security analysis and experimental results show that the proposed algorithms can protect data privacy, while providing efficient sorting on encrypted data.

1. Introduction

With the profits from data sharing and parallel, elastic, and on-demand computing, clouds are becoming more and more popular with companies and individuals. Many kinds of services are provided by cloud service providers (CSP), such as Amazon EC2 and Alibaba Cloud. As one of the most important technologies, machine learning is very useful and wildly adopted in many areas, such as prediction [1, 2] and multimedia data processing [3, 4]. And it usually utilizes huge data volume, such as wireless multimedia data and human health data, to build intelligent models and systems for practical applications. Due to the need of large and elastic scale of storage and computing resources, those huge volume data are usually processed in clouds [57]. Data owner (DO) outsources their data in the cloud server (CS) for on-demand services which enhance the efficiency of complex computation such as machine learning and save the hardware/software cost.

However, in the cloud environment, DO lose direct control of their own data placed in remote CS, which may cause concerns about their outsourced data being illegally acquired or abused by CSPs, especially for sensitive data, such as national defence data and human health data. Although many CSPs claim that they deployed several safety measures in CS, such as access control, firewalls, or intrusion detection, doubts about the privacy of outsourced data obstruct the promotion and application of cloud computing. How to preserve the security and privacy of DO’s outsourced data while CS providing reliable and efficient computing services has become a hot issue [810].

Data encryption is a common technique to protect the privacy of outsourced data on clouds, such as sensitive wireless multimedia data and human health data. Sorting is one of the basic methods in practical applications, such as machine learning, service recommending, and data query. However, applying to sort over encrypt data on clouds is a challenge without leaking private information. The existing privacy-preserving sorting algorithms based on order-preserved encryption (OPE) [1114] have security problems [15]. In addition, privacy-preserving sorting algorithms based on fully homomorphic encryption (FHE) [1619] are too slow because of the complexity of FHE. It is significant to research the efficient privacy-preserving sorting algorithms for clouds.

In this paper, we assume that the honest-but-curious threat model [20] is adopted where CS strictly abides by established protocols but has the curiosity to snoop on DO’s private data. On the basis of the threat model, we propose privacy-preserving sorting algorithms based on the logistic map. The main contributions of this paper are as follows. Firstly, by introducing the logistic map, we propose a secure comparison model which can be utilized to compare data without knowing their real values. Secondly, we give a data preprocessing algorithm. Data owners preprocess their private data by a symmetric encryption and logistic map. The encrypted data and corresponding comparable codes are generated and then outsourced to clouds, where the former is to protect data privacy, while the latter is to support secure comparisons. Finally, on the basis of secure comparison model, we propose privacy-preserving sorting algorithms for clouds. Also, security analysis and performance experiment are given, where results show that the proposed algorithms can protect data privacy from curious cloud administrators while providing efficient sorting on encrypted data.

The paper is organized as follows. Section 2 describes the related work. Section 3 gives the problem descriptions. Section 4 gives notations and necessary preliminaries. In Section 5, we firstly present secure comparison model on the basis of the logistic map, and then the data preprocessing algorithm and privacy-preserving sorting algorithms are given. Section 6 analyzes the security of our schemes. Section 7 gives evaluations on correctness, correlation coefficient, and performance of our proposed schemes.

There are two kinds of methods achieving encrypted data sorting on clouds, one is sorting algorithms based on order-preserving encryption (OPE) [1114], and the other is sorting algorithms based on fully homomorphic encryption [1619].

Agrawal et al. [11] originally proposed an OPE method which is a deterministic encryption scheme whose encryption function preserves numerical ordering of the plaintext. Due to the unachievable of the indistinguishability against chosen-plaintext attack (IND-CPA) in [11], Boldyreva et al. proposed an efficient OPE scheme [12] which is based on a natural relation between a random order-preserving function and the hypergeometric probability distribution. Jaiman et al. [13] proposed an OPE algorithm by introducing shuffling, impurity insertion, and randomness in order-preserving functions. Liu et al. [14] propose a new OPE model which uses message space expansion and nonlinear space split to hide data distribution and frequency. Any proposed OPE is clearly suitable for application of privacy-preserving sorting in clouds if the data security is ensured. However, OPE is vulnerable to ciphertext-only attack [15], especially when encrypted data are massive. Therefore, those sorting algorithms based on OPE have potential security risks.

Gentry et al. [21] proposed the fully homomorphic encryption (FHE) which is a special encryption algorithm which allows computation (such as addition and multiplication) on the ciphertext. Melchor et al. [16] give an idea about sorting encrypted data by FHE. Chatterjee et al. [17] propose the sorting algorithm over encrypted data on the basis of FHE. They tried to get higher sorting efficiency by reducing costs of reencryption. Afterwards, they applied the algorithm to clouds [18, 19]. The volume of encrypted data generated by FHE is very large, due to the inclusion of big floating-point numbers which take the place of numerous storage space. Thus, the calculation of comparison for sorting based on FHE is very complex and its time efficiency is also very slow. Since the fully homomorphic encryption based sorting requires CS to reencrypt data frequently, it is not suitable for storing and managing big data on clouds.

To support efficiency and privacy in sorting algorithms for cloud environments, we propose logistic map based privacy-preserving sorting algorithms in this paper, the abstract of which has been shown in [22].

3. Problem Description

The model of the privacy-preserving sorting for clouds, proposed in this paper, is similar to recent works [18, 19]. It mainly consists of two entities, data owner (DO) and cloud server (CS). The interactions between DO and CS are introduced as follows: firstly, DO encrypts its sensitive data and generates corresponding codes which are used for privacy-preserving sorting. Then DO outsources the encrypted data and codes to CS. Secondly, CS stores the data uploaded by DO and performs data sorting over the received data. Any proved secure symmetric encryption could be adopted, such as DES and AES. If authorized users want to access DO’s sensitive data, they can get the encrypted data from CS and perform decryption to obtain the plaintext data by using the shared key with DO.

In this paper, we assume that CS provides services following the curious-but-honest threat model [20]. CS is assumed to strictly follow the established protocols, but it attempts to snoop on DO’s private data. There are two kinds of attacks: (1) CS has already known DO’s preprocessing algorithms but does not know its initial parameters, and it tries to use exhaustive attacks against to encrypted data for plaintext information; (2) because of the massive quantity of outsourced encrypted data, statistical attacks are common methods for CS to analyze the distribution of ciphertext and speculate on the relationship between the ciphertext and plaintext to obtain plaintext information.

We focus on privacy-preserving sorting for clouds, and the key issues of this paper are introduced as follows: (1) privacy protection on outsourced data: DO preprocesses its plaintext data to keep it in confidential. Thus, CS cannot obtain DO’s plaintext information via the outsourced data; (2) privacy-preserving sorting on encrypted data: if a symmetric encryption algorithm is adopted, the privacy is guaranteed, but the encrypted data is hard for data sorting. Therefore, privacy-preserving sorting algorithms have to support sorting on encrypted data even without knowing the values.

4. Preliminaries and Notations

4.1. Preliminaries

Chaos theory [23, 24] originated in the 1960s, which has been widely adopted in medicine, astrophysics, image encryption, and hydromechanics. The basic characteristic of chaotic motion is extremely sensitive to the initial value. The difference between two chaotic motions with different initial values will become larger and larger over time. Therefore, on the basis of any given initial conditions, the chaotic motion is unpredictable.

Logistic map [2527] is one of the important and practical chaotic motions and has been widely used in data encryption [2832]. The equation of the logistic map is shown aswhere , is called bifurcation parameter, and is the iteration number. Studies [2527] show that the sequence generated from (1) is chaotic if . The output of such logistic map is extremely sensitive to the initial parameters. Any minor changes of initial parameters will lead to a tremendous difference of outputs. Therefore, the sequences generated by the logistic map are unpredictable.

4.2. Notations

The notations used in this paper are described as shown in Table 1.

5. Privacy-Preserving Sorting Algorithms

5.1. Secure Comparison Model Based on Logistic Map

The chaos characteristic of the logistic map can be used to compare data values secretly under certain conditions. Current work such as [2832] mainly focuses on data encryption with logistic map algorithms, but there is no literature discussing the secure comparison of encrypted data with the logistic map.

If we use the logistic map for data comparison directly, then we may get wrong comparison results during sorting. But if proper constraint factors are introduced in the logistic map, we will get correct compare results invariably. The main idea of the proposed secure comparison model based on the logistic map is briefly given by three lemmas as follows.

Lemma 1. For any given data , where , is settled as the initial value for the logistic map function, i.e., (1), where is the constraint factor, where and . Then we have

Proof. We use mathematical inductions to prove Lemma 1 as follows.
(1) When , according to (1), we haveAccording to the given assumptions and , we have , and then is deduced.
(2) When , according to (1), we haveAccording to the conclusion of (1), we have 1/2 < 1-L(t/x, 1) < 1 and . In addition, because of , then we deduce thatAccording to the given assumption when and the deduced result in (3), we deduce (6) from (5), whereDue to , then we have that holds.
(3) We assume that Lemma 1 holds when , i.e., . According to the assumption /(2·μn) and the deduced result in (3), we haveWhen , according to (1), we haveAccording to the assumption of (3), i.e., , we haveOn the basis of (7), (8), and (9), we haveTherefore, we have that holds.
According to the above mathematical induction proofs, we have that Lemma 1 holds.

Lemma 2. For any given data and , where , let and as initial values for the logistic map function, i.e., (1), and the nth iteration results are and , respectively, then we havewhere , , and .

Proof. We also use mathematical inductions to prove Lemma 2 as follows.
(1) When , according to (1), we haveandAfter applying subtraction between and , we have Since and /(2·μ) are given conditions when , we can easily deduceTherefore, holds.
(2) When , according to (1), we have andAfter applying subtraction between and , we haveIn accordance with the given conditions , /(2·μn), and , we haveThen we have and According to (18), (20), and (21), we deduce thatTherefore, holds.
(3) We assume that Lemma 2 holds when ; then we haveWhen , according to (1), we haveand After applying subtraction between and , we haveAccording to Lemma 1, we have < 1/2 and < 1/2; then we deduceAccording to (7), (23), and (27) and the given condition , then we haveTherefore, holds.
According to the above mathematical induction proofs, we have that Lemma 2 holds.

Definition 3. For a given data item , is denoted as the corresponding comparable code of , where is the logistic map function as (1).

Lemma 4. For a given data set , where and , we can get the corresponding comparable codes set , where . Then we havewhere and .

Proof. To prove Lemma 4, we have to prove the sufficiency and necessity of Lemma 4, respectively, i.e., and .
(Sufficiency) According to Lemma 2, we can easily deduce when , where = L(t/, n) and . The sufficiency of Lemma 4 is proved.
(Necessity) We prove the necessity of Lemma 4 by contradiction. Assuming that > holds when , where = L(t/, n) and = L(t/, n); then we have L(t/, n) < L(t/, n) according to Lemma 2, i.e., < . It is obvious that the derivation is inconsistent with the given hypothesis . Therefore, if we have , where =L(t/, n) and =L(t/, n), then holds.
In accordance with the sufficiency and necessity proofs, we have that Lemma 4 holds.

According to Lemma 4, the computation of comparable is order-preserving reversely with the increasing of input data. For any two real numbers both larger than 1, we can achieve the comparison by comparing their corresponding comparable codes. Obviously, such comparison does not need to know the real values of them. If the given real numbers are less than 1, they are still comparable by using our proposed secure comparison model based on the logistic map. For example, if they are less than -1, the corresponding absolute values will be bigger than 1. And if they are between -1 and 1, by adding the constant number 2, then the result data will be also bigger than 1. Therefore, any two real numbers can be compared. As a result, we have that the proposed secure comparison model based on the logistic map is capable of performing data comparison without knowing their corresponding values. In order to describe conveniently, we focus on the data larger than 1 in the subsequent chapters.

On the basis of secure comparison model, the privacy-preserving sorting mechanism is proposed in the next sections, including the data preprocessing algorithm and privacy-preserving sorting algorithm. The brief flowchart of our proposed work is shown in Figure 1.

5.2. Data Preprocessing Algorithm

DO preprocesses its outsourced data with encryption and logistic map in order to protect private data from CS and support privacy-preserving sorting in CS. We use a symmetric encryption algorithm such as DES and AES to preserve data privacy and the logistic map is utilized to generate comparable codes for secure comparison.

We assume that the outsourced data of DO are . After data preprocessing, a data pair = (e, c) will be generated for , where and are the corresponding encrypted data and comparable code of . In addition, we assume that is a private key, while and are bifurcation parameter and number of iterations of the logistic map function, respectively. And , , and are all owned by DO privately. The data preprocessing algorithm based on the logistic map (DP-LM) is shown in Algorithm 1.

Begin
(1) ;
(2) FOR    DO
(3)  Create a data pair for ;
(4)  ;  //generate encrypted data
(5)  ;  //generate comparable code
(6) END FOR
(7) Upload and outsource to CS;
END

In Algorithm 1, is to randomly pick a float number between 0 and , is to get the minimum of the set , is to encrypt with private key by a symmetric encryption, and is a logistic map function as (1). After finishing data preprocess, the generated data pairs will be uploaded and outsourced to CS.

5.3. Privacy-Preserving Sorting Algorithm

Privacy-preserving sorting is performed in CS after receiving the outsourced data from DO. Obviously, traditional sorting algorithms (e.g., merge sort, quicksort, and heap sort) cannot solve the problem of sorting encrypted data items, but by introducing the proposed secure comparison model, the encrypted data will be sorted by using the corresponding comparable codes.

We give the privacy-preserving quick sorting algorithm based on the logistic map (PQS-LM) for sorting over encrypted data in CS. The specific implementation of PQS-LM is shown in Algorithm 2.

Begin
(1) IF  start < end  THEN
(2) = start, = end +1;
(3)WHILE TRUE DO
(4)WHILE    DO
(5)++;
(6)END WHILE
(7)WHILE    DO
(8)--;
(9)END WHILE
(10)IF    THEN
(11);
(12)ELSE
(13)Finish the current loop and start the next loop;
(14)END IF
(15)END WHILE
(16);
(17)PQS-LM(, start, j-1);
(18)PQS-LM(, , end);
(19) END IF
END

In Algorithms 2, is to swap the positions of elements and . The comparable codes are compared during sorting procedures, and the number of comparisons determines the efficiency of sorting. The complexity of the comparison based on our proposed model is equivalent to the comparison of plaintext. Therefore, the time complexity of PQS-LM is .

Other classic sorting algorithms, such as merge sorting and heap sorting, can also be improved to be the corresponding privacy-preserving sorting algorithms as Algorithm 2 on the basis of the proposed secure comparison model. We denote the privacy-preserving merge sorting and heap sorting as PMS-LM and PHS-LM, respectively. Because the implementation ideas are similar to PQS-LM, we omit the details of those algorithms. The analysis and performance evaluations of our proposed privacy-preserving sorting algorithms will be given in the latter sections.

6. Security Analysis

There are two types of data outsourced in CS. One is the encrypted data generated by a symmetric encryption, and the other is the comparable code generated by the logistic map. The former is to protect data privacy, while the latter is to support secure comparisons. For encrypted data, it has an identical security level with the adopted symmetric encryption. For comparable codes generated by the logistic map, we conduct security analysis as follows.

(1) Space of initial parameters: the data preprocessing algorithm in this paper is based on a logistic chaotic system. The corresponding parameters are initialized before preprocessing, including the number of iterations , the constraint factor , and the bifurcation parameter μ. We assume that the attacker uses an exhaustive attack against the initial parameters. The precision of and μ is assumed to be 10-p and 10-q, respectively. The space of initial parameters is 10p+qn. For example, if we take , and randomly pick from the interval , then the space of initial parameters is 1067. It is computation infeasible to commit successful attacks by using exhaustive search in such a large space.

(2) Sensitivity of initial parameters: since the sequence generated by the logistic map is extremely sensitive to initial parameters, any small modification of them leads to completely different results. For example, we take the same and , where μ = 3.95362 and , while we take two different constraint factors which are very close to each other, such as =1 × 10−8 and =2 × 10−8. For the real number 11, we will get two completely different comparable codes = 5.62 × 10−8 and = 1.12 × 10−7.

(3) Antistatistic ability: the logistic map has good cryptographic properties such as sensitivity to initial parameters, driven by white noise, unpredictability, etc. [31]. Even if an attacker obtains some statistic information about the input data and the corresponding comparable codes, he or she still cannot get configurations of initial parameters. Lots of simulation cases show that the data generated by the logistic map with different initial parameters are in equi-distribution [29, 32] which can prevent statistical attacks. In addition, we will give the correlation coefficient evaluation in the next section to analyze the antistatistic ability quantitatively.

As a result, we have that our proposed algorithms can support sorting over encrypted data while preserving data privacy.

7. Experiments

In this section, we give the correctness, correlation coefficient, and performance evaluations of our proposed method. The experimental datasets are generated by a random number generator. The software environment of the experiment is Windows 10 and NetBeans 8, and the hardware environment is Core i5 5200U and 8 GB DDR3 RAM.

7.1. Correctness Evaluation on Secure Comparison Model

We proposed the secure comparison model which is on the basis of the logistic map. It is the foundation of achieving the privacy-preserving sorting algorithms. Theoretical proofs are given to prove the correctness of the secure comparison model in the above sections, such as the proofs in Lemmas 1, 2, and 4. Additionally, we give the correctness evaluation on the proposed model by quantitative experiments.

In this evaluation, almost 100 thousand random numbers are generated as the input, and the corresponding comparable codes are calculated by the logistic map function with the initial parameter configuration as μ=3.67435, , and /(2×μn). The diagram of the input data and the corresponding comparable codes are shown in Figure 2.

Figure 2 shows that the values of comparable codes decrease along with the increasing of the input data values. It indicates that the comparable codes computation is with the order-preserving property which is consistent with the proposed conclusions of our proposed secure comparison model. Therefore, the experimental result has verified the correctness of the security comparison model quantitatively.

7.2. Correlation Coefficient Evaluation on Secure Comparison Model

We use the Pearson correlation coefficient formula [33] to analyze the correlation between the input data and the corresponding comparable codes generated in secure comparison model. The correlation coefficient formula is shown aswhere and L(t/, n) are the input data and corresponding comparable codes, respectively, is the correlation coefficient factor, and is the evaluated dataset. We calculate correlation coefficients on the basis of five datasets which are generated by a random number generator, and the results are shown in Table 2.

According to the result of Table 2, we can see that the average correlation coefficient decreases with the increasing of dataset scales. And the average correlation coefficient is very small which indicates that the correlation between the input data and corresponding comparable codes is negligible. Therefore, our proposed secure comparison model is with the antistatistic ability.

7.3. Performance Evaluation on Algorithms

We implement our proposed logistic map-based data preprocessing and privacy-preserving sorting algorithms. To make comparisons with related works, we adopt the classic Boldyreva’s order-preserving symmetric encryption (OPE) [12] to implement privacy-preserving sorting, which is more secure than [11] and more efficient than [13, 14]. We denote the OPE based data preprocessing as DP-OPE and denote the OPE based privacy-preserving quick sorting, merge sorting, and heap sorting algorithms as PQS-OPE, PMS-OPE, and PHS-OPE, respectively. Then we evaluate and compare the time cost performance of those algorithms.

It is noticeable that there are fully homomorphic encryption (FHE) based privacy-preserving sorting schemes proposed in [18, 19]. But they are too slow because of the complexity of FHE. The experiments of them show that thousands of seconds are consumed even sorting only 40 encrypted data items. Thus, we do not choose them to implement performance comparisons.

7.3.1. Evaluation on Time Cost of Data Preprocessing

The time cost of DP-OPE and DP-LM is evaluated on the basis of five given datasets. The experimental result is shown in Table 3.

Table 3 shows that the time costs of DP-LM and DP-OPE are both increasing along with the expansion of datasets, but DP-LM is obviously much faster than DP-OPE. The reason is given as follows. DP-OPE needs to execute order-preserving encryption for plaintext by mapping amount of consecutive integers in a domain to integers in a much larger range. Each integer is assigned a pseudorandom value in its subrange. The OPE algorithm recursively bisects the range and samples from the domain at each recursion until it hits the input plaintext value. Thus, the calculation load of OPE is higher relatively which makes DP-OPE much slower than DP-LM.

7.3.2. Evaluation on Time Cost of Data Sorting

We also use the same datasets to evaluate the privacy-preserving sorting algorithms based on the logistic map and OPE. The result is shown in Table 4.

The experimental result in Table 4 shows that the performance of our proposed privacy-preserving sorting algorithms is better than those sorting algorithms based on OPE, respectively. The reason is that the output data of OPE, which is used for privacy-preserving sorting, is more complex than the comparable codes generated by the logistic map.

8. Conclusions

When the clouds provide outsourcing services, the privacy of outsourced data, such as national defence data and human health data, can be protected by common encryption. However, those encrypted data are useless for data sorting which is a common operation in many areas, such as machine learning, service recommending, and data query. It is a challenge to achieve privacy-preserving sorting in clouds. In this paper, we introduce a secure comparison model based on the logistic map and propose privacy-preserving sorting algorithms. The security analysis and experimental result show that the proposed algorithms can protect data privacy while providing efficient sorting on encrypted data.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Disclosure

The abstract of this paper appeared in the 4th International Conference on Cloud Computing and Security (ICCCS 2018), June 8-10, Haikou. This version is the full paper.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This research was supported by the National Natural Science Foundation of China under Grant nos. 61872197, 61572263, 61672297, 61502251, and 61472193; the Natural Science Foundation of Jiangsu Province under Grant nos. BK20151511, BK20141429, and BK20161516; the Postdoctoral Science Foundation of China under Grant no. 2015M581794; the Natural Science Foundation of Anhui Province under Grant no. 1608085MF127; and the Natural Research Foundation of Nanjing University of Posts and Telecommunications under Grant no. NY217119.