Abstract

Order-preserving encryption (OPE) that preserves the numerical ordering of plaintexts is one of the promising solutions of cloud security. In 2013, an ideally secure OPE, which reveals no additional information except for the order of underlying plaintexts, was proposed, along with the notion (mutable encryption) that ciphertexts can be changed. Unfortunately, even the ideally secure OPE can be vulnerable by inferring the underlying frequency of repeated plaintexts. To solve this problem, in 2015, Kerschbaum designed a frequency-hiding OPE (FH-OPE) scheme based on the notion of a randomized order under the strengthened security model. Later, Maffei et al. has shown that Kerschbaum’s model is imprecise, which means no such OPE scheme can exist. Moreover, they provided a new FH-OPE scheme under the corrected security model. However, their scheme requires the order information of all the encrypted plaintexts as an input; therefore, it causes relatively high overhead during encryption. In this work, we propose a more efficient FH-OPE based on Maffei et al.’ s security model and also present an improved update algorithm suitable for duplicate plaintexts.

1. Introduction

Cloud storage has become a common practice in recent years, but it still has privacy concerns with respect to the service provider hosting the data. In these data-outsourcing scenarios, encryption is one of the most reliable solutions. However, the existing normal encryption schemes have limitations; for instance, it is impossible to perform operations, e.g., range query, on encrypted data. To perform such operations, the client has to download all the encrypted data and decrypt them. To overcome these limitations, few solutions have been proposed by slightly weakening the security of the normal encryption schemes. Order-preserving encryption is one of the promising solutions and allows a client to perform efficient range queries on the encrypted data because it maintains the ordering of plaintexts in ciphertexts.

1.1. Related Works

The first concept of order-preserving encryption was introduced by Agrawal et al. [1]. In 2009, Boldyreva et al. [2] presented the first formal security notion of OPE, which is called indistinguishability against ordered chosen plaintext attacks (IND-OCPA). Moreover, they showed that any stateless OPE cannot guarantee the IND-OCPA security unless the ciphertext space is exponentially large in the plaintext space. They also presented the weaker security notion, which is known as pseudorandom order-preserving function advantage under chosen ciphertext attacks (POPF-CCA). However, this security model does not precisely quantify the leakage information of plaintexts. Later, Boldyreva et al. [3] and Xiao and Yen [4] showed that ciphertexts of [2] scheme leak approximately the first half bits of the underlying plaintexts. Yum et al. [5] improved Boldyreva’s construction by extending their work to nonuniformly distributed plaintexts but still remained in the same security level of random order-preserving functions. Subsequently, few OPE schemes [615] that provide no formal security proof were proposed, but rather they provided an ad hoc security analysis.

Recently, some ideally secure (IND-OCPA secure) OPE schemes [1620], which are stateful or interactive, have been proposed. Popa et al. [18] developed an interactive model for clients and servers as a two-party protocol. The client encrypts plaintexts using a deterministic OPE algorithm and sends them to the server that maintains a search tree where ciphertexts are stored. When the client wants to perform range queries on the encrypted data, the server exploits the search tree. Moreover, they presented a notion of mutable encryption, which means that ciphertexts can be updated to achieve the IND-OCPA security. Their interacting scheme requires a large amount of communication. In 2014, Kerschbaum and Schröpfer [19] presented a revised ideally secure OPE scheme where the client stores the search tree. This approach makes it possible for their scheme to incur lower communication cost than that proposed by [18].

To solve the problem of deterministic OPE [1, 3, 18, 19, 21, 22] that are vulnerable to frequency analysis, sorting, and cumulative attack [23], Kerschbaum [16] presented a new frequency-hiding OPE to apply randomization to duplicate plaintexts. In addition, they introduced a stronger security notion than IND-OCPA, which is known as indistinguishability against frequency-analyzing ordered chosen plaintext attacks (IND-FA-OCPA). In 2017, Maffei et al. [17] has shown that Kerschbaum’s security model is imprecise. Therefore, they designed a new construction based on the corrected security model. However, their scheme causes relatively high overhead during encryption due to requiring the order information of all encrypted plaintexts. Moreover, we figure out that the update algorithm used in [16, 17] cannot guarantee to produce a perfectly balanced search tree when duplicate plaintexts are encrypted.

Yang et al. [24] presented a semiorder-preserving encryption (SOPE) although with the sacrifice of the precision of order-preserving. In this scheme, two different plaintexts may be encrypted to the same ciphertext; thus, the ciphertext sequence cannot be mapped to a plaintext. Dyer et al. [25] presented OPE scheme based on the general approximate common divisor problem (GACDP). This approach is the first OPE scheme using a computational hardness, not on a security game. Like Liu and Wang [9], their scheme adds random noise to the initial plaintext so that if there are duplicate plaintexts, the ciphertexts seem like distinct. Kim [26] showed a new OPE scheme based on order-revealing encryption (ORE) and improved the round and client side storage complexities on the exiting ideally secure OPE [16, 20]. Tueno and Kerschbaum [27] introduced an oblivious OPE (OOPE) as an equivalent of a public-key OPE; they also showed a protocol for OOPE that combines existing ideally secure OPE [16, 19] with Paillier’s homomorphic encryption and garbled circuits. In [28], Taigel et al. presented a real-life use case that combines OPE and decision tree classification to enable privacy-preserving forecasting of demand for spare parts based on distributed condition data. In [29], Meng and Feigenbaum described an application of OPE that combines OPE, pseudorandom functions (PRFs), and additively homomorphic encryption (AHE) to design a privacy-preserving XGBoost inference algorithm, that is, to create an encrypted regression tree.

1.2. Our Contributions

Table 1 shows the comparison of the existing FH-OPE schemes. As mentioned before, the original definition of IND-FA-OCPA of [16] is imprecise, and their FH-OPE scheme is insecure under the security model that they proposed. In fact, no FH-OPE scheme that can be proven under their security model can exist. The scheme of [17] guarantees the IND-FA-OCPA security that has been revised to be feasible, but the client has to maintain the order information of all the encrypted plaintexts to date; this maintenance causes inefficiency in the client’s persistent storage and the encryption performance.

To summarize, our contributions are as follows:(i)We propose a more practical FH-OPE scheme compared with the previous schemes. Our scheme does not require the order information of all the encrypted plaintexts; thus, the client does not need to maintain them. The security of the proposed scheme can be proven considering the IND-FA-OCPA security model of [17].(ii)We figure out that the update algorithm in [16, 17] is not suitable for random duplicate distributions. Moreover, it cannot guarantee to produce a perfectly balanced search tree when duplicate plaintexts are encrypted. To overcome this problem, we propose an improved update algorithm. The proposed algorithm always produces a perfectly balanced search tree regardless of the distribution of plaintexts and positively affects the overall performance of FH-OPE.(iii)We implement the schemes of [16, 17] and the proposed scheme and evaluate the schemes based on different plaintext distributions. Among others, the implementation results show the excellence of our scheme.

1.3. Outline

In Section 2, we recall the formal notion of (stateful) OPE and its security definitions. In Section 3, we analyze the scheme proposed by [17] and present that the scheme still needs to be improved in terms of storage and computational complexity. Section 4 proposes a new practical FH-OPE scheme and an improved update algorithm and shows that the proposed scheme achieves the IND-FA-OCPA security. We present the experimental results in Section 5. Finally, we conclude our work in Section 6.

2. Preliminaries

This section briefly recalls the formal notion of OPE and its security definitions.

2.1. Order-Preserving Encryption

The OPE scheme is defined in two ways: stateless and stateful. A stateless scheme is difficult to achieve the IND-OCPA security. Instead of the stateful OPE being a key-less scheme, a state operates as a secret key.

Definition 1. (stateful OPE). A stateful OPE scheme consists of the following three algorithms (Setup, Encryption, and Decryption):(i): the Setup algorithm takes as an input a security parameter and outputs a state .(ii): the Encryption algorithm takes as input a plaintext and a state . It outputs a ciphertext and updates the state to .(iii): the Decryption algorithm takes as input a ciphertext and a state . It outputs a plaintext .

Definition 2. (order-preserving). An OPE scheme is order-preserving if for any two ciphertexts and with corresponding messages and , we have .

2.2. Security Definitions

The standard security notion of OPE is IND-OCPA [2]. It means that an adversary cannot know anything about plaintexts except for their order. Let be the number of necessarily distinct plaintexts in sequence , where for all . The security game between the adversary and challenger for the security parameter proceeds as follows:(1)The adversary prepares two sequences and of necessarily distinct plaintexts with the same order. He sends them to the challenger .(2)The challenger randomly chooses , executes the Setup , and runs . He sends to the adversary .(3)The adversary guesses which sequence is encrypted and accordingly outputs guess .

We say that the adversary wins if . Let be the winning probability of in .

Definition 3. (IND-OCPA). A stateful OPE scheme is IND-OCPA secure if for any PPT adversary , is negligible in the security parameter , i.e.,Now, we review IND-FA-OCPA, which is originally presented in [16] and modified in [17]. To capture “frequency-hiding” security, it allows duplicate plaintexts, e.g., , and . However, two challenge sequences and have at least one common randomized order . A randomized order of means any possible permutation of is placed in an order according to , and the order of duplicate plaintexts is randomly determined. For example, with , the randomized order for can be any of , , , or . The randomized order is precisely defined as follows.

Definition 4. (randomized order). Let be the number of plaintexts in a sequence that are not necessarily distinct, where for all . A randomized order , where and for all and of sequence , it holds thatTwo sequences and have only two common randomized order: and . denotes the order of the sequences . For instance, of means .
The security game between an adversary and a challenger for a security parameter proceeds as follows:(1)The adversary prepares two sequences and that have at least one common randomized order. He sends them to the challenger .(2)The challenger randomly chooses and selects from the common randomized orders of and . Then, the challenger executes the Setup and runs , based on . It means that the relative order of duplicate plaintexts is determined by . He sends to the adversary .(3)The adversary guesses which sequence is encrypted and accordingly outputs the guess .We say that the adversary wins if . Let be the winning probability of in .

Definition 5. (IND-FA-OCPA). A FH-OPE scheme is IND-FA-OCPA secure if for any PPT adversary , is negligible in the security parameter , i.e.,As the randomized order of distinct plaintexts is equal to its order, the IND-FA-OCPA security is stronger than IND-OCPA.

3. Maffei Et Al.’s FH-OPE Scheme

We review the FH-OPE scheme of [17] in detail. The main idea is that the client maintains the randomized order of all encrypted plaintexts and uses it as one of the inputs of the encryption algorithm. It externally determines their relative order for duplicate plaintexts in the encryption algorithm. A search tree that maps plaintexts to ciphertexts is stored as a state on the client side and used in the decryption algorithm. For a node of of , and represent a plaintext and a corresponding ciphertext. and denote the left and right child of , respectively. Every node in stores its index based on the plaintext sequence. is the number of distinct plaintexts, and is the number of plaintexts in the sequence to be encrypted, which also means . is the number of plaintexts encrypted and stored on the server so far. denotes the number of distinct ciphertexts, and its bit length is expanded by a factor of , i.e., . As described in Algorithm 1, the state comprises , , and . When the search tree is empty, the state is initialized as . The update (tree rebalancing) algorithm is as described in [16].

Input: , , and
Output:
State:
if is empty then
ifthen
  rebalance the tree
 return
ifthen
ifthen
 Encryption
else
ifthen
  Encryption

To review their scheme, we present a concrete example. Figure 1 shows the encryption results of , where . Here, we set the possible randomized order of to .

We can check that the algorithm produces ciphertexts properly based on . However, should be updated continuously with each encryption. For the mutable OPE schemes whose state is stored on the client side, the computational cost of rebalancing is similar; thus, it will be excluded from the following efficiency analysis.

Computational Cost. The encryption algorithm [17] has computational complexities and , except the rebalancing in the best and worst cases, respectively. This is because in the search tree, the cost of finding an empty node and placing a plaintext based on required for each encryption is . In addition, the cost of updating is required. In the case of increasing sequential plaintexts, e.g., , then ; , then ; and , then ; the cost of updating is . On the other hand, for decreasing sequential plaintexts, e.g., , then ; , then ; and , then ; the cost of updating is .

Storage Cost. There are elements in , and each can be represented by bits. Thus, the client requires bits for additional persistent storage, except for .

Rebalancing Tree. In [16, 17], if there is no available ciphertext in , it has to be rebalanced by calling the update algorithm. However, the algorithm presented in [19] was designed assuming that there were no duplicate plaintexts. Therefore, the algorithm cannot guarantee to produce a perfectly balanced tree when duplicate plaintexts are encrypted. The result quality of the update algorithm significantly impacts the overall performance; thus, a new improved algorithm is needed.

4. Proposed Scheme

We propose a practical FH-OPE scheme described in Algorithm 2 that achieves the IND-FA-OCPA security with an improved update algorithm. Our search tree does not need to store the index of the encrypted plaintexts. Let : be a hash function with 1-bit output modeled as a random oracle. Our main idea is to replace the inefficient input with the combination of a single random value and . In our scheme, the selection of empty nodes for duplicate plaintexts is determined by . It means that the order of duplicate plaintexts is not determined internally but intended externally. The other notations and the initialization are defined as described in the previous sections.

Input: , , and
Output:
State:
ifthen
ifthen
 Update()
else
if is Empty then
  
  
  return
else
  ifthen
   Encryption
   
  else if then
   
    ++
   ifthen
    Encryption ,
    
   else
    Encryption ,
    
  else
   Encryption ,
   

Figure 2 shows the encryption of duplicate plaintexts {1, 1, 1, 1} based on our scheme. We can check that the algorithm produces distinct ciphertexts {64, 96, 80, 112} based on the chosen random values, e.g., has the same role as in [17].

The existing update algorithm for FH-OPE [16, 17, 19] sorts the plaintext sequence in ascending order and simply re-encrypts the sequence. The algorithms cannot guarantee to produce a perfectly balanced tree because the node positions are randomly selected for the duplicate plaintexts. The idea of our improved update Algorithm 3 is simple. We build a new search tree on where is the number of nodes in and replace with , where . We check that the resulting is a perfectly balanced tree because it has been built based on the distinct plaintexts.

Input: a set of in tree
Output: a balanced search tree
Initialization: a new empty search tree
the number of nodes in
Encryption
ifthen
 Encryption
ifthen
 Encryption
 Encryption
ifthen
 Update
 Update
if End recursively iterate then
 Call in ascending order
 Call in ascending order
fordo
  
 return tree

In stateful OPE, the decryption algorithm can be omitted by the state that is stored on client side. However, this omission is without loss of correctness of OPE scheme. To decrypt a given ciphertext , he uses the binary search tree and finds the node that includes and where . Thus, he can simply decrypt the ciphertext and return a plaintext by performing the binary search.

Next, we will prove the security of our proposed scheme with regard to the IND-FA-OCPA security model and analyze our construction in terms of efficiency.

Theorem 1. Let denote any possible randomized order of the plaintext sequence . denotes the ciphertext sequence when is used as plaintexts. Then, the challenger in IND-FA-OCPA can always simulate the ciphertexts of , which is identical to .

Proof. In the encryption algorithm of , let be the outputs of , where . Then, we can compute , which is identical to the search tree as if is encrypted. As shown in Figure 2, we can obtain , , , , and for . Finally, the challenger chooses random values and simulates the random oracle as ; otherwise, it returns a bit chosen randomly.
Based on Theorem 1, if the challenger outputs in step 3 of IND-FA-OCPA, where is the chosen common randomized order in the step 2, there is no advantage for an attacker to distinguish and .
Efficiency. The security in Table 1 demonstrates that the scheme of [16] achieves only the IND-OCPA security and shows that Kerschbaum’s model is imprecise, which means no such FH-OPE can exist. Furthermore, both [16, 17] do not provide an improved update algorithm. We can know that the proposed update algorithm positively affects the computational performance through some experiments in Section 5. Compared with [17], our scheme does not require the order information of all the encrypted plaintext while Maffei’s scheme requires bits for additional persistent storage, except for a state. Moreover, a series of sorting every element in the randomized order causes very low computational performance of [17] because these operations occur whenever a plaintext is inserted into the encryption algorithm of [17].

5. Experiments

We analyze the performance of [16, 17] and the proposed scheme using a system that includes an AMD Ryzen 5 3600 6-core processor 3.59 GHz, 16 GB RAM in Python 3.9.5. We use different plaintext sizes and the number of plaintexts to be encrypted , but the ciphertext size is fixed at 2048.

5.1. Random Duplicate Plaintexts

Figure 3 shows the comparison of the encryption of plaintexts that are randomly selected in , allowing duplicates, where and the corresponding . As the main operation of [17] is to maintain the order of all encrypted plaintexts, their scheme requires additional updates except for the ciphertext updates. Figure 3 shows that scheme of [17] exhibits lower performance than the other.

5.2. Random Distinct Plaintexts

Here, we encrypt plaintexts that are selected randomly in , not allowing duplicates where . Figure 4 shows that the overall speed improved in all the schemes owing to the blockage of the duplicate plaintexts, but the encryption time of [17] is more than the other.

5.3. Sequential Plaintexts

We encrypt plaintexts , where . As these plaintexts cause the rebalancing tree most frequently, Figure 5 shows that all the schemes take more time to encrypt data owing to the frequent updates of search tree. However, as the number of plaintexts to be encrypted increases, the encryption time of the scheme [17] sharply increases.

5.4. Ciphertext Update Cost

In this section, we prove that our update algorithm is better than other algorithms. We encrypt plaintexts that are selected randomly in , allowing duplicates where and the corresponding . In our update algorithm, there is no case of producing worst case that the update algorithm outputs a skewed search tree. The update algorithm in [16, 17] may produce the unbalanced search tree after executing the ciphertext updates. Therefore, Figure 6 shows that the number of updates in our case is relatively small.

6. Conclusion

We review the construction presented by Maffei et al. and conclude that the scheme still needs to be improved in terms of storage and computational complexity. Then, we propose a more practical FH-OPE scheme with the formal IND-FA-OCPA security proof. Moreover, we figure out that the previous update algorithms are not suitable for the duplicate plaintexts and propose an improved update algorithm that helps produce a perfectly balanced search tree regardless of the distribution of the plaintexts. Finally, we present some experimental results to demonstrate the excellence of the proposed scheme.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. NRF-2019R1G1A1097540).