Abstract

The searchable encryption scheme can perform keywords search operation directly over encrypted data without decryption, which is crucial to cloud storage, and has attracted a lot of attention in these years. However, it is still an open problem to develop an efficient public key encryption scheme supporting conjunctive and a disjunctive keyword search simultaneously. To achieve this goal, we introduce a keyword conversion method that can transform the query and index keywords into a vector space model. Through applying a vector space model to a predicate encryption scheme supporting inner product, we propose a novel public key encryption scheme with conjunctive and disjunctive keyword search. The experiment result demonstrates that our scheme is more efficient in both time and space as well as more suitable for the mobile cloud compared with the state-of-art schemes.

1. Introduction

With the rapid development of the cloud computation accompanied by the boosting amount of data, more and more enterprises and individuals are willing to share their own data on the cloud platform. Because the data stored in the cloud may be sensitive, such as medical records, the popularity of the cloud storage inevitably brings its users security concern. Specifically, hacker attack and administrator theft can lead to data leakage. In order to protect the data privacy, encrypting data before outsourcing it on the cloud server is a common way. However, users still confront the problem of how to search the encrypted data stored on the cloud efficiently. A straightforward approach is to download all the encrypted data to the clients and then decrypt them all. After obtaining all the unencrypted data, users can search the document by using common information retrieval technical. Nevertheless, this strategy needs tremendous cost of transportation, storage, and computation, which brings a new issue: how to search encrypted data efficiently without decrypting it first.

Many searchable encryption (SE) schemes were proposed to realize keyword search over encrypted data with various search functions. There are two main categories of SE according to its applications: searchable public key encryption and searchable symmetric key encryption. Over the last few years, many searchable symmetric key encryption schemes have been proposed, which achieve complex search conditions such as Boolean keyword search, personal keyword search, and query result ranking [14]. However, the development of searchable public key encryption is relatively slow since it is difficult to support advanced search function without sacrificing security. Specifically, a scheme supporting Boolean keyword search in the public key setting is still in urgent demand. To meet this need, one should solve two subproblems: public key encryption with conjunctive keyword search (PECK) and public key encryption with disjunctive keyword search (PEDK). Park et al. presented the formal model and the security definition of PECK followed by two constructions [6]. Then Hwang and Lee [5] introduced a concept of multiuser PECK as well as a more efficient scheme with less search time. But all these schemes require keyword fields. To eliminate the keyword fields, Boneh and Waters [11] presented a hidden vector encryption (HVE) scheme supporting conjunctive search, comparison queries, and subset queries on encrypted data. Recently, Zhang and Zhang proposed a new PECK scheme with less storage space and query time [12].

However, the research process of PEDK is very slow. In order to support disjunction formulae, Katz et al. gave a predicate encryption supporting inner product (IPE) scheme [7]. This scheme enables more complex evaluations on disjunction, conjunction, and polynomial formulae. In an IPE scheme, the secret key corresponding to a predicate vector can decrypt the ciphertext associated with an attribute vector if and only if . Unfortunately, the IPE scheme mentioned above was proved in a selective security model. To improve its security level, Lewko et al. presented a fully secure IPE scheme [23] by using dual system encryption introduced in [13].

Although we can create an PEDK scheme by making use of an IPE scheme and a trivial method presented in [6], the time complexities of the encryption and the test in this PEDK scheme are both , where is the number of the keywords in a document. In addition, the storage cost of an index also encounters exponential increase. Therefore, this scheme is not practical when n is large. Besides, if a user wants to perform a conjunctive and a disjunctive keyword query simultaneously, a PEDK and a PECK scheme should be constructed and maintained together. To construct a more efficient PEDK scheme supporting conjunctive keyword search simultaneously, Zhang and Lu proposed an approach of converting an IPE scheme into a public key encryption with conjunctive and disjunctive keyword search (PECDK) scheme and gave a concrete scheme [15]. In their scheme, the size of each document’s index, the size of a trapdoor, and the time cost of pairing operations in the test process are all . Since is a large integer and seen as the number of keywords in a dictionary, there is still a great room to improve the efficiency of the previous PECDK scheme.

In this paper, we first propose a new method that can change an IPE scheme into a PECDK scheme and then give an instance. Our contributions are summarized as follows.(1)We design a new approach, which converts an index keyword set and a query keyword set into an attribute matrix and a predicate vector, respectively. Technically, we first use the index keyword set to construct an equation of degree n with one unknown. Then, we apply coefficients and the roots of the equation to create a predicate vector and an attribute matrix, respectively.(2)We propose a construction of PECDK based on the method mentioned in (1) and an efficient IPE scheme proposed in [23]. We also prove the security of our PECDK scheme according to the security definition introduced in [15]. The experiment shows that compared with the previous PECDK scheme, the time complexity of keyword search and that of index construction are both reduced from to , where , so is the space cost of encrypted index. Since is much larger than n, for example, is less than 20 while is large than 10000, we can argue that the proposal is suitable for the mobile setting.

1.1. Organization

The rest of this paper is organized as follows. Related work is discussed in Section 2, and Section 3 gives the model of PECDK together with its security model. Section 4 firstly introduces our transformation method, then proposes a concrete PECDK scheme based on our approach, and finally presents its security proof. We present the theoretical and experimental analysis in Section 5. Section 6 covers the conclusion.

Searchable encryption schemes enable the clients to store the encrypted data to the cloud and execute keyword search over ciphertext domain. Thus, our solution belongs to this field. Due to different cryptography primitives, searchable encryption schemes can be classified into public key system and symmetric key system.

Song et al. first introduced the definition of searchable symmetric encryption and proposed a concrete scheme [1]. Then Goh defined the concept of conjunctive keyword search over encrypted data and presented an effective scheme by taking advantage of the Bloom filter [2]. In addition, he also gave a formal security definition of searchable symmetric encryption. According to this scheme, some improved schemes with less computation and communication cost were proposed in [35]. However, the search time cost in these schemes is linear with the number of the documents in a dataset since each document needs an encrypted keyword index. To reduce the time cost of search, some works utilized the tree structure, such as R-tree and kd-tree, to obtain a sublinear search efficiency [8, 9]. Considering the issues that returning all related documents will bring network traffic and the previous schemes fail to sort search results, rank search schemes realizing a quick search of top-k relevant documents were proposed in [1618]. These works only supported single keyword search due to using order-preserving encryption (OPE) [14]. Recently, some schemes achieving multikeywords rank search were presented in [19, 20].

With slower development than searchable symmetric encryption, searchable public key encryption is also difficult to support complex query condition. Boneh et al. brought up the new concept of PEKS and provided several constructions [10] related to the Identity-Based Encryption (IBE) presented in [21]. Based on that, Abdalla et al. specified the computational and statistical consistency of PEKS and proposed a statistically consistent scheme [22]. However, their works only supported a single keyword search. Park et al. raised a concept of PECK in [6]. They defined the security model of this mechanism followed by two constructions. One needs more bilinear pairing operations, while the other needs more private keys. Then Hwang and Lee designed a more efficient scheme as well as introduced a multiuser PECK scheme that enables multiusers keyword search [5]. To avoid the usage of keyword field just like all the constructions mentioned above did, Boneh and Waters presented the hidden vector encryption (HVE), a public key system supporting conjunctive keyword search, comparison queries, and subset queries on encrypted data [11]. For achieving disjunctive keyword search without keyword field, Katz et al. proposed an IPE scheme [7]. To improve the security level and the decryption efficiency, fully secure IPE schemes were proposed in [23, 24]. In addition, the previous schemes fail to support conjunctive keyword search and disjunctive keyword search at the same time. Addressing this issue, Zhang and Lu proposed a PECDK scheme [15].

Another class of searchable encryption is called range search over encrypted data. It can be used to test whether a multidimension point is inside in a hyperrectangle. Related works were presented in [2527].

3. Preliminaries

3.1. The System Model

Consider a data storage service in cloud, where a data owner has a set of documents D to be outsourced to the cloud server in an encrypted form. To enable efficient query over encrypted documents, we consider the keyword-based index structure for storing the outsourced files. Specifically, the data owner builds an encrypted searchable index set C with each document’s keywords, and then both the encrypted index set C and the encrypted document set Enc(D) are outsourced to the cloud server. For each query of an arbitrary keyword set Q, a data user computes a search token of the query Q and sends it to the cloud server. Upon receiving from the data user, the cloud server queries over the encrypted index set C and returns the candidate encrypted documents. Finally, the data user decrypts the candidate documents and verifies each document by checking the existence of the keyword.

The application scene of the searchable public key encryption involves three roles: data senders, a data receiver, and a cloud server, as illustrated in Figure 1. In this scene, the data receiver performs keyword search, generates a public key (pk) and a secret key (sk), and sends the pk to the public. Anyone who can access the pk is recognized as a data sender. Data senders send their encrypted documents with the related encrypted index to the cloud server. The cloud server stores the encrypted documents and the encrypted searchable indexes for data senders. In addition, we assume that the cloud server is “honest-but-curious” employed by many works on searchable encryption. When the data receiver processes keywords search, one generates a trapdoor of these keywords and sends it to the cloud server. Upon receiving trapdoor from the data receiver, the cloud server queries over each document’s encrypted index and returns the candidate encrypted documents.

In this paper, we focus on the searchable public key encryption supporting conjunctive and disjunctive keyword search. Strictly speaking, we present a formal definition PECDK model derived from the model proposed in [15].

Definition 1. There are four polynomial time algorithms in the PECDK scheme: KeyGen, IndexBuild, Trapdoor, and Test:(i)KeyGen(γ): Choosing a security parameter γ, the algorithm outputs the parameter (pk, sk), where pk is the public key and sk is the secret key.(ii)IndexBuild(pk, W): The algorithm is performed by the sender to produce an encrypted index by using a keyword set and pk.(iii)Trapdoor(sk, Q, sym): The receiver runs this algorithm to produce a trapdoor. It takes sk and the keyword query Q, sym as input to generate a trapdoor .(iv)Test(pk, , ): It takes the trapdoor , the secure index , and the public key pk as input. If sym is , it means that the trapdoor is for conjunctive keyword search. In this case, the output is 1 if or 0 otherwise. If sym is , disjunctive keyword search should be performed. In this case, if , it outputs 1 otherwise 0.

3.2. Security Model

Generally speaking, the security of a searchable encryption means that the cloud server can infer as little information as possible from the encrypted data and the search process without sacrificing the search ability. Before introducing the adaptive security definition of our scheme, we first define the privacy leakage, which is revealed to the cloud server inevitably.

3.2.1. Size Pattern

Since the encrypted documents and queries are submitted to the cloud server, the cloud server can obtain the basic size information of these encrypted data easily. This is called the leakage of size pattern.

3.2.2. Access Pattern

For each query, the cloud server can obtain the identifiers of data records that match this query. This is called the leakage of access pattern.

3.2.3. Search Pattern

Given a record set and a query set , cloud server can create a matrix, where the element in the ith row and jth column is 1, if the record matches the query . The matrix can be seen as the leakage of search pattern.

Actually, Oblivious RAM can be utilized to preserve access and search pattern, but this technique is too inefficient to be used in the real applications. In this paper, we do not consider the problem of how to protect access pattern and search pattern in our scheme.

3.2.4. Query Privacy

The leakage of query privacy means that keywords in the encrypted query will be revealed to the cloud server. It commonly exists in the public key setting since anyone can construct an encrypted index for arbitrary keywords. Because of belonging to public key encryption category, our scheme fails to protect query privacy.

As previous works, we denote the information leakage including size pattern, access pattern, search pattern, and query privacy as leakage function L(D, I, Q), where D is a record set, I is a secure index, and Q is a query set.

3.2.5. Adaptive Security

With the leakage function mentioned above, we introduce an adaptive security definition of the PECDK scheme related to the one proposed in [15] as follows.

Definition 2. A PECDK scheme is adaptively index-hiding against chosen plaintext attacks if for all probabilistic polynomial time adversaries , the advantage of in the following game is negligible.(1)Setup: The challenger runs KeyGen(γ) algorithm to generate the public key and the secret key . Then gives to the attacker .(2)Phase 1: The attacker can adaptively ask the challenger for the trapdoor for any query of his choice.(3)Challenge: selects two keyword sets and and sends them to . Suppose that , ,…, are the keyword queries in Phase 1, the only restriction is that, for each , there is and . Then, picks a random bit , produces , and sends to .(4)Phase 2: can continue to ask for trapdoors for any query of his choice, and these trapdoors subject to the same restriction as before.(5)Response: outputs and wins the game if .We define ’s advantage in the above game asGenerally speaking, as long as the information leakage of and that of are the same under the leakage function L(D, I, Q), we should insure that the encrypted form of and that of are computationally indistinguishable to the adversary.

4. Proposed PECDK Scheme

Based on the system and security model descried in the previous section, in this section, we present the method that converting index and query keyword sets into a vector space model. This model can be applied to an IPE scheme easily.

4.1. Conversion Method

We suppose that any keyword can be expressed as and define a function . Since p is a large prime and is larger than the number of the all words, can be collision-resistance. This means that, if , then , where and are two distinct keywords.

We first construct an equation of degree with one unknown by using the index and query keyword sets. After that, we use the roots and coefficients of the equation to create a query vector and an index matrix. Let and are two keyword sets, where . The approach is described as follows:(1)For the keyword set , the following function can be constructed:

According to the coefficient of the , a vector can be obtained.(2)For the keyword set , the following function can be constructed:

It is not difficult to find that the roots of the equation are . According to the above roots, a matrix can be constructed:(3)Note that if there is a keyword such that , according to (24), it is not difficult to verify that where

As a result, if we can make sure there is for each i or some i, a conjunctive or disjunctive query can be obtained when applying the method above. Based on this, a concrete PECDK scheme will be proposed in next section.

4.2. Construction Details

According to the definition of IPE [13], suppose that , , , and be the four algorithms in the IPE scheme, where and are the public key and the master secret key generated by using , is the attribute vector, is the predicate vector, c is the ciphertext generated by using , and is the secret key generated by using . Our PECDK scheme works as follows:(i)KeyGen: By using the algorithm, and can be obtained. The algorithm sets and and outputs and .(ii)IndexBuild: Given a keyword set , the algorithm generates a matrix by using (4). Then it generates in which .(iii)Trapdoor: Given a keyword query , the algorithm creates by using (3). Then it generates by making use of . Then it outputs a trapdoor .(iv)Test: Given a , a , and the , there are two situations.(1)If the symbol in is , the algorithm works as follows.(a)Choosing a counter i and setting .(b)If , then go to step (c), otherwise the algorithm computes . If , then the algorithm outputs 1 and ends. Otherwise, it sets and goes to the step (b).(c)The algorithm outputs 0 and ends.(2)If the symbol in is , the algorithm works as follows.(a)Choosing two counters i and j and setting and .(b)If , then go to step (c), otherwise the algorithm computes . If , then the algorithm sets . Otherwise, it does nothing. After that, it sets and goes to the step (b).(c)If , the algorithm outputs 1 and ends. Otherwise, it outputs 0 and ends.

4.3. Security Proof

The proposed PECDK scheme can be constructed by making use of the fully secure IPE scheme. Therefore, we have the following proposition.

Proposition 1. If the IPE scheme is secure, then our PECDK scheme is secure.

Proof Sketch: If there is a PPT algorithm which can break the PECDK scheme, we can say that can break the IPE scheme. To create pk and sk in the PECDK scheme, the challenger uses the algorithm to generate , , and sets and . can adaptively query trapdoors of keyword set . It is not difficult to find that these trapdoors can be seen as a group of decryption keys for the IPE scheme. Then outputs two challenge keyword sets and , under a constraint that and , where . flips a coin and gives an index to . This index can be seen as a set of challenge ciphertexts of IPE. After this, can continue querying trapdoors which subject to the restriction described above. These trapdoors can still be seen as a group of decryption keys for IPE. Finally, gives a guess . Note that if can break the PECDK scheme, the value of is not negligible. It means that the two challenge indices can be distinguished. Because the challenge indices in the PECDK scheme is equal to the challenge ciphertext in the IPE scheme, according to the security definition for IPE, it means that can break the IPE scheme.

5. Performance Evaluation

5.1. Cost Analysis

We denote the previous PECDK scheme [15] as PECDK-2 and the proposal as PECDK-1. Let be the file set, and be the keyword set for the corresponding file , where . For each keyword set , we build an index of file . We assume that is the number of keywords in , m is the number of keywords in a query Q, and is a large number such as the number of keywords in a dictionary, where . The index and query structures for PECDK-1 and PECDK-2 are illustrated in Figures 2(a) and 2(b), respectively. Note that the index structure in Figure 2 is only for one file , and the index for whole files is a simple combination for each file’s index.

From Figure 2(a), we can find that is converted into a matrix in which each column is associated with a keyword in and represented by a vector , where . Correspondingly, the query Q is changed to a vector based on the approach introduced in Section 4. The test algorithm is to verify whether there exists at least one j such that equals 0. Therefore, PECDK-1 is independent with the vocabulary length . According to the above analysis, we know that the time complexity for index building, trapdoor generation, and testing is , , and , respectively.

From Figure 2(b), we suppose that the vocabulary is defined as . For the keyword set , the index of is initialed by a zero vector with length which is denoted as . After initialization, if , the k-th position of is set to be 1, where and . Moreover, the vector building process for query Q is the same as that for . The test process in PECDK-2 is to check whether equals 0. According to the above description, it is clear that the time complexities of index building, trapdoor generation and testing are all .

Let represents the size of an element in G, be the time cost for a pairing operation [28], and be the time cost for the power operation on G. The results of the comparison with the PECDK-2 scheme are shown in Table 1.

Since, in the encryption and test phrase, the time cost of the pairing and the power operation are much more than other operations, we do not take account of other operations. According to Table 1, we can argue that compared with the previous PECDK (PECDK-2) scheme, the proposed one (PECDK-1) has better performance on time and space complexity when .

In the following, we will argue that our statement where is true in real world. We investigate data collections used in recent works [31, 32] for information retrieval. The statistical information for these data sets are shown in Table 2. From this table, we found that the vocabulary size () is commonly linear with . However, the number of keywords in a document () is usually only 3∼5 (e.g., the scientific paper). Even if the abstract field is used as keywords, the number of keywords in a single document is approximately 200. So we can empirically think that is much bigger than .

Moreover, we have investigated the OHSUMED collection [30] that was created for information retrieval research. In this collection, each document contains several fields, such as title, abstract, sequential identifier, and so on. In our experiment, we use the field of “title” and “abstract,” as well as a file (file name: ohsumed.87) containing more than 54000 documents for statistics. The results of this experiment are described in Table 3. Whether we use the “title” or “abstract” field as keywords, the keywords size is far less than the vocabulary size . In conclusion, we think that the statement of is true.

In addition, since PECDK-1 needs to test each keyword in the against the query Q one by one, it will leak the information of the keyword set . Therefore, the proposed scheme is proven to be secure under a weaker security definition (Definition 2) than the previous one [15].

5.2. Experiment Results

For our experiments, we build artificial plaintext index with different number of keywords in a dictionary (i.e., ), different number of documents (i.e., ), and different number of keywords in a document (i.e., ). In this index, we denote each keyword as a unique integer in a range [0, N]. We encrypted the indices with PECDK-1 and PECDK-2, respectively, and the encrypted indices were stored on our machine. We then executed random queries over these encrypted indices. We implemented our constructions in JAVA with Java Pairing Based Cryptography library (JPBC) [29]. Our experiments were run on Intel(R) Core(TM) i7-3520M CPU at 2.90 GHz processor and 3537 MB memory size. In our implementation, the bilinear map is instantiated as Type A pairing (base field size is 512-bit), which offers a level of security equivalent to 1024-bit DLOG [29].

5.2.1. Performance Comparison Between PECDK-1 and PECDK-2

For a query with five keywords, Figures 3 and 4 show the following:(1)Figures 3(b) and 3(c) and Figures 4(a) and 4(c) denote that the execution time of key generation in PECDK-1 is irrespective of and D, while that in PECDK-2 is independent with and D.(2)Figures 3(b) and 4(a) indicate that the running time of index construction and keywords search in PECDK-1 is independent with , and that in PECDK-2 is not connected with .(3)Figures 3(c) and 4(c) illustrate that both the search time and the indexing time in PECDK-1 and PECDK-2 are nearly linear with D.(4)Figures 3(a) and 4(b) shows that PECDK-1 is influenced mainly by , while PECDK-2 is affected primarily by .

According to the item (4), since and are two different parameters, we need to investigate the influence extent of the time and space complexity from these two parameters.

5.2.2. Key Generation

The running time of key generation in PECDK-1 is less than while in PECDK-2 is less than . Since is much less than , PECDK-1 is more efficient than PECDK-2 in key generation phase.

5.2.3. Index Construction

Figure 3(b) shows that the time cost of index construction in PECDK-2 is linear with . Specifically, when , it takes around 7720.292 seconds to build the index. Therefore, we can argue that under the condition that , the time of building index requires 7720.292500 seconds, which is significantly longer than the time of index construction in PECDK-1 with . According to this, we can conclude that the PECDK-1 scheme is more suitable and practical since is always less than 20 and is always larger than 10000.

5.2.4. Trapdoor Generation

Tables 4 and 5 show that the time of trapdoor generation in PECDK-1 is linear with , while that in PECDK-2 is linear with . Since is much less than , the time cost of building a trapdoor in PECDK-1 is less than that in PECDK-2. The trapdoor generation, unlike the key generation, is not a one-time cost and is performed by the data receiver. Therefore, the PECDK-1 scheme is more suitable on mobile cloud where data users possess little computation capacity.

5.2.5. Search Efficiency

Generally speaking, Figures 3(c) and 4(c) indicate that the time cost of conjunctive keyword search and that of disjunctive keyword search in PECDK-1 are both increasing with , while those in PECDK-2 are both linearly increasing with . Specifically, when , it takes around 2220 seconds to make a conjunctive or disjunctive keyword search in PECDK-2. According to the property that search time is linear with , we can also conclude that the time cost of keyword search needs 2220500 seconds when . But the PECDK-1 scheme only costs approximately 5000 seconds to realize the conjunctive or disjunctive keyword search when . As the same reason mentioned above, the PECDK-1 scheme is more practical than the PECDK-2 one.

5.2.6. Storage Overhead

As shown in Figures 5(a) and 5(b), we put forward the following arguments:(1)The parameters and are independent with the storage cost of the index in PECDK-1 and that in PECDK-2, respectively.(2)The storage of the index in PECDK-1rises with the square of while that in PECDK-2 is linearly related to . Due to the reason that is less than 20 and is larger than 10000 in common, we deem that the proposal is more useful in practice. As illustrated in Figure 5(c), we can find that both storage overhead in PECDK-1 and that in PECDK-2 are linear with D. Because is significantly less than and the increase rate in PECDK-1 is larger than that in PECDK-2, the storage cost in the proposed scheme is less than the previous one when D is large.

5.3. More Comments

Although the index structure, such as inverted document and R-tree, can raise the search efficiency, it fails to support dynamic operations in the public key system. Because anyone who can access the pk can construct an index in this system as well, it is difficult to combine indices obtained from data senders into a structured index. Our proposal can dynamically support document update in nature since each document is associated with an encrypted index.

In addition, because of the simple index structure, we can easily accelerate the search process by utilizing the technique of parallel computation. Thus, we argue that our scheme is practical in the cloud platform.

6. Conclusion

In this paper, we proposed a new approach to construct an efficient PECDK scheme with better performance in time and space complexity under an adaptive security model. To reveal the efficiency of the proposed scheme, we compared it with the existing PECDK scheme presented in [15] through theoretical analysis and experimental results.

Since and m are much smaller than , we think that the proposed scheme is beneficial for mobile applications with computation and memory limitations. In the future, we plan to create a new index structure to reduce the time cost of search and index construction.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The authors gratefully acknowledge the National Natural Science Foundation of China under Grant nos. 61402393 and 61601396 and Shanghai Key Laboratory of Integrated Administration Technologies for Information Security (no. AGK201607).