Abstract

As Internet services are widely used in various mobile devices, the amount of data produced by users steadily increases. Meanwhile, the storage capacity of the various devices is limited to cover the increasing amount of data. Therefore, the importance of Internet-connected storage that can be accessed anytime and anywhere is steadily increasing in terms of storing and utilizing a huge amount of data. To use remote storage, data to be stored need to be encrypted for privacy. The storage manager also should be granted the ability to search the data without decrypting them in response to a query. Contrary to the traditional environment, the query to Internet-connected storage is conveyed through an open channel and hence its secrecy should be guaranteed. We propose a secure symmetric keyword search scheme that provides query privacy and is tailored to the equality test on encrypted data. The proposed scheme is efficient since it is based on prime order bilinear groups. We formally prove that our construction satisfies ciphertext confidentiality and keyword privacy based on the hardness of the bilinear Diffie–Hellman (DH) assumption and the decisional 3-party DH assumption.

1. Introduction

According to the development of IT technologies including communications and computations, the use of small devices for daily human life is increasing. Along with the change, the world’s so-called Internet of Everything (for short, IoE) is getting closer to our life. In the IoE world, billions of devices are used for various IT services, including social network websites and applications, which deal with users’ personal data for better IT services [1]. Not all data can be stored and managed in small and low-powered devices, and thus, we need to use Internet-connected storage. Although the use of Internet-connected storage can make it possible to utilize much more data without storing it in local storage, we need to care about the security of data which are stored and managed in remote storage.

The main security concern in using Internet-connected storage such as a cloud storage is data privacy [24]. The storage inevitably stores and manages the incremental amount of sensitive data on clients. Clients of the storage service must entrust their data to a service provider [5]. Encryption has been the most classical method to provide data privacy. To provide encryption-based access control for clients’ sensitive data, a number of value-added encryption techniques have been studied including attribute encryption techniques [6].

A general data protection regulation (GDPR) has forced companies to use encryption of personal data to reduce the probability of a data breach [7]. Accordingly, companies are encrypting, storing. and managing customers’ personal information. When an encryption is used for data privacy, we face another obstacle. The storage server should be given a capability that allows server to identify exactly the documents a client wants to retrieve without decrypting them. As one of the basic steps to resolve this difficulty, secure keyword search over encrypted data is receiving much attention. Secure keyword search enables a user to search the encrypted data with a keyword without revealing any information on the data. When an encrypted document is uploaded to a server, a set of ciphertexts of keywords in the document are appended to the encrypted document. Let denote the ciphertext of keyword . For a given query (also called trapdoor) , the server runs the function test with inputs and to identify whether or not . Only the user who can generate query , such that test , can retrieve the encrypted documents containing keyword . Secure keyword search is a primitive to construct various queries and can be extended to the complex queries such as range queries and inner-product queries [8].

Secure keyword search systems can be classified into two types: asymmetric and symmetric settings. In the asymmetric setting [911], known as public key encryption with keyword search (PEKS), ciphertext of keyword is generated under a public key and only the owner of the corresponding secret key can generate trapdoor . Hence, PEKS is suitable in a store-and-forward system such as an e-mail system. In the symmetric setting [1218], ciphertext of keyword is generated under a symmetric key and only the owner of the key can generate trapdoor using the symmetric key. Here, the symmetric key is not shared but owned by one client. The symmetric setting is suitable to personal storage service as well as a blog and web-hard service, where the same client uploads and downloads his/her data.

1.1. Necessity of Keyword Privacy

The formal notion of secure keyword search has considered ciphertext confidentiality, i.e., a semantic security against an attacker who generates the ciphertexts of keywords of her choice. When given a query and a ciphertext of a keyword, the server can decide whether or not the ciphertext is related to the query by running the function test. Therefore, it is not possible to guarantee ciphertext confidentiality without guaranteeing the secrecy of the query (keyword privacy).

As stated in [19, 20], it is not possible to provide keyword privacy of the trapdoor in PEKS which is one of the searchable encryptions in asymmetric setting, due to the ciphertext of a guessed keyword. Hence, an adversary can obtain the test result of the ciphertext of the guessed keyword and a given trapdoor. In [11], Rhee et al. firstly defined the notion of a keyword privacy in asymmetric setting. They proposed the enhanced PEKS scheme that the keyword privacy can only be provided in situations where only the server can test whether the ciphertext and trapdoor are related or not. There has been several works providing keyword privacy in the symmetric setting. Shen et al. [20] firstly proposed a symmetric predicate encryption scheme for an inner-product operation of two vectors, which are used in generating a ciphertext and a token (like a trapdoor in PEKS), and considered keyword privacy in a symmetric predicate encryption scheme. Their scheme is constructed on composite-order bilinear groups, which requires 25 times of exponentiations and 30 times of pairing operations of those in prime-order groups [21]. Recently, Blundo et al. [12] proposed a symmetric hidden vector encryption in asymmetric prime-order bilinear groups. However, there exists no efficiently computable morphism between two different groups used and its security depends on the hardness of nonstandard assumption.

However, in general, a predicate encryption differs from a searchable encryption in that a decryption occurs at the same time as the test process. Since it is not common to trust the administrator of the server in various cloud environments, it is necessary to separate the decryption and test processes so that the unreliable server cannot perform the decryption. As noted above, the previous results in symmetric predicate encryption and symmetric hidden vector encryption cannot be immediately adopted for symmetric keyword search.

Also, the protocols for providing access pattern privacy were proposed in [22, 23]. That is, anyone cannot get which documents contain the keyword. But, access pattern privacy does not provide the keyword privacy from the given queries. The keyword privacy is the more intuitional than the pattern privacy. Once the information of keyword from the queries are revealed, then the privacy of the corresponding ciphertext cannot be guaranteed even though the pattern privacy is guaranteed. Also, the protocols providing access pattern privacy do not satisfy the search correctness. That is, these protocols considering access pattern privacy cause a search error and require the additional efforts for fixing the search error.

The comparisons with [20, 21] are shown in Section 4.

1.2. Our Contributions

Our contributions in this paper are twofold:(1)We firstly define the “trapdoor indistinguishability” for keyword privacy in symmetric keyword search against an active adversary who is able to get trapdoors as well as ciphertexts for any nontarget keyword of his choice. This security of a trapdoor guarantees that the keyword does not reveal any information on any keyword.(2)We construct a practical and secure keyword search, called secure symmetric keyword search (SSKS), which is tailored for Internet-connected storage service. To construct SSKS, we exploit well-known results of PEKS. Moreover, the proposed scheme achieves both ciphertext confidentiality and keyword privacy. Our construction is efficient since it is based on prime-order bilinear groups unlike the scheme in [20]. The security depends on the hardness of standard assumptions. Ciphertext confidentiality is based on the hardness of the bilinear DH and keyword privacy depends on the hardness of the decisional 3-party DH assumption.

2. Preliminaries

For giving concrete description, we will use pairing-related operations. So, in this section, we describe some fundamental definitions for pairing, hard problems defined over the operation, and formal definitions for scheme descriptions and security features.

2.1. Underlying Mathematical Problems

The pairing operation is defined over an elliptic cubic curve. We will give simple definition of the operation since it is possible to understand our scheme with the knowledge of the property so-called bilinearity of pairing.

Definition 1 (bilinear map). The definition of bilinear groups appears in [9]. Let and be two (multiplicative) cyclic groups of prime order . We assume that is a generator of . which is a bilinear map with the following properties:(1)For all and , (2) and there is an efficient algorithm to compute map To prove the security of our scheme, we use the bilinear Diffie–Hellman assumption and the decision three-party Diffie–Hellman assumption which are defined as follows.

Definition 2 (bilinear Diffie–Hellman assumption (BDH)). The BDH problem [9] is as follows:The BDH assumption is that all polynomial time algorithms have a negligible advantage in solving the BDH problem.

Definition 3 (decision 3-party Diffie–Hellman assumption (3-party DH)). The decision 3-party Diffie–Hellman problem [24] is as follows:The 3-party DH assumption is that all polynomial time algorithms have a negligible advantage in solving the decisional 3-party DH problem.

2.2. Formal Definitions for SKSS

We begin by reviewing the formal definition of symmetric keyword search scheme.

Definition 4. A symmetric keyword search scheme (SKSS) can be noted as (KG, SEKS, STd, Test) which consists of four algorithms. The algorithms are described as follows:(1)KG takes security parameter and generates secret key .(2)SEKS takes input secret key and keyword , where is a keyword space. It returns ciphertext .(3)STd takes input secret key and keyword . It outputs trapdoor .(4)Test takes input ciphertext and trapdoor . If , output “1”; otherwise, output “0.”For any scheme, it should be guaranteed that it works as intended. More precisely, if is identical to , then the test algorithm Test, outputs 1. For a SKSS, we define its correctness as follows.

Definition 5 (correctness). For the security parameter , we define that algorithm satisfies correctness if there is a SKSS scheme (KG, SEKS, STd, Test) which is defined over a keyword space and secret key ; then for any keywords ,where is valid for a keyword and valid for a keyword . Here, if is negligible, for any constant , there exists such that for .

2.2.1. Ciphertext Confidentiality

Our definition for a ciphertext confidentiality (SEKS-IND-CPA-security) follows the general framework of those given in [9, 12, 13].

Let be an probabilistic polynomial time adversary whose running time is bounded by , which is a polynomial in a security parameter . In the experiment of Table 1, chooses keywords and in the find stage. Given challenge ciphertext in the guess stage, tries to correctly guess . is used to retain some state information. is allowed to obtain trapdoors and ciphertexts by querying trapdoor oracle and encryption oracle , respectively. But, is not allowed to obtain the trapdoor of or . Otherwise, could run to find out .

Oracle Oracle
Return Return

Here, the trapdoor oracle and the encryption oracle are defined as follows.

The advantage of attacking a ciphertext confidentiality is defined as follows:

Definition 6 (ciphertext confidentiality). We say that scheme satisfies -security against an adaptive chosen plaintext attack if for any polynomial adversary , the advantage is negligible in security parameter .

2.2.2. Keyword Privacy

We newly define keyword privacy (KEY-IND-CPA-security) for SSKS. In the experiment of Table 2, tries to correctly guess of . is allowed to query the trapdoor oracle and the encryption oracle. But, should not be allowed to obtain the ciphertext of or . Otherwise, could run to find out . also needs to be restricted in obtaining the trapdoor of or . Otherwise, might find ciphertext in the database such that  = 1. If the trapdoor of (or is available, then can easily decide the value of .

The advantage of attacking a keyword privacy is defined as follows:

Definition 7. (keyword privacy). We say that the scheme satisfies -security against an adaptive chosen plaintext attack if for any polynomial adversary , the advantage is negligible in security parameter .

3. New Symmetric Keyword Search with Keyword Privacy

In this section, we give a detailed description for our symmetric keyword search with keyword privacy.

Since we use pairing operations for our scheme, we use the following notations. Let and be groups of prime order , and let be a bilinear map. We use hash functions and . Our construction works as follows:KG takes the security parameter and picks a random exponent , generator , and a random value . It outputs secret keySEKS takes as input secret key and keyword , where is a keyword space. It picks a random exponent and returns the ciphertext:STd takes as input secret key and keyword . It picks a random exponent and outputs the corresponding trapdoor:Test takes as input ciphertext and trapdoor and parses as and as . It checks if the following equality holds:

If so, output “1”; otherwise, output “0.”

4. Analysis

In this section, we analyze the proposed scheme in terms of security against the security notions discussed in Section 2.2. We also compare the proposed scheme with existing schemes to show that our scheme guarantees better security than the existing schemes.

4.1. Security

We now prove that our construction satisfies ciphertext confidentiality and keyword privacy. Ciphertext confidentiality is proved as the same manner in [9].

Theorem 1. If the -decisional BDH assumption holds in , then our SSKS scheme is -secure.

Proof. Suppose is an adversary that has advantage in breaking ciphertext confidentiality. We construct that solves the BDH problem with probability at least , where is the base of the natural logarithm and (resp., is the number of hash function queries (resp., the number of trapdoor queries). Given , , , and , the goal of is to compute . interacts with as follows:Setup. picks a random , and let and .(1),- Queries. As the same manner in [9], can simulate and queries. If there exists -list, then responds with . Otherwise, generates a random coin so that . If , then picks a random and sets . Otherwise, sets . adds the tuple to the -list and responds to with .Similarly, if there exists such that -list, then responds with . Otherwise, responds to a query for by picking a random for and setting and adds to the -list.Query Phase 1. makes trapdoor and ciphertext queries as follows:(2)Trapdoor Queries. can obtain such that , where -list such that . If , then reports failure and terminates. Otherwise, since there exists such that , picks a random and sets and . gives back for to .(3)Ciphertext Queries. can obtain such that , where -list such that . picks a random and sets , and , where is the let value in the setup phase. gives back for to .Challenge. outputs challenge keywords and . To obtain such that and , queries and to -queries. Let be the corresponding tuples on the -list . If both and , then aborts. Otherwise, since there exists such that , picks a random and sets , and , where is the selected value in the setup phase. responds with .Query Phase 2. answers the queries in the same manner as phase 1 under the restriction of trapdoor queries of .Output. outputs its guess, . The value was set with the probability in the setting of the queries. Since queries the oracle regarding the value of the form with the same probability , there exists one pair of the form -list. Therefore, picks a random pair -list and outputs as its guess for , where is the selected value in the challenge phase. To show that correctly outputs with probability at least , we should analyze the probability that does not abort during the simulation.We define the following events:(1) does not abort as a result of any of ’s trapdoor queries(2) does not abort during challenge phase(3) does not issue a query for either one ofin the real attackWe can show in the same manner as in [9] thatWe omit the detailed description.
To prove keyword privacy, we consider a hybrid game which differs on what challenge trapdoor is given by the challenger to . We suppose that is the real trapdoor , which is the challenge ciphertext given to the adversary during a real security game or is a random .

Theorem 2. If the -decisional 3-party DH assumption holds, then our SSKS scheme is -secure.

Proof. Suppose that there exists a -time adversary with nonnegligible difference between its advantage for challenge and its advantage for challenge . We construct an algorithm that solves the decisional 3-party DH problem in . Given a random challenge , outputs 1 if and 0 otherwise. interacts with as follows:Init. outputs challenge keywords . flips a coin to obtain , internally.Setup. chooses a random and sets unknown secret values and . chooses random exponents , and sets and .Query Phase 1. makes trapdoor and ciphertext queries of the form as follows:(1)Trapdoor Queries. If , then chooses a random exponent and sets . If , then aborts.(2)Ciphertext Queries. If , then aborts. If , then chooses a random exponent and sets .Challenge Phase . chooses a random exponent and outputs the challenge trapdoor for keyword as follows:Query Phase 2. answers the queries in the same manner as phase 1 under the restriction of trapdoor and ciphertext queries of .Guess. outputs guess in response to the challenge trapdoor. If , then outputs 1. Otherwise, outputs 0.’s advantage in solving the decisional 3-party DH problem is directly taken from ’s advantage to distinguish between and , except with negligible probability less than or equal to .

4.2. Comparison

Since the symmetric searchable encryption scheme is related to symmetric predicate encryption schemes and symmetric searchable encryption providing access pattern privacy, we give the following two categories for correct comparison.

4.2.1. Symmetric Predicate Encryption

There have been several studies providing keyword privacy in the symmetric setting. In [20], symmetric predicate encryption schemes for an inner-product operation of two vectors were proposed and it considered keyword privacy in a symmetric predicate encryption scheme.

Their scheme is constructed on composite-order bilinear groups, which requires 25 times of exponentiations and 30 times of pairing operations of those in prime-order groups. Recently, Blundo et al. [12] proposed a symmetric hidden vector encryption in asymmetric prime-order bilinear groups. However, there exists no efficiently computable morphism between two different groups used and its security depends on the hardness of nonstandard assumption. The comparisons with [20, 21] are shown in Table 3. However, as mentioned before, the predicate encryption cannot be immediately adopted for symmetric keyword search on cloud storage services.

4.2.2. Symmetric Searchable Encryption

In the previous works for the symmetric searchable encryption [1218], only the ciphertext confidentiality was guaranteed. In searchable encryption schemes, if the trapdoor is given, then the server can run the test function with and ciphertext . When the keyword from a trapdoor is revealed, the confidentiality of the ciphertext associated with cannot be guaranteed. Once the keyword privacy is ensured, even if we know that the given trapdoor and the ciphertext are associated, we cannot know which keyword is related to the trapdoor and the ciphertext. As shown in Table 4, we firstly defined the keyword privacy in a symmetric searchable encryption scheme and presented an efficient SSKS scheme with ciphertext confidentiality as well as keyword privacy.

4.2.3. Efficiency

Since the access patter privacy is the main security goal of this work, as mentioned in the beginning of Section 4.2, we will compare our scheme with predicate encryption schemes. In Table 5, we compare our scheme and the other schemes in terms of functionalities and performances. The table shows that our scheme provides both ciphertext confidentiality and keyword privacy as well as enables to search on encrypted data without any decryption. However, predicate encryption schemes require decryption operations to implement test function in predicate encryption schemes. Therefore, the predicate encryption schemes have the restriction that the test function can be processed by only a trusted party. In comparison with Shen et al.’s scheme and Blundo et al.’s scheme, our scheme requires the similar size in terms of the ciphertext and the smallest size in terms of the trapdoor. Most of all, in the test phase, our scheme needs the smaller computation than other schemes.

5. Conclusions

In this paper, we proposed a practical and secure keyword search, called secure symmetric keyword search (SSKS), which is tailored for cloud storage service. We firstly defined the keyword privacy in a symmetric searchable encryption scheme and presented SSKS scheme that guarantees ciphertext confidentiality as well as keyword privacy. Our SSKS systems and the new security model for key privacy in the symmetric setting can be further exploited to construct symmetric searchable encryption schemes providing extended queries such as conjunctive and inner-product queries [2527].

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was partly supported by the Institute of Information and Communications Technology Planning and Evaluation (IITP) grant funded by the Korea Government (MSIT) (No. 2021-0-00779, Development of High-Speed Encryption Data Processing Technology That Guarantees Privacy Based Hardware, 50%) and National R&D Program through the National Research Foundation of Korea (NRF) funded by Ministry of Science and ICT (NRF-2021R1F1A1056115, 50%).