Abstract

Private set intersection (PSI) is a fundamental cryptographic primitive, allowing two parties to calculate the intersection of their data sets without exposing additional private information. In cloud-based IoT system, IoT-enabled devices would like to outsource their data sets in their encrypted form to the cloud. In this scenario, how to delegate the set intersection computation over outsourced encrypted data sets to the cloud and how to achieve the fine-grained access control for PSI without divulging any additional information to the cloud are still open problems. With that in mind, in this work, we combine key-policy attribute-based encryption (KP-ABE) and PSI to introduce such a novel concept, called delegated key-policy attribute-based set intersection over outsourced encrypted data sets (KP-ABSI), to solve this problem. Then we propose a first concrete KP-ABSI scheme and analyze its efficiency.

1. Introduction

Internet of Things (IoT) is enabling Smart City initiatives all over the world. Recently, IoT-based applications have been widely developed, such as smart grid and smart healthcare [1, 2]. With the growth of IoT, enormous amount of data is generated by IoT-enabled devices. They need to be stored, processed, and accessed. Thus, Alessio et al. firstly merged cloud and IoT to introduce a new paradigm named CloudIoT to solve the issues [3]. For cloud-based IoT, the research on the security and privacy for the big data of IoT is a hot spot.

Private set intersection (PSI), firstly proposed by [4], is a special case of secure multiparty computation. It enables two parties to calculate the set intersection of their data sets under the condition of privacy preservation. It is applied to many practical scenarios, such as IoT and internet-based personal health record (PHR) systems. In traditional PSI solutions, the data users hold their own data sets. However, in CloudIoT computing, data users (i.e., IoT-enabled devices) with limited computing power and storage resources would like to outsource their data sets to the cloud. For confidentiality and privacy, data sets should be encrypted before outsourcing. Cloud service providers provide flexible services to fulfill cloud users’ demand.

Based on this, we research on PSI over outsourced encrypted data sets in the CloudIoT system. In this scenario, the data users (i.e., the IoT-enabled devices) will encrypt and outsource their data sets to the cloud and then delegate the cloud to perform the set intersection. It has been studied by some works [57]. But it indeed raises a concern on how to enforce fine-grained access control for limiting the cloud’s capability on computing set intersection. For this, Mohammad Ali et. al combined ciphertext-policy attribute-based encryption and private set intersection to propose an attribute-based private set intersection [8]. However, in their solution, the data user, who requests the set intersection operation, should hold the data sets in plaintext form. It does not really focus on outsourced encrypted data sets. Besides, there is still no solution for key-policy setting.

In this paper, we firstly combined key-policy attribute-based encryption and private set intersection to introduce a novel concept called delegated key-policy attribute-based set intersection over outsourced encrypted data sets (KP-ABSI). KP-ABSI focuses on the problem of set intersection over outsourced data sets in the cloud paradigm. It allows data owners to specify some attributes set on his/her data set and encrypt it before outsourcing, respectively. A data user with proper access control policy (satisfied by the attribute set specified by the data owner and himself/herself) can generate a token to delegate the cloud sever to perform the set intersection over his/her and the data owner’s outsourced encrypted data sets. We formally give the definition and security notion for KP-ABSI and propose a concrete construction.

Our KP-ABSI scheme has three distinctive properties: (1) Our solution realizes fine-grained authorization for set intersection over encrypted outsourced data sets by combining KP-ABE and PSI. (2) The cloud server cannot obtain any information about the plaintexts beyond the result of set intersection, which is also with the form of ciphertexts. (3) Compared with existing PSI schemes, our schemes do not require interaction with the data owner or the trusted authority.

Although the scholars have carried out extensive research on PSI, the existing solutions cannot solve the problems considered in this paper. In the following part, we will briefly introduce the related works. In general, they can be divided into three categories as follows.Two-Party Private Set Intersection. The traditional PSI has two participants, a data owner and a data user. Both of them hold their own data sets and interactively compute the set intersection [9, 10]. However, two-party PSI does not apply to cloud computing because two parties must hold their data sets by themselves.Three-Party Private Set Intersection. Typically, three-party PSI involves three participants: a data owner, a data user, and the cloud server. The data user and the data owner would like to outsource their data sets to the cloud and delegate set intersection computation to the cloud. [5, 7, 11]. Moreover, public key encryption with equality test [1215] can also be used to attain this goal. However, in these solutions, there is not any authorization mechanism and the data owner online is required to authorize the data user. So, they are not practical in the cloud computing.Attribute-Based Encryption. ABE, which is introduced by Sahai and Waters, achieves fine-grained access control for outsourced data [16]. There are two variants of ABE: key-policy attribute-based encryption (KP-ABE) where the decryption key is associated with the access control policy (e.g., [1719]) and ciphertext-policy attribute-based encryption (CP-ABE) where the ciphertext is associated with the access control policy (e.g., [2022]). In 2017, Zhu et al. presented a key-policy attribute-based encryption with equality test, which can be utilized to do the set intersection over outsourced encrypted data set of one element. After that, Wang et al. proposed the first ciphertext-policy attribute-based encryption with equality test scheme [23, 24]. Later, Cui et al. improved its efficiency [25]. However, attribute-based encryption with equality test is only for one element. For this, in 2020, Mohammad Ali et. al firstly combined CP-ABE and PSI to propose an attribute-based set intersection scheme [8]. It achieves fine-grained access control for set intersection computation. Unfortunately, their solution requires the data user to hold his/her data set in the plaintext form. It did not really focus on outsourced encrypted data sets in the cloud computing. Moreover, there are no key-policy setting solutions for attribute-based set intersection.

Thus, in this paper, we combine KP-ABE with PSI to introduce a novel primitive-delegated key-policy attribute-based set intersection over outsourced encrypted data sets (KP-ABSI). For fairness, we summarize the properties of KP-ABSI scheme in Table 1.

3. Problem Formulation

3.1. System Model

The system model for KP-ABSI is shown in Figure 1. There are three participants: the trusted attribute authority, the cloud users (e.g., data owner Alice, authorized data user Bob, and unauthorized data user Carlos), and the cloud server. The trusted attribute authority primarily initiates the public parameters and issues private keys for data users according to their access control polices. Cloud server provides powerful storage and computing services for cloud users. The cloud users outsource their private data sets to the cloud server. Specifically, a cloud user, Alice, outsources her data set to the cloud in encrypted form, where the encryption is conducted according to some attribute set . An authorized user, Bob, whose access control policy is satisfied by the attribute set , can delegate to the cloud the computation of set intersection between Alice’s outsourced encrypted sets and his own outsourced encrypted sets (Bob naturally has the private key to decrypt his own outsourced encrypted data). Meanwhile, any unauthorized user, Carlos, is neither able to decrypt Alice’s outsourced encrypted data sets nor able to delegate the cloud to perform the set intersection operation.

In this model, we assume that the cloud is semitrusted (i.e., honest-but-curious), which means that the cloud honestly executes the protocol for two honest users, but tries to learn useful information beyond the ciphertexts through set intersection operations. Cloud users may be malicious and may collude with each other. We even allow a malicious user, say Bob, to collude with the semitrusted cloud. However, in this case, we cannot require that the cloud be not able to decrypt the honest user’s, say Alice, ciphertext data set when the malicious and colluding user, Bob, has the private key for decrypting Alice’s data set (e.g., Bob can simply give his private key to the cloud).

3.2. Functional Definition

In this part, we introduce the formal definition for delegated key-policy attribute-based set intersection over outsourced encrypted data sets (KP-ABSI), where private keys are associated with access control policies. For convenience, we denote by an attribute set and by an access policy in KP-ABSI. Let if and only if satisfies in -.

Definition 1. Delegated key-policy attribute-based set intersection over outsourced encrypted data sets (KP-ABSI) includes five algorithms as follows:: Given a security parameter as input, the trusted attribute authority initializes the system public parameters and the master secret key .: Given the master secret key and an access control policy , the trusted attribute authority issues private keys for a data user.: Given an attribute set , a data user encrypts his/her private data set to the ciphertext . The resulting ciphertexts will be outsourced to the cloud.: With his/her private key , the data user generates a token and delegates the set intersection computation to the cloud.: The cloud utilizes to compute, on behalf of two data users, the set intersection only if the access control policy corresponding to satisfies both and , where the attribute sets and are, respectively, specified by and . We say that a KP-ABSI scheme is correct if the following holds: Given , , , and for set and for set , if and , then is the encrypted form of set intersection , where .

3.3. Security Definitions

The security for KP-ABSI can be expressed by the three properties as follows.

3.3.1. Selective Security against Chosen-Plaintext Attack

It indicates that a probabilistic polynomial-time (PPT) adversary , without being given the corresponding tokens, is not able to obtain any useful information about the encrypted data sets. Notice that “selective” means that adversary should choose a target attribute set which it wants to challenge before the public parameters are generated. The security definition for selective security against chosen-plaintext attack can be formalized via the following game between an adversary and a challenger.Setup: selects a target attribute set and sends it to the challenger. The challenger runs algorithm to initialize and , sends to , and sets as the master private key.Phase 1: The adversary can make polynomial queries for the following oracles:: If , the challenger aborts; otherwise, the challenger returns to .: If , the challenger aborts; otherwise, the challenger calculates and returns to .Challenge: The adversary randomly gives two data sets and , where but , to the challenger. Then, the challenger picks at random, builds the challenge ciphertext , and sends to .Phase 2: Same as Phase 1.Guess: eventually outputs a guess of . If , we say that wins the game.

Definition 2. We say that a KP-ABSI scheme is selective secure against chosen-plaintext attack, if any PPT adversary wins the above game with a negligible advantage, where the advantage can be described as .

3.3.2. One-Way Security against Chosen-Plaintext Attack

It says that a PPT adversary , even given an appropriate token, cannot obtain the plaintexts corresponding to the ciphertexts. Note that the term “appropriate” means that the access control policy that generates the token is satisfied by the attribute set associated with the target ciphertext. Of course, can choose a plaintext data set of its choice, encrypt it with public keys, and then utilize the token to check whether or not the target ciphertext is equal to the ciphertext of his choice. In other words, this type of brute-force attack is inherent to the set intersection problem and we can only demand that cannot have any attack strategy significantly better than the brute-force attack, as captured by this property via the following game between an adversary and a challenger.Setup: The challenger runs to initialize , sends to , and sets as the master private key.Phase 1: The adversary can make polynomial queries for the following oracles. Meanwhile, the challenger maintains a list , which is initially empty.: The challenger returns to and records to .: The challenger calculates and returns to .Challenge: gives a target attribute set to the challenger, where, , . The challenger selects an access control such that , picks uniformly at random, runs and , and returns to .Phase 2: executes the same as in Phase 1, except that when querying .Guess: outputs a guess . If , we say that wins the game.

Definition 3. We say that a KP-ABSI scheme achieves one-way security against chosen-plaintext attack if, for any PPT adversary , the advantage of winning the game is negligible, where the advantage is defined as , where is the number of guess/brute-force attacks makes, and Msg is the message space of set elements.

3.3.3. Fine-Grained Authorization Security

This property says that the cloud is unable to utilize the given tokens to conduct set intersection over ciphertexts if no access control policy (associated with the data user’s private key) that generates the tokens is satisfied by both of the attribute sets associated with the two ciphertexts. More specifically, consider the token that can be used to conduct set intersection over ciphertexts and and the token that can be used to conduct set intersection over ciphertexts and . If the access control policy that is used to generate is not satisfied by the attribute set associated with ciphertext , and the access control policy that is used to generate is not satisfied by the attribute set associated with ciphertext ; then the cloud cannot do the set intersection computation over ciphertexts and by using and/or . The definition for fine-grained authorization security can be described via a game between an adversary and a challenger.Setup: The challenger runs to initialize , sends to , and sets as the master secret key.Phase 1: The adversary makes polynomial queries for the following oracles. Meanwhile, the challenger maintains two lists and , which are empty initially.: The challenger returns to and records to .: The challenger runs and , returns back to , and records to .Challenge: gives two target attribute sets and to the challenger. Then the challenger chooses two data sets , picks a bit randomly, runs and , and returns to . Here, we require that, and do not output 1 simultaneously;, and do not output 1 simultaneously.Phase 2: executes the same as in Phase 1, except for the following:When querying , and do not output 1 simultaneously.When querying , and do not output 1 simultaneously.Guess: The adversary eventually outputs a guess . If , we say that wins the game.

Definition 4. If, for any PPT adversary , the advantage of winning the game is negligible, where the advantage can be expressed as , we say that a KP-ABSI scheme achieves fine-grained authorization security.

4. Scheme Construction

4.1. Basic Idea

To illustrate the idea, let a data user’s private key be , which can be generated by running for some random and setting , where is a random number with respect to leaf . A set element is encrypted into two parts:The first part is related to ; namely,where is a generator of , and are two random numbers, are two hash functions, and are private keys.The second part is related to the attribute set corresponding to the access control policy in question, namely, for .

A data user can generate the token as , by which the cloud is able to translate the ciphertext into an intermediate form once the attribute set satisfies the access control policy .

4.2. KP-ABSI Construction

Setup(): Given the security parameter as input, the public parameters and the master secret key can be generated as follows:Let .Let and be two secure hash functions that are modeled as random oracles.Select and set the public parameters and the master secret key as: Given access tree , this algorithm selects , computes and , and runs . Then, for each leaf , the algorithm selects and sets and . The secret key is: Given set for outsourcing, , this algorithm encrypts the set as follows: for each , it selects , sets , , and , and computes for each . The ciphertext of isThe set of ciphertexts is .: Given secret key , this algorithm selects , sets and , and computes and for leaf . The token is: Given , , and , this algorithm is executed as follows:Given , it selects an attribute set satisfying . If does not exist, it returns 0. Otherwise, it computesfor all with , computesand sets .Given , it selects an attribute set satisfying . If does not exist, it returns 0. Otherwise, it computes for all with , computesand sets .Output the set intersection .

The correctness of the above KP-ABSI scheme can be verified by following the protocol. In what follows, we analyze its security.

4.3. Security Analysis

Theorem 1. Under the DLN assumption, the above KP-ABSI scheme achieves the selective security against chosen-plaintext attacks in the random oracle as specified in Definition 2.

Firstly, we prove that our scheme achieves the security goal when (i.e., the message space has a single element) and then extends the proof to the case .

Proof. We show that if there is a PPT adversary that wins the selective security game with a nonnegligible advantage , then a challenger that can solve the DLN problem with the advantage at least can be constructed. Specifically, given a DLN instance , where and are unknown, the game is simulated by the challenger as follows.Setup: The adversary gives an attribute set to the challenger. Then the challenger produces the bilinear map , constructs and with and unknown, and sets . The challenger sends to . The challenger maintains two lists and , which are initially empty. can query and polynomially many times as follows:: Given attribute , the challenger responds as follows:The case was queried before: it retrieves from and returns to .The case was not queried before: if , it selects , adds to , and returns to ; otherwise, it selects , adds to , and returns to .: Given a message as input, if was queried before, it retrieves from and returns to ; otherwise, it picks randomly, records to , and returns to .Phase 1: The adversary makes polynomial queries for the following oracles as follows:: If , the simulation is aborted; otherwise, the challenger produces according to the two following procedures:: Given a secret value , this procedure builds the polynomial for each node of subtree , where . Suppose that the threshold value of node is ; it lets and chooses coefficients uniformly at random to uniquely determine the polynomial . Then it recursively runs to build the polynomial for each child node of , by letting .: Given an element , where is unknown to the challenger, this procedure is to build the polynomial for each node of subtree , where . Assume that the threshold value of node is and is the set of children of node such that, . Since , we have . For each , it selects and sets . It then determines the other points of polynomial such that . For each child node of node , it executes the following:(i)If is a node with , it runs , where is known.(ii)If is a node with , it runs , where is known.Based on the two procedures above, the challenger executes by implicitly defining . Note that, for each , the challenger knows if and knows otherwise. Therefore, it generates credentials as follows by selecting and setting and (while noting that is the secret share of ):If for some , it selects and sets and .If , where for some , it selects and sets and . Note that is valid because the challenger implicitly sets and the following:: The challenger queries to obtain and returns to .Challenge: The adversary gives two messages and of equal length to the challenger. Then the challenger randomly picks , encrypts to , and sends to .Phase 2: executes the same as in the above .Guess: The adversary will eventually output a guess of . If , the challenger outputs ; otherwise, it outputs .The simulation is completed. In Challenge phase, if , then is indeed a valid ciphertext of and the probability that outputs is . Otherwise, if is a random element from , then is a random group element and the probability that outputs is . In conclusion, the probability that the challenger correctly guesses is . That is, if wins the game with the advantage , then the challenger solves the DLN problem with the advantage .
So far, we have shown that our KP-ABSI scheme is selective secure against chosen-plaintext attack when . In what follows, we prove that our KP-ABSI scheme achieves the selective security against chosen-plaintext attack for the general case of .
Suppose that gives two sets and . Denote by the encryption of , meaning that is the encryption of and is the encryption of . The Challenge phase is extended to accommodate an additional adversary as follows: picks a random index and presents to the challenger. Then the challenger sends back to by encrypting if or returns if . encrypts and and returns to . outputs ’s output .Note that sends to the ciphertext if and the ciphertext if . Denote by the guess of with ciphertexts . Then we show the probability that wins the game. Note thatTherefore, the probability that wins the game iswhere is negligible because the advantage that wins the game is negligible. Thus, the probability that distinguishes from isThat is, the advantage of distinguishing from is at most . Therefore, the scheme achieves the selective security against chosen-plaintext attack when .

Theorem 2. Given one-way hash function , the above KP-ABSI scheme achieves the one-way security against chosen-plaintext attack as specified in Definition 3.

Proof. We prove this theorem by showing that if there is a PPT adversary winning the one-way security game against chosen-plaintext attack with a nonnegligible advantage , then a challenger breaking the one-way hash function can be simulated.
Given , the challenger can simulate the one-way security game as follows:Setup: The challenger randomly picks , produces , and sends to .Phase 1: The challenger maintains a list , which is empty initially. makes polynomial queries for the following oracles:: Given an access control policy , it returns to and adds to .: Given an access control policy , it runs and returns to .Challenge: gives an attribute set to the challenger, where, , . The challenger picks and sets a data set , where are randomly chosen from the message space and is implicitly set as and generates the ciphertext as follows:If , is generated the same as in the real construction.If ,by randomly choosing and implicitly setting .The challenger chooses an access tree satisfying , runs , and returns to the adversary .Phase 2: The adversary executes the same as in Phase 1 while complying with the necessary requirements defined by the game.Guess: will eventually output a guess to the challenger. The challenger wins if .The simulation is completed. If the probability that outputs is , then . Since the data set size is , . Therefore, if wins the one-way security game against chosen-plaintext attack with a nonnegligible advantage , the one-way hash function can be broken by the challenger with a nonnegligible probability at least .

Theorem 3. The KP-ABSI achieves fine-grained authorization security in the generic bilinear group model as specified in Definition 4.

Proof. Similar to the proof for Theorem 1, firstly, we prove that our KP-ABSI scheme achieves fine-grained authorization when the challenge size and then extend the proof to the case of challenge size .Setup: The challenger randomly picks , produces , and sends to . The challenger maintains two lists and , which are empty initially. can make polynomial queries for the following and .: Given an attribute , if was queried before, the challenger returns by retrieving from ; otherwise, the challenger picks , records to , and returns to .: Given a message , if was queried before, the challenger returns ; otherwise, the challenger picks , records to , and returns to .Phase 1: The challenger keeps the two lists, and , which are initially empty. can make polynomial queries for the following oracles.: The challenger selects and runs . For each node , the challenger chooses and setswhere . The challenger sends to and records to .: The challenger runs , selects , and setswhere . It returns to and adds to .Challenge: chooses two attribute sets and with the following restrictions: (1) , and cannot output 1 at the same time, and, (2) , and cannot output 1 at the same time. Then, sends and to the challenger. The challenger chooses two sets of equal length. For , the challenger selects and computesThe challenger randomly picks and for and setsPhase 2: Same as Phase 1.Guess: Finally, will eventually output a guess of .If can determine whether is equal to or not, also can determine whether is equal to or not. The only way for to achieve this is to construct a query for some . To prove Theorem 3, we will show that can never construct a query for .
Table 2 shows all the possible queries of G by means of the bilinear map and group elements given to the adversary. Note that only incurs in terms and , respectively. Thus, must construct for obtaining . Moreover, since and are independent, must construct and for the same . Then we show that adversary can never build and for the same .
To construct for , as only appears in the term , we let for some . That is, needs to construct the term . In order to get that, the only way of constructing is to apply in Table 2 with of , which will result in , meaning that can construct the query . That is, can be written as for a known constant . Similarly, we can show that can be written as for a known constant to build . Since and are unknown to , then cannot be constructed, since cannot find a known constant that is the product of and .
In conclusion, is able to construct and for the same with a negligible probability and get a negligible advantage in the fine-grained authorization game.
Similar to the proof in Theorem 1, an adversary can be simulated and it can be proved that if can break the fine-grained authorization security for , then can break the fined-grained authorization security for . This completes the proof.

4.4. Efficiency Analysis

Now we evaluate the efficiency of the schemes in terms of the asymptotic computational complexity. The asymptotic complexity is measured in terms of operations: denotes the operation of mapping a bit-string to an element of , denotes the group exponentiation operation in , denotes the group exponentiation operation in , and denotes the pairing operation. We ignore the multiplication operations because they are much more efficient than the operations mentioned above (Table 3).

We can see that TokenGen incurs small cost when compared with SI. This implies that the data user should use the token to outsource the set intersection operations to the cloud.

5. Conclusions

In this paper, we present a novel cryptographic primitive: delegated key-policy attribute-based set intersection over outsourced encrypted data sets (KP-ABSI). It simultaneously achieves the following: (1) Each data owner outsources his/her data set in encrypted form to a cloud, where the outsourced data set is associated with an attribute set. (2) A data user is associated with an access control policy that is satisfied by the attribute sets of two encrypted data sets (owned by two data owners, respectively) and can delegate to the cloud the set intersection computation over the two data owners’ outsourced encrypted data sets. (3) The cloud can conduct the set intersection operation on behalf of the data user without being able to obtain any useful information about the data owners’ plaintext data set.

Thus, our scheme can solve the PSI problem in CloudIoT system. Of course, in our solution, the cloud is semihonest. How to build a construction in the malicious model is still an open problem.

Data Availability

No data were used to support this study.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by Program for the Scientific Research Foundation of Nanjing Institute of Technology (YKJ201980), Program for Natural Science Research Projects of Universities (19KJB520033), and Program for Scientific Research Foundation for Talented Scholars of Jinling Institute of Technology (JIT-B-201726).