Abstract

Secure subset problem is important in secure multiparty computation, which is a vital field in cryptography. Most of the existing protocols for this problem can only keep the elements of one set private, while leaking the elements of the other set. In other words, they cannot solve the secure subset problem perfectly. While a few studies have addressed actual secure subsets, these protocols were mainly based on the oblivious polynomial evaluations with inefficient computation. In this study, we first design an efficient secure subset protocol for sets whose elements are drawn from a known set based on a new encoding method and homomorphic encryption scheme. If the elements of the sets are taken from a large domain, the existing protocol is inefficient. Using the Bloom filter and homomorphic encryption scheme, we further present an efficient protocol with linear computational complexity in the cardinality of the large set, and this is considered to be practical for inputs consisting of a large number of data. However, the second protocol that we design may yield a false positive. This probability can be rapidly decreased by reexecuting the protocol with different hash functions. Furthermore, we present the experimental performance analyses of these protocols.

1. Introduction

The prompt development of networks provides a great opportunity for multiparty cooperative computation, and it challenges the privacy of the participants’ information. In a complex network environment, parties may not trust each other during computations, and they are required to keep their information private. Secure multiparty computation is a key technology for privacy-preserving in cooperative computations. Thus, secure multiparty computation attracts increasing attention in the international cryptographic community.

Secure multiparty computation was first introduced by Yao [1] as a millionaires’ problem in 1982. The millionaires’ problem can be described as follows. Two millionaires, Alice and Bob, want to know who is richer, but neither Alice nor Bob wants to disclose her/his own wealth to the other. This is a secure two-party computation problem. After this, Ben-Or et al. [2] gave the first secure multiparty computation protocol. A secure multiparty computation involves any two or more parties who use their own private data to cooperatively compute a function in order to obtain the predetermined output while keeping their input information private. Secure multiparty computation is a general cryptographic protocol. Many cryptographic protocols for cooperative computations that contain two or more parties can be viewed as secure multiparty computation protocols, and these include key exchange protocols [3], digital signature protocols [4], secret sharing protocols [5], zero-knowledge proof protocols [6], and oblivious transfer protocols [7]. Secure multiparty computation is a key technology in network security, and it has been the focus of the international cryptographic community for many years. The Turing Award winner Goldwasser [8] predicted that “the field of multiparty computations is today where public key cryptography was ten years ago, namely, an extremely powerful tool and rich theory whose real-life usage is at this time only beginning but will become in the future an integral part of our computing reality.”

Goldreich et al. [9, 10] thoroughly studied the secure multiparty computation problem and established its theoretical foundation. They proved that secure multiparty computation problems are theoretically solvable and proposed a general solution to secure multiparty computation problems. Because the general solution is inefficient and impractical for special problems, they also noted that, to improve efficiency, special solutions should be developed for special problems. This observation motivates people to study solutions to various secure multiparty computation problems. The problems studied include millionaires’ problems [11, 12], secure computational geometry problems [13], comparisons of information without it being leaked [14], private bidding and auction problems [15], and privacy-preserving data mining problems [16]. In addition, there are many other new secure multiparty problems that need to be studied.

Because many problems can be abstracted as set problems, private set operation is a highly important field in secure multiparty computation. These problems include set intersection [18], set union [19], and subsets [17]. The set intersection problem and the set union problem have been widely studied, while there are only few studies of the subset problem. However, there are a variety of applications for the subset problem.(1)In data mining, there is an important principle (Apriori Principle) about the association rule, which states that if an itemset is frequent, then all of its subsets must also be frequent [20]. Suppose that both Alice and Bob are suppliers of a supermarket . Alice has a large frequent itemset that is generated with data mining from the transactions of . Bob has an itemset , and he wants to know whether is also a frequent itemset. However, he cannot perform data mining on the transaction data of (either he cannot obtain the transaction data or he does not have data mining knowledge). Therefore, he resorts to Apriori Principle, but he does not want to disclose to Alice. As expected, Alice also wishes to keep a secret. In this application, they have to privately determine whether .(2)In secret sharing, a secret is divided into shares, and they are privately given to parties who are called the legal shareholders, and any or more shareholders can reconstruct the secret. During the reconstruction of the secret, some illegal shareholders may take part in the reconstruction. To prevent illegal shareholders from taking part in the reconstruction, the authenticity of the shareholder participants must be privately determined. This is where the secure subset protocol comes into play.

It is generally known that the subset problem is a special case of the set intersection. However, when applied to solve the subset problem, existing set intersection protocols can lead to both insecure and inefficient solutions. For the subset problem, we only need to determine whether . Meanwhile, the intersection protocols have to compute every element where . This method will first disclose the same elements between and for the subset problem. Furthermore, the subset problem is a decision problem, and it does not need to compute all the elements of . Thus, the set intersection protocols are not suitable for the subset problem.

If there are two sets and , where , in most current studies, many private subset operations can be classified into two different cases. First, two parties proved that , leaking the elements of set [2123]. Second, two parties proved that without keeping the privacy of the elements of set [2427].

In addition, Kissner and Song [17] proposed a secure solution to the subset problem based on the Paillier additively homomorphic encryption scheme [28], the representation of elements of a set as roots of a polynomial, and the mathematical properties of polynomials. In their proposed solution, both sets and can be kept private. Let be the encryption of the polynomial that represents the larger set . Note that if , then is true for every element (or vice versa). That is, . The party who has the smaller set evaluates the encrypted polynomial at each element to obtain ciphertexts, and it multiplies these ciphertexts to obtain . If is an encryption of , then . However, the computational complexity of this protocol takes (, ) modular multiplications (mod , details are presented in Section 5.1). This depends on the product of and . However, the protocol is inefficient for the computation of a large quantity of data.

Furthermore, Ye et al. [29] and Sang and Shen [30] separately gave their subset protocols, which are mainly based on the oblivious polynomial evaluations, and which are similar to Kissner’s protocol. The subset protocol of [29] was presented in the distributed setting. By using Shamir’s secret sharing scheme, the polynomial constructed based on the larger set was distributed to multiple servers. The party who had the smaller set interacted with at least servers to compute the subset problem based on the standard variant of the ElGamal encryption. The overall cost for the computation is , and the communication is bits. In the subset protocol of [30], Sang utilized a nonmalleable NonInteractive Zero-Knowledge (NIZK) argument, which is based on the Boneh-Goh-Nissim (BGN) cryptosystem to protect it against malicious attacks. Without considering the computational complexity of malicious attacks, the computational complexity of this protocol is besides the NIZK argument. Meanwhile, our protocols have a linear computational complexity in the cardinality of the large set (details are presented in Section 5.1).

Moreover, Blanton and Aguiar [31] created an efficient subset protocol based on the oblivious algorithms, such as oblivious sorting algorithms and oblivious equality algorithms. Unfortunately, this protocol is constructed using the circuit method and has the drawbacks of the circuit method [32].

Shundong et al. [12] described a secure subset protocol that retains the privacy of both sets and , and it is based on symmetric cryptography and has high efficiency. However, the smaller set can only have one element in the protocol. If set has more than one element, the parties have to execute the protocol times and choose new pseudorandom sequences on each occasion, which is tedious.

In this study, we mainly propose two secure subset protocols for different situations using homomorphic encryption schemes which can be multiplicative or additive. Because a multiplicatively homomorphic encryption scheme is more efficient than an additive one, we choose a multiplicative one to build our protocols. To the best of our knowledge, encryption schemes can currently encrypt only integer messages. In addition, the sets to be computed always come from a known set whose elements are not integers for many often-occurring ranges. For this case, we design an efficient protocol, which is based on a new encoding method, and a homomorphic encryption scheme. The computational complexity of this protocol is linear in the size of the large set. For the situation in which the sets are taken from a large domain, we further present an efficient protocol based on Bloom filters and a homomorphic encryption scheme to improve efficiency without compromising accuracy much. Furthermore, we show that, by using the Bloom filter, we can solve the subset problem for sets that are taken from an exponentially large domain.

The rest of this paper is organized as follows. In Section 2, we introduce some preliminaries. In Section 3, we propose an efficient secure subset protocol for sets whose elements are drawn from a known set using a new encoding method and homomorphic encryption schemes. In Section 4, we show the secure subset protocol for sets within a large domain based on the Bloom filter and homomorphic encryption schemes, while in Section 5, we present an analysis of secure subset protocols and the experimental implementation. Finally, in Section 6, we conclude this paper.

2. Preliminaries

2.1. Secure Subset Problem

Alice has a set , and Bob has a set . Alice and Bob want to determine whether is a subset of without disclosing any information about the elements of their sets relative to each other. This can be abstracted as a secure subset problem.

2.2. Homomorphic Encryption Scheme

A homomorphic encryption scheme is an encryption scheme with some special properties that make the homomorphic encryption scheme a building block of many secure multiparty computation protocols. A conventional public key encryption scheme consists of three algorithms: , , and .(i). takes a security parameter as the input, and it outputs a secret key and the corresponding public key with the definition of the plaintext space and the ciphertext space . (ii). Taking and a plaintext as inputs, outputs a ciphertext . (iii). Taking a ciphertext and the secret key as inputs, outputs the plaintext . In addition to the three conventional algorithms, a homomorphic encryption scheme has an efficient algorithm , which takes as inputs the public key , an operation , and a tuple of ciphertexts ( is the ciphertext of , ), and it outputs a ciphertext of .

Our construction uses semantically secure public key encryption schemes that preserve the group homomorphism under some computational complexity assumptions. This property is obtained by the Paillier encryption scheme [28] and the ElGamal encryption scheme [33] under the Composite Residuosity Class (CRC) assumption and the Computational Diffie-Hellman (CDH) assumption, respectively. Details are presented as follows.

Pailler Encryption Scheme

(i) KeyGen. On inputting a security parameter , this algorithm generates two large primes , sets , and and computes such that , where is defined as is the ciphertext space, and is the plaintext space. The public key is , and the private key is .

(ii) Encrypt. To encrypt plaintext , the algorithm selects a random number and computes

(iii) Decrypt. To decrypt the ciphertext , the algorithm computes

(iv) Evaluate. For ciphertexts , , and and a constant , we have

In this encryption scheme, if , then .

ElGamal Encryption Scheme

(i) KeyGen. On inputting a security parameter , the algorithm generates a large prime and a generator , and it randomly chooses a number as a private key. The public key is .

(ii) Encrypt. Taking and as inputs, the algorithm selects a random number and computes

(iii) Decrypt. This algorithm takes and as inputs and computes

(iv) Evaluate. Given ciphertexts , and and a constant , we can compute that

In this encryption scheme, if , then .

These two schemes are semantically secure under the CRC assumption or the CDH assumption. That is, given two messages and , as well as a ciphertext encrypted by these encryption schemes, no probabilistic polynomial-time algorithm can determine whether the ciphertext is a ciphertext of or with nonnegligible advantages.

2.3. Security of Secure Multiparty Computation

We assume that all parties are semihonest. In general, a semihonest party follows the prescribed protocol correctly, except that it keeps a record of all its intermediate computations and may try to derive the other party’s private inputs from the record. Goldreich [10] also designed a compiler that can force each party to either behave in a semihonest manner or be detected. Given a protocol , which privately computes function in the semihonest model, this compiler can produce a new protocol , which privately computes in the malicious model. This work demonstrates that the study based on the semihonest model is very important. Therefore, our work focuses on solutions to the subset problem in the semihonest model.

Different methods are used to prove the security in different cryptographic fields. The proof method, which reduces the security to a difficult assumption in the standard model or the random oracle model, is suitable for verifying encryption schemes and signature schemes. The simulation paradigm is widely accepted and is used to prove the security of secure multiparty computation protocols. The basic idea behind the simulation paradigm is to compare a real secure multiparty computation protocol with an ideal one. The real protocol is considered as secure if the real secure multiparty computation protocol does not leak more information than the ideal one. The ideal secure multiparty computing protocol can be described as follows.

Assume that there is an absolute trusted third party, denoted by Trent, who will neither lie nor leak any information that should not be revealed. Alice has a number , Bob has a number , and they want to securely compute a function . They can do as follows: (a) Alice and Bob, respectively, send and to Trent, (b) Trent computes the function , and (c) Trent tells Alice and Bob the result.

Because most secure multiparty computation protocols are constructed using public key encryption schemes, the security proof for a secure multiparty computation protocol is to reduce the security of the protocol to the security of the public key encryption scheme on which the protocol is based. That is, to prove that a multiparty computation protocol is secure, we must prove that the real secure multiparty computation protocol does not leak more information than the ideal protocol with the assumption that the public key encryption scheme used in the real protocol is secure. In other words, the information that a party obtains in a real secure multiparty computation protocol can be simulated by a simulator that only obtains the result and one party’s input, and if the sets of information obtained from both methods are computationally indistinguishable, the real protocol is secure.

Intuitively, a protocol that computes is secure if whatever a set of semihonest parties can obtain after participating in the protocol could be obtained from the inputs and outputs of these same parties. In the simulation paradigm, this means that the (this will be discussed later) of a set of semihonest parties during a protocol execution can be simulated by their inputs and outputs.

Suppose that there are two parties Alice and Bob who have sets and , respectively. They want to privately compute , which is a polynomial-time function. Further, suppose that is a protocol-computing function . The VIEW of Alice, who has the set , during the execution of on the input , is denoted by , where is the result of Alice’s internal coin tosses, and is the -th message that Alice received. The output of Alice after the execution of is denoted as , which is implicit in Alice’s VIEW. Similarly, Bob’s VIEW and output during the execution of are and .

Definition 1 (security in the semihonest model [10]). For a function , we say that privately computes if there exist two probabilistic polynomial-time simulators, denoted by and , such that where denotes computational indistinguishability.

3. Protocol for Sets Whose Elements Are Drawn from a Known Set

Suppose that Alice has a set and Bob has a set . A straightforward way to compute the subset problem between and , without worrying about the privacy, is as follows: Alice sends her set to Bob; Bob computes whether ; then tells the result to Alice. Thus, Alice and Bob obtain the subset relation between and .

By the definition of subset, if , then for any element . Thus, we can reduce the subset problem to checking whether all the elements of set are in set . If all the elements of are the elements of , then ; otherwise, .

Suppose Alice and Bob have sets , (), respectively. They want to determine whether or not without disclosing either or .

3.1. Foundations of This Protocol

Before we describe the idea of our protocol, we first present the building blocks— a 1- encoding method and a 1-0 encoding method—based on the definition of the characteristic vector of mathematics.

1- Encoding. A 1- encoding is used to encode a set to a 1- vector, where every component is either 1 or , where is a random number and . The principle for encoding a set to a 1- vector is as follows: if , then ; otherwise, . This can also be described by the following pseudocodes:For to   If   Else End

1-0 Encoding. This method is similar to a 1- encoding, but with a small difference. Encoding a set to a 1-0 vector is as follows: if , then ; otherwise, . This can also be described by the pseudocodes as follows:For to If Else End

From a high-level perspective, the 1- encoding (1-0 encoding) encodes an () with a one component and an () with a random (zero) component. Alice and Bob can use the above encoding methods to compute the subset problem.

Alice encodes set to a 1- vector , and Bob encodes set to a 1-0 vector . Alice sends her vector to Bob. Bob chooses the components of corresponding to the one components of and computes their product , . If , then ; otherwise . This is the principle of deciding the subset relation between sets and . For simplicity, we give a simple example in Table 1. is the vector that is chosen from vector according to the one components of .

Alice and Bob can also compute the subset using another method. Alice and Bob encode to a 0- vector and to a 1-0 vector based on a 0- encoding and the 1-0 encoding, respectively. The 0- encoding is similar to the 1- encoding and requires only that we change one component to zero components and . Bob computes . If , then ; otherwise, .

However, we can use the above approaches to easily determine whether easily, but it is not secure. We use semantically secure and homomorphic encryption schemes to privately compute or in order to privately determine whether .

3.2. Protocol

We give a solution to the secure subset problem in Protocol 2 based on the above foundations. Because a multiplicatively homomorphic encryption scheme is more efficient than an additive one, we choose a multiplicative one and encode to to present this protocol. The ElGamal encryption scheme is semantically secure if the CDH assumption holds, which can make the ciphertexts of the same plaintext indistinguishable. Therefore, we can have different ciphertexts of plaintext 1. In addition, the ElGamal encryption scheme is multiplicatively homomorphic, and we can therefore obtain using ciphertexts and . Furthermore, ; . Thus, we present Protocol 2 based on the ElGamal encryption scheme. For ease of explanation, we define as follows: if , ; otherwise, .

Protocol 2. Secure subset protocol for sets whose elements are drawn from a known set.
Inputs. Alice and Bob’s input sets and .
Output. . (1)Alice generates both her private key and its corresponding public key . She publishes while keeping private.(2)Alice encodes set as vector . She further encrypts as with . She sends to Bob.(3)Bob encodes as using 1-0 encoding. He computes Furthermore, he randomly chooses a number and computes Then, he sends to Alice.(4)Alice decrypts to obtain . If , then Alice tells Bob that ; otherwise, Alice tells Bob that .Because the ciphertexts of random numbers are also random, Alice needs only to encrypt the one components of in step (). That is, Alice needs only to encrypt her own elements. This reduces the computational complexity.
If , then ; otherwise, if , then is a random number. Thus, the random number does not change the result. In this protocol, all the parties are semihonest and may try to derive information based on the message sequences that they obtained. The random number can randomize the computation of Bob. In step (), if Bob does not insert the random number , Alice may deduce useful information from . If is small, Alice can obtain the ciphertexts that Bob used to compute the product ciphertext . Thus, Alice obtains the 1-0 vector . Furthermore, she obtains set . Even if is sufficiently large, Alice cannot derive Bob’s set from , but if is not a subset of , Alice may derive which elements are not in set based on and .

3.3. Security of Protocol 2

In this manuscript, we prove the security of Protocol 2 using the simulation paradigm.

Theorem 3. Protocol 2, denoted by , for computing the subset problem is private.

Proof. To prove this theorem, we show that there exist two simulators and such that (12) holds. We first show the construction of .(1) receives as the input and randomly chooses a set such that . simulates the execution of Protocol 2 based on , . encodes sets and to and , respectively.(2) encrypts the vector using the public key to obtain ciphertexts , .(3) first computes further chooses a random number and computes .(4) decrypts and obtains .Let ( ). In this protocol, , (). Because the ElGamal encryption scheme is semantically secure with the CDH assumption, messages that are encrypted based on this scheme are computationally indistinguishable. This means that the message sequences that Alice obtained in Protocol 2 and the message sequences that simulated are computationally indistinguishable. As , it follows that Now, let us examine the construction of . Based on the inputs , proceeds as follows: (1) chooses a set such that , and it simulates the execution of Protocol 2 with sets and . Based on the 1- encoding and 1-0 encoding, encodes to and to , respectively.(2) encrypts the vector to obtain (3) computes Furthermore, chooses a random number and computes .(4) obtains from .Let ( ). In this protocol, (). Because messages that are encrypted using the ElGamal encryption scheme are computationally indistinguishable under the CDH assumption, the message sequences that Alice obtains in Protocol 2 and the message sequences that is simulating are computationally indistinguishable. As , we have

4. Protocol for Sets with Large Domains

In Protocol 2, we present a subset protocol for sets whose elements are drawn from a known set. Because the communication complexity is linear in , this is awkward if is large. Therefore, we construct a secure subset protocol for sets taken from a large domain. Suppose there are two sets and (). This protocol is efficient with a linear computational complexity in , whereas the computational complexity of Kissner’s protocol [17] is linear in the product of and . If , the protocol of Kissner has an computational complexity that is quadratic. Thus, this protocol cannot generally be considered practical for inputs consisting of a large number of data [34]. However, the protocol that we construct reduces the computational cost at the cost of degraded accuracy. That is, our protocol has a negligible false positive, and in Section 4.2, we show how to decrease the false positive.

4.1. Foundations of This Protocol

The following secure subset protocol is based on the Bloom filter [35] and a variant Bloom filter. We present the building blocks of the protocol before giving the idea behind the protocol.

Bloom Filter. A Bloom filter is a vector of bits that can represent a set of at most elements. There are independent uniform hash functions , and each maps the elements of to uniformly. In this paper, we use to denote the bit at index in . Initially, all bits in the array are set to 0. To insert an element into the filter, we compute each hash function and set . After all the elements of are inserted in , the Bloom filter represents set .

To check if an item is in , we check all components of that are hashed by . If any bit at the components is 0, then ; otherwise, with high probability. However, while a Bloom filter may yield a false positive, it never yields a false negative. That is, if , it must be that ; if , it may be that . The probability of the false positive is According to [36], it is about . We can choose based on our practical applications. If the size of is , the probability that a specific component is one is [37].

Suppose Alice has a set , and Bob has a set . Sets and are taken from an exponentially large domain, and they can compute whether using the Bloom filter as follows.

Protocol 4. Secure subset protocol for sets taken from an exponentially large domain.
Inputs. Alice and Bob input sets , .
Output. Whether .(1)Alice and Bob negotiate the parameters for their Bloom filters.(2)Alice and Bob represent sets and to Bloom filters and , respectively.(3)Alice sends to Bob.(4)Bob checks using . If , then , ; otherwise, . He sends the result to Alice.

The above protocol has a low computational complexity based on hash functions. However, when the sets of parties are not taken from an exponentially large domain, Bob can obtain the set from using an exhaustive search. Thus, we designed a solution for sets taken from a large, but not exponentially large domain based on the Bloom filter and a variant of the Bloom filter. Before presenting the principles behind this solution, we show the variant of the Bloom filter.

Variant Bloom Filter. The variant Bloom filter is similar to the Bloom filter with a small difference. In the Bloom filter, each component is either 0 or 1 bit, while the component of the variant Bloom filter is either or 1. Similarly, to insert an element into a variant Bloom filter of a set , we compute and set . After all the elements of are inserted, we let the remaining components of be random numbers other than 1. Because the variant Bloom filter just changes 0 components to random numbers compared to the Bloom filter, the false-positive probability of the variant Bloom filter is the same as that of the Bloom filter.

Suppose that Alice and Bob have sets and , respectively, which are taken from a large domain, and they want to decide whether . Alice represents set to a variant Bloom filter and sends to Bob. Bob represents set to a Bloom filter . He computes If , then ; otherwise, . This is the idea behind deciding whether . However, Alice and Bob can also solve the subset problem to represent set to another variant Bloom filter . Besides represents the 1 component of to 0 components, and it is similar to . Bob computes instead of . If , then ; otherwise, .

For simplicity, we give a simple example in Table 2 with the variant Bloom filter . Suppose that Alice has a set and Bob has a set . Alice and Bob represent and to a variant Bloom filter and a Bloom filter , respectively. Let the length and hash functions for both and . The hash functions map any value to uniformly, as in Table 3. Alice sets as 1 the 134, 189, 258, and 393 corresponding components in and as random numbers other components that are not mapped. Bob sets the components corresponding to 134 and 258 as 1 within and other components as 0. is the vector that is chosen from according to the 1 component of . Thus, if the product of all the components of is 1, then ; otherwise, .

Because the domain of sets is not exponentially large, these ideas are insufficiently secure for strict applications. Fortunately, we can obtain a secure scheme using homomorphic encryption. The method can be implemented with a multiplicatively homomorphic encryption scheme, and the method can be implemented with an additively homomorphic encryption scheme. Because multiplicatively homomorphic encryption schemes are usually more efficient than additive ones, we represent our protocol in the next subsection with the method and the ElGamal encryption scheme that has multiplicative homomorphism.

4.2. The Protocol

Protocol 5. Secure subset protocol for sets within a large domain.
Inputs. Alice inputs set , and Bob inputs set .
Output. Whether or not .(1)Alice and Bob negotiate the parameters to construct their Bloom filters.(2)Alice performs the following:(i)generating her private key and its corresponding public key ;(ii)representing her set to a variant Bloom filter and encrypts , (iii)sends and to Bob.(3)Bob computes the following:(i)For to If Return (ii)Bob randomly chooses and evaluates He sends to Alice.(4)Alice decrypts and obtains . If , then ; otherwise, .The ElGamal encryption scheme is multiplicatively homomorphic, and multiplying the ciphertexts is the same as multiplying the corresponding plaintexts. Thus, In step (), if , then ; otherwise, . Thus, does not change the result. However, if Bob does not insert the random number , Alice may obtain more information than she should. This is similar to Protocol 2 (we have omitted details in this paper). To lower the computational complexity, Alice can only encrypt the 1 component in as Protocol 2.

Analogously, Alice can also represent as , and it can encrypt with an additively homomorphic encryption scheme, such as the Paillier encryption scheme. She sends to Bob. Bob computes based on the additive homomorphism and his Bloom filter . He chooses a random number to randomize and obtains Bob sends to Alice. Alice decrypts . If , then ; otherwise, . Thus, they obtain the subset relation.

The successful probability of Protocol 5 is stated in Theorem 6.

Theorem 6. Protocol 5 will succeed with probability .

Proof. According to [38], the probability that a particular component in the Bloom filter is 1 is . Because the variant Bloom filter is similar to the Bloom filter, the probability is also . In Protocol 5, Bob chooses components of to compute the product based on the 1 component of . The product is only if all of the components that Bob chose from encrypt 1. This shows that ; otherwise, . Thus, the successful probability of Protocol 5 is .
The successful probability of Protocol 5 can be increased for important applications. If Alice and Bob choose another set of different hash functions to reexecute Protocol 5 for the same set and , the probability of false positive is also . In addition, these two executions are in series. Therefore, the successful probability is . Thus, the successful probability to execute Protocol 5   times is for the same sets with different hash functions on each occasion.

Corollary 7. Secure subset protocol for sets within a large domain is private.

Based on the Theorem 3, it is easy to prove Corollary 7, and we omit the proof here.

5. Analysis of above Protocols

5.1. Efficiency Analysis

Because the subset protocols of [29, 30] have foundations that are similar to Kissner’s protocol [17] and Kissner’s protocol is more efficient, we give the efficiency comparisons of computational complexity and communication complexity among the protocol of Kissner, Protocols 2 and 5 in this analysis.

Computational Complexity. In Protocol 2, Alice needs encryptions and one decryption. While the messages to be encrypted are 1, each ElGamal encryption takes modular multiplications. For the ElGamal encryption scheme, each decryption takes modular multiplications. Thus, Alice needs modular multiplications. Bob computes using modular multiplications, and it requires to compute for Bob. The computational cost of Bob is modular multiplications. Therefore, the computational overhead of Protocol 2 is modular multiplications ().

Alice encrypts her variant Bloom filter using encryptions during the execution of Protocol 5. Because the components are 1, each encryption takes modular multiplications. Alice also needs to decrypt . Thus, Alice takes modular multiplications. Bob evaluates using modular multiplications. He computes based on taking modular multiplications, and modular multiplications are required during Protocol 5. The total computational cost is modular multiplications () in Protocol 5. Because is a constant, the computational cost is linear in .

In the protocol proposed by Kissner and Song [17], suppose that Alice has a set and Bob has a set and and , where . Alice needs encryptions to encrypt her polynomial in order to obtain the encrypted polynomial and 1 decryption to decrypt the ciphertext . Bob needs modular exponentiations and modular multiplications to evaluate the encrypted polynomial for every element . There are elements within , and this takes Bob modular exponentiations and modular multiplications. For the Paillier encryption scheme, every encryption and decryption require two modular exponentiations, and every modular exponentiation requires modular multiplications. This protocol takes modular multiplications ().

Communication Complexity. We can measure the communication complexity using the exchanged bits or the communication rounds. In secure multiparty computation, the communicating round is widely used. For Protocols 2 and 5 and Kissner’s protocol, each of them involves three communicating rounds.

Based on the above discussion, we summarize the comparison in Table 4. In this table, the modular multiplication for Kissner’s protocol is mod , and for our proposed protocols, it is . In order to achieve the same security, .

5.2. Performance Evaluation

Based on the efficiency analysis described above, the experimental setting and the performance evaluation are shown. Our experiment includes the Kissner protocol, Protocols 2 and 5.

In our implementation, we used the Java programming language to implement these protocols, and the experimental environment was as follows: Windows 10 64-bit operating system, with an Intel(R) Core(TM) i3-2100 CPU @ 3.10 GHz processor, and 4 GB of memory. We set both the Paillier scheme modular and the ElGamal scheme modular to be 1024 bits.

The experimental results of the subset protocols are shown in Figure 1. “KissnerP” is the protocol proposed by Kissner. Both Protocols 2 and 5 are based on the ElGamal encryption scheme, while Kissner’s protocol is based on the Paillier encryption scheme. Because the successful probability of Protocol 5 is , we instantiate in the following implementation.

In Figure 1, we showed that both Protocol 2 and Protocol 5 have a linear computational complexity, while the computational complexity of Kissner’s protocol is quadratic. Thus, our protocols are efficient and practical for large inputs. However, the probability of a false positive of Protocol 5 is . Then, can be chosen to be sufficiently large to make the probability negligible.

6. Conclusion

The subset problem is an important building block in secure multiparty computation, and it has many applications in privacy-preserving problems. In this study, we first presented an efficient subset protocol for sets whose elements are drawn from a known set. For sets whose elements are obtained from a large domain, we further designed an approximated and efficient subset protocol. These protocols have a linear computational complexity in the size of the large set. However, all parties of our protocols are semihonest. In future work, it is necessary to solve similar problems that fall under the malicious model.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work is supported by the National Natural Science Foundation of China (Grant nos. 61272435 and 61373020), Fundamental Research Funds for the Central Universities (Grant no. 2016TS061), the National Foundation Fund of China (201706870028), Natural Science Foundation of Inner Mongolia (Grant no. 2017MS0602), and University Scientific Research Project of Inner Mongolia (Grant no. NJZY17164).