Security and Communication Networks

Volume 2017, Article ID 9717580, 11 pages

https://doi.org/10.1155/2017/9717580

## Efficient Secure Multiparty Subset Computation

^{1}School of Computer Science, Shaanxi Normal University, Xi’an 710062, China^{2}School of Mathematic and Information Science, Shaanxi Normal University, Xi’an 710062, China

Correspondence should be addressed to Shundong Li; nc.ude.unns@gnodnuhs

Received 7 January 2017; Revised 21 April 2017; Accepted 11 June 2017; Published 28 August 2017

Academic Editor: Jimson Mathew

Copyright © 2017 Sufang Zhou et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Secure subset problem is important in secure multiparty computation, which is a vital field in cryptography. Most of the existing protocols for this problem can only keep the elements of one set private, while leaking the elements of the other set. In other words, they cannot solve the secure subset problem perfectly. While a few studies have addressed actual secure subsets, these protocols were mainly based on the oblivious polynomial evaluations with inefficient computation. In this study, we first design an efficient secure subset protocol for sets whose elements are drawn from a known set based on a new encoding method and homomorphic encryption scheme. If the elements of the sets are taken from a large domain, the existing protocol is inefficient. Using the Bloom filter and homomorphic encryption scheme, we further present an efficient protocol with linear computational complexity in the cardinality of the large set, and this is considered to be practical for inputs consisting of a large number of data. However, the second protocol that we design may yield a false positive. This probability can be rapidly decreased by reexecuting the protocol with different hash functions. Furthermore, we present the experimental performance analyses of these protocols.

#### 1. Introduction

The prompt development of networks provides a great opportunity for multiparty cooperative computation, and it challenges the privacy of the participants’ information. In a complex network environment, parties may not trust each other during computations, and they are required to keep their information private. Secure multiparty computation is a key technology for privacy-preserving in cooperative computations. Thus, secure multiparty computation attracts increasing attention in the international cryptographic community.

Secure multiparty computation was first introduced by Yao [1] as a millionaires’ problem in 1982. The millionaires’ problem can be described as follows. Two millionaires, Alice and Bob, want to know who is richer, but neither Alice nor Bob wants to disclose her/his own wealth to the other. This is a secure two-party computation problem. After this, Ben-Or et al. [2] gave the first secure multiparty computation protocol. A secure multiparty computation involves any two or more parties who use their own private data to cooperatively compute a function in order to obtain the predetermined output while keeping their input information private. Secure multiparty computation is a general cryptographic protocol. Many cryptographic protocols for cooperative computations that contain two or more parties can be viewed as secure multiparty computation protocols, and these include key exchange protocols [3], digital signature protocols [4], secret sharing protocols [5], zero-knowledge proof protocols [6], and oblivious transfer protocols [7]. Secure multiparty computation is a key technology in network security, and it has been the focus of the international cryptographic community for many years. The Turing Award winner Goldwasser [8] predicted that “the field of multiparty computations is today where public key cryptography was ten years ago, namely, an extremely powerful tool and rich theory whose real-life usage is at this time only beginning but will become in the future an integral part of our computing reality.”

Goldreich et al. [9, 10] thoroughly studied the secure multiparty computation problem and established its theoretical foundation. They proved that secure multiparty computation problems are theoretically solvable and proposed a general solution to secure multiparty computation problems. Because the general solution is inefficient and impractical for special problems, they also noted that, to improve efficiency, special solutions should be developed for special problems. This observation motivates people to study solutions to various secure multiparty computation problems. The problems studied include millionaires’ problems [11, 12], secure computational geometry problems [13], comparisons of information without it being leaked [14], private bidding and auction problems [15], and privacy-preserving data mining problems [16]. In addition, there are many other new secure multiparty problems that need to be studied.

Because many problems can be abstracted as set problems, private set operation is a highly important field in secure multiparty computation. These problems include set intersection [18], set union [19], and subsets [17]. The set intersection problem and the set union problem have been widely studied, while there are only few studies of the subset problem. However, there are a variety of applications for the subset problem.(1)In data mining, there is an important principle (Apriori Principle) about the association rule, which states that if an itemset is frequent, then all of its subsets must also be frequent [20]. Suppose that both Alice and Bob are suppliers of a supermarket . Alice has a large frequent itemset that is generated with data mining from the transactions of . Bob has an itemset , and he wants to know whether is also a frequent itemset. However, he cannot perform data mining on the transaction data of (either he cannot obtain the transaction data or he does not have data mining knowledge). Therefore, he resorts to Apriori Principle, but he does not want to disclose to Alice. As expected, Alice also wishes to keep a secret. In this application, they have to privately determine whether .(2)In secret sharing, a secret is divided into shares, and they are privately given to parties who are called the legal shareholders, and any or more shareholders can reconstruct the secret. During the reconstruction of the secret, some illegal shareholders may take part in the reconstruction. To prevent illegal shareholders from taking part in the reconstruction, the authenticity of the shareholder participants must be privately determined. This is where the secure subset protocol comes into play.

It is generally known that the subset problem is a special case of the set intersection. However, when applied to solve the subset problem, existing set intersection protocols can lead to both insecure and inefficient solutions. For the subset problem, we only need to determine whether . Meanwhile, the intersection protocols have to compute every element where . This method will first disclose the same elements between and for the subset problem. Furthermore, the subset problem is a decision problem, and it does not need to compute all the elements of . Thus, the set intersection protocols are not suitable for the subset problem.

If there are two sets and , where , in most current studies, many private subset operations can be classified into two different cases. First, two parties proved that , leaking the elements of set [21–23]. Second, two parties proved that without keeping the privacy of the elements of set [24–27].

In addition, Kissner and Song [17] proposed a secure solution to the subset problem based on the Paillier additively homomorphic encryption scheme [28], the representation of elements of a set as roots of a polynomial, and the mathematical properties of polynomials. In their proposed solution, both sets and can be kept private. Let be the encryption of the polynomial that represents the larger set . Note that if , then is true for every element (or vice versa). That is, . The party who has the smaller set evaluates the encrypted polynomial at each element to obtain ciphertexts, and it multiplies these ciphertexts to obtain . If is an encryption of , then . However, the computational complexity of this protocol takes (, ) modular multiplications (mod , details are presented in Section 5.1). This depends on the product of and . However, the protocol is inefficient for the computation of a large quantity of data.

Furthermore, Ye et al. [29] and Sang and Shen [30] separately gave their subset protocols, which are mainly based on the oblivious polynomial evaluations, and which are similar to Kissner’s protocol. The subset protocol of [29] was presented in the distributed setting. By using Shamir’s secret sharing scheme, the polynomial constructed based on the larger set was distributed to multiple servers. The party who had the smaller set interacted with at least servers to compute the subset problem based on the standard variant of the ElGamal encryption. The overall cost for the computation is , and the communication is bits. In the subset protocol of [30], Sang utilized a nonmalleable NonInteractive Zero-Knowledge (NIZK) argument, which is based on the Boneh-Goh-Nissim (BGN) cryptosystem to protect it against malicious attacks. Without considering the computational complexity of malicious attacks, the computational complexity of this protocol is besides the NIZK argument. Meanwhile, our protocols have a linear computational complexity in the cardinality of the large set (details are presented in Section 5.1).

Moreover, Blanton and Aguiar [31] created an efficient subset protocol based on the oblivious algorithms, such as oblivious sorting algorithms and oblivious equality algorithms. Unfortunately, this protocol is constructed using the circuit method and has the drawbacks of the circuit method [32].

Shundong et al. [12] described a secure subset protocol that retains the privacy of both sets and , and it is based on symmetric cryptography and has high efficiency. However, the smaller set can only have one element in the protocol. If set has more than one element, the parties have to execute the protocol times and choose new pseudorandom sequences on each occasion, which is tedious.

In this study, we mainly propose two secure subset protocols for different situations using homomorphic encryption schemes which can be multiplicative or additive. Because a multiplicatively homomorphic encryption scheme is more efficient than an additive one, we choose a multiplicative one to build our protocols. To the best of our knowledge, encryption schemes can currently encrypt only integer messages. In addition, the sets to be computed always come from a known set whose elements are not integers for many often-occurring ranges. For this case, we design an efficient protocol, which is based on a new encoding method, and a homomorphic encryption scheme. The computational complexity of this protocol is linear in the size of the large set. For the situation in which the sets are taken from a large domain, we further present an efficient protocol based on Bloom filters and a homomorphic encryption scheme to improve efficiency without compromising accuracy much. Furthermore, we show that, by using the Bloom filter, we can solve the subset problem for sets that are taken from an exponentially large domain.

The rest of this paper is organized as follows. In Section 2, we introduce some preliminaries. In Section 3, we propose an efficient secure subset protocol for sets whose elements are drawn from a known set using a new encoding method and homomorphic encryption schemes. In Section 4, we show the secure subset protocol for sets within a large domain based on the Bloom filter and homomorphic encryption schemes, while in Section 5, we present an analysis of secure subset protocols and the experimental implementation. Finally, in Section 6, we conclude this paper.

#### 2. Preliminaries

##### 2.1. Secure Subset Problem

Alice has a set , and Bob has a set . Alice and Bob want to determine whether is a subset of without disclosing any information about the elements of their sets relative to each other. This can be abstracted as a secure subset problem.

##### 2.2. Homomorphic Encryption Scheme

A homomorphic encryption scheme is an encryption scheme with some special properties that make the homomorphic encryption scheme a building block of many secure multiparty computation protocols. A conventional public key encryption scheme consists of three algorithms: , , and .(i). takes a security parameter as the input, and it outputs a secret key and the corresponding public key with the definition of the plaintext space and the ciphertext space . (ii). Taking and a plaintext as inputs, outputs a ciphertext . (iii). Taking a ciphertext and the secret key as inputs, outputs the plaintext . In addition to the three conventional algorithms, a homomorphic encryption scheme has an efficient algorithm , which takes as inputs the public key , an operation , and a tuple of ciphertexts ( is the ciphertext of , ), and it outputs a ciphertext of .

Our construction uses semantically secure public key encryption schemes that preserve the group homomorphism under some computational complexity assumptions. This property is obtained by the Paillier encryption scheme [28] and the ElGamal encryption scheme [33] under the Composite Residuosity Class (CRC) assumption and the Computational Diffie-Hellman (CDH) assumption, respectively. Details are presented as follows.

*Pailler Encryption Scheme*

*(i) KeyGen*. On inputting a security parameter , this algorithm generates two large primes , sets , and and computes such that , where is defined as is the ciphertext space, and is the plaintext space. The public key is , and the private key is .

* (ii) Encrypt*. To encrypt plaintext , the algorithm selects a random number and computes

* (iii) Decrypt*. To decrypt the ciphertext , the algorithm computes

* (iv) Evaluate*. For ciphertexts , , and and a constant , we have

In this encryption scheme, if , then .

*ElGamal Encryption Scheme*

*(i) KeyGen*. On inputting a security parameter , the algorithm generates a large prime and a generator , and it randomly chooses a number as a private key. The public key is .

*(ii) Encrypt*. Taking and as inputs, the algorithm selects a random number and computes

*(iii) Decrypt*. This algorithm takes and as inputs and computes

*(iv) Evaluate*. Given ciphertexts , and and a constant , we can compute that

In this encryption scheme, if , then .

These two schemes are semantically secure under the CRC assumption or the CDH assumption. That is, given two messages and , as well as a ciphertext encrypted by these encryption schemes, no probabilistic polynomial-time algorithm can determine whether the ciphertext is a ciphertext of or with nonnegligible advantages.

##### 2.3. Security of Secure Multiparty Computation

We assume that all parties are semihonest. In general, a semihonest party follows the prescribed protocol correctly, except that it keeps a record of all its intermediate computations and may try to derive the other party’s private inputs from the record. Goldreich [10] also designed a compiler that can force each party to either behave in a semihonest manner or be detected. Given a protocol , which privately computes function in the semihonest model, this compiler can produce a new protocol , which privately computes in the malicious model. This work demonstrates that the study based on the semihonest model is very important. Therefore, our work focuses on solutions to the subset problem in the semihonest model.

Different methods are used to prove the security in different cryptographic fields. The proof method, which reduces the security to a difficult assumption in the standard model or the random oracle model, is suitable for verifying encryption schemes and signature schemes. The simulation paradigm is widely accepted and is used to prove the security of secure multiparty computation protocols. The basic idea behind the simulation paradigm is to compare a real secure multiparty computation protocol with an ideal one. The real protocol is considered as secure if the real secure multiparty computation protocol does not leak more information than the ideal one. The ideal secure multiparty computing protocol can be described as follows.

Assume that there is an absolute trusted third party, denoted by Trent, who will neither lie nor leak any information that should not be revealed. Alice has a number , Bob has a number , and they want to securely compute a function . They can do as follows: (a) Alice and Bob, respectively, send and to Trent, (b) Trent computes the function , and (c) Trent tells Alice and Bob the result.

Because most secure multiparty computation protocols are constructed using public key encryption schemes, the security proof for a secure multiparty computation protocol is to reduce the security of the protocol to the security of the public key encryption scheme on which the protocol is based. That is, to prove that a multiparty computation protocol is secure, we must prove that the real secure multiparty computation protocol does not leak more information than the ideal protocol with the assumption that the public key encryption scheme used in the real protocol is secure. In other words, the information that a party obtains in a real secure multiparty computation protocol can be simulated by a simulator that only obtains the result and one party’s input, and if the sets of information obtained from both methods are computationally indistinguishable, the real protocol is secure.

Intuitively, a protocol that computes is secure if whatever a set of semihonest parties can obtain after participating in the protocol could be obtained from the inputs and outputs of these same parties. In the simulation paradigm, this means that the (this will be discussed later) of a set of semihonest parties during a protocol execution can be simulated by their inputs and outputs.

Suppose that there are two parties Alice and Bob who have sets and , respectively. They want to privately compute , which is a polynomial-time function. Further, suppose that is a protocol-computing function . The VIEW of Alice, who has the set , during the execution of on the input , is denoted by , where is the result of Alice’s internal coin tosses, and is the -th message that Alice received. The output of Alice after the execution of is denoted as , which is implicit in Alice’s VIEW. Similarly, Bob’s VIEW and output during the execution of are and .

*Definition 1 (security in the semihonest model [10]). *For a function , we say that privately computes if there exist two probabilistic polynomial-time simulators, denoted by and , such that where denotes* computational indistinguishability*.

#### 3. Protocol for Sets Whose Elements Are Drawn from a Known Set

Suppose that Alice has a set and Bob has a set . A straightforward way to compute the subset problem between and , without worrying about the privacy, is as follows: Alice sends her set to Bob; Bob computes whether ; then tells the result to Alice. Thus, Alice and Bob obtain the subset relation between and .

By the definition of subset, if , then for any element . Thus, we can reduce the subset problem to checking whether all the elements of set are in set . If all the elements of are the elements of , then ; otherwise, .

Suppose Alice and Bob have sets , (), respectively. They want to determine whether or not without disclosing either or .

##### 3.1. Foundations of This Protocol

Before we describe the idea of our protocol, we first present the building blocks— a 1- encoding method and a 1-0 encoding method—based on the definition of the characteristic vector of mathematics.

*1- ** Encoding.* A 1- encoding is used to encode a set to a 1- vector, where every component is either 1 or , where is a random number and . The principle for encoding a set to a 1- vector is as follows: if , then ; otherwise, . This can also be described by the following pseudocodes: For to If Else End

*1-0 Encoding.* This method is similar to a 1- encoding, but with a small difference. Encoding a set to a 1-0 vector is as follows: if , then ; otherwise, . This can also be described by the pseudocodes as follows: For to If Else End

From a high-level perspective, the 1- encoding (1-0 encoding) encodes an () with a one component and an () with a random (zero) component. Alice and Bob can use the above encoding methods to compute the subset problem.

Alice encodes set to a 1- vector , and Bob encodes set to a 1-0 vector . Alice sends her vector to Bob. Bob chooses the components of corresponding to the one components of and computes their product , . If , then ; otherwise . This is the principle of deciding the subset relation between sets and . For simplicity, we give a simple example in Table 1. is the vector that is chosen from vector according to the one components of .