Abstract

Privacy-preserving string equality test is a fundamental operation of many algorithms, including privacy-preserving authentication in Internet of Things (IoT). Existing secure equality test schemes can theoretically achieve string equality comparison and preserve the private strings. However, they suffer from heavy computation and communication cost, especially while the strings are of hundreds of bits or longer, which is not suitable for IoT applications. In this paper, we propose an approximate  Fast privacy-preserving equality  Test  Protocol (FTP), which can securely complete string equality test and achieve high running efficiency at the cost of little accuracy loss. We strictly analyze the accuracy of our proposed scheme and formally prove its security. Additionally, we leverage extensive simulation experiments to evaluate the running cost, which confirms our high efficiency; for instance, our proposed FTP can securely compare two -bit strings within seconds on ordinary laptops.

1. Introduction

In recent years, with the growth of privacy concern, privacy-preserving computation [14] receives increasing attention, since various privacy-preserving computation schemes can support computation on private data while keeping the privacy of the involved data. Sensitive data collection and analysis over the encrypted data become the current trend [512]. Based on this situation,   Privacy-preserving  Equality  Test (PET) aims at securely comparing two binary strings which are privately held by two parties. That is, by PET scheme, two participants can securely work out whether their binary strings are exactly equal or not; meanwhile each participant can obtain no useful information about the private binary string of the other participant; even two strings are the same. PET is a significant basic building of many privacy-preserving schemes, such as privacy-preserving authentication [1315], secure comparison of biological characteristics [1618], privacy-preserving machine learning [1921], secure cost comparison in wireless network [22, 23], privacy-preserving threshold schema in recommendation systems [3], attribute comparison in attribute-based encryption [2426], and secure query in cloud [27, 28]. For example, Internet-of-Things (IoT) applications may authenticate users in privacy-preserving manner. For completing the authentication, a user needs to submit his/her authentication credential to IoT system, and the system decides whether the user is legal or not by comparing the user’s authentication credential with authentication information stored in the system database. As privacy concern, the user cannot reveal his/her authentication credential to the system, and the latter can just access them in encrypted form. Meanwhile, to protect the privacy of the IoT system, any user cannot learn useful information of database stored in the IoT system. This dilemmatic problem can be solved by employing a PET protocol.

As its wide applications, several works have devoted to PET recently. Nateghizad et al.’s scheme [29], denoted as NEL16, is the state-of-the-art approach to achieve PET, which is also the most efficient PET method up to now. NEL16 can be viewed as an improved method of Lipmaa and Toft’s PET scheme in [30] denoted as LT13. In LT13 [30], Lipmaa and Toft compute the Hamming distance of two private binary strings in encrypted form. Then, they generate a Lagrange interpolating polynomial that outputs if the input equals and outputs otherwise. Finally, the comparison result is figured out in encrypted form by securely evaluating the Lagrange interpolating polynomial with encrypted Hamming distance as input. Compared to LT13, NEL16 further computes the number of “" of binary representation of the Hamming distance in encrypted form and uses the number of “", instead of the Hamming distance, to evaluate the Lagrange interpolating polynomial. Suppose binary representation of the Hamming distance has bits. The number of “” must be not bigger than , which can be represented by using just bits. While , it always has . Thus, NEL16 requires a lower-degree Lagrange polynomial and can reduce running time. However, NEL16 still cannot achieve practical running efficiency, since computing the number of “" in encrypted form is also time-consuming. As shown in [29], while implementing them on a Linux machine of -bit microprocessor and GB RAM to compare two -bit binary strings, LT13 and NEL16 both cost tens of seconds. Therefore, existing PET schemes still suffer from low efficiency.

In this paper, we propose a new PET scheme, named  Fast privacy-preserving equality  Test  Protocol (FTP), which has high efficiency at the cost of little error rates. In FTP, we randomly convert the original binary strings into shorter ones, then the shorter binary strings are securely compared to decide whether the original ones are the same, by which we can dramatically reduce both computation cost and communication overheads. Although FTP just compares shorter strings, we can ensure the comparison result is exactly correct if the original binary strings are the same or they have an odd number of different bits, and the comparison result has low false-positive rates while they have an even number of different bits. For data privacy, our proposed FTP can achieve provable security, and no private information is disclosed throughout the protocol. In general, our main contributions in this paper can be summarized as follows:(i)We propose a Fast privacy-preserving equality Test Protocol, named FTP, which can achieve much high running efficiency than the state-of-the-art PET schemes. FTP can guarantee an exactly correct comparison result while the involved binary strings are the same or have an odd number of different bits and has a low false-positive rate if the compared strings have an even number of different bits.(ii)We formally prove the security of FTP and can guarantee no privacy is disclosed throughout the proposed protocol.(iii)We strictly analyze the accuracy loss of FTP and leverage extensive experiments to evaluate the running cost. The results indicate that FTP is highly accurate and can dramatically reduce running cost.

The rest of this paper is organized as follows. In Section 2, we describe preliminaries and system model. In Section 3, we present our approximate fast privacy-preserving equality test in detail and theoretically analyze its accuracy loss. In Section 4, we formally prove the security of our scheme, evaluate our running efficiency, and compare our scheme with previous ones. In Section 5, we simply review the related work. At last, we conclude this paper in Section 6.

2. System Model and Preliminaries

2.1. Paillier Encryption System

In [31], Paillier proposes a probabilistic public key encryption scheme with semantic security (Indistinguishability under Chosen-Plaintext Attack, IND-CPA). Its steps are concisely described as follows.

Key Generation. Select two large enough primes and . Then, the secret key is , i.e., the least common multiple of and . The public key is , where and such that , that is, the maximal common divisor of , and equals . Here, .

Encryption. Let be a number in plaintext space . Select a random as the secret parameter, then the ciphertext of is .

Decryption. Let be a ciphertext. The plaintext hidden in is

In Paillier encryption system, it obviously has where denotes the encrypted result of using public key and random secret parameter . That is, the product of ciphertexts of and is a ciphertext of . Thus, Paillier encryption scheme is additively homomorphic. Further, for any , there is i.e., the -th power of is a ciphertext of .

Besides, Paillier cryptosystem has the self-blinding property as it is a probabilistic encryption. For any plaintext , it has and , in which denotes the corresponding decryption function.

Paillier encryption system is a significant secure basic tool of our scheme, which will be utilized to encrypt private data and support necessary computation. For simplicity, we use to denote the ciphertext of encrypted by Paillier cryptosystem, while the random parameter is no need to be pointed out.

2.2. System Model

In this paper, we consider privacy-preserving user authentication in IoT. A user (named Bob) submits a -bit authentication credential to system (named Alice), and the system decides whether the user is legal or not by comparing Bob’s authentication credential with the authentication information stored in the system database. As privacy concern, Bob cannot reveal the authentication credential and authentication result to Alice, and Alice just obtains them in encrypted form. Meanwhile, to protect the privacy of Alice, Bob cannot learn any information of Alice’s database. This dilemmatic problem can be seen as a privacy-preserving equality test (PET) problem as follows.

Privacy-Preserving Equality Test (PET) Problem. PET involves two parties: Alice and Bob. Alice privately hold -bit binary strings and Bob . Here, and can be also considered as two integers that belong to . Besides, Bob has a public key pair of Paillier encryption system, where is public key and is secret key. They want to securely compare and such that only Alice obtains the comparison result in encrypted form; i.e., Alice gains in whichAdditionally, should be privately kept to Alice throughout the protocol, and Bob’s private string cannot be disclosed to Alice or anybody else. Neither Alice nor Bob can learn the real value of .

2.3. Security Model

In this paper, we assume that the participants Alice and Bob are semihonest. It means that each participant follows the protocol correctly but records all the received information in the protocol to infer as much information about the private data of the other participant as possible. In [32], Goldreich gives a formal definition of security against semihonest adversaries, which can be described as follows.

Definition 1 (privacy under semi-honest model [32]). Let be a functionality and (resp. ) denote the first (resp., second) element of . Let be a two-party protocol for computing such that the first (resp., second) party obtains (resp., ). The view of the first (resp., second) party during an execution of on , denoted as (resp., ), is (resp., ), where (resp., ) represents the input of the first (resp., second) party, represents its random number, and represents the -th message it has received. We say that protocol privately computes function , i.e., is secure against semihonest adversaries, if there exist probabilistic polynomial-time algorithms and , such thatwhere represents computational indistinguishability.

2.4. Design Goal

For PET problem shown in Section 2.2, we aim at proposing a new solution to achieve the following security and performance goals.(i)High Accuracy. The protocol should arrive at a correct output with high probability while both participants exactly follow the protocol steps. That is, the solution should be of high accuracy to output a correct comparison result.(ii)Input Privacy. Throughout the protocol, each bit of the private inputs and should be known to its owner only. That it, any useful information about cannot be disclosed to Bob, and cannot be revealed to Alice.(iii)Result Privacy. Both users cannot get the value of result in plaintext, and only Alice can obtain the encrypted output which is encrypted by Bob’s public key.(iv)Efficiency. The protocol needs to employ a sublinear number of public key encryption and decryption such that it can achieve high running efficiency even while and are of hundreds of bits.

2.5. Review of LT13 Scheme

In this following, we will simply introduce the previous PET schemes LT13 [30].

Generally, LT13 consists of two stages: Computing the encrypted Hamming distance between and such that only Alice learns . During the first stage, Bob uses the public key to encrypt his private bit for to and sends each to Alice. Then, based on (7) and the additively homomorphic property of Paillier encryption scheme,Alice can obtain the encrypted Hamming distance where if and if .

Computing the final result which is also known to Alice only. To this end, they first select a -degree public Lagrange interpolation polynomial that satisfies .

Namely, we can correctly attain the output by setting , since . Second, Alice sets , i.e., , and where , is randomly selected from , and is the large integer in the public key. After that, will be sent to Bob, who decrypts , encrypts , and returns the ciphertext to Alice for . Finally, Alice can gain and , which is exactly because .

As can be seen, for a larger , LT13 needs more computation and communication cost. While , LT13 uses tens of seconds [29], which is far away from being practical. In this paper, we will introduce a new PET scheme which can reduce the number of invoking Paillier encryption system and thus dramatically lessen running cost at the expense of small accuracy loss.

3. Privacy-Preserving Equality Test

Assume is a uniform random vector from . For two binary strings and , if setting , we have the following observations. Here, we use to denote the number of different bits of and . It is easy to say .

Observation 1. If , then always equals .

Proof. While , each equals ; thus .

Observation 2. If and is odd, then it must be .

Proof. Without loss of generality, we assume the first bits of and are different from each other. That is, for to , and for to .
In this case, for to and for to . Then, , in which each . Since is odd, the number of is impossibly equal to that of .
Therefore, it must be , which completes the proof.

Observation 3. If and is even, suppose , then with the probability , and correspondingly with the probability .

Proof. Without loss of generality, we assume that for to and for to . Then, for to and for to . Further, we have .
Let , denote the number of , , respectively, for to . Then, . Hence, iff . As is uniformly randomly selected from , the probability of is , where denotes choose . Then, the probability of is Accordingly, the probability of is It completes the proof.

Observations 1, 2, and 3 show we can approximatively determine by comparing and . Besides, we have since and . Then, we can get an approximative scheme for securely comparing and with high efficiency as follows.

Basic Approach. Alice selects numbers uniformly at random and computes . Then, Alice sets a -bit binary vector where if and otherwise and sends to Bob. While receiving , Bob can locally compute . Finally, Alice and Bob utilize LT13 [30] to securely compare private numbers and such that Alice gains , i.e., Alice obtains if and otherwise.

As , it has and . For the security Bob cannot learn any information about from , because is uniformly randomly selected from . Since , thus . Hence, iff . That is, the basic scheme substantially determines by checking . We will analyze accuracy of the basic approach in Theorem 2.

Due to ; thus and can be represented by using bits, in which bits represent the value of , and one bit is used to denote their sign. While , it always is . For example, when , we have . Therefore, our basic scheme can dramatically reduce the running cost.

Theorem 2. For Alice and Bob’s binary strings and , when , let the probability of be for , where and . Namely, denotes the condition probability . For simplicity, suppose each is identical, i.e., each . Then, for the basic scheme, we have
if , the basic approach always arrives at a correct result, i.e., Alice always gains .
if , the basic approach returns a false result (i.e., Alice gains ) with the condition probability in average, and correspondingly the basic scheme returns a correct result (i.e., Alice gains ) with the probability . Besides, it hasTo simplify, we use to denote the probability , i.e., .

Proof. If , it always has according to Observation 1. As , then holds. Thus, Alice will gain , i.e., the basic scheme will get an exactly correct result.
If , the correct result is . Thus, the basic will correctly complete the comparison only when . According to (9), we have iff . Hence, in this situation, the probability that the basic scheme returns a correct result equals the condition probability . Correspondingly, the basic scheme returns a false result with the probability . Since may be to while , then . On account of Observation 2, if is odd, it always has , i.e., the probability for each odd . Besides each ; therefore,Observation 3 has shown . As a result, which completes the proof.

Theorem 3. Let the functions be where , , and and are integers. We have
for any ,
if is even, then and ,
if is odd, then .

Proof. According to the setting, we have Then, Therefore, holds.
It is easy to say If is even, assume , then and . Hence,Besides, Since , we have . Thus, .
As a result, and both are proved.
If is odd, assume , then and . Then, Since , we have That is, Because and , it has and . Besides, is an even integer. We have proved . Thus,Consequently, is correct. It completes the proof.

As we can see, . Theorem 3 shows that has a downward trend, as increases. While , the error probability which may be too high for real applications. We can reduce the probability by generating multiple and with different random vector , which can exponentially reduce the error probability. For example, if we use double and , then the error probability will be . Even when , the error probability will be around. Figure 1 shows the error probability while ranges from to . It indicates our error probability will be smaller than while the bit length is larger than . When using our scheme to compare two -bit binary strings, the error probability will be about only.

In general, the details of our scheme with double and are formally shown in Protocol 1. First, Alice randomly generates and shares with Bob. Second, Alice locally computes and , and Bob gains and . They decide iff and . Third, they use -bit and to represent and , respectively. Finally, Alice and Bob utilize the similar methods of LT13 to securely compare and such that Alice gains .

Input: Alice privately holds a -bit binary string , and Bob holds a private -bit binary string
and the public key pair () of Paillier encryption scheme where and are public key and private key, respectively.
Output: Alice obtains where , and Bob learns nothing.
1: Alice generates two random -dimension vectors . For each , let and denote the -th dimension of
and , respectively.
2: Alice computes two -dimension binary vectors such that their -th bits and satisfy if ,
otherwise .    if , otherwise . Alice sends and to Bob.
3: Alice computes and . Then, Alice utilizes bits to represent , , respectively. The first bit
denotes the sign where denotes positive and denotes negative, and the latter bits represents the their absolute value.
Let . We use to denote the bits of and , in which the first
bits correspond to , and the latter bits correspond to .
4: Bob computes and . Then, Bob utilizes bits to represent , , respectively.
The first bit denotes the sign where denotes positive and denotes negative, and the latter bits represents the their
absolute value. Similarly, is used to denote the bits of and , in which the first
bits correspond to , and the latter bits correspond to .
5: For to , Bob uses his public key to encrypt each private bit , and sends to Alice.
6: Alice computes encrypted Hamming distance , where if and
if .
7: Alice and Bob select a -degree public Lagrange interpolation polynomial in which is the large integer
in the public key, such that satisfies . Namely, we can correctly attain the output by
setting , since .
8: Alice sets , i.e. , and where and is randomly selected from . After
that, Alice sends to Bob.
9: Bob decrypts , encrypts , and returns the ciphertext to Alice for .
10: Alice computes , and further gains the final output

4. Analysis Evaluation

4.1. Security

We prove the security of our proposed scheme FTP through the following Theorem 4.

Theorem 4. Our proposed scheme FTP discloses nothing useful about the privacy of input values and the final result.

Proof. We will discuss the view of Alice and Bob, respectively.
In our scheme FTP, Alice receives , for Based on IND-CPA security of Paillier encryption system [31], Alice can learn nothing useful about and . Thus, Bob’s private data can be securely preserved.
Throughout FTP, Bob learns just , and . For each bit in , it has if ; otherwise . That is, . Each is unknown to Bob, and we can simply assume Pr Pr for the view of Bob. Hence, for any , conditional probability Pr Pr, which means Bob can learn nothing about from . Similarly, it is provable that discloses nothing about . For , based on the additive homomorphic property, we have . As is randomly selected from , Bob can infer no information about from . In general, , and reveal nothing of Alice’s private data.
To sum up, the privacy of Alice and Bob both can be preserved in our scheme FTP, which completes the proof.

4.2. Computation and Communication Cost

In this section, we will analyze the computation complexity and communication overheads of our proposed FTP in detail.

Computation Complexity. Since simple addition and multiplication are much cheaper than encryption, decryption, and ciphertext multiplication of Paillier cryptosystem, we will ignore the simple addition and multiplication in the protocol. Throughout FTP, Bob encrypts each and for to and decrypts one times to gain . Alice uses and to compute and , which requires ciphertext multiplication times. In total, both Bob and Alice just employ Paillier encryption system times.

Communication Overheads. In our scheme FTP, Alice and Bob need to transmit , and for to . If each ciphertext is -bit, then the total communication overheads are . While and we set the public key of Paillier encryption system to be bits, the communication overheads will be bits.

4.3. Experiment Results

We implement our scheme and two existing efficient algorithms: LT13 and NEL16, using C language. During executing our scheme, we utilize GMP library [33] and Paillier library [34] with key size of bits. All experiments are performed on an Apple computer with macOS Sierra 10.12.6, Intel Core i5 1.6GHz CPU and 4 GB memory. Alice and Bob communicate through the socket where ping time is about seconds.

Figure 2 shows the runtime of LT13, NEL16, and our scheme FTP while the compared string is of to bits. As can be seen, FTP can dramatically reduce the running time compared to LT13 and NEL16. When the length is , LT13 costs about seconds, NEL16 takes seconds around, and FTP just needs seconds. While the length is larger, the advantage of FTP will be more salient. The main reason is that we transform the original , into , which is much shorter than the original ones. More importantly, our transformation just involves simple addition and multiplication and can be completed rapidly. In FTP, Paillier encryption system is employed only to securely compare and . Therefore, FTP can reduce the running cost, especially when is large. If the bit length is smaller than , FTP has no significant advantages on running time, and LT13 or NEL16 is suitable for the short-string equality comparison scenario.

4.4. Improvement

Though our scheme FTP can reduce the cost, it still takes bits to transmit the vector or . We can further improve the scheme to avoid transmitting or . Let be a pseudorandom function. Alice and Bob, in advance, select a constant . While they decide to compare the private binary vectors, they can separately generate a random binary string where denotes the time they decide to implement the protocol and denotes concatenation. Then, they set in which denotes the -th bit of . Since and , Alice can locally get . Thus, Alice and Bob can compute and , respectively. By this method, Alice need not to send the vector again. For , they can preestablish another constant and use it avoid transmitting by a similar method.

Privacy-preserving string equality test is one of secure multiparty computation (SMC) problems, and it has wide applications in various privacy-preserving scenes [3538]. Up to now, a big number of works can be utilized to achieve privacy-preserving string equality test. We simply discuss the previous schemes as follows.

In 1982, Yao [39] proposes the first SMC problem, Millionaire problem and gives a secure solution. After that, garbled circuits method [32, 40] is put forward to securely evaluate a general function. Nevertheless, the general approach is too expensive and can just theoretically solve the problem. Scalar product protocol (also known as dot product protocol) focuses on computing the scalar product of two private vectors with privacy-preservation. Privacy-preserving string equality test can be achieved by invoking scalar product protocol. We thus review the main solutions of scalar product protocol. In [41], Vaidya et al. proposed a scalar product protocol based on algebraic transformation. By using homomorphic encryption, two solutions for securely computing dot product of private vectors are given in [42] and [43], respectively. A polynomial secret sharing-based scalar product protocol is presented by Shaneck and Kim [44]. Nevertheless, the schemes either are not provably secure or have heavy computation and communication overheads. Recently, Zhu et al. propose two efficient solutions for secure scalar product protocol [45, 46], which can be utilized to securely compute the Hamming distance of two private strings but cannot support the distance comparison. Cheng et al. [47] review the approaches to secure Internet of Things in a quantum world. In [48], Li et al. leverage Paillier encryption to achieve secure comparison protocol, based on which they also propose a secure SVM classification scheme. Nevertheless, the comparison scheme in [48] focuses on securely figuring out the bigger one from two private integers but cannot directly support the equality comparison problem investigated in this paper.

In [30], Lipmaa and Toft propose a secure string equality test scheme based on Paillier encryption scheme [31]. While comparing -bit strings, Lipmaa and Toft’s scheme requires encryption of Paillier encryption system and thus is time-consuming. Nateghizad et al. [29] improve Lipmaa and Toft’s scheme by reducing the degree of Lagrange interpolation polynomial. As yet, the number of invoking Paillier encryption in Nateghizad et al.’s solution is also linear with , which is not suitable for a large either. In general, the existing privacy-preserving string equality test schemes are still far away from being practical.

6. Conclusions

In this paper, we considered efficient and privacy-preserving authentication in IoT applications. To this end, we proposed a new privacy-preserving equality test protocol, which can securely complete string equality test and achieve high running efficiency at the cost of little accuracy loss. We strictly analyzed the accuracy of our proposed scheme and formally proved our security. Additionally, we leveraged extensive simulation experiments to evaluate the running cost, which confirms our high efficiency.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work is partly supported by the National Key Research and Development Program of China (no. 2017YFB0802300), the Natural Science Foundation of China (no. 61602240), the Natural Science Foundation of Jiangsu Province of China (no. BK20150760), Research Fund of Guangxi Key Laboratory of Cryptography and Information Security (no. GCIS201723), and Postgraduate Research & Practice Innovation Program of Jiangsu Province (no. KYCX18_0305).