Security and Communication Networks

Security and Communication Networks / 2020 / Article

Research Article | Open Access

Volume 2020 |Article ID 4938736 | https://doi.org/10.1155/2020/4938736

Ying Zou, Yanting Chai, Sha Shi, Lei Wang, Yunfeng Peng, Yuan Ping, Baocang Wang, "Improved Cloud-Assisted Privacy-Preserving Profile-Matching Scheme in Mobile Social Networks", Security and Communication Networks, vol. 2020, Article ID 4938736, 12 pages, 2020. https://doi.org/10.1155/2020/4938736

Improved Cloud-Assisted Privacy-Preserving Profile-Matching Scheme in Mobile Social Networks

Academic Editor: Emanuele Maiorana
Received24 Sep 2019
Revised03 May 2020
Accepted01 Aug 2020
Published21 Sep 2020

Abstract

Due to the transparency of the wireless channel, users in multiple-key environment are vulnerable to eavesdropping during the process of uploading personal data and re-encryption keys. Besides, there is additional burden of key management arising from multiple keys of users. In addition, profile matching using inner product between vectors cannot effectively filter out users with ulterior motives. To tackle the above challenges, we first improve a homomorphic re-encryption system (HRES) to support a single homomorphic multiplication and arbitrarily many homomorphic additions. The public key negotiated by the clouds is used to encrypt the users’ data, thereby avoiding the issues of key leakage and key management, and the privacy of users’ data is also protected. Furthermore, our scheme utilizes the homomorphic multiplication property of the improved HRES algorithm to compute the cosine result between the normalized vectors as the standard for measuring the users’ proximity. Thus, we can effectively improve the social experience of users.

1. Introduction

With the rapid development of Internet technology, mobile devices such as mobile phones and tablets have gradually become popular in people’s daily life in recent years. Some social networks such as WeChat, Twitter, and Facebook are gradually integrated into people’s life, and people would like to share some opinions, pictures, and videos with others. Therefore, people are more willing to find potential friends with similar interests in mobile social networks.

Profile-matching is the most effective way to measure the proximity between users’ personal profiles. The user’s personal profile is often defined as a vector in practical applications, and each dimension of the vector represents an attribute corresponding to a hobby, such as football, photography, and religion. Each attribute value is represented by an integer between 0 and 10 or a larger range. The attribute value indicates the degree of the interest. The value 0 means that the user has no interest in the item, and value 10 represents that the user is particularly fond of it. Moreover, social proximity is often defined as the inner product of two users’ vectors [13]. Since the inner product is the sum of the products of the corresponding attributes between the two vectors, it cannot accurately show the proximity between users. For example, the vector of user is , user is , and user is . User wants to find a person with a higher proximity between user and user . If the inner product of two vectors is chosen as a measurement, it is surprising that user meets higher requirements than user . To get more accurate social proximity results, it is preferred to adopt cosine proximity result between the two normalized vectors as the standard.

In addition, users’ profiles often contain sensitive data and they do not want to expose their personal information. One way to protect the users’ privacy is to encrypt the data by cryptographic technologies, but the structure of raw data will be essentially damaged after encryption, causing difficulties in reprocessing the data. The homomorphic encryption technology [47] has its unique advantages in encrypted data processing. In particular, partial homomorphic encryption technique [8] is more suitable for many realistic applications [9].

In many existing profile-matching works [1, 10, 11], a large amount of interactions are often required between user mobile terminals to obtain matching results, which will bring heavy computational costs and communication overhead to users. In addition, users also need to stay online during the matching process. Fortunately, with the development of cloud computing technology [12, 13], the cloud platform can provide users with huge storage space and abundant computing resources. Outsourcing computing and storage to the cloud can effectively reduce the burden on the users’ mobile terminals. Gao et al. [2] transfer users’ work to the cloud with the help of two cooperative but noncollusive clouds, and users can go offline after uploading their profiles to the cloud. Gao et al.’s scheme utilizes an ElGamal-like proxy re-encryption [14] algorithm with additive homomorphic property, which leads to the issues of key management and the leakage of re-encryption keys. Furthermore, the ElGamal-like algorithm requires that the size of the plaintext cannot exceed 40 bits in order to ensure the decryption efficiency; thus it cannot be applied to the scenarios requiring high data accuracy.

The main contributions of our work are as follows:(i)We improve the HRES algorithm [15] in order to avoid the drawbacks in [2]. The improved algorithm supports one homomorphic multiplication and arbitrarily many homomorphic additions, which can effectively avoid the key leakage and the key management issues caused during users’ uploading re-encryption keys.(ii)This paper utilizes the homomorphic multiplication property of the improved HRES algorithm to compute the cosine result between the normalized vectors as the standard for measuring proximity. Our proposal can effectively ensure the accuracy of the matching results and improve the social experience of the users. Furthermore, the improved HRES algorithm can prove to be semantically secure, and the profile-matching protocol is also secure in the sense that both clouds cannot get useful information about users’ data under the non-collusion security model.

1.1. Related Work

The research on privacy-preserving profile matching can be mainly divided into two categories, coarse-grained [1619] and fine-grained [1, 2, 10] profile-matching schemes.

1.1.1. Coarse-Grained Scheme

In the coarse-grained profile-matching schemes, the matching proximity is often defined as a set intersection or the cardinality of intersections of user’s attribute sets, but this solution cannot further distinguish the specific relevance between users. In 2011, Li et al. [16] proposed two distributed privacy-preserving profile-matching protocols, which deploy the homomorphic property of Shamir secret sharing scheme [20] to calculate the intersections of users’ private sets without relying on a trusted third party. However, the coarse-grained profile matching cannot accurately measure the proximity of users.

1.1.2. Fine-Grained Scheme

For the fine-grained profile-matching schemes [1, 2, 10], user’s preference or behavior pattern is usually regarded as a multidimensional vector, and the social proximity is usually measured by the inner product between two users’ vectors. For example, Zhang et al. [1] designed a fine-grained profile-matching protocol with three security levels. A user can initiate a matching query for their encrypted data with other users and finally obtains the proximity result by utilizing the homomorphic addition property of the Paillier cryptosystem [21]. However, it is required that both users stay online to perform multiple interactions during the execution of protocols, thus imposing a heavy communication and computational burden on mobile terminals. To tackle the above issues, Gao et al. [2] introduced a novel cloud-assisted profile-matching scheme under multiple keys. The cloud environment is composed of two cooperative but non-collusive clouds, and users in the social application could go offline after uploading their encrypted data. A friend finder can designate a target and initiate a matching query to the cloud; then, the two cloud servers return the matching result to the user through interactions. In their scheme, the clouds perform most of the computations, which effectively reduces the burden of users. And the data providers do not have to stay online all the time. However, the scheme [2] utilizes a secure ElGamal-like [14] proxy re-encryption algorithm to encrypt data, so each user needs to generate their secret keys and re-encryption keys. Although schemes in a multiple-key environment could benefit from some existing technologies [22] that could create secure communicating groups within a secure group with exchange of small information, multiple keys increase additional burden of key management in our consideration. Moreover, it seems impossible to guarantee users’ privacy once the user’s re-encryption key is leaked in the process of uploading to the cloud. What is worse, Gao et al.’s scheme uses the inner product of two vectors to measure the matching degree. However, the ElGamal-like algorithm requires the size of the plaintext to be less than 40 bits to make sure that the ciphertext can be efficiently decrypted using Pollard’s kangaroo algorithm [23] to solve the discrete logarithm problem for a relatively smaller integer. Therefore, the scheme fails to provide high precision computations on the users’ data.

1.2. Organization

The rest of the paper is organized as follows. Section 2 gives some notations and the main part of the HRES algorithm [15]. Section 3 introduces the system and adversary models. Section 4 provides the construction of the improved algorithm and the profile-matching scheme. Section 5 presents the security proof of the improved HRES algorithm and the security analysis of the profile-matching protocol. Section 6 draws conclusions.

2. Preliminaries

In this section, we briefly introduce some notations appearing in this paper. We also introduce a fundamental HRES algorithm with two cooperative and non-collusive clouds and a data normalization method.

2.1. Notation

In this paper, we write for assigning x a value X and denote the integer set as . In addition, indicates that is calculated by the algorithm . We use and to denote the Euler and the Carmichael function, respectively. The ring means the residue class ring of integers modulo , i.e., . We use bold lowercase letters to represent vectors, and the Euclidean norm for a vector is denoted as . In addition, the bit length of an integer is denoted as . We use the symbol to denote the ceiling function. Besides, means the inner product of the two vectors and , and we use to denote the encryption function with the public key of the improved HRES algorithm that will be introduced in the next section.

2.2. Homomorphic Re-Encryption

Re-encryption technology supports a proxy transferring decryption authority without decrypting ciphertext [24]. On this basis, homomorphic re-encryption supports homomorphic operation on ciphertext, which is more suitable for data dissemination in networks with special needs. In 2017, Ding et al. proposed a HRES homomorphic re-encryption scheme [15], which includes two non-collusive cloud servers, CA and CB, that jointly manage the ciphertext data. The data owner encrypts his data with the public key generated by negotiation between the two cloud servers, and the ciphertexts can be correctly decrypted only if the two cloud servers cooperate with each other. The HRES consists of the following algorithms.(i)KeyGen. Given a security parameter , let be a safe RSA modulus, where and are primes of the form of and , and and are primes of equal bit length. Let be an element of the maximal order in . The two cloud servers CA and CB, respectively, generate their own key pairs: and . Therefore, CA negotiates with CB to generate their Diffie–Hellman key .(ii)Enc. Given the Diffie–Hellman key and a message , output the ciphertext as follows: , where indicates the ciphertext encrypted with and is a random integer selected from .(iii)Partial Dec1. Given and a ciphertext , the cloud CA can transfer the above ciphertext into another ciphertext as where indicates that the ciphertext can be decrypted with .(iv)Partial Dec2. Given and a ciphertext , the cloud CB can decrypt the plaintext as follows:where the function is defined as .

2.3. Normalization Method

This section mainly introduces a common data standardization method: Z-score standardization method, which standardizes data based on the mean and variance of the original data. The processed data obeys the normal distribution; that is, the mean value of the data is 0 and the standard deviation is 1. The conversion formula is as follows:where represents the processed data, is the mean of the samples and denotes the standard deviation of the samples.

In the proposed scheme, the raw data need to be normalized before processing. The reason for this operation is that the values of data from different sources are quite different. In order to eliminate the influence of different numerical range and make the data comparable, data need to be normalized. Hence, Z-score method is adopted to make the data at the same level, which is convenient for analyzing the data.

3. System and Threat Model

3.1. System Model

As shown in Figure 1, our system model contains three entities: a friend finder (Alice), the cloud environment, and other users. Each entity is described as follows: Alice is marked as a friend finder who wants to find friends that have similar interests with her in the social network. The cloud environment includes two cloud servers, CA and CB, which can provide users with enormous storage space for storing personal profiles and a large amount of computing resources. Through cooperating, CA and CB can help Alice calculate the proximities with other users, and the privacy of the two users will be not compromised. There are many other users in social networks who prefer to outsource the encrypted data representing their preferences to the cloud CA.

Each user registers a personal account in the mobile social network and then fills in the personal information. The personal information contains user’s preferences that can be used as a measurement for profile matching. However, the user does not want to reveal his private data in order to avoid illegal use [25]. Personal profile is often defined as a multidimensional vector , and each dimension represents an attribute corresponding to a hobby, such as cooking and tourism. Each attribute value may be represented by an integer ranging from 0 to 10, or a larger range. The bigger the number is, the more the users like the item will be, and vice versa. To better measure the proximity between users, the cosine value of normalized vectors with Z-score method is taken in this paper.

3.2. Threat Model

In the honest-but-curious model, an external adversary and an internal adversary are considered. An external adversary mainly refers to an eavesdropper who can get some information (e.g., encrypted data) through the transparent channel by eavesdropping. An internal adversary is an honest-but-curious entity such that he faithfully follow the agreement but attempt to collect and reveal private information during the execution of the agreement. The friend finder may want to expose other users’ profiles, while the two clouds may want to reveal the users’ personal data in the social networks. Moreover, it is assumed that the two clouds will never collude with each other and the users will not deliberately attempt to guess the cosine result by adjusting the vector multiple times.

4. Our Construction

In this section, we give the improved HRES algorithm and our privacy-preserving profile-matching scheme.

4.1. Improved HRES Algorithm

In order to support homomorphic multiplication computations, a slight modification has been made on the original HRES algorithm. Here it is required that the two clouds CA and CB will not collude with each other and cooperate to perform decryption operations. The improved algorithm includes the following algorithms: KeyGen, Enc, Partial Dec1, Partial Dec2, and Evaluation.(i)KeyGen. Choose a large prime integer . Note that the multiplicative group has primitive roots of order , and hence the algorithm can randomly choose a generator with order from . Then the two cloud servers CA and CB, respectively, generate their own key pairs: and . Therefore, CA negotiates with CB to generate their Diffie–Hellman key .(ii)Enc. Given the Diffie–Hellman key and a plaintext message , where , output the ciphertext as follows: , where indicates the ciphertext encrypted with and is a random integer selected from .(iii)Partial Dec1. Given and a ciphertext , the cloud CA can transfer the above ciphertext into another ciphertext as , where indicates that the ciphertext can be decrypted with .(iv)Partial Dec2. Given and a ciphertext , the cloud CB can output the plaintext as follows:where the function is defined as .(v)Homomorphic evaluation. We first show that the improved scheme supports homomorphic additions. Given any ciphertexts, namely, for , with the underlying plaintext being for , we firstly show that our improved algorithm supports homomorphic additions.

Specifically, we compute . Note that

From the refreshed ciphertext , the cloud CA can use his secret key to partially decrypt the ciphertext as , where

The cloud CB can further decrypt the ciphertext using his secret key as follows. The cloud CB first computes

Accordingly, it is easy to verifywhere the cloud CB can obtain the plaintext

Note that should hold, i.e., , if is needed. Therefore, our scheme supports homomorphic additions on ciphertexts.

Now we show that the proposal can perform a homomorphic multiplication on two ciphertexts, and , where with and for . Specially, we compute . Note thatand . The cloud CA can partially decrypt the ciphertext as , where and . The cloud CB first computes

Noting that

The cloud CB computes

CB could obtain when . Then, CB calculateswhere the cloud CB obtains the results . Thus, our scheme supports a homomorphic multiplication operation on ciphertexts when .

4.2. Privacy-Preserving Profile Matching

In this part, the improved HRES algorithm is adopted to implement our privacy-preserving profile-matching scheme. Suppose that Alice’s vector is and that Bob’s vector is . The cosine value of the two vectors can be calculated as

In particular, since and are nonnegative integers. Users can encrypt their vectors with the Diffie–Hellman key and then outsource their encrypted data to CA. The procedure for the data outsourcing is presented in Algorithm 1 and the privacy-preserving profile-matching scheme is shown as Algorithm 2.

Input: A user Bob wants to outsource his personal profile to CA and holds a private vector .
Output: The encrypted result , and are sent to CA.
(1)CA executes KeyGen algorithm of the improved HRES algorithm with CB to generate their respective key pairs and their Diffie–Hellman key . Thereafter, is issued to the users in social networks.
(2)Firstly, Bob normalizes his vector with Z-score method and then converts each element of the normalized vector to an integer by multiplying it with a pretreatment public integer and ceiling. Finally, Bob gets his matching vector and calculates , .
(3)Bob negotiates with the CA to run the Diffie–Hellman key exchange protocol to generate a secret random integer .
(4)After encrypting the following values: , , and , Bob uploads these encrypted values to CA.
Input: Alice’s private vector , and the secret keys of CA and CB.
Output: Alice gets the cosine result .
(1)Alice executes the KeyGen algorithm of the improved HRES algorithm with CA to generate their respective key pairs , and their Diffie–Hellman key . Then, is assigned to the cloud CB.
(2)Alice sends her encrypted profile to CA for querying the proximity with Bob in the social networks.(i)Alice also needs to process her normalized vector with the public number to get the result , and calculates , .(ii)Alice negotiates with CA to run the Diffie–Hellman key exchange protocol to generate a secret random integer .(iii)Alice uploads the encrypted values , , and to CA.
(3)CA generates random integers and works out the following formulas:(i).(ii).
Next, CA computes ; executes the Partial Dec1 algorithm to process and with its secret key ; and then sends the processed results to CB.
(4)CB runs the Evaluation algorithm with its secret key to decrypt and gets where .
(5)CB runs the Partial Dec2 algorithm with its secret key to decrypt and compute . Next, CB sends the encrypted values to CA.
(6)CA computes(i).(ii).(iii).
After that, CA generates a random integer and calculates . Finally, CA runs Partial Dec1 algorithm to decrypt and sends the results to CB.
(7)CB computes the obfuscated proximity result and sends it to CA.(i)CB runs the Partial Dec2 algorithm to decrypt and obtains .(ii)CB processes the value with a pretreatment public integer , computes , and then gets an integer number where .(iii)CB computes with the public key and then sends it to CA.
(8)CA computes the formula , runs the Partial Dec1 algorithm with its secret key to process , and sends the results to Alice.
(9)Alice runs the Partial Dec2 algorithm with her secret key and uses to get which is very close to the matching result .

5. Correctness and Security

In this section, we firstly give the correctness analysis of our protocol and then prove that the improved HRES algorithm is semantically secure with rigorous method. At last, we prove that our profile-matching scheme is secure under the semihonest model.

5.1. Correctness

In our scheme, CB can correctly decrypt and obtain the result of without revealing any useful information. The demonstration is shown as follows. Firstly, CA computes

We can verify that

Then, CA generates the random integers to obfuscate , and the final result can be obtained through the following formula:where . Hence, CB can obtain . Then, CB calculates when , and it cannot reveal information about and .

On the other hand, we will prove that Alice can obtain a more accurate cosine result as follows.

Once CA receives , it can remove the blinding values by computing , and then it computes the following formulas:(i).(ii).(iii).

Then, CA obtains with the blinding value . CB decrypts the ciphertext, computes , and then encrypts the ceiling result with . Thus, CA can compute and get . We find that when the pretreatment number is large, is closer to the cosine result .

5.2. Security Proof of Improved Algorithm

In this subsection, the semantic security of the improved HRES algorithm is proved through three theorems. We first prove that the DDH problem in (the cyclic group of modulo ) is hard to solve. Based on this conclusion, the improved HRES algorithm could be semantically secure.

Theorem 1. Let be the cyclic group of modulo , and be a generator of . The discrete logarithm problem (in ) is hard to solve.

Proof. For the sake of contradiction, it is assumed that the discrete logarithm problem (in ) is not difficult; i.e., there exists an oracle that can solve the discrete logarithm problem (in ) in polynomial time. For example, given an input , the oracle returns the index as output.(1)Note that, if is a generator of and satisfies , is also a generator of . Our strategy is as follows. Let , we can assume that there exists an integer that satisfies , such that the equation holds. Multiply simultaneously both sides of the equation by and the equation can be obtained. Thus, it has .Since , can be denoted as where is an integer. Thus, can be written as the following formula:We can obtain . Then, we query the oracle for solving the discrete logarithm problem in to get the result with as input. And of course can be solved out. This implies that, if the discrete logarithm problem in were not difficult, the discrete logarithm problem in turns out to be easy as well. It is contradictory obviously.(2)If is a generator of and satisfies , is also the generator that belongs to both and . The proof process is similar to the above one and we will not describe it here.

Theorem 2. Let be a large prime. Let be the cyclic group of modulo , and be a generator of . The Decisional Diffie–Hellman problem (in ) is difficult.

Proof. For the sake of contradiction, it is assumed that the Decisional Diffie–Hellman (in ) is not difficult. It means that there exists an adversary who can find a random integer such that . We assume that is a random quadruple and is a DH quadruple, where and .(1)Note that, if is a generator of and satisfies , is also a generator of . If can find a random integer that satisfies , can be derived. Hence, equation holds. Besides, since is a generator of , we can also get the equation . We thus state that if there exists an adversary who can distinguish the random quadruple from DH quadruple , can also distinguish and in . However, the DDH assumption in is difficult, so the original hypothesis does not hold. Thus, we can get a conclusion that DDH assumption (in ) is also difficult.(2)If is a generator of and satisfies , then is also the generator that belongs to both and . The proof process is similar to the above one and we will not describe it here.

Theorem 3. If Decisional Diffie–Hellman assumption in holds, the improved HRES algorithm presented in Section 5.1 is semantically secure.

Proof. During the KeyGen phase, the cloud servers CA and CB negotiate with each other to generate their Diffie–Hellman key. Due to the difficulty of discrete logarithm problem in , it is negligible to get any information about, , or for any adversaries.
For the sake of contradiction, it is assumed that the scheme is not semantically secure; i.e., there exists a polynomial time adversary which can break semantic security with nonnegligible probability . constructs a distinguisher that can solve the DDH problem in . The construction is as follows.
Given a challenge quadruple , where , the goal of the distinguisher is to determine or , where is a random integer in . The challenger sends the public key to the adversary . Then, the adversary submits two challenge messages of equal length to the challenger based on his prior knowledge, and then the challenger returns as a challenge ciphertext to the adversary , in which . Finally, the adversary outputs as the guess result. If , the challenger outputs ; otherwise, it outputs . The discussion is as follows:(1)If , then is a valid ciphertext and the probability of adversary guessing correctly is equal to .(2)If , then is independent of the encrypted message because the random value is uniformly and randomly distributed among . Therefore, the probability that the adversary guesses correctly is .As a result, if the distinguisher can break the scheme with a nonnegligible probability , the adversary can attack the DDH assumption in with the same advantage. For the reason that the DDH assumption in is difficult, our improved scheme is semantically secure.

5.3. Security Analysis of Our Protocol

The security analysis of our privacy-preserving profile-matching scheme under the semihonest model will be presented in this subsection with a real and ideal paradigm [2, 26]. For any adversaries who attack a real protocol execution, there exists an adversary who attacks an ideal execution, such that the input and output distributions of the adversary and participants in both the real and the ideal executions are fundamentally the same.

Theorem 4. Our profile-matching scheme described in Section 5 can securely obtain the matching result through the calculations on ciphertexts under the semihonest and noncollusive adversaries.

Proof. In this scheme, there are mainly four parties: Alice, Bob, CA, and CB. We can construct four simulators against four types of adversaries that will corrupt the privacy of Alice, Bob, CA, and CB, respectively.
simulates as follows: After receiving the normalized vector , encrypts and , respectively, to get and . Then, chooses a random integer and encrypts into . randomly picks a vector , computes , and sends the partial decryption result processed by its secret key to . The view of involves the normalized vector , the encrypted results set , and the decryption result . The view of in real and ideal executions is indistinguishable owing to the semantic security of the improved HRES scheme mentioned above.
simulates as follows: After receiving the normalized vector , encrypts and , respectively, to get and . After that, chooses a random integer and encrypts into . Then sends the encrypted results to . The view of involves the normalized vector and the encrypted result set . The view of in real and ideal executions is indistinguishable owing to the semantic security of the improved HRES scheme mentioned above.
simulates as follows: randomly picks two vectors and . Then, encrypts them as , , , and . generates two random integers , ; respectively, obtains the encrypted values ; and partially decrypts the above result. Finally, it generates random integers