Abstract
Due to the transparency of the wireless channel, users in multiplekey environment are vulnerable to eavesdropping during the process of uploading personal data and reencryption keys. Besides, there is additional burden of key management arising from multiple keys of users. In addition, profile matching using inner product between vectors cannot effectively filter out users with ulterior motives. To tackle the above challenges, we first improve a homomorphic reencryption system (HRES) to support a single homomorphic multiplication and arbitrarily many homomorphic additions. The public key negotiated by the clouds is used to encrypt the users’ data, thereby avoiding the issues of key leakage and key management, and the privacy of users’ data is also protected. Furthermore, our scheme utilizes the homomorphic multiplication property of the improved HRES algorithm to compute the cosine result between the normalized vectors as the standard for measuring the users’ proximity. Thus, we can effectively improve the social experience of users.
1. Introduction
With the rapid development of Internet technology, mobile devices such as mobile phones and tablets have gradually become popular in people’s daily life in recent years. Some social networks such as WeChat, Twitter, and Facebook are gradually integrated into people’s life, and people would like to share some opinions, pictures, and videos with others. Therefore, people are more willing to find potential friends with similar interests in mobile social networks.
Profilematching is the most effective way to measure the proximity between users’ personal profiles. The user’s personal profile is often defined as a vector in practical applications, and each dimension of the vector represents an attribute corresponding to a hobby, such as football, photography, and religion. Each attribute value is represented by an integer between 0 and 10 or a larger range. The attribute value indicates the degree of the interest. The value 0 means that the user has no interest in the item, and value 10 represents that the user is particularly fond of it. Moreover, social proximity is often defined as the inner product of two users’ vectors [1–3]. Since the inner product is the sum of the products of the corresponding attributes between the two vectors, it cannot accurately show the proximity between users. For example, the vector of user is , user is , and user is . User wants to find a person with a higher proximity between user and user . If the inner product of two vectors is chosen as a measurement, it is surprising that user meets higher requirements than user . To get more accurate social proximity results, it is preferred to adopt cosine proximity result between the two normalized vectors as the standard.
In addition, users’ profiles often contain sensitive data and they do not want to expose their personal information. One way to protect the users’ privacy is to encrypt the data by cryptographic technologies, but the structure of raw data will be essentially damaged after encryption, causing difficulties in reprocessing the data. The homomorphic encryption technology [4–7] has its unique advantages in encrypted data processing. In particular, partial homomorphic encryption technique [8] is more suitable for many realistic applications [9].
In many existing profilematching works [1, 10, 11], a large amount of interactions are often required between user mobile terminals to obtain matching results, which will bring heavy computational costs and communication overhead to users. In addition, users also need to stay online during the matching process. Fortunately, with the development of cloud computing technology [12, 13], the cloud platform can provide users with huge storage space and abundant computing resources. Outsourcing computing and storage to the cloud can effectively reduce the burden on the users’ mobile terminals. Gao et al. [2] transfer users’ work to the cloud with the help of two cooperative but noncollusive clouds, and users can go offline after uploading their profiles to the cloud. Gao et al.’s scheme utilizes an ElGamallike proxy reencryption [14] algorithm with additive homomorphic property, which leads to the issues of key management and the leakage of reencryption keys. Furthermore, the ElGamallike algorithm requires that the size of the plaintext cannot exceed 40 bits in order to ensure the decryption efficiency; thus it cannot be applied to the scenarios requiring high data accuracy.
The main contributions of our work are as follows:(i)We improve the HRES algorithm [15] in order to avoid the drawbacks in [2]. The improved algorithm supports one homomorphic multiplication and arbitrarily many homomorphic additions, which can effectively avoid the key leakage and the key management issues caused during users’ uploading reencryption keys.(ii)This paper utilizes the homomorphic multiplication property of the improved HRES algorithm to compute the cosine result between the normalized vectors as the standard for measuring proximity. Our proposal can effectively ensure the accuracy of the matching results and improve the social experience of the users. Furthermore, the improved HRES algorithm can prove to be semantically secure, and the profilematching protocol is also secure in the sense that both clouds cannot get useful information about users’ data under the noncollusion security model.
1.1. Related Work
The research on privacypreserving profile matching can be mainly divided into two categories, coarsegrained [16–19] and finegrained [1, 2, 10] profilematching schemes.
1.1.1. CoarseGrained Scheme
In the coarsegrained profilematching schemes, the matching proximity is often defined as a set intersection or the cardinality of intersections of user’s attribute sets, but this solution cannot further distinguish the specific relevance between users. In 2011, Li et al. [16] proposed two distributed privacypreserving profilematching protocols, which deploy the homomorphic property of Shamir secret sharing scheme [20] to calculate the intersections of users’ private sets without relying on a trusted third party. However, the coarsegrained profile matching cannot accurately measure the proximity of users.
1.1.2. FineGrained Scheme
For the finegrained profilematching schemes [1, 2, 10], user’s preference or behavior pattern is usually regarded as a multidimensional vector, and the social proximity is usually measured by the inner product between two users’ vectors. For example, Zhang et al. [1] designed a finegrained profilematching protocol with three security levels. A user can initiate a matching query for their encrypted data with other users and finally obtains the proximity result by utilizing the homomorphic addition property of the Paillier cryptosystem [21]. However, it is required that both users stay online to perform multiple interactions during the execution of protocols, thus imposing a heavy communication and computational burden on mobile terminals. To tackle the above issues, Gao et al. [2] introduced a novel cloudassisted profilematching scheme under multiple keys. The cloud environment is composed of two cooperative but noncollusive clouds, and users in the social application could go offline after uploading their encrypted data. A friend finder can designate a target and initiate a matching query to the cloud; then, the two cloud servers return the matching result to the user through interactions. In their scheme, the clouds perform most of the computations, which effectively reduces the burden of users. And the data providers do not have to stay online all the time. However, the scheme [2] utilizes a secure ElGamallike [14] proxy reencryption algorithm to encrypt data, so each user needs to generate their secret keys and reencryption keys. Although schemes in a multiplekey environment could benefit from some existing technologies [22] that could create secure communicating groups within a secure group with exchange of small information, multiple keys increase additional burden of key management in our consideration. Moreover, it seems impossible to guarantee users’ privacy once the user’s reencryption key is leaked in the process of uploading to the cloud. What is worse, Gao et al.’s scheme uses the inner product of two vectors to measure the matching degree. However, the ElGamallike algorithm requires the size of the plaintext to be less than 40 bits to make sure that the ciphertext can be efficiently decrypted using Pollard’s kangaroo algorithm [23] to solve the discrete logarithm problem for a relatively smaller integer. Therefore, the scheme fails to provide high precision computations on the users’ data.
1.2. Organization
The rest of the paper is organized as follows. Section 2 gives some notations and the main part of the HRES algorithm [15]. Section 3 introduces the system and adversary models. Section 4 provides the construction of the improved algorithm and the profilematching scheme. Section 5 presents the security proof of the improved HRES algorithm and the security analysis of the profilematching protocol. Section 6 draws conclusions.
2. Preliminaries
In this section, we briefly introduce some notations appearing in this paper. We also introduce a fundamental HRES algorithm with two cooperative and noncollusive clouds and a data normalization method.
2.1. Notation
In this paper, we write for assigning x a value X and denote the integer set as . In addition, indicates that is calculated by the algorithm . We use and to denote the Euler and the Carmichael function, respectively. The ring means the residue class ring of integers modulo , i.e., . We use bold lowercase letters to represent vectors, and the Euclidean norm for a vector is denoted as . In addition, the bit length of an integer is denoted as . We use the symbol to denote the ceiling function. Besides, means the inner product of the two vectors and , and we use to denote the encryption function with the public key of the improved HRES algorithm that will be introduced in the next section.
2.2. Homomorphic ReEncryption
Reencryption technology supports a proxy transferring decryption authority without decrypting ciphertext [24]. On this basis, homomorphic reencryption supports homomorphic operation on ciphertext, which is more suitable for data dissemination in networks with special needs. In 2017, Ding et al. proposed a HRES homomorphic reencryption scheme [15], which includes two noncollusive cloud servers, CA and CB, that jointly manage the ciphertext data. The data owner encrypts his data with the public key generated by negotiation between the two cloud servers, and the ciphertexts can be correctly decrypted only if the two cloud servers cooperate with each other. The HRES consists of the following algorithms.(i)KeyGen. Given a security parameter , let be a safe RSA modulus, where and are primes of the form of and , and and are primes of equal bit length. Let be an element of the maximal order in . The two cloud servers CA and CB, respectively, generate their own key pairs: and . Therefore, CA negotiates with CB to generate their Diffie–Hellman key .(ii)Enc. Given the Diffie–Hellman key and a message , output the ciphertext as follows: , where indicates the ciphertext encrypted with and is a random integer selected from .(iii)Partial Dec1. Given and a ciphertext , the cloud CA can transfer the above ciphertext into another ciphertext as where indicates that the ciphertext can be decrypted with .(iv)Partial Dec2. Given and a ciphertext , the cloud CB can decrypt the plaintext as follows:where the function is defined as .
2.3. Normalization Method
This section mainly introduces a common data standardization method: Zscore standardization method, which standardizes data based on the mean and variance of the original data. The processed data obeys the normal distribution; that is, the mean value of the data is 0 and the standard deviation is 1. The conversion formula is as follows:where represents the processed data, is the mean of the samples and denotes the standard deviation of the samples.
In the proposed scheme, the raw data need to be normalized before processing. The reason for this operation is that the values of data from different sources are quite different. In order to eliminate the influence of different numerical range and make the data comparable, data need to be normalized. Hence, Zscore method is adopted to make the data at the same level, which is convenient for analyzing the data.
3. System and Threat Model
3.1. System Model
As shown in Figure 1, our system model contains three entities: a friend finder (Alice), the cloud environment, and other users. Each entity is described as follows: Alice is marked as a friend finder who wants to find friends that have similar interests with her in the social network. The cloud environment includes two cloud servers, CA and CB, which can provide users with enormous storage space for storing personal profiles and a large amount of computing resources. Through cooperating, CA and CB can help Alice calculate the proximities with other users, and the privacy of the two users will be not compromised. There are many other users in social networks who prefer to outsource the encrypted data representing their preferences to the cloud CA.
Each user registers a personal account in the mobile social network and then fills in the personal information. The personal information contains user’s preferences that can be used as a measurement for profile matching. However, the user does not want to reveal his private data in order to avoid illegal use [25]. Personal profile is often defined as a multidimensional vector , and each dimension represents an attribute corresponding to a hobby, such as cooking and tourism. Each attribute value may be represented by an integer ranging from 0 to 10, or a larger range. The bigger the number is, the more the users like the item will be, and vice versa. To better measure the proximity between users, the cosine value of normalized vectors with Zscore method is taken in this paper.
3.2. Threat Model
In the honestbutcurious model, an external adversary and an internal adversary are considered. An external adversary mainly refers to an eavesdropper who can get some information (e.g., encrypted data) through the transparent channel by eavesdropping. An internal adversary is an honestbutcurious entity such that he faithfully follow the agreement but attempt to collect and reveal private information during the execution of the agreement. The friend finder may want to expose other users’ profiles, while the two clouds may want to reveal the users’ personal data in the social networks. Moreover, it is assumed that the two clouds will never collude with each other and the users will not deliberately attempt to guess the cosine result by adjusting the vector multiple times.
4. Our Construction
In this section, we give the improved HRES algorithm and our privacypreserving profilematching scheme.
4.1. Improved HRES Algorithm
In order to support homomorphic multiplication computations, a slight modification has been made on the original HRES algorithm. Here it is required that the two clouds CA and CB will not collude with each other and cooperate to perform decryption operations. The improved algorithm includes the following algorithms: KeyGen, Enc, Partial Dec1, Partial Dec2, and Evaluation.(i)KeyGen. Choose a large prime integer . Note that the multiplicative group has primitive roots of order , and hence the algorithm can randomly choose a generator with order from . Then the two cloud servers CA and CB, respectively, generate their own key pairs: and . Therefore, CA negotiates with CB to generate their Diffie–Hellman key .(ii)Enc. Given the Diffie–Hellman key and a plaintext message , where , output the ciphertext as follows: , where indicates the ciphertext encrypted with and is a random integer selected from .(iii)Partial Dec1. Given and a ciphertext , the cloud CA can transfer the above ciphertext into another ciphertext as , where indicates that the ciphertext can be decrypted with .(iv)Partial Dec2. Given and a ciphertext , the cloud CB can output the plaintext as follows: where the function is defined as .(v)Homomorphic evaluation. We first show that the improved scheme supports homomorphic additions. Given any ciphertexts, namely, for , with the underlying plaintext being for , we firstly show that our improved algorithm supports homomorphic additions.
Specifically, we compute . Note that
From the refreshed ciphertext , the cloud CA can use his secret key to partially decrypt the ciphertext as , where
The cloud CB can further decrypt the ciphertext using his secret key as follows. The cloud CB first computes
Accordingly, it is easy to verifywhere the cloud CB can obtain the plaintext
Note that should hold, i.e., , if is needed. Therefore, our scheme supports homomorphic additions on ciphertexts.
Now we show that the proposal can perform a homomorphic multiplication on two ciphertexts, and , where with and for . Specially, we compute . Note thatand . The cloud CA can partially decrypt the ciphertext as , where and . The cloud CB first computes
Noting that
The cloud CB computes
CB could obtain when . Then, CB calculateswhere the cloud CB obtains the results . Thus, our scheme supports a homomorphic multiplication operation on ciphertexts when .
4.2. PrivacyPreserving Profile Matching
In this part, the improved HRES algorithm is adopted to implement our privacypreserving profilematching scheme. Suppose that Alice’s vector is and that Bob’s vector is . The cosine value of the two vectors can be calculated as
In particular, since and are nonnegative integers. Users can encrypt their vectors with the Diffie–Hellman key and then outsource their encrypted data to CA. The procedure for the data outsourcing is presented in Algorithm 1 and the privacypreserving profilematching scheme is shown as Algorithm 2.


5. Correctness and Security
In this section, we firstly give the correctness analysis of our protocol and then prove that the improved HRES algorithm is semantically secure with rigorous method. At last, we prove that our profilematching scheme is secure under the semihonest model.
5.1. Correctness
In our scheme, CB can correctly decrypt and obtain the result of without revealing any useful information. The demonstration is shown as follows. Firstly, CA computes
We can verify that
Then, CA generates the random integers to obfuscate , and the final result can be obtained through the following formula:where . Hence, CB can obtain . Then, CB calculates when , and it cannot reveal information about and .
On the other hand, we will prove that Alice can obtain a more accurate cosine result as follows.
Once CA receives , it can remove the blinding values by computing , and then it computes the following formulas:(i).(ii).(iii).
Then, CA obtains with the blinding value . CB decrypts the ciphertext, computes , and then encrypts the ceiling result with . Thus, CA can compute and get . We find that when the pretreatment number is large, is closer to the cosine result .
5.2. Security Proof of Improved Algorithm
In this subsection, the semantic security of the improved HRES algorithm is proved through three theorems. We first prove that the DDH problem in (the cyclic group of modulo ) is hard to solve. Based on this conclusion, the improved HRES algorithm could be semantically secure.
Theorem 1. Let be the cyclic group of modulo , and be a generator of . The discrete logarithm problem (in ) is hard to solve.
Proof. For the sake of contradiction, it is assumed that the discrete logarithm problem (in ) is not difficult; i.e., there exists an oracle that can solve the discrete logarithm problem (in ) in polynomial time. For example, given an input , the oracle returns the index as output.(1)Note that, if is a generator of and satisfies , is also a generator of . Our strategy is as follows. Let , we can assume that there exists an integer that satisfies , such that the equation holds. Multiply simultaneously both sides of the equation by and the equation can be obtained. Thus, it has . Since , can be denoted as where is an integer. Thus, can be written as the following formula: We can obtain . Then, we query the oracle for solving the discrete logarithm problem in to get the result with as input. And of course can be solved out. This implies that, if the discrete logarithm problem in were not difficult, the discrete logarithm problem in turns out to be easy as well. It is contradictory obviously.(2)If is a generator of and satisfies , is also the generator that belongs to both and . The proof process is similar to the above one and we will not describe it here.
Theorem 2. Let be a large prime. Let be the cyclic group of modulo , and be a generator of . The Decisional Diffie–Hellman problem (in ) is difficult.
Proof. For the sake of contradiction, it is assumed that the Decisional Diffie–Hellman (in ) is not difficult. It means that there exists an adversary who can find a random integer such that . We assume that is a random quadruple and is a DH quadruple, where and .(1)Note that, if is a generator of and satisfies , is also a generator of . If can find a random integer that satisfies , can be derived. Hence, equation holds. Besides, since is a generator of , we can also get the equation . We thus state that if there exists an adversary who can distinguish the random quadruple from DH quadruple , can also distinguish and in . However, the DDH assumption in is difficult, so the original hypothesis does not hold. Thus, we can get a conclusion that DDH assumption (in ) is also difficult.(2)If is a generator of and satisfies , then is also the generator that belongs to both and . The proof process is similar to the above one and we will not describe it here.
Theorem 3. If Decisional Diffie–Hellman assumption in holds, the improved HRES algorithm presented in Section 5.1 is semantically secure.
Proof. During the KeyGen phase, the cloud servers CA and CB negotiate with each other to generate their Diffie–Hellman key. Due to the difficulty of discrete logarithm problem in , it is negligible to get any information about, , or for any adversaries.
For the sake of contradiction, it is assumed that the scheme is not semantically secure; i.e., there exists a polynomial time adversary which can break semantic security with nonnegligible probability . constructs a distinguisher that can solve the DDH problem in . The construction is as follows.
Given a challenge quadruple , where , the goal of the distinguisher is to determine or , where is a random integer in . The challenger sends the public key to the adversary . Then, the adversary submits two challenge messages of equal length to the challenger based on his prior knowledge, and then the challenger returns as a challenge ciphertext to the adversary , in which . Finally, the adversary outputs as the guess result. If , the challenger outputs ; otherwise, it outputs . The discussion is as follows:(1)If , then is a valid ciphertext and the probability of adversary guessing correctly is equal to .(2)If , then is independent of the encrypted message because the random value is uniformly and randomly distributed among . Therefore, the probability that the adversary guesses correctly is .As a result, if the distinguisher can break the scheme with a nonnegligible probability , the adversary can attack the DDH assumption in with the same advantage. For the reason that the DDH assumption in is difficult, our improved scheme is semantically secure.
5.3. Security Analysis of Our Protocol
The security analysis of our privacypreserving profilematching scheme under the semihonest model will be presented in this subsection with a real and ideal paradigm [2, 26]. For any adversaries who attack a real protocol execution, there exists an adversary who attacks an ideal execution, such that the input and output distributions of the adversary and participants in both the real and the ideal executions are fundamentally the same.
Theorem 4. Our profilematching scheme described in Section 5 can securely obtain the matching result through the calculations on ciphertexts under the semihonest and noncollusive adversaries.
Proof. In this scheme, there are mainly four parties: Alice, Bob, CA, and CB. We can construct four simulators against four types of adversaries that will corrupt the privacy of Alice, Bob, CA, and CB, respectively.
simulates as follows: After receiving the normalized vector , encrypts and , respectively, to get and . Then, chooses a random integer and encrypts into . randomly picks a vector , computes , and sends the partial decryption result processed by its secret key to . The view of involves the normalized vector , the encrypted results set , and the decryption result . The view of in real and ideal executions is indistinguishable owing to the semantic security of the improved HRES scheme mentioned above.
simulates as follows: After receiving the normalized vector , encrypts and , respectively, to get and . After that, chooses a random integer and encrypts into . Then sends the encrypted results to . The view of involves the normalized vector and the encrypted result set . The view of in real and ideal executions is indistinguishable owing to the semantic security of the improved HRES scheme mentioned above.
simulates as follows: randomly picks two vectors and . Then, encrypts them as , , , and . generates two random integers , ; respectively, obtains the encrypted values ; and partially decrypts the above result. Finally, it generates random integers , and computes , , and . The view of in the real and ideal executions is indistinguishable owing to the semantic security of the improved HRES scheme mentioned above.
simulates as follows: chooses random integers , , , where , and then encrypts them as , . Then, it picks a random number and encrypts it as , . reencrypts them with the secret key of CA and sends the values , , , , , and to . The view of is the above encrypted values and some obfuscated data. Although can decrypt and obtain the obfuscated data, the random numbers selected by the simulator are uniformly and randomly distributed in the message space, so they are the obfuscated messages. The view of in real and ideal executions is indistinguishable owing to the semantic security of the improved HRES scheme mentioned above.
6. Evaluation
6.1. Comparison
In this subsection, we mainly discuss the advantages of our scheme compared with the existing privacypreserving profilematching schemes [1, 2, 11, 27] in Table 1. These schemes [1, 11, 27] require users to stay online simultaneously to obtain matching results through multiple interactions, resulting in additional computational costs and communication overheads on mobile devices of users. In our scheme, users only need to encrypt their personal profiles and upload them to the cloud, and then they can go offline. For a Friend finder, he can designate a target to initiate a matching query and ultimately get the matching result. Most computations are undertaken by the two cloud servers, which can greatly reduce the burden of users. Compared with the scheme in [2], the users do not need to upload the reencryption keys when uploading their encrypted data, thereby avoiding the risk of the users’ personal data leakage due to the reencryption key leakage and reducing the burden of the key management. We use the cosine scores of two vectors as the matching result instead of the intersection of two sets or the inner product of two vectors. In particular, the proposed scheme supports processing larger data. Therefore, the proposed scheme is finegrained, and the precision of matching result is high.
6.2. Simulation
In order to better demonstrate the performance of our scheme and compare it with Gao’s scheme [2], we simulated two profilematching protocols separately. Simulations are conducted using the PBC library on a computer equipped with a 2.5 GHz Intel core i53210 processor with 4 GB of RAM. The dimension of users’ vector is set as 20, and we compared the two system performances with the increase in the number of the data bits. The results are shown in Figures 2 and 3. More intuitive comparison results are shown in Figure 4. As the number of data bits increases, the decryption time of Gao’s scheme will increase exponentially, and the time spent by cloud servers and users increases. Specifically, our scheme has no limit on the length of the data. And as the length increases, the time consumed does not change significantly.
6.3. Complexity Analysis
In this subsection, we review some existing conclusions before analyzing the computational complexity of our scheme. The complexity of calculating modular multiplication and modular exponentiation is and , respectively, where is the modulus.
First, we analyze the computational complexity of the improved HRES algorithm. The Enc involves two modular multiplications, a modular addition and two modular exponentiations. Hence, the computational complexity of this part is . The Partial Dec1 only needs a modular exponentiation, so the computational complexity is . Similarly, the Partial Dec2 involves a modular exponentiation, a modular inversion, a modular multiplication, a subtraction, and a division. Hence, the corresponding computational complexity is . On this basis, we could deduce that the computational complexity of the procedure executed by Alice is , and the same for Bob, CA, and CB.
7. Conclusion
In this paper, we propose a privacypreserving profilematching scheme over improved HRES algorithm in mobile social networks. The improved algorithm can support onetime homomorphic multiplication and arbitrarily many homomorphic additions. Compared with the original scheme [2], the key management burden can be reduced, and the privacy problem of users caused by the reencryption keys leakage can be effectively solved. In addition, our scheme utilizes the cosine result between two normalized vectors as the standard for measuring the users’ proximity, which can effectively improve the social experience of the users. Even if users with ulterior motives collude with one of the clouds, the personal data of other users will not be revealed. At last, we prove that our scheme is secure under the semihonest model through strict security analysis.
Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
Acknowledgments
This work was funded by the National Key R&D Program of China under grant no. 2017YFB0802000, the National Natural Science Foundation of China under grant nos. U19B2021 and U1736111, the National Cryptography Development Fund under grant no. MMJJ20180111, the Key Research and Development Program of Shaanxi under grant no. 2020ZDLGY0804, and the Program for Science and Technology Innovation Talents in the Universities of Henan Province under grant no. 18HASTIT022.