Abstract

As a review system, the Crowd-Sourced Local Businesses Service System (CSLBSS) allows users to publicly publish reviews for businesses that include display name, avatar, and review content. While these reviews can maintain the business reputation and provide valuable references for others, the adversary also can legitimately obtain the user’s display name and a large number of historical reviews. For this problem, we show that the adversary can launch connecting user identities attack (CUIA) and statistical inference attack (SIA) to obtain user privacy by exploiting the acquired display names and historical reviews. However, the existing methods based on anonymity and suppressing reviews cannot resist these two attacks. Also, suppressing reviews may result in some reiews with the higher usefulness not being published. To solve these problems, we propose a cross-platform strong privacy protection mechanism (CSPPM) based on the partial publication and the complete anonymity mechanism. In CSPPM, based on the consistency between the user score and the business score, we propose a partial publication mechanism to publish reviews with the higher usefulness of review and filter false or untrue reviews. It ensures that our mechanism does not suppress reviews with the higher usefulness of reviews and improves system utility. We also propose a complete anonymity mechanism to anonymize the display name and avatars of reviews that are publicly published. It ensures that the adversary cannot obtain user privacy through CUIA and SIA. Finally, we evaluate CSPPM from both theoretical and experimental aspects. The results show that it can resist CUIA and SIA and improve system utility.

1. Introduction

With the development of position technology and the widespread use of smartphones, more and more social network applications provide Location-Based Services (LBSs), known as Location-Based Social Networks (LBSNs) [1], such as TripAdvisor, Yelp, Dianping. We can exploit these applications to easily socialize online, plan travel routes, have spatial crowdsourcing [2, 3], and query surrounding Point of Interests (POIs), which greatly facilitates our lives [4]. Among these applications, Crowd-Sourced Local Businesses Service Systems (CSLBSSs), such as Yelp and Dianping, are interactive platforms that provide users with business information, consumer preferences, and consumer reviews in the areas of dining, shopping, etc. CSLBSSs are also special LBSNs that crowdsource review lists of businesses and maintain their reputation [5, 6].

In CSLBSSs, a public review mainly includes attributes such as display name, avatar, and review content (text, image, video, etc.). By browsing the list of reviews, consumers can get a true picture of the quality of the services provided by the business without going to the physical store. That is, consumers can refer to the business reputation (i.e., business reputation score) and the review list to quickly and easily select POIs, such as restaurants with high scores. Note that the CSLBSSs evaluate the business reputation from both subjective and objective aspects: user rating and business reputation score. The user rating is a score that the system allows the user to make on the business reputation when a user publishes a review and is highly subjective. The business reputation score is a score that the system makes on the business reputation by calculating all user ratings and is highly objective.

However, while consumers enjoy the convenient services brought by CSLBSSs, they also face the risk of privacy leakage. In CSLBSSs, a business corresponds to a unique address. A review for a business implies that a user has visited the business or has related experiences associated with it. Moreover, reviews are public information, which can be obtained by anyone, even adversaries. By collecting and analyzing the reviews published by users [5, 6], some malicious adversaries are even able to infer users’ privacy. Furthermore, by using the cross-social network fusion technology [79], we can connect user identities across multiple social networks and further infer more privacy, such as occupation, address, e-mail. At the same time, multiple types of platforms, including social networks, will provide users with different services and publish different information about users. The published information always contains the users’ real profile and can be easily used to infer users’ privacy.

In general, the process by which an adversary obtains a user profile in a CSLBSS includes the following: (1) the adversary uses display name as an initial keyword to search for the user’s information, such as using engines to obtain the user’s QQ, WeChat, and e-mail from social networks; (2) the adversary uses some information (e.g., e-mail, which is considered less private by users compared to QQ and WeChat), acquired from the first search process, as the keyword to further search for user’s real name, educational background, organization. We call the above process that obtains a user’s profile connecting user identities attack (CUIA). At present, most researches focus on the protection of user privacy in terms of query content [2, 3, 1012] and data publication [8, 13, 14]. However, for data publication, few studies are investigating the privacy protection of review publication in CSLBSSs. To the best of our knowledge, only Zheng et al. [5] and Yang et al. [6] have explored the issue. However, both focus on privacy protection within the same platform and do not consider privacy disclosure on the cross-platforms. Therefore, neither of these methods can resist CUIA. Moreover, although the schemes of literature [5, 6] can protect users’ privacy to some extent, they do not take into account the behavioral patterns of users. This results in the above approaches being unable to resist statistical inference attack (SIA) due to their inability to prevent adversaries from accessing long-term user behavior data [15, 16].

Currently, schemes of literature [5, 6] mainly protect user’s privacy with the combination of suppression publication (partial publication) and anonymous publication. On the one hand, the adversary launches CUIA starting from the display name. While anonymous and partial release can reduce the risk of compromise, adversaries can still gain the user’s display names and other private information from public reviews. On the other hand, the CSLBSSs rely on user’s reviews to sustain the evolution of the platform; i.e., consumers are more likely to buy goods that have more credible reviews than less credible reviews. That is, consumers’ willingness to purchase goods depends on how credible the reviews are. In this paper, we call the ability of the review to influence consumers’ willingness usefulness of review. In the CSLBSSs, the systems hope to publish as many reviews that consumers consider as credible as possible. In general, for a system, the more such reviews it publishes means that it has a better ability to sustain development. In this paper, we call the ability to sustain development system utility. However, partial or anonymous reviews will reduce the usefulness of reviews, because partial publication results in a decrease in the number of reviews that the system can publish, or anonymous publication reduces the credibility of reviews. Thus, how to balance privacy protection and system utility becomes an issue that needs to be addressed.

To address the above issues, we need to propose a method to effectively protect a user’s cross-platform privacy in the scenario of review publication while maintaining system utility. We first investigated the process of privacy disclosure and the usefulness of review in CSLBSSs. We found that adversaries generally can mine user’s profiles to infer user’s privacy by launching CUIA and SIA. In this process, the display name is usually the keyword exploited to mine a user’s profiles. Based on the information adoption model [17], we found that users’ real identity and consensus information, namely, the degree of consistency between user rating and business reputation score, are two key factors that determine the usefulness of the review. When little identity information of user is disclosed, if the user rating is consistent with the business reputation score, the consumer considers the review credible [18]. In other words, even if a user’s real identity is not disclosed in the review, the usefulness of the review will not be affected.

Based on the above research, we propose a cross-platform strong privacy protection mechanism (CSPPM) based on the partial publication and complete anonymity mechanism to publish reviews. CSPPM partially publishes public reviews, but all published reviews are anonymous. It is a restricted privacy protection mechanism than [5, 6]. Here, complete anonymity refers to obscuring the display names and avatar or directly replacing them with randomly assigned strings and uniform icons in partial reviews that are allowed to be published. It solves the leakage of display names and minimizes the possibility of users suffering from CUIA and SIA. However, considering the background knowledge of the adversary, the completely anonymous reviews still have the risk of being identified. For example, reviews often contain users’ photos and landmarks. In this case, people familiar with the user can easily identify that the review is published by the user. The more information the review discloses, the higher the risk of the user will have. Therefore, we adopt a partial publication mechanism to reduce the disclosure of less useful reviews. Whether a review is published or not depends on the difference between the user rating for a business and the business reputation score (short for score difference). It measures the degree of consistency between the user rating and the business reputation score. For the mechanism, reviews whose score differences fall within the threshold range are published anonymously, while those that exceed the threshold range are not published (these are most likely false reviews and extreme cases of positive and negative reviews). It ensures that the published reviews are those with high consistency with the business reputation score, which best reflect the business reputation. It also ensures the reference value of the review list to users. Besides, the list of reviews is sorted by a combination of score difference and user reputation score; namely, reviews are sorted by their usefulness.

The main contributions of our paper are as follows:(i)We identify and formalize CUIA and SIA in the scenario of review publication. We also find that the display name is a key factor in user identification (i.e., privacy leakage).(ii)We propose a stricter privacy protection mechanism based on partial publication and complete anonymity mechanism to protect privacy in the scenario of review publication.(iii)We propose a method to improve the usefulness of reviews based on consensus information. In cases where the user’s real disclosed identity information is too little or where the user is anonymous, the score differences of reviews are used to decide which reviews to publish. In the method, reviews with a small score difference will be published first.(iv)We conducted experiments to verify the effectiveness of the proposed algorithms in the terms of resisting CUIA and SIA and maintaining system utility.

In view of the above scenarios, we review the existing technologies from three aspects: how the attacker identifies the user’s identity (attack identification), how the existing methods protect location privacy (privacy protection), and how to evaluate the system utility under different schemes (system utility).

2.1. User Identification

User identification [19], also known as linking user identities [20] or connecting user identities [21], refers to connecting user identities across multiple social networks by mining user profiles, relationships, and user-generated content (UGC, i.e., user behavior data, such as social network check-in, blog posts, shared pictures) from different social networks and associates the accounts of the same natural person on different social networks. According to the different types of information that the attacked can obtain, the existing research mainly focuses on user attribute, user relationship, and UGC.

User identification based on user attributes means that an attacker can connect user identity based on user profiles (mainly user names). As an identifier that uniquely identifies a user, because of individuals being accustomed to using the same or similar user names, user names are often used to identify users’ accounts in different social networks. Zafarani et al. [21] found that the user homepage URL usually contains the user name, and they are used to adding a prefix or suffix to the user name to form a new user name. This means that these different usernames belong to the same person. Therefore, Liu et al. [20] considered seven characteristics including the length of the user name, special characters, numbers, to determine the user’s identity. In response to this problem, the display name is used to replace the user name and become a kind of public information. However, the display name can still be used to identify the user. Li et al. [22] designed a distributed crawler to obtain user profiles containing display names in Foursquare, Facebook, and Twitter and identify users based on the extracted display name feature comparison results. Therefore, the user name or display name has become a key information for the attacker to identify the user’s identity.

User identification based on user relationship is a method by which an attacker uses the user’s circle of friends to identify multiple different accounts belonging to the same user. The core of this method is to connect users based on overlapping subnets in different social networks and improve user identification accuracy. The higher the degree of subnet overlap is, the higher the identification accuracy is [7, 19]. At present, there are some methods to identify users: related user mining based on prior users, related user mining based on non-priori users [7], and non-priori knowledge user identification algorithm based on friend relationships (FRUI-P) [23]. All these methods show that although the same user has different accounts in different social networks, attackers can still use user relationships to infer that these accounts belong to the same user.

User identification based on UGC mainly uses user behavior data on social networks for cross-social network user identification. Li et al. [24] mined the similarity of space (extracting location), time (extracting timestamp), and content features (counting semantic similarity and the number of identical words in text content) from UGC and then used supervised machine learning method to match the user accounts. Zhang et al. [25] analyzed the user’s spoken language, content complexity, content standardization, and the characteristics of user pictures and user time series in multimedia content for the text content, multimedia content, and time series content published by users. And then they proposed text content analysis and identification methods, multimedia content identification methods, and time series content identification methods to identify user organizations/personal identities. As a result, UGC has become the key information for identifying users.

In addition, some scholars have tried user identification methods that combine user attributes, user relationships, and UGC content [9, 26, 27] to identify users’ multidimensional identity feature information, in order to solve the drawbacks of using a single attribute to identify user accounts. Although the method of combining user’s multiple attributes will result in a high degree of sparse data and a high degree of complexity in extracting features, it can extract more comprehensive user characteristics and increase the probability of recognizing a user’s identity.

2.2. Location Privacy Protection

The inability to associate a user’s identity with a precise location is a privacy protection method commonly used by current location privacy protection technologies. These techniques include three categories: obfuscation method, dummy method, and pseudonym.

The obfuscation method [2830] protects the user’s location privacy by using imprecise locations or areas instead of the user’s real or precise location. It requires users to submit imprecise locations or areas to the server, for example, Gedik et al. [28] submission area, Gedik et al. [29] submission imprecise location, and so on. However, in the review publication scenario, the location of each business is precise. Therefore, it is not suitable for protecting location privacy in this scenario.

The dummy method [16, 31, 32] usually adds false users or false locations to achieve anonymity. For example, Li et al. [16] added fake locations, and Niu et al. [32] added fake users to achieve anonymity. However, in the review publication scenario, if a user has not visited a business, the user is not allowed to publish reviews on the business’s services. Therefore, the dummy method is not suitable for protecting location privacy in this scenario.

The pseudonym [15] realizes privacy protection by replacing the user’s identity identifier with a pseudonym. The basic assumption of this method is that the identity identifier is the only information that can be used by an attacker to identify a user’s identity. For example, to a certain extent, display name can be regarded as a pseudonym that replaces the user’s identity identifier. User reviews usually include photos, real-time location, and personal information. According to the aforementioned analysis, an attacker can identify the user’s identity through CUIA. Li et al. [14] and Zhang et al. [33] and others have proposed privacy protection methods to balance the needs of users for such information and privacy protection. However, in the review publication scenario, we cannot protect user privacy only by replacing the user ID.

Zheng et al. [5] pointed out that only when the user’s location obtained by the attacker exceeds the threshold will the user’s privacy be disclosed. Therefore, Zheng et al. [5] and Yang et al. [6] proposed two mechanisms combining partial publication and anonymous publication. These two mechanisms protect user privacy by reducing the number of public reviews published by users. However, they did not anonymize the display name and avatar in the public reviews, making it easy for attackers to use this information to carry out cross-platform CUIA to obtain more private information from users.

2.3. Usefulness of Review

Online consumer reviews (OCRs), known as electronic word-of-mouth (WOM), are the experience, usefulness, performance about the business, brand, product, service, etc., published on the Internet by consumers [34]. Good OCRs can effectively help consumers make choices without being familiar with the business, brand, product, or service. To study the mechanism by which the OCRs influence consumer purchasing behavior, Sussman et al. [17] proposed an information adoption model, as shown in Figure 1. Information usability is an important factor in determining consumers’ adoption of information. Therefore, we usually use usefulness to measure the effectiveness of reviews, which is embodied in two aspects: review quality and source credibility.

Generally speaking, evaluation criteria of review quality [35] include the review content, as well as the accuracy, relevance, timeliness, and length of the description of the review content about businesses and products. In general, the more accurate the content of the description is, the greater the relevance is, the stronger the timeliness is, and the longer the length is, the higher the quality of the review is. The existing CSLBSS is used to use ratings (e.g., 5 stars), scores (e.g., 10 points), and thumb-up numbers (thumb-up denotes agreement with the content described in the review) to measure a reviewers’ approval of businesses and products [5, 6].

For source credibility, since it is difficult to judge whether the reviewer is credible, the participants judge the credibility of the reviewer’s source by obtaining the reviewer’s profiles from OCRs. In addition, out of consideration of different interests, the CSLBSS platform, businesses, and consumers intentionally publish some false reviews [36]. This also reduces the credibility of the source to a certain extent. Xu et al. [37] pointed out that profiles, such as photos and reputation [38], can improve the credibility of the reviewer perceived by the recipient. The more profiles disclosed by the reviewer, the higher the credibility is, and the easier it is for consumers to adopt the review [18].

Starting from the privacy risks of the CSLBSS, we studied the strong privacy protection mechanism of partial publication (reviews that meet the publication conditions are anonymous) and how to ensure the system utility under this strong privacy protection mechanism.

3. Preliminary

3.1. Statistical Inference Attack

In typical LBSs, to enjoy the service, the user needs to submit a service request, containing ID, location, POI, etc., to the LBS service provider (LSP). The massive service requests make the user vulnerable to statistical inference attack. For example, LSA attacks and RSA attacks [15] are defined as collecting historical query information of target users, analyzing the geographical distribution probability and time period distribution probability of their query, and inferring the area where their family and company are located, including personal preferences and living habits. In CSLBSSs, each review corresponds to a business, and the business uniquely corresponds to a physical address. The distribution probability of user reviews can also reflect user’s sensitive locations, causing them to also easily suffer from statistical inference attack.

In CSLBSSs, reviews reflect where the user has been (a business corresponds to a unique geographic location, and a user review on a business means that he or she has been there or have related experience). Although CSLBSSs crowdsource reviews, users are still subject to statistical inference attacks, since the distributed probability of reviews can reflect their sensitive locations. For example, a user will review some Chinese restaurant in a mall at noon every day for a long time. If there are companies in the mall’s office buildings, it is likely that the target user works in one of the companies and is an employee who loves Chinese food. For statistical inference attacks, Zheng et al. [5] pointed out that adversaries can infer a user’s privacy based on the distribution of reviews. Then, Yang et al. [6] proposed IEPP mechanism. In this mechanism, users with similar probabilities of reviews in an area are allowed to publish their reviews. For example, users A and B are allowed to publish reviews if the probability of publishing reviews in area is in the range . However, neither of the two mechanisms takes into account the behavior patterns of users that characterize user behavior over the long term. Consider the anonymous group containing 3 users: A, B, and C. Assume the probabilities of publishing reviews of them in area is in the range during the period . Then, we can formalize the probabilities of publishing reviews of them as .

When these users publish a large number of reviews over the long term, denoted as ( is the period for the system to update the reputation score of users and the business), we can compute the probabilities of publishing reviews, as shown in the formula:

Since individual behavior patterns are not exactly the same, , , and are not exactly the same at the time . For example, if , the adversary will find that user A has a higher probability of publishing reviews in area and infer that is the sensitive area of A.

3.2. Connecting User Identities Attack

In our scenario, by analyzing user personal information, relationships, and UGC, the adversaries can launch CUIA to link user identities across social networks and obtain user privacy. CUIA discussed in our paper includes two types: CUIA on the same social network (short for SSN-CUIA) and CUIA across social networks (short for ASN-CUIA). Among them, SSN-CUIA, as a special case of ASN-CUIA, refers to mining and analyzing all the information of a user on the same social network to determine the user’s personal information as much as possible. To protect privacy, most users use pseudonyms and fake avatars when publishing reviews. However, privacy will inevitably be leaked when the user publishes reviews that include photos, or organization, etc. Considering that the user profiles on the same social network is limited, if the adversary wants to obtain more dimensional profiles of the user, they need to launch ASN-CUIA to link the user identities. Figure 2 shows the specific process of ASN-CUIA.

When user publishes a review in CSLBSS, the adversary first obtains their attributes ( is the j-th attribute of ) on the CSLBSS based on the review. Let be the user on the . In the first stage, the adversary links user identities by obtaining the user’s information on other social networks. In this stage, the adversary uses the user’s display name as the keyword to search the user’s information across different social networks, which is represented as . These searched information is highly similar to the display name and is likely to belong to the same user. Then, for each , the adversary crawls attributes of on and connects identities for the first time. Through these processes, the adversary finally obtains the consistent and more dimensional attributes of the user across different social networks. To distinguish from the original attributes, the attributes finally obtained in the first stage is represented as . In the second stage, the adversary extracts the key attributes of , such as e-mail, phone number, QQ, and WeChat, and conducts the next round of search and user identification. In this stage, the adversary can further obtain and determine attributes (represented as ) with a higher level of user privacy than the first stage through data mining and analysis.

3.3. The Consistency between the User Rating and the Business Reputation Score

Based on the above statement, consumers still consider a review credible based on the consistency between the user rating and the business reputation score (short for score consistency) when the real identities of most users have not been disclosed. In this section, we define the score consistency as follows.

Definition 1. (Score Consistency): assume the rating of user for business is at time and the score of business reputation is . Then the score consistency of and refers to the difference between and , which is expressed asFor simplicity, we abbreviate as .
Generally speaking, a smaller means the smaller difference between and ; namely, the user rating and the business reputation score are more consistent. Referencing the paper [18], a smaller also means higher credit of the review. Then, we clarify this conclusion from the perspective of benefit.
Specifically, the reviews for businesses are divided into three categories: false-positive review, false-negative review, and real review. False-positive review means that the business induces or hires users to give ratings that are significantly higher than the business reputation score in order to improve its own reputation score; that is, the is greater. False-negative review means that consumers give a business a lower rating unequally; that is, the is greater. The real review means that the user rating is basically the same as the business reputation score; that is, the is smaller. Therefore, the smaller is, the closer the user rating is to the business reputation score and the more credible consumers consider a review. Note that the business reputation score reflects the majority of consumers’ rating for the service of a business service. Therefore, it can reflect the real service quality of the business more objectively than a single user rating. For the extremely low or extremely high user rating given by a very small number of users, although they are not false reviews, they cannot reflect the real service quality of the business due to the large score difference. Therefore, in reality, they will not affect consumer decisions. That is, a review with a higher has low credibility.

3.4. Voting Decision Rule

To improve the usefulness of reviews, Yang et al. [6] used Voting Decision Rule [39] to improve the usefulness of reviews. In the current CSLBSSs, a 10-point scale or 5-star rating is used to rate businesses, and the most commonly used is the 5-star rating. Our paper uses 5-stars rating to rate the business service. Let be the user rating of which user scores business and be the threshold at which each user agrees to recommend a business to other users. makes a decision (use for approval and for the opposition) on whether or not to agree to recommend a business to other users, denoted by (agree) and (not agree). Then, for users , means they agree to recommend and . In contrast, , means they do not agree to recommend and .

We consider the case that and rate business , respectively. For a constant , there exists and . However, since and have different subjective experiences with ’s service, does not mean that is in favor of recommending . Further, the business reputation score is dynamic process, and a constant cannot reflect changing process of the business’s service quality. To address the problem, we consider using score consistency as a dynamic indicator to assess the usefulness of review. That is, as long as , i.e., , the user rating is considered credible and useful. In other words, it also means that approve of . Then, the approval or otherwise of the business in literature [6] becomes approval or otherwise of the current business reputation score.

Based on the dynamic and evaluation criteria, the overall binary decision of whether users agree or not is expressed as formula (3) [40].

Among them, represents the overall decision. represents that the choice of at least users out of users is , and is the result of approval. This paper uses to indicate the threshold of approval.

3.5. Beta Reputation Mechanism

According to Section 3.4, the user’s choice determines the overall decision based on the number of approvals. Intuitively, everyone’s user reputation score is different, and the credibility of their reviews is also different. To reduce the influence of false-positive and false-negative reviews and improve the credibility of the overall decision, we compute the user reputation score and business reputation score based on Beta Reputation Mechanism [41]. Suppose that at time , publishes a review for and is the user rating. Then, we get the decision vector, as shown in the following formula:where represents the binary decision made by to at time . and represent opinions for approval or refusal, respectively. Then, at time , the rule of global decision is shown in the following formula:

Based on (5), we get the weight vector , as shown in

Here, denotes the weight of the decision made by in the overall decision and it is determined by the user reputation score of before . The formula for calculating the weight is shown inwhere represents the user reputation score of at , and the specific calculation is as in formula (10). According to formula (7), . Here, is a relative value that depends on the user reputation score of users who published reviews for the business at time before the time . Different users who published reviews for have different user reputation score. But for any user among them, the higher the user reputation score at the previous moment is, the greater its weight is, and the greater its impact on the overall decision is.

In addition, we define the positive rating and the negative rating of , as shown inwhere denotes the times of which ’s decision before time is consistent with the global decision . denotes whether ’s decision before time is consistent with the global decision , denoted by (they are not consistent) and (they are consistent). denotes the times of which ’s decision before time is not consistent with the global decision . denotes whether ’s decision before time is not consistent with the global decision , denoted by (they are consistent) and (they are not consistent).

Then, we can calculate the user’s score at time , as shown in

In formula (10), because each user publishes no reviews in the initial state, it is impossible to judge the consistency of their decision with the overall decision, so we set the initial value of the user reputation score of each user as 0.5. After calculating the user reputation score , the user rating determines the business reputation score. Considering the difference in the influence (i.e., weight) of users’ reviews for business at time on the overall decision (as it is known in formula (5)), we can get the method to calculate the business reputation score of business at time , as shown in

It can be seen from formula (11) that the business reputation score of a business is jointly determined by the business reputation score at time and the weighted sum of the user scores of all users at time for the business.

4. Motivation and Model

4.1. Motivation and Basic Idea

In CSLBSSs, users submit reviews to rate businesses, and consumers make decisions based on reviews and the business reputation score. Considering that the adversary can legally obtain reviews from the CSLBSSs and user profiles from other social networks, it leads to the inevitably leaking of the user privacy due to CUIA and SIA. Also, the more user profiles the adversary obtains, the more accurate the user privacy can be inferred. Although partial publication and anonymity mechanism can reduce the risk of privacy leakage caused by excessively publishing reviews, users still suffer from CUIA and SIA, especially CUIA. As long as the display name is not anonymized, the adversary can exploit it as the keyword to launch CUIA. However, the partial publication and anonymity mechanism can also reduce the usefulness of review and the utility of the system. Besides, the consumers need more public reviews and more objective business reputation score to enjoy a better service. Hence, how to balance privacy protection and system utility is a problem that needs to be addressed urgently.

To address the above problems, the basic idea of our paper is to partially publish public reviews of which the usefulness of review is high. At the same time, we need to anonymize the display name and avatar of all reviews that are allowed to be publicly published. Based on above two mechanisms, it can achieve a balance between privacy protection and system utility. As we stated in the section Introduction, the system hopes to publish as many reviews that consumers consider credible as possible. Whether consumers consider a review credible depends on the usefulness of the review, which is measured by score consistency. The partial publication mechanism can publish public reviews of which the usefulness of review is high and suppress reviews of which the usefulness of review is low. The complete anonymity mechanism ensures that the adversary cannot infer user privacy by exploiting the display name to launch CUIA. Therefore, the basic idea can protect user privacy by resisting CUIA and SIA while improving system utility.

4.2. Threat Model

The goal of this paper is to protect user privacy while improving system utility. We do this by a partial publication mechanism which anonymously publishes reviews with a high score consistency. In our threat model, the CSLBSSs are entities of “honest but curious”. In other words, the CSLBSSs execute the agreement honestly, but they also collect and analyze users’ data curiously and infer the POIs that users have visited to provide users with better personalized recommendation services. The CSLBSSs have no subjective maliciousness. However, to further attract users through the social relationship, CSLBSSs tend to establish a personal homepage for each user, which makes it easy for the adversary to get all reviews and personal profiles. By analyzing users’ data, the adversary can infer the user’s privacy. In addition, we assume that CSLBSSs are credible and cannot be hacked, which is also the basic trust or agreement between users and service providers.

Furthermore, adversaries are subjective, malicious, and highly motivated entities. Their goals are to infer as much privacy as possible about the users, including real identity and organization. The adversary may be any user who can access CSLBSSs. In our attack model, the adversaries can use any tools and methods to collect user reviews on the CSLBSSs, as well as the personal profiles on other social networks including demographic information. They use this information as background knowledge to implement CUIA and SIA such that they can identify as much real personal information of users as possible.

In this paper, the user’s privacy we consider mainly includes location privacy, identity privacy, and preference privacy. The user’s privacy is considered threatened if the following conditions are met:(1)The adversary can directly or indirectly infer areas where users frequently visit and can even determine accurate information such as their home address and workplace(2)The adversary directly or indirectly infers the personal profiles such as real name, photo, and phone number(3)The adversary directly or indirectly infers the preference information such as consumption habits and behavior habits

4.3. System Model

In this paper, the system model consists of four parts: user/business, CSLBSS, reputation system, and review list. The general business data process includes 5 steps, as shown in Figure 3. Among them, the business publishes services on CSLBSS, and the user obtains services from CSLBSS and reviews the business’s services. CSLBSS is the management center of the entire system and mainly contains two functions: (1) provide a service interface for the users/business (including registration, login, and review) and (2) provide the users/business information and review information to the reputation system. The reputation system calculates the user reputation score and the business reputation score according to the user/business information and review information from the CSLBSS to determine the status of the review (published or not, public or anonymous) and give feedback of the calculation results to CSLBSS. The review list shows the calculation results from the reputation system which includes the user reputation score, the business reputation score, and the review content.(i)User/business: in the CSLBSS, there are businesses, denoted as . Each business has a unique location denoted by coordinate , which is the longitude and latitude. There are users, denoted as . Each user publishes a review and gives a user rating (e.g., score or star rating, denoted as ) for . Generally speaking, users need to register an account on the system and successfully log in to obtain services and publish reviews. To ensure the authenticity of the user, a mobile phone number or Short Messaging Service (SMS) verification code is required when registering. At the same time, users are allowed to customize their pseudonym (namely the user name displayed in the review list, also called display name), instead of the real name to uniquely identify the user, so as to protect the privacy of the user’s identity. In this paper, for reasons such as user naming preference and reauthentication complexity, we assume that each user will not change the pseudonym for a long time. It is agreed that each user can only review on relevant experience to ensure the objectivity and authenticity of the review.(ii)CSLBSS : the CSLBSS mainly includes 3 functions: (1) CSLBSS provides users with interfaces to register, log in, obtain services and reviews, store and update users’ pseudonyms, and bound mobile phone numbers, avatars, reviews, and the user reputation score and other profiles. It also provides a platform for the business to display products, store and update the business reputation score. (2) The CSLBSS provides the user reputation score, the business reputation score, and user review to the reputation system and updates relevant information about users and businesses in real-time. (3) Based on the user reputation score, the business reputation score, the score consistency, and the threshold , the CSLBSS selects reviews that meet the publication criteria and filters out some reviews with low usefulness and credibility. In addition, the published reviews are sorted according to and the user reputation score.(iii)Reputation system: the reputation system is the core computing part of the entire system. It is responsible for calculating the business reputation score based on user rating and the user reputation score based on the usefulness of the review. Then, it determines the objectivity of the review and whether it will be published based on the score consistency. In a cycle, the reputation system will perform the above process to update the business reputation score, the user reputation score, and the status of reviews.(iv)Review list: CSLBSS publishes the calculation results of the reputation system and sorts the reviews by the usefulness of review in the form of a web page, namely a review list for users to use. Therefore, the review list is the page display of the reputation system and an important reference for users when they consume. As shown in Figure 3, the review list contains the business reputation score and user reviews that can directly reflect the quality of business services, as well as the user reputation score that indirectly affects users’ acceptance of reviews (indicated by user membership levels in the figure).

5. Privacy Protection Framework and Core Algorithms

5.1. Privacy Protection Framework

Zheng et al. [5] and Yang et al. [6] pointed out that users are more willing to publicly publish as many reviews as possible to obtain a higher user reputation score. However, users will suffer CUIA and SIA if any one public review discloses the display name. Therefore, users need to publicly publish as many reviews as possible without disclosing the display name. In this paper, we propose two mechanisms to address the above problem: (1) the partial publication mechanism. This mechanism publishes the reviews of which the usefulness of review is high and suppresses reviews of which the usefulness of review is low according to . Specifically, we give two thresholds and for the score consistency. is related to voting decision rule and the overall decision. For a review, if there exists , consumers will consider the user rating and the business reputation score highly consistent. That is, consumers consider the business reputation score credible. determines which reviews are published publicly. A review can be published anonymously if . A review, whose user rating is not objective enough and that has low reference value to consumers, cannot be published if . On the one hand, reviews whose not being published can reduce disclosure of user reviews and privacy risks. On the other hand, reviews whose being retained can improve the usefulness of reviews that are publicly published. (2) Anonymize display names and avatars in publicly published reviews that meet the publication conditions. We find that the display name is the key factor of privacy leakage in review publication, and anonymous treatment of display name can prevent privacy leakage caused by display name in review publication scenario from the root.

Furthermore, we first sort the publicly published reviews according to the score consistency . Then, we sort the reviews with the same value of according to the user reputation score. The purpose of this step is to ensure that useful and reliable reviews are ranked first. The reason why is the main keyword for ranking is that it best reflects the degree of the score consistency. The user reputation score is calculated based on the reviews for different businesses. It reflects the degree of consistency between the user’s decision and the overall decision and does not reflect the degree of consistency between a specific user rating and the business reputation score. Moreover, even if the user reputation score is high, it does not mean that a certain user rating can accurately reflect the business service. However, if is the same, the higher the user reputation score is, the higher the usefulness and reliability are. Therefore, in this paper, the score consistency is used as the primary keyword for ranking, and the user reputation score is used as the secondary keyword for ranking. The specific privacy protection process is shown in Figure 4.

As shown in Figure 4, the key part of the privacy protection framework includes calculating the score difference and judging whether the privacy metrics are met. If the privacy metrics are not met, the review will not be published; if the privacy metrics are met, the review can be published after being anonymized. Next step, the system needs to update the user reputation score and the business reputation score and publish review list.

5.2. Core Algorithm
5.2.1. Calculate the Score Difference

In this paper, the score difference is a key parameter to realize privacy protection and improve the usefulness of review and is calculated by the reputation system. At time , the reputation system first obtains the user rating from CSLBSS and then obtains the business reputation score at time . Then calculate the score difference . The specific process is shown in Algorithm 1.

(1)Obtain ;
(2)Obtain;
(3);
(4)while do
(i)
(5)return ;
5.2.2. Determine the Privacy Metric

In this process, determining privacy metric is mainly to determine which reviews can be published according to the threshold . In our paper, whether a review can be published depends on the score difference being bound to the threshold . In the process, we first obtain and observe whether exceeds . For a review, if , we will publish it publicly, namely any user can browse it on the CSLBSS. If , we will not publish it publicly; namely, only the user who published it can browse it on his own page. Note that the reviews published in our paper are anonymous, so the threshold is used to roughly identify false or extreme reviews (nonobjective reviews) and filter those with low usefulness of review to reduce excessive disclosure of user privacy while improving the usefulness of review that can be published. Therefore, is an empirical parameter and is relatively larger than . In other words, in this case, there will be more reviews that satisfy the condition. Relative to , the threshold determines whether a user approves of the business reputation score. The smaller the is, the more the user approves of the review. If , the user will approve of the business reputation score. That is, the user considers the current business reputation score objective; otherwise, the user will make an opposing decision. That is, the user considers the current business reputation score inconsistent with the actual service. The specific privacy metric determination algorithm is shown in Algorithm 2.

(1)Obtain the set of the score difference and the threshold ;
(2), ;
(3)while do
 if
  ;//the set of the score difference that satisfies the privacy metric;
  ;//the number of published reviews
;
(4)Obtain all reviews corresponding to the score difference in and get the set of reviews needed to be published;
(5)return ;
5.2.3. Update the User Reputation Score and the Business Reputation Score

The process of updating the user reputation score and the business reputation score mainly involves calculating the user reputation score and the business reputation score at time . The user reputation score reflects the objectivity of the review for the business’s service, which is represented by the ratio of the times that the user’s decision is consistent with the overall decision to the total times of decisions made by the user. The business reputation score reflects the quality of the business’s service. It is the average value of the business reputation score at time and the sum of the user rating’ weight at time . The specific process is shown in Algorithm 3.

(1)Obtain the set of the user reputation score at time;
(2)Obtain and ;
(3);
(4)while do
(5);
(6)while do
; before time ;
(7)Calculate the business reputation score of at time ;
(8)Return ;
5.2.4. Publish Review List

Publishing the review list is the final stage of privacy protection. After determining the privacy metric, the system has determined which reviews can be published. Our next step is to anonymize the display name and avatar and sort the review list to be published before publishing. In this paper, we first use uniform characters and pictures to replace the user’s original display name and avatar, so as to protect user privacy while reducing computational complexity. Then, according to , we sort publicly published user reviews and get a new review list. For the reviews whose score difference is the same, we then sort them according to the user reputation score and get a new review list. Through these two sorts, we can improve the usefulness of review. Finally, we publish the review list publicly; namely, any user can browse it on the CSLBSS. The specific process is shown in Algorithm 4.

(1)Input ,;
(2)According to the , sort the and get the new review list;
(3)According to the ,sort the reviews whose score difference is the same in and get the new review list;
(4)Return ;

6. Scheme Analysis

In this section, we focus on the privacy protection effect and the usefulness of the review of our scheme.

6.1. Privacy Protection Effect

Essentially, a user can be represented by a series of attributes that characterize the user’s characteristics, expressed as . Assume that there are attributes that can be collected across different social networks and the total number of user attributes is , and . Therefore, the adversary identifying a user is to identify the user attributes. For an individual user, the more publicly published reviews are and the more attributes that are collected, the higher the probability that the individual user will be identified. For the CSLBSS, the more publicly published user reviews are, the more users will be identified and the higher the probability of the recognition rate of the CSLBSS. In this paper, we use public publication rate , user attribute recognition rate , and user recognition rate to describe the privacy protection effect.

Definition 2. (Public Publish Rate) : we define the ratio of the number of publicly published reviews to the total number of reviews as the , expressed as where and represent the total number of reviews publicly published and the total number of reviews published by users, respectively.

Definition 3. (user attribute recognition rate). For a user, each of their attributes contains different privacy information, and we use the privacy level to express this difference. In this paper, we introduce the attribute weight to represent the privacy level. Then, the user attribute recognition rate is defined asHere, represents the privacy information contained in the attributes that can be collected across different social networks. represents the attribute weight corresponding to attribute . Whether the adversary has obtained the user’s attribute information is denoted by (i.e., adversary obtains user’s attribute information ) and (adversary does not obtain user’s attribute information ). represents the total amount of privacy information contained in all attributes, and . Then, formula (13) can be further simplified as

Definition 4. (user recognition rate). Assume that a user is identified if . Let the total number of users be and the number of users who has been identified be . Then, we define the user recognition rate as

Conclusion 1. For the same user, the larger the is, the more privacy of the user is leaked and the higher the probability of the user being identified is.

Proof. Based on the aforementioned analysis, the adversary can first obtain the set of attributes of on the . Let contain attributes and the corresponding privacy be denote as . Then, the adversary searches the display name of through the Internet and get the set of attributes that can be visited. Let contain attributes and the corresponding privacy be denoted as . We can draw that ( represents the all privacy of the user) due to and then the is greater. Especially, all attributes of will be searched if . In other words, if all the privacy of is leaked, the adversary can accurately identify them. Therefore, it can be proved that Conclusion 1 is true. Note that, although we use the CUIA as an example to verify the risk of privacy leakage, the obtained privacy is a conservative estimate. That is, the amount of privacy information leaked is at least . If there exists a better attack tool or method, more private information will be leaked.

Conclusion 2. In a CSLBSS, the greater the is, the more users will be identified. The greater the is, the greater is the risk that privacy will be compromised.

Proof. We consider CSLBSS with different , i.e., and . The number of users who have been identified and the corresponding to is and , respectively. The number of users who have been identified and the URR corresponding to is and , respectively. For the same privacy protection scheme, we can easily get the conclusion , and then , namely . Therefore, it can be proved that Conclusion 2 is true.
According to the derivation process of Conclusions 1 and 2, we know that there will be a potential risk of privacy leakage, as long as the user’s display name is disclosed (that is, the review is publicly published). In addition, even if all the display names of the user have not been disclosed, the attacker can obtain some personal privacy information from the review content. Therefore, we adopt a strong privacy protection mechanism of complete anonymity and partial publication. That is, user reviews that meet the threshold range of the score difference are published publicly, and the user’s display name and avatar must be anonymized before publication.

Conclusion 3. The privacy protection framework proposed in this paper can effectively reduce the risk of privacy leakage.

Proof. For the individual user, the user attribute set obtained without using the privacy protection framework proposed in this paper is , and the corresponding privacy information is . All reviews are processed anonymously and have no personalized characteristics after using our privacy protection framework. It is difficult to identify the regional characteristics, namely the distribution characteristics of the published reviews, and they are not vulnerable to SIA. In addition, the display name is anonymized. The adversary cannot implement CUIA based on attributes that can uniquely identify users, and the user information across social networks is not easy to obtain. Assume that the adversary can obtain the user attribute set and the corresponding privacy information is when using our privacy protection framework. Then, and . For a CSLBSS without using our privacy protection framework, the number of users that have revealed the display name is . The number of users who have been identified and the corresponding to the CSLBSS is and , respectively. For a CSLBSS using our privacy protection framework, the number of users that has revealed the display name is . The number of users who have been identified and the corresponding to it is and , respectively. We can easily conclude that . Then, there will be ; namely . Therefore, the privacy protection framework proposed in this paper can resist CUIA and SIA and can effectively reduce the risk of privacy leakage.

6.2. The Usefulness of Review

The factors affecting the usefulness of a review include the accuracy of the review content, relevance, timeliness, length, number of reviews, and reliability of the source. Considering the difference in the users’ subjective experience, it is difficult to ensure the accuracy, objectivity, and reliability of individual reviews [18]. Therefore, this paper chooses two indicators, i.e., the score difference and the user reputation score , to judge the usefulness of a review. The business reputation score is the overall score of a business by all users at a certain time and reflects the overall evaluation of the business services by historical users. The score difference reflects the difference between individual user rating and overall evaluation. In other words, it reflects whether a user rating is objective and credible. The lower the is, the more objective the score is and the more credible the review is. Furthermore, the user reputation score reflects a user’s reputation. Using it to measure the usefulness of review can better reflect the objectivity and credibility of a review. In addition, and are the primary keyword and the secondary keyword in sorting the usefulness of review, respectively. The reason is that reflects the reputation of the overall credibility of an individual user, but it does not mean that each review of the individual user is objective and credible. However, for the same , the higher the is, the more objective and credible the user’s current review is.

7. Evaluation

In this section, we evaluate the utility and privacy of CSPPM on the real review datasets.

7.1. Dataset

We use two datasets, i.e., Dataset1 and Dataset2, to evaluate our scheme. Among them, Dataset1 is collected from Dianping, which contains businesses, user data, and reviews, only containing the text in the reviews but not containing pictures. It has become a dataset used in many scenarios [42]. In this paper, we use it to evaluate our scheme in terms of utility and privacy. The shortcoming of Dataset1 is that the reviews do not contain pictures so that it cannot be used to evaluate the ability of our solution to resist CUIA and SIA. Therefore, as a supplement to Dataset1, we design Dataset2 based on Dataset1. We randomly select 10 businesses and 560 different users from Dataset1. Then, we crawl the 560 users’ reviews including pictures on the Dianping and their profiles from other websites to evaluate the ability to resist attacks. All these data make up Dataset 2. The statistical information of Dataset1 and Dataset2 is shown in Tables 13.

7.2. Evaluation Metric

In CSLBSS, both users and businesses desire as many highly credible reviews as possible to be published. The goal of the business is that more reviews will attract more consumers. The goal of the user is that more reviews will build more objective reputations for businesses while protecting their privacy. Therefore, we evaluate our scheme with respect to three metrics: system utility, user utility, and privacy. For existing researches, both LPA [5] and IEPP [6] are methods to protect user privacy in the scenario of review publication. Therefore, we select them for comparison with our scheme.

7.2.1. System Utility Metric

In the scenario of review publication, a basic fact is that the more public (nonanonymous) reviews a user has published, the more private information is leaked. To publish as many reviews as possible while protecting user privacy, a feasible idea needs to meet two requirements: (1) limit the number of reviews published publicly by each user; (2) each user publishes the maximum number of reviews allowed for (1). Therefore, all methods (i.e., our and [5, 6]) set different thresholds (the maximum number of reviews that each user is allowed to publish publicly) to implement this idea. Based on this idea, if a threshold is given, the system utility will depend on the number of reviews submitted by each user. For example, suppose the threshold is 3; that is, each user can publish no more than 3 reviews. In this case, the more reviews a user submits, the more reviews will be suppressed, and the lower the system utility will be. In other words, when users submit different numbers of reviews, the greater the difference in the ratio that the reviews are publicly published (Public Publish Rate, ), the lower the system utility is. Therefore, we use the to measure the system utility. Considering the different thresholds set by different methods, we analyze the impact of different thresholds on the difference in the under the same method.

Based on the above analysis, we divide the administrative area covered by Dataset1 into a grid. For LPA, we set the thresholds of the total number of reviews published publicly as 60, 70, 80, 90, 100, 110, 120, and 130, respectively. For IEPP, we set the threshold interval of user similarity as [1/2,2], [1/2,3], [1/2,4], and [1/2,5], respectively. For CSPPM, we set the thresholds for the score difference as 0.5, 1, 1.5, 2, 2.5, and 3.5, respectively. In addition, to evaluate the impact of the number of reviews submitted by individual users, Dataset1 is divided into Dataset3 (the number of reviews submitted by each user is less than 4) and Dataset4 (the number of reviews submitted by each user is not less than 4).

7.2.2. User Utility Metric

Users hope to publish as many credible reviews as possible. In the scenario of review publication, the usefulness of review reflects whether a review is credible. For a user, the greater the proportion of reviews considered credible, the higher the user utility is. For this, the existing literatures propose some metrics for evaluating the usefulness of review (number of thumbs-up [5] and user rating [6]). However, a metric that is considered credible should meet the two requirements: (1) users have evaluated most of reviews based on the metric; for example, 90% of the reviews are liked by users; (2) for a review, the evaluation result based on the metric should be consistent with the evaluation results of other metrics. Therefore, we measure user utility as the usefulness of a review and compare it with [5, 6] to prove that it is objective and credible.

Based on the above analysis, we evaluate the usefulness of reviews under different values of each metric. For number of thumbs-up, we set the likes interval as 0, [1, 100], [101, 200], [201, 300], [301, +]. For the user rating, we set the rating from 0 to 7.

7.2.3. Privacy Metric

The goals of the adversary include (1) identifying individual users with the acquired attributes and (2) identifying as many users in a CSLBSS as possible. The ability of a method to resist attacks reflects how difficult the adversary achieves the two goals. The stronger the ability to resist attacks is, the lower the risk of privacy leakage is. Therefore, we measure the ability to resist attacks as two metrics: and . Among them, is used to measure the degree of user attribute leakage. The greater is, the greater the probability of an individual user being identified. is defined as the ratio of the number of users identified to the total number of users in a CSLBSS and is used to measure the risk of privacy leakage of a CSLBSS. The larger the is, the higher the risk of privacy leakage of a CSLBSS is.

In our scheme, the set of attributes used to evaluate the ability of our system to resist the CUIA and the SIA includes 11 attributes: display name, avatar, real photo, name, location, gender, age, education background, organization, contact information, and home address. We assign attribute weight to each attribute according to its privacy level. The more likely an attribute can uniquely identify a user, the higher the privacy level and the greater its attribute weight. The greater the privacy level of an attribute, the more privacy information it will leak after being acquired by the adversary. Therefore, we use the privacy leakage level to measure the privacy level. The lower the privacy leakage level is, the greater the privacy level is. Specifically, we divide the privacy disclosure level into 5 levels, as shown in Table 4.

7.3. Results
7.3.1. System Utility

Figures 5(a)5(c) separately show the of LPA, IEPP, and CSPPM on Dataset1, Dataset3, and Dataset4.

LPA sets the total number of reviews published publicly as the threshold. In Figure 5(a), as the threshold increases, the trends on Dataset3 is almost horizontal while both on Dataset1 and Dataset4 are gradually increasing. The corresponding average number of reviews that each user can publish in each grid is about 2 and 3 when the thresholds are 60 and 80, respectively. In Dataset3, the proportion of the total number of reviews published by users whose number of submitted reviews is less than 4 and is 1 or 2 exceeds 90%. Therefore, the is 97.52% when the threshold is 60 and the is even 100% when the threshold is increased to 80. In Dataset4, since the number of reviews submitted by each user is not less than 4 and the total number of reviews published publicly is limited to not exceeding the threshold, some reviews will not be allowed to be published publicly, which makes the in Dataset4 higher than in Dataset3. As the threshold increases, the number of reviews that users can publicly publish in each grid increases, and the overall also increases. In Dataset1, the proportion of the number of submitted reviews less than 4 is 81%. Therefore, the in Dataset1 is higher than in Dataset4 and is lower than in Dataset3. The results prove that the number of reviews submitted by individual users can affect the overall review publication. The fewer the number of reviews an individual user submits, the higher the probability that all their reviews will be published, and the higher the overall is. This is also the essence of LPA’s privacy protection. That is, privacy protection is achieved by limiting the number of reviews published by individual users.

IEPP sets the user similarity interval as the threshold. In Figure 5(b), as the threshold increases, the trends on Dataset3 is almost horizontal while both on Dataset1 and Dataset4 are gradually increasing. In Dataset3, there are close to 200,000 users and the number of reviews submitted by each user is less than 4. It ensures that there are enough similar users in Dataset3 and the similarities of all users are located in [1/2,2], Therefore, all reviews in Dataset3 can be published publicly; namely, the is 100%. In Dataset4, the number of reviews submitted by each user is not less than 4. It makes the difference in the number of reviews and distribution characteristics between users very obvious and reduces the similarity between users and the . However, the threshold for publicly published reviews increases when the user similarity interval increases. It allows user reviews with greater differences to be published and the also increases. Dataset1 contains all users whose number of reviews submitted is less than 4. Therefore, IEPP can publicly publish all reviews of the users whose number of reviews submitted is less than 4 when the user similarity interval is [1/2.2]. For the users whose number of reviews submitted is not less than 4, only partial reviews of them can be publicly published. Thus, the in Dataset1 is higher than in Dataset4 and is lower than in Dataset3. The results prove that the number of reviews submitted by individual users can affect the of IEPP overall review publication. The fewer the number of reviews an individual user submits, the more the number of users is, the more similar users is, and the higher the overall is.

CSPPM sets the score difference interval as the threshold. In Figure 5(c), as the threshold increases, the curves on Dataset1, Dataset3, and Dataset4 are very close and the trends on three datasets are gradually increasing. On the one hand, the increase in the score difference interval means that more reviews that meet the criteria for review publication can be published, and the overall will increase. On the other hand, the score difference is used to determine whether a review can be publicly published while the score difference is determined by a single review and has no direct relationship with the number of users and the number of reviews submitted by individual users. Therefore, the of CSPPM is relatively close on the three datasets.

Note that LPA and IEPP only allow reviews that meet the conditions to be published publicly, while reviews that do not meet the conditions are published anonymously. Here, the refers to the percentage of the total number of reviews published that are not anonymous. Therefore, for LPA and IEPP, the greater the is, the more display names and avatars of users are disclosed, and the greater the risk of privacy disclosure caused by display names and avatars. Compared with them, CSPPM is a more stringent privacy protection scheme. That is, reviews that meet the publication conditions are published anonymously, and reviews that do not meet the conditions are not published. It can effectively reduce the risk of privacy leakage caused by the display name and avatar. In addition, by comparing the of the three methods on three datasets, it can be seen that our method is basically not affected by the number of reviews submitted by individual users and the total number of users, and the privacy protection effect is more effective, stable, and more universal.

7.3.2. User Utility

LPA, IEPP, and CSPPM use the number of thumbs-up, the user rating, and the score difference to evaluate the usefulness of review, respectively. It can be seen from Figure 6(a) that the proportion of reviews with 0 thumbs-up is more than 80%, indicating that most of the reviews are not liked, and it is not feasible to rank the usefulness of the reviews solely relying on the number of thumbs-up. However, In Figures 6(b) and 6(c), although the proportion of reviews with more than 200 thumbs-up is small, the distribution of score difference ranges from -0.84 to 0.79 and the distribution of user rating ranges from 4 to 7. It proves that some reviews with much more thumbs-up will also have a smaller score difference and a higher user rating.

As shown in Figure 7(a), the proportion of reviews with user rating no more than 4 is nearly 90% and the distribution of score difference corresponding to each rating is very close, but the corresponding to reviews received relatively few thumbs-up. In addition, the proportion of reviews with user rating of more than 4 is about 10%, but these reviews contain most of the reviews with much more thumbs-up and a small part of reviews with lower thumbs-up. It proves that, to a certain extent, user rating can reflect the usefulness of review, but it is not a decisive factor in determining the usefulness of the review. That is, users with high ratings may also make evaluations that are inconsistent with the facts, and users with low ratings may publish reliable reviews. In other words, it is not feasible to rely solely on user rating to determine whether a review is reliable.

CSPPM preferentially publishes reviews with a small score difference, namely the score difference threshold (approximately 78.740%), as shown in Figure 8, which contain the reviews with more than 200 thumbs-up, since the score difference is a reflection of the consistency between user reviews and business services. In other words, reviews with the smaller score difference make consumers feel more objective and credible and thus gives more thumbs-up to these reviews. Therefore, comparing the three metric, the score difference is the most feasible one for evaluating the usefulness of the review. Although the number of thumbs-up reflects the objectivity of the reviews to a certain extent, it is not feasible due to small users evaluating the reviews based on the metric. For the user rating, under the same conditions, it is suitable as a reference metric. Therefore, the score difference and the user rating can be selected to evaluate the usefulness of the review. Specifically, when publishing reviews, CSPPM first sorts the published reviews according to the score difference and then sorts the reviews with the same score difference according to the user rating.

7.3.3. Privacy

Table 5 shows the proportion of users at different privacy leakage levels in Dataset1 and the corresponding .

In Table 5, privacy leakage level 0 represents that users publish reviews anonymously, which can effectively reduce the risk of privacy leakage due to the display name and avatar; the privacy leakage level 1 and 2 represent that reviews on Dianping are not anonymous, while they also cannot link them to the user’s profile across other social networks. These levels correspond to the risk of privacy leakage within the same CLBSS; the privacy leakage levels 3, 4, and 5 correspond to the risk of privacy leakage caused by information such as the display name in the publicly published reviews. Especially the privacy leakage levels 4 and 5 can even directly determine the user’s name, gender, real photo, occupation, contact information, address, etc. To evaluate the ability of our scheme to resist CUIA and SIA, we evaluate the at the different and (0.2 corresponds to the privacy leakage level 3). As shown in Figure 9, the of LPA and IEPP both increase with the increase of the . The reason is that more reviews can be published publicly with the increase of the . Thus, the adversary can collect more users’ display names and real identities, which can identify more easily users’ identities across social networks. However, CSPPM uses a strong anonymity mechanism to anonymize the display name and avatar of reviews that need to be published publicly. Therefore, CSPPM can avoid the privacy leakage caused by the display name. As a result, the of CSPPM is close to 0. It proves that, compared with LPA and IEPP, CSPPM has better resistance to CUIA and SIA.

8. Discussion

We adopt a strong privacy protection mechanism. That is, in order to reduce the privacy risks caused by display name, user reviews meeting the release conditions should be uniformly anonymous before publication. This can achieve better privacy protection effect. But for users, they cannot independently choose whether to publish reviews publicly, and the personalized needs of them cannot be met. This is the limitation of the privacy protection mechanism proposed by us. Aiming at the limitation, in the future research work, we will use ASN-CUIA identification method to implement the personalized privacy protection of users demand, called user privacy risk identification system. This system adopts the technique of artificial intelligence and big data to roughly estimate the privacy risks of user cross-platform and gives feedback of the assessment result to the user as a decision reference. Depending on the privacy risk feedback given by user privacy risk identification system, users can decide whether to publicly publish a review. Besides, it should be noted that we will still consider anonymity for reviews that exceed privacy risk thresholds.

9. Conclusions

In this paper, we proposed a strong cross-platform privacy protection mechanism (CSPPM) based on the partial publication and complete anonymity mechanism to resist connecting user identities attack (CUIA) and statistical inference attack (SIA) on the scenario of review publication. To be specific, on the one hand, we used the consistency between the user score and the business score as a criterion to publicly publish reviews with the higher usefulness of review and filter false or untrue reviews; on the other hand, we anonymized the display name and avatars of reviews that are publicly published. Besides, we evaluate the performance of CSPPM from three aspects: system utility metric (i.e., Public Publish Rate, ), user utility metric (i.e., number of thumbs-up, user rating, score difference), and privacy metric (i.e., the privacy leakage level based on user attribute recognition rate and user recognition rate ). Based on these metrics, we compared the effectiveness of our scheme with LPA and IEPP by implementing some experiments: (1) we analyze the impact of different thresholds on the difference in the under the same method by considering the different thresholds set by different methods; (2) we evaluate the usefulness of reviews under different number of thumbs-up, user rating, and score difference; (3) we evaluate the URR at the different PRR and UARR > 0.2. Evaluation results show that CSPPM has better system utility and can better avoid the privacy leakage than LPA and IEPP in terms of resistance to CUIA and SIA. The evaluation results also prove that, as a metric to measure the usefulness of review, the consistency between the user score and the business score is more objective and credible than the number of thumbs-up and the user rating.

Data Availability

In this paper, we use restaurant review data on Dianping.com to evaluate CSPPM. The URL of our data is https://doi.org/10.18170/DVN/GCIUN4.

Conflicts of Interest

The authors declare no conflicts of interest.

Acknowledgments

This research was funded by the Major Scientific and Technological Special Project of Guizhou Province (20183001), the Foundation of Guizhou Provincial Key Laboratory of Public Big Data (2017BDKFJJ015, 2018BDKFJJ020, and 2018BDKFJJ021), the National Natural Science Foundation of Guangxi (2019JJA170064), and the Basic Ability Improvement Program for Young and Middle-Aged Teachers in Guangxi (2021KY0615 and 2021KY0620).