Abstract

Incentive mechanisms are crucial for motivating adequate users to provide reliable data in mobile crowdsensing (MCS) systems. However, the privacy leakage of most existing incentive mechanisms leads to users unwilling to participate in sensing tasks. In this paper, we propose a privacy-preserving incentive mechanism based on truth discovery. Specifically, we use the secure truth discovery scheme to calculate ground truth and the weight of users’ data while protecting their privacy. Besides, to ensure the accuracy of the MCS results, a data eligibility assessment protocol is proposed to remove the sensing data of unreliable users before performing the truth discovery scheme. Finally, we distribute rewards to users based on their data quality. The analysis shows that our model can protect users’ privacy and prevent the malicious behavior of users and task publishers. In addition, the experimental results demonstrate that our model has high performance, reasonable reward distribution, and robustness to users dropping out.

1. Introduction

As more and more sensors are integrated into human-carried mobile devices, such as GPS locators, gyroscopes, environmental sensors, and accelerometers, they can collect various types of data [1]. Therefore, the MCS system [24] can utilize the sensors equipped in mobile devices to collect sensing data and complete various sensing tasks [5], such as navigation service [6], traffic monitoring [7], indoor positioning [8], and environmental monitoring [9]. In general, the MCS system consists of three entities: a task requester, a sensing server, and participating users, as shown in Figure 1. The task requester publishes sensing tasks and pays awards for sensing results. The server recruits users according to the sensing task, processes the data from users, and sends the results to the task publisher. Users collect sensing data based on the requirements of the sensing task and get rewards.

In the practical MCS system, the sensing data collected by users are not always reliable [10, 11] due to various factors (such as poor sensor quality, lack of effort, and background noise). Therefore, the final result may be inaccurate if we treat the data provided by each user equally (e.g., averaging). To solve this problem, truth discovery [1214] has been widely concerned by industry and academia. The main idea of most truth discovery schemes is that the user will be given a higher weight (i.e., reliability) if the user’s data are closer to the ground truth. Also, the data provided by a user will be counted more in the aggregation procedure if this user has a higher weight. Recently, a number of truth discovery methods [15] have been proposed to calculate user’s weight and aggregated results based on this basic idea. But one problem with these methods is that users have to be online to interact with the server. Otherwise, the MCS system may fail and have to restart. Therefore, if we design a truth discovery scheme that allows users to exit, the MCS system can get stronger robustness.

The proper functioning of the truth discovery requires enough users and high-quality sensing data. Generally, the MCS system utilizes an incentive mechanism [1618] to motivate sufficient users to participate in sensing tasks. However, because of monetary incentives, malicious users attempt to earn rewards with little or no effort. Although the truth discovery can assign low weight to malicious users, their continuous input of erroneous data can result in the unavailability of the MCS system [19]. Consequently, the evaluation of data quality is critical to the MCS system. To improve data quality, users who provide incorrect data can be removed before sensing data get aggregated [20]. On the one hand, we can get more accurate aggregation results. On the other hand, users who provide eligible data can get more monetary rewards.

Although the incentive mechanism has been improved a lot, users’ privacy protection remains inadequate. When users submit sensing data, their sensitive or private information [2123] may be leaked, including identity privacy [24], location privacy, and data privacy. Also, privacy disclosure [25] will reduce users’ willingness to participate in sensing tasks. Some incentive mechanism methods only consider the cost of users to collect sensing data but do not consider the potential cost of privacy disclosure. Recently, some researchers have designed privacy-preserving incentive mechanisms [2628]. In [20], an incentive method is proposed to protect the user’s identity and data privacy. Still, the user’s sensing data will be submitted to the task publisher regardless of the privacy of the sensing data. In [29], the incentive mechanism is designed under the assumption of a trusted platform, which may not hold in practice since the platform itself might be attacked by hackers.

To address these issues, we propose a privacy-preserving incentive mechanism based on truth discovery, called PAID. In our PAID, the task publisher sets data constraints, such as time, location [30], budget [31], and sensing data. If the user does not collect the sensing data at the required time and location or sensing data are not in the qualified range, we believe that the user’s sensing data are not credible (i.e., unqualified). After removing the unqualified user’s data, the qualified user’s sensing data will be submitted to the server to calculate the ground truth and weight. We also design a secure truth discovery scheme, which uses secret sharing technology and key agreement protocol and can still work when some users drop out. Moreover, our truth discovery can ensure that other parties cannot obtain users’ sensing data except users themselves. Finally, we calculate every user’s data quality according to the weight and distribute the reward.

In summary, the main contributions of this paper are as follows:(i)We introduce a privacy-preserving interval judgment scheme to remove users who provide unreliable data before performing the truth discovery scheme. Removing unqualified users in advance can greatly improve the quality of the sensing data used in the truth discovery scheme, improve the accuracy of results, and save the reward budget.(ii)We introduce a secure truth discovery scheme so that our incentive mechanism model can obtain the ground truth and the weight of each user’s data while protecting the user’s privacy. Then, we design a reasonable reward distribution scheme based on the data weight of users. Moreover, our incentive mechanism model can allow users to drop out at any time.(iii)Analysis shows that our model is secure. Also, experimental results demonstrate that our model has high performance and can achieve reasonable reward distribution.

The remainder of this paper is organized as follows. In Section 2, we describe the problem statement. In Sections 3 and 4, we introduce cryptography primitives and intuitive technology in our model. Then, we discuss PAID in detail in Section 5. Next, Sections 6 and 7 carry out the analysis and performance evaluation. Finally, we discuss the related work and conclude the paper in Sections 8 and 9.

2. Problem Statement

In this section, we introduce the background of truth discovery and our system model. Then, we describe the threat model and our design goals. Table 1 summarizes the main notations in this paper.

2.1. Truth Discovery

Truth discovery [32] is widely used in the MCS system to solve the conflicts between sensing data collected from multiple sources. Although the methods of estimating weights and calculating ground truth are different, their general processes are similar. Specifically, truth discovery initializes a random ground truth and then iteratively updates the weight and ground truth until convergence.

2.1.1. Weight Update

Suppose that the ground truth of the object is fixed. If the user’s sensing data are close to the ground truth, a higher weight should be assigned to the user. The weight of each user can be iteratively updated as follows:where is a distance function and . We use to represent the set of users, and is the number of users in the set . The sensing data collected by the user are denoted as , in which is the number of , and is the estimated ground truth.

2.1.2. Truth Update

Similarly, we assume that the weight of each user is fixed. Then, we can calculate the ground truth as follows:

The final ground truth is obtained by iteratively running the weight update and the truth update until the convergence condition is satisfied.

2.2. System Model

Similar to the general MCS system, our PAID comprises three entities: a task publisher (TP), a server (), and users. In our PAID, the TP publishes tasks and requirements to and gets the ground truth of the object from . The server recruits adequate users and removes the users who provide unqualified data. After receiving the sensing data of all users, performs the truth discovery scheme and gets the ground truth and the weight of each user. To prevent the TP from refusing to pay the reward, we require the TP to prepay the reward to as a guarantee. After getting the weight of each user, the server calculates the data quality and distributes the rewards. Users collect sensing data and earn monetary rewards by providing qualified data. Moreover, our PAID can protect users’ privacy of time, location, identity, and sensing data. Unlike general MCS models, in our PAID, the TP and can only get the aggregated result instead of users’ sensing data. Figure 2 shows the flow of our PAID. The specific process of our model is as follows.(1)Task Publish. The TP publishes a sensing task to , including sensing objects, data eligibility requirements, and budget.(2)User Recruitment. The server broadcasts the sensing task and recruits participating users.(3)Eligibility Assessment. The server judges whether every user’s sensing data meet qualification requirements.(4)Prepayment. The TP prepays monetary reward to avoid the denial of payment attack.(5)Submission Notification. The server notifies qualified users to submit sensing data.(6)Data Submission and Eligibility Confirmation. Users submit the masked sensing data to . And the server needs to confirm whether the submitted sensing data are qualified to prevent malicious users from tampering with the data.(7)Deviation Elimination. The server removes users who tamper with their sensing data and eliminates the deviation of data aggregation caused by these dropped users.(8)Secure Truth Discovery. The server calculates the ground truth and weight of each user by performing the security truth discovery scheme.(9)Reward Distribution. The server calculates the data quality of each user and distributes the rewards.(10)Task Completion. The server sends the ground truth of the sensing object to TP.

2.3. Threat Model

In this section, we mainly consider the potential threats from TP, the server , and users.

We suppose that TP is dishonest. After getting data from , TP may launch a denial of payment attack (DoP) and refuse to pay rewards.

The server is considered as honest-but-curious [33, 34]. Specifically, the server follows the agreement execution instructions, but it also attempts to spy on users’ private data. In other words, the server may launch inference attacks (IAs) on the users’ private data.

We assume that users are untrusted. Some malicious users may provide erroneous data and launch a data pollution attack (DPA). Besides, untrusted users may forge multiple identities and initiate a Sybil attack (SA), to earn more monetary rewards.

2.4. Design Goals

In this section, we introduce the design goals of our PAID, which are divided into privacy and security goals and property goals.

The privacy goals can protect the user’s private data, and the security goals can avoid malicious attacks. The details are as follows.(i)Privacy Goals. PAID can protect user’s location privacy, data privacy, and identity privacy. Specifically, the location and sensing data of a user cannot be obtained by any other parties except the user himself. And users’ real identities would not be disclosed when performing a sensing task.(ii)Security Goals. In our PAID, users can avoid the denial of payment attack (DoP) of TP. The server cannot initiate an inference attack (IA) on users. The server can resist the data pollution attack (DPA) launched by malicious users. And our PAID guarantees fairness by resisting the Sybil attack (SA).

Our PAID also requires the following property goals.(i)Eligibility. If users’ data do not meet the eligibility requirements, they cannot pass the eligibility assessment. In other words, the sensing data adopted by our PAID must be eligible.(ii)Zero Knowledge. When the server assesses whether users’ data meet the eligibility requirements, it cannot obtain the content of users’ private data.(iii)Payment Rationality. Each user can get non-negative utility as long as the user provides qualified data.(iv)Budget Rationality. The total monetary reward paid by the TP does not exceed the budget constraint.

3. Preliminaries

In this section, we review the cryptographic primitives used in our PAID.

3.1. Secret Sharing

We use Shamir’s -out-of- secret sharing protocol [35], which can split each user’s secret into shares, where any shares can be used to reconstruct . Still, it is impossible to get any information about if the shares obtained by attackers are less than .

We assume that some integers can be identified with distinct elements in a finite field , where is parameterized with a size of (in which is the security parameter). These integers can represent all users’ IDs, and we use a symbol to denote the set of users’ IDs. Then, Shamir’s secret sharing protocol consists of two steps as below.(i): the inputs of the sharing algorithm are a secret , a threshold , and a set of field elements denoting the users’ ID, where . It outputs a set of shares , each of which is associated with its corresponding the user .(ii): the inputs of the reconstruction algorithm are the shares corresponding to a subset and a threshold , where , and it outputs the secret .

Correctness requires that , , with . If , where and , then .

Security requires and any with . We havewhere “” indicates that the two distributions are indistinguishable.

3.2. Key Agreement

We utilize the Diffie–Hellman key agreement called SIGMA [36] in our PAID to generate a session key between two users. Typically, SIGMA is described in three parts as follows.(i): the algorithm’s input is a security parameter . It samples a group of prime order , along with a generator and a hash function , where is set as SHA-256 for practicability in our model.(ii): the algorithm’s inputs are a group of prime order , along with a generator and a hash function . It samples a random and , where and will be marked as the secret key and the public key in the following sections.(iii): the algorithm’s inputs are the user ’s secret key , the user ’s public key , signed signature , and from the user , where is used as the key. It outputs a session key between user and user . For simplicity, we use to represent the above process in the following sections.

Correctness requires that for any private and public key generated by the users and if two users use the same parameters. Security requires that the shared key is indistinguishable from a uniformly random string for any adversary who is given public keys and (but do not have the corresponding secret keys and ).

3.3. Paillier Cryptosystem

The Paillier cryptosystem [37] is a probabilistic public key cryptosystem. It consists of three parts as follows.(i): the key distribution algorithm inputs are a number and , where is the product of two large primes , . It outputs a secret key and a public key , where is computed by , and .(ii): the encryption algorithm inputs are a plaintext (which ) and a public key . It outputs a ciphertext .(iii): the decryption algorithm inputs are a ciphertext (which ) and a secret key . It outputs a plaintext .

The Paillier cryptosystem has the property of homomorphic addition.

We assume that is an encryption function.

4. Technical Intuition

In this section, we first introduce how the interval judgment scheme can judge users’ data eligibility while protecting users’ privacy. Then, we notice that truth discovery mainly involves the aggregation of multiple users’ data in a secure manner. Therefore, we require that the server only get the sum of users’ input, not content. And we propose a double-masking scheme to achieve this goal.

4.1. Interval Judgment Scheme for Privacy Protection

In our PAID, we use the interval judgment scheme [38] based on the Paillier cryptosystem to determine the sensing data eligibility. Every user provides sensing data , and the server provides a continuous integer interval (). The server can judge whether the user ’s sensing data meet the interval range without knowing the data . The user also cannot obtain any information about the integer interval. The scheme is divided into four steps as follows.(i)The user gets and then computes using and sends it to .(ii)The server picks two random numbers to construct a monotone increasing (or decreasing) function . Then, the server computes and sends them to .(iii)After receiving the information from the server , the user gets and then compares the size of , , and . Next, the message is sent to the server .(iv)After receiving the message from , the server judges whether . If so, we can know because of the monotonicity of the function , i.e., the user passes the data eligibility assessment. Otherwise, the user fails to pass the eligibility assessment of the server .

It should be noted that since the user does not know the monotonicity of the function , it is impossible to infer whether the data are in the range of the interval from the size relationship. For simplicity, we formulate the above process as an interval judgment function denoted by . If the user passes the eligibility assessment of the server , ; otherwise, .

4.2. One-Masking Scheme

Assume that all users are represented in sequence as integers 1, . And any pair of users , , agrees on a random value . Let us add to the user ’s data and subtract from the user ’s data to mask all users’ raw data. In other words, each user computes as follows.where we assume and are in with order for simplicity.

Then, each user submits to the server , and computes

However, this approach has two shortcomings. The first one is that every user needs to exchange the value with all other users, which will result in quadratic communication overhead if done naively. The second one is that the protocol will fail if any user drops out since the server cannot eliminate the value associated with in the final aggregated results .

4.3. Double-Masking Scheme

To solve these security problems, we introduce a double-masking scheme [39, 40]. In the work [40], the double-masking scheme is used for privacy-preserving data aggregation. And the scheme in [40] can also protect location privacy and verify the aggregation results. In our model, location privacy protection is implemented by the interval judgment scheme, and our secure truth discovery will confirm the data consistency. The details of the double-masking scheme are as follows.

Every user can get a session key with other user by engaging the Diffie–Hellman key agreement after the server broadcasts all of the Diffie–Hellman public keys. Then, we can utilize a pseudorandom generator () to reduce the high communication overhead by having the parties agree on a common seed instead of the whole mask .

We use the threshold secret sharing scheme to solve the issue that users are not allowed to drop out. Every user can send his secret shares to other users. Once some users cannot submit data in time, other users can recover masks associated with these users by submitting shares of these users’ secrets to , as long as the number of dropped users is less than (i.e., threshold of Shamir’s secret sharing).

However, there is a problem that may lead to users’ data leaked to . There is a scenario where a user is very slow to send data to . The server considers that the user has dropped and asks for their shares of the user ’s secret from all other users. Then, the server receives the delayed data after recovering ’s mask. At this time, the server can remove all the masks and get the plaintext .

To improve the scheme, we introduce an additional random seed to mask the data. Specifically, each user selects a random seed on the round of generating and then creates and distributes shares of to all other users during the secret sharing round. Now, users calculate as follows:

Note that an honest user will never reveal both kinds of shares of the same user to the server . During the recovery round, the server can request either a share of or a share of from each surviving user . After gathering at least shares of for all dropped users and shares of for all surviving users, the server can eliminate the remaining masks to reveal the sum.

5. Our Proposed Scheme

In this section, we first provide an overview of our PAID. Then, we show the details of the three critical designs in our PAID, including eligibility assessment, truth discovery, and reward distribution. In the eligibility assessment stage, the server judges whether users’ sensing data meet the requirements of a sensing task. In the truth discovery stage, the server can calculate each user’s weight and the ground truth required by the sensing task without knowing their sensing data. In the reward distribution stage, the server computes the quality of sensing data by each user’s weight and then pays a reward to users.

5.1. Overview

For convenience, we introduce a simple case. We set up a sensing task to collect the temperature of urban roads in the evening. There are range requirements for time, location, and sensing data (i.e., temperature). To be more precise, the time range is required to be 5–8 pm on February 3rd, the location range is required to be 12.45–12.55 E and 41.79–41.99 N, and the temperature requirement is 10–15°C. In our PAID, we consider the range requirement as the data eligibility requirement . The data () collected by a user meet the eligibility requirements , meaning that . Since the data collected by mobile devices are usually rational numbers, in our PAID, we transform the eligible interval into an integer interval by moving the decimal point right. The sensing task consists of three entities: a task publisher (TP), a server (), and users. And the specific steps are as follows.Step 1 (Task Publish). The task publisher TP initializes a public key and a private key , a reward control parameter ( is a decimal number), a task budget , the number of users , and eligibility requirements for a sensing task . The public key is used to encrypt the information that the server needs to send to the TP, and the TP decrypts the ciphertext using the private key . Then, the TP sends the information to as a task request.Step 2 (User Recruitment). The server broadcasts the sensing task information and recruits users who request to participate in the sensing task. Then, generates a key pair using the key agreement scheme for every user and sends to .Step 3 (Eligibility Assessment). Each user confirms whether , where denotes the sensing cost of , and the posted lowest reward is denoted as . If , and starts the sensing task and collects the data . The user then generates a key pair using the key agreement scheme and computes a session key as ’s anonymous identity information. Then, the user performs the interval judgment scheme and sends the public key to . Specifically, is divided into , , , .Step 4 (Prepayment). After recruiting eligible users, the server requests TP to prepay a budget reward for the sensing task to prevent the denial of payment attack. And the server calculates the session key with the eligible user .Step 5 (Submission Notification). After getting the budget reward , the server informs the eligible user to submit data.Step 6 (Data Submission and Eligibility Confirmation). After receiving the submission notification, each user performs double-masking scheme to mask the sensing data and get and , at the same time, executes eligibility confirmation to prevent malicious users from modifying data. Then, encrypts the data using the symmetric encryption algorithm and sends the ciphertext to . The session key is the key of symmetric encryption.Step 7 (Deviation Elimination). For users who tamper with data during data submission, the server regards them as dropped users and discards their data. Then, gets plaintext and requests seed and the noise between the dropped user and the surviving user to eliminate the impact on the aggregate result.Step 8 (Secure Truth Discovery). The server computes the surviving user ’s weight and the ground truth of the sensing object utilizing the truth discovery algorithm. The detailed algorithm process will be introduced later.Step 9 (Reward Distribution). The server calculates the sensing data quality of , where , is the number of online users. Then, pays a monetary reward for , where denotes the payment parameter, , and .Step 10 (Task Completion). The server encrypts the ground truth using and sends to TP. And the TP can decrypt the data using , i.e., .

In our PAID, only users who passed the eligibility assessment and eligibility confirmation can obtain the monetary reward. Thus, users cannot cheat to get a reward with unreliable data. We can also ensure the quality of the sensing data used by the truth discovery algorithm and obtain more accurate ground truth . Moreover, since the TP pays the task reward to in advance and will pay a reward to according to the quality of ’s sensing data after the task is accomplished, the TP cannot refuse to pay the reward. Besides, cannot get users’ raw sensing data, time, and location information, which can protect the users’ privacy. The anonymous identity of each user is determined by both the user and . only assigns one random identity token to each user, so malicious users cannot forge multiple identities.

5.2. Eligibility Assessment

In our PAID, there are three benefits to the design of the eligibility assessment. First, it can prevent users who provide unreliable or erroneous sensing data from receiving monetary rewards, which avoids wasting budgets. Secondly, filtering out unqualified sensing data can improve the accuracy of the sensing task result. Thirdly, the data quality of each user is related to the sensing object’s ground truth , and inaccurate ground truth will lead to unfair incentives.

The process of eligibility assessment and eligibility confirmation is similar. The purpose of the eligibility assessment is to filter out unqualified users preliminarily. Thus, the unqualified users do not need to communicate with other users to perform the double-masking scheme, by which the communication overhead can be reduced. The eligibility confirmation is designed to prevent malicious users from altering the original qualified data. The detailed process of eligibility assessment and eligibility confirmation is as follows.Step 1. Each user initializes a key pair . Then, encrypts the sensing data using and sends the ciphertext to . Generally, consists of four parts: , , , and .Step 2. After receiving , the server picks different random and constructs a monotone increasing (or decreasing) function for each value in the quadruples . The monotonicity of the four functions is inconsistent. For eligibility requirement interval (), the server calculatesThen, sends () to . For convenience, we will not describe and separately in the following text.Step 3. After receiving from , each user gets and then compares the size of . Next, the size relationship is sent to .Step 4. After the server receives the information from , if , then because of the monotonicity of the functions. And determines whether passes the eligibility assessment. Otherwise, it fails.

Because users do not know the function’s monotonicity, they cannot infer the size relationship between the qualified data and eligibility requirement. Therefore, we can think that malicious users have a very low probability of passing the eligibility assessment. Moreover, during the eligibility assessment, cannot know the specific qualified interval. also cannot get ’s sensing data, which can protect ’s privacy. The above process is represented by . If passes the eligibility assessment, then . If not, .

5.3. Secure Truth Discovery

In the secure truth discovery scheme [15], data exchange is between users and the server . The user needs to collect sensing data , perform the double-masking scheme to mask the raw input data (), and then send the masked input data to . The server receives masked input data from each user and aggregates the input data of online users. Each user can drop out at any time. As long as the number of surviving users is not less than the threshold , can eliminate the deviation caused by dropped users and restore the aggregation results. The detailed process is as follows.Step 0 (Key Generation). Assume users submit sensing data in the data submission phase. Given the security parameter and threshold value , a trusted third party creates three key pairs for each user as follows.where are used for signature, are used to generate a session key with other users for symmetric encryption, and are used to generate a session key with other users as the noise . Then, each user signs two public keys using as and sends to .When receiving messages from at least users (which denotes the surviving users as a set ), broadcasts to all users. Otherwise, abort.Step 1 (Key Sharing). After receiving the information from , each user confirms whether ; then, verifies whether the signature is valid using the public key for other user . If not, abort. Next, selects a random parameter and generates shares of and as follows.Then, each user generates a session key with other users and uses the symmetric authenticated encryption to encrypt two types of shares as follows.where the symmetric authenticated encryption is indistinguishable under ciphertext integrity attack and chosen plaintext attack. It can ensure the confidentiality and integrity of messages, which are exchanged between two parties. We do not repeat the details here. If any of the above processes fails, abort. Otherwise, each user sends to .When receiving messages from at least users (which denotes the surviving users as a set ), randomly initializes the ground truth and then broadcasts and to all users. Otherwise, abort.Step 2 (Masking Input Data). After receiving and from , each user confirms whether , then computes for every user , and gets masked input data as follows.where is the input data in the second round, represented by for convenience, and indicates the masked input data. If any of the above processes fails, abort. Otherwise, each user sends to .When receiving from at least users (which denotes the surviving users as a set ), sends the list of to all users. Otherwise, abort.Step 3 (Consistency Check). After receiving the list of from , each user confirms whether . Then, calculates the signature and sends it to .When receiving from at least users (which denotes the surviving users as a set ), sends to all users. Otherwise, abort.Step 4 (Unmasking). After receiving the list of from , each user confirms whether , , and the signature is valid using the public key . Then, decrypts for users as follows.Then, and will be sent to if and . If any of the above processes fails, abort.After receiving messages from users, performs the deviation elimination and regards users who modify the data as dropped users, and discards dropped users’ data. The surviving users are then denoted as a set . If , the secret key and masks can be reconstructed as follows.Furthermore, the can be reconstructed as follows.Next, the aggregated results of can be calculated as follows.Then, selects a random positive noise value to mask the raw aggregation results to prevent users from obtaining weight information.Next, sends to all users.Step 5 (Masked Input Generation). After receiving from , each user computes for every surviving user . Then, each user calculates the masked weight information as follows.So, the masked input data are denoted as , where the raw input data are . If any of the above processes fails, abort. Otherwise, each user sends to .Step 6 (Unmasking). After receiving from at least users (which denotes the surviving users as a set ), sends the list of to all users. Otherwise, abort. Then, each user decrypts as follows.

Then, and will be sent to if and . If any of the above processes fails, abort.

After receiving the information from at least users (which denotes the surviving users as a set ), restores the secret key , for each user and as follows.

Then, can calculate the aggregation results as follows.

Next, eliminates the random noise value as follows.

Therefore, the current ground truth and the weight of every user can be calculated using formulas (1) and (2) as follows.

Thus, can get the final ground truth and the weight of every user by repeating steps 0 to 6 until the convergence conditions are met. And the weight will be used to calculate the data quality of each user .

5.4. Reward Distribution

The weight calculated by truth discovery can represent the effective contribution of users. Still, to facilitate reward distribution, we need to quantify the data quality of every user further. Then, can compute the monetary reward according to the data quality of .

To achieve the rationality of reward distribution, we set , so the data quality of each user can be calculated as follows.

Next, we calculate the monetary reward of each user as follows. And the higher the quality of ’s data, the more reward can get.where is the average quality of all surviving users. represents the reward control parameter, which is a small rational number. The function of is to ensure that the reward is non-negative. And is the number of surviving users.

Since , we can know that the lowest reward which a user can get is . When the number of final online users , each user ’s reward is . If some users dropped out and , will distribute the task budget to each surviving user , and each user ‘s reward is . Therefore, our reward distribution formula is applicable regardless of whether there are users offline.

6. Analysis

In this section, we introduce property analysis, privacy analysis, and security analysis to illustrate the feasibility of our PAID.

6.1. Property Analysis

In this section, we introduce eligibility, zero knowledge, payment rationality, and budget rationality of our PAID.

Theorem 1. (eligibility)If the data () collected by users do not meet the eligibility requirement , these users cannot pass the eligibility assessment.

Proof. We assume that the user’s data are denoted as , and the eligibility requirement interval is . The user gets ciphertext using homomorphic encryption. Then, picks different random and constructs a monotone increasing (or decreasing) function . Then, computes , , and . When receiving , , from , the user decrypts to get and compares the sizes of , , . Because the user does not know the monotonicity of the function, it is impossible to determine the size relationship among the three numbers. Therefore, if the user’s data are not qualified, then it cannot pass the qualification judgment.

Theorem 2. (zero knowledge)The server can determine whether the user’s data meet the eligibility requirements, but it cannot know the user’s specific data content.

Proof. Similar to the description in Theorem 1, we assume that the user’s data are , and the server can receive the user’s homomorphic encrypted ciphertext . Since the Paillier cryptosystem is indistinguishable under the chosen plaintext attack, a malicious user has no way to recover the plaintext . The server may be curious about each user’s data, but it cannot obtain each user’s data without knowing the secret key.

Theorem 3. (payment rationality)If an honest user provides qualified data, can obtain a non-negative utility.

Proof. The utility of each user is determined by the cost of and the real reward from task publisher TP, i.e., .
If the data provided by an untrusted user are not qualified, cannot pass the eligibility assessment, so the untrusted user’s utility . However, when ( is the posted lowest pricing), an honest user will refuse to participate in the sensing task, so the trusted user’s utility . When , an honest user will participate in the sensing task and earn a reward . Since , , and , we haveTherefore, we can know that . To summarize, a user’s real reward is always non-negative.

Theorem 4. (budget rationality)The total payment of the task publisher TP is no larger than budget in our PAID.

Proof. The total rewards for all users are calculated as follows.Hence, , i.e., our PAID is budget rational.

6.2. Privacy Analysis

In this section, we demonstrate the protection of user’s sensing data, location, and identity privacy in our PAID.

Theorem 5. (data and location privacy protection)Except for the user himself, other parties cannot obtain the user’s sensing data and location data.

Proof. In PAID, the objects that steal users’ data and location privacy are mainly the server and external attackers. Specifically, the server may obtain users’ sensing data and location privacy in eligibility assessment and truth discovery. External attackers steal data and location privacy by eavesdropping on the communication between the server and users.
According to Theorem 2, we can know that our PAID has the property of zero knowledge, so the server cannot learn users’ sensing data and location data in the eligibility assessment. In truth discovery, users’ sensing data are sent to after performing the double-masking scheme. However, the server cannot recover users’ raw sensing data by double-masking sensing data. Furthermore, before the communication between the user and , the data are encrypted by AES symmetric encryption function . Therefore, as long as is secure, external attackers cannot steal the data by eavesdropping communication.

Theorem 6. (identity privacy protection)When users participate in a sensing task, they use an anonymous identity rather than their real identity. Therefore, any PPT adversary cannot distinguish the users’ identities.

Proof. In PAID, the anonymous identity of a user is represented by , and the real identity of is where , and ( is a token assigned by ). The user uses an anonymous identity rather than a real identity to participate in a sensing task. Because of the DDH problem, the PPT adversary cannot get the real identity of the user by the anonymous identity . We omit the detailed proof, and interested readers can learn more details in the literature [36].

6.3. Security Analysis

In this section, we describe the attacks our PAID can resist, including denial of payment attack (DoP), inference attack (IA), data pollution attack (DPA), and Sybil attack (SA).(1)Resistance to Denial of Payment Attack (DoP). We use the prepayment mechanism in our PAID. At the beginning of a sensing task, the task publisher TP pays the monetary rewards of users to in advance. If a malicious TP refuses to pay the monetary reward after receiving the data, can pay the reward to users according to the reward distribution formula. Therefore, the TP cannot refuse to pay users the reward.(2)Resistance to Inference Attack (IA). The server cannot initiate an inference attack on users’ data due to the zero-knowledge property of our PAID.(3)Resistance to Data Pollution Attack (DPA). Our PAID introduces eligibility assessment, and the unqualified data submitted by users are not used in the truth discovery algorithm. Therefore, our PAID can resist the data pollution attack (DPA).(4)Resistance to Sybil Attack (SA). The anonymous identity of a user needs the information provided by the user and the token assigned by . Each user can only obtain one token from and then get the anonymous identity using the key agreement algorithm. Hence, untrusted users cannot forge vast fake identities to launch the Sybil attack (SA).

7. Performance Evaluation

In this section, we use a temperature dataset from Roma for performance evaluation. First, we describe the computational and communication overhead of the eligibility assessment. Then, we show the performance of the truth discovery algorithm. Finally, the comparison with the related work shows that the quality quantification and incentive mechanism are effective.

In our experiment, the server has Intel(R) Xeon(R) E3-1231v3 3.4 GHz CPU, 16 GB RAM, 256 GB SSD, and 1 TB mechanical hard disk and runs on Ubuntu 18.04 operating system. These mobile devices are equipped with Android system with 2.2 GHz CPU and 4 GB RAM. The Roma temperature dataset includes users’ ID, date, time, longitude, latitude, and temperature. In particular, the range accuracy of location, time, and sensing data (temperature) is 1 meter, 1 second, and 0.01°C, respectively. Before performing the eligibility assessment, we convert the decimal interval to the corresponding integer interval by moving the decimal point to the right. Figure 3 shows the statistical results of 232 qualified users. And we select 100 data from all qualified data for performance evaluation.

7.1. Evaluation of Eligibility Assessment

In this section, we analyze the computational and communication overhead in the eligibility assessment. Table 2 shows the performance comparison between our PAID and related work.

7.1.1. Computational Overhead

The Paillier homomorphic encryption requires two exponents , one multiplication , and one modular operation . One decryption operation needs to perform two exponents , three divisions , and two modular operations . And in our interval judgment scheme, the user needs to perform one encryption and one decryption, so the computational cost of the user is , where is the number of users. The server needs to perform one encryption and calculates , so the computational overhead of the server is . Consequently, the total computational overhead is and the computation complexity of the interval judgment scheme is .

7.1.2. Communication Overhead

According to our interval judgment scheme, users need to send encrypted data to the server , and the communication overhead is bits, where is the product of two large primes . After receiving the encrypted data , the server calculates and sends it to the user. The communication overhead is bits, where denotes the bit length of ciphertext. So, we can conclude that the total communication overhead is bits.

7.2. Evaluation of Truth Discovery

In this section, we select 100 users to participate in the performance comparison of truth discovery. We compare the truth discovery of our PAID with the related word from five aspects, including accuracy, convergence, robustness to users dropping out, computational overhead, and communication overhead. The evaluation results show that our truth discovery algorithm has good accuracy, quick convergence, and high robustness to users dropping out. Besides, the computational overhead and communication overhead of our algorithm are better than those of the related work. Therefore, our truth discovery algorithm is reasonable.

7.2.1. Accuracy

We utilize the root of mean squared error (RMSE) to measure the resulting accuracy between PAID and CRH [32]. Figure 4 shows that the accuracy rates of PAID and CRH are similar when different numbers of users participate in a sensing task.

7.2.2. Convergence

To prove the convergence ability of our truth discovery algorithm in PAID, we choose four different initial values to calculate the error rate of ground truth. As shown in Figure 5, our PAID can converge quickly in a few iterations when choosing different initial values.

7.2.3. Robustness to Users Dropping Out

To analyze the robustness of our PAID to dropped users, we count the number of PAID failures and compare with related work PPTD [41]. Failure means that the model cannot continue to run and have to restart because of users’ exit. In the PPTD, it is considered as a failure once a user quits in the whole truth discovery process. In our PAID, it is deemed to be a failure only when the number of online users is less than the threshold ( in our experiment). And we repeat the experiment 50 times to count the failure times of the two models. Figure 6 shows the failure times of the two models when different users participate in a sensing task. We can know that the number of PPTD failures increases as the number of users increases. However, as long as online users’ number is greater than the threshold, our PAID is robust to dropped users.

7.2.4. Computational Overhead

We compare the computational overhead of PAID and PPTD [41]. Figure 7 shows the running time of the two schemes for different users. It is evident that the running time of our PAID is far less than that of PPTD.

7.2.5. Communication Overhead

We count the communication overhead of users in a complete iterative process and compare our scheme with PPTD [41]. And we do not count the server’s communication overhead because we can regard the total communication overhead of all users as the communication cost of the server. Table 3 shows that the communication overhead of our PAID is far less than that of PPTD, although the number of users is different.

7.3. Evaluation of Incentive Mechanism

In this section, we compare the monetary rewards of our PAID and related work. In the experiment, we select 100 users, including 80 qualified users and 20 unqualified users. And the budget , . DQTE [42] is a scheme that includes unqualified users in reward distribution, while DQTE+ removes unqualified users before reward distribution. As Figure 8 shows, users in DQTE get almost the same rewards. Although DQTE+ removes unqualified users, there is no obvious difference for users’ rewards except for the increase in each user’s monetary rewards. However, our scheme can provide higher monetary rewards for users who submit higher quality data. Therefore, our scheme can effectively motivate users to provide high-quality sensing data.

Truth discovery is an effective technology that can calculate the ground truth and users’ data quality from conflicting sensing data. Li et al. [32] proposed a general truth discovery scheme, but privacy protection is not in their work scope. To protect users’ privacy data, Miao et al. [41] proposed the first privacy-preserving truth discovery scheme using the Paillier cryptosystem, but the computational and communication costs are huge. Zheng et al. [43] designed a privacy-aware truth discovery, which greatly reduced the computational and communication overhead through a secure sum protocol. Zhang et al. [44] designed a truth discovery scheme using a one-way hash chain to ensure privacy security, and all truth discovery operations are completed by fog and cloud platforms. Tang et al. [45] used two servers to complete the calculation process of truth discovery, which can effectively protect users’ sensing data privacy. However, these works do not take into account the failure of the MCS system caused by users’ exit. Bonawitz et al. [39] proposed a double-masking scheme for secure data aggregation, and this scheme allows users to exit. After that, Xu et al. [15] designed a privacy-preserving truth discovery scheme based on the double-masking scheme. However, these truth discovery schemes do not incorporate incentive mechanisms. If malicious users constantly input erroneous data, it will affect the reliability of the results in the MCS system.

Another previous work [42, 46] related to this paper is the incentive mechanism in the MCS system. Zhang et al. [47] presented a reverse auction model which can motivate online users to participate in sensing tasks. Jin et al. [16] designed an incentive mechanism model based on reverse combinatorial auctions, which can maximize social welfare and effectively motivate users. Yang et al. [42] introduced a quality-aware incentive mechanism, which can distribute rewards to users after calculating the data quality. However, these works do not consider the privacy of users. In [27], the authors designed a privacy-preserving incentive mechanism model. Nevertheless, these solutions can not eliminate users who provide error data. Zhao et al. [20] presented an incentive mechanism model to evaluate the reliability of users’ data while protecting data privacy. Still, the user’s sensing data need to be submitted to the task publisher, so the privacy protection of sensing data is still insufficient. Later, Zhao et al. [48] proposed a privacy-preserving incentive mechanism based on truth discovery. This model uses two servers to achieve real-time reward distribution while protecting users’ privacy. However, most existing works do not take users’ exit into account.

9. Conclusion

In this paper, we propose a privacy-preserving incentive mechanism based on truth discovery in the MCS system. Specifically, we introduce an eligibility assessment scheme to estimate whether the data submitted by users are qualified. Next, the truth discovery scheme calculates the ground truth and the weight of each user. Then, we quantify the data quality of users by the weight and distribute the rewards. Besides, we also demonstrate that PAID meets eligibility, zero knowledge, payment rationality, and budget rationality. And the analysis shows that our PAID can resist the denial of payment attack, inference attack, data pollution attack, and Sybil attack. Finally, experiments illustrate that PAID is effective, efficient, and robust to dropped users. In future work, we will design an incentive mechanism model for the application of multidimensional sensing data collection.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This study was supported by the National Natural Science Foundation of China (nos. 61962022 and 62062034) and Key Research and Development Plan of Jiangxi Province (no. 20192BBE50077).