Abstract

The Internet of Things (IoT) has attracted the interests of both academia and industry and enables various real-world applications. The acquirement of large amounts of sensing data is a fundamental issue in IoT. An efficient way is obtaining sufficient data by the mobile crowdsensing. It is a promising paradigm which leverages the sensing capacity of portable mobile devices. The crowdsensing platform is the key entity who allocates tasks to participants in a mobile crowdsensing system. The strategy of task allocating is crucial for the crowdsensing platform, since it affects the data requester’s confidence, the participant’s confidence, and its own benefit. Traditional allocating algorithms regard the privacy preservation, which may lose the confidence of participants. In this paper, we propose a novel three-step algorithm which allocates tasks to participants with privacy consideration. It maximizes the benefit of the crowdsensing platform and meanwhile preserves the privacy of participants. Evaluation results on both benefit and privacy aspects show the effectiveness of our proposed algorithm.

1. Introduction

The IoT is an efficient network that connects various devices on the Internet. It often consists of sensor-equipped devices that can sense, communicate, and react to environmental variations [1]. The developments of IoT in both academia and industry are rapid, because of its promising market value [2, 3]. A large number of IoT applications have been developed and utilized in the society, such as the smart city [4, 5], smart grid [6, 7], and smart traffic [8, 9]. Most of the IoT applications require large amounts of sensing data for monitoring and computing. Therefore, the methods of acquiring sensing data are fundamental in IoT. An efficient way to acquire large amounts of sensing data is using the mobile crowdsensing. It is a promising sensing paradigm which encourages crowds to use mobile devices to collect sensing data. Since small-sized portable mobile devices become extremely prevalent in modern society, the mobile crowdsensing reveals its high performance on the collection of sensing data [1012]. There is a wide range of IoT applications based on mobile crowdsensing, such as environmental monitoring [13, 14], healthcare [15], and smart cities [16, 17].

A mobile crowdsensing system typically consists of a crowdsensing platform (CSP), a set of data requesters, and a set of participants. Data requesters publish requirements of the sensing data to the CSP. The CSP segments tasks allocate segmented tasks to suitable participants and release the uploaded data from participants to requesters. The goals of the CSP are maintaining sufficient participants and maximizing its own benefit. The strategy of task allocating is crucial for the CSP, since it affects the data requester’s confidence, the candidate participant’s confidence and its own benefit. Specifically, participants care about their workloads and compensations. They like to collaborate with the CSP that always assigns proper sensing tasks and offers fair compensations. If a CSP always assigns improper tasks to participants, i.e., assigns participants to go far away from their daily active regions, the CSP will lose many participants. This may decrease the CSP’s capability of acquiring sensing data, lose the data requester’s confidence, and reduce the CSP’s benefit finally. Thus, the CSP should allocate proper tasks to participants for above reasons.

Moreover, the privacy preservation becomes important for the CSP, because the crowds care about the disclosure of their sensitive information in recent years [1821]. Only the CSP who preserves the privacy can maintain sufficient participants. In this study, we treat the CSP as a trusted entity for participants. However, data requesters are untrusted, and we treat them as potential adversaries. They may be curious about the sensitive information of participants. For example, the requested data always associates with locations in crowdsensing. Adversaries can extract movement patterns of participants from acquired data. The movement patterns are sensitive since adversaries may be able to identify the addresses of participants’ homes, schools, or working places [22]. Thus, adversaries usually choose the participants with abnormal profiles as vulnerable users and very likely execute further attacks, so that the privacy preservation is important for the task allocation as well.

Therefore, we investigate the strategy of task allocation with basic considerations and the privacy consideration. Specifically, we first formulate the problem of task allocation. This formulation carefully considers the utility of CSP and the privacy disclosure of participants. A task allocation algorithm with privacy preservation (TAPP) is proposed. It consists of three phases, allocating tasks without privacy preservation, modifying allocations with privacy consideration, and merging the allocations. Furthermore, a series of evaluations show that the proposed algorithm achieves outstanding performance on many aspects.(1)We first formulate the problem of task allocation with privacy preservation on the CSP’s site. We utilize the relative entropy to formulate the privacy disclosure of the participants. The problem formulation is based on a series of assumptions, such as limits of the participant’s total time cost and privacy disclosure.(2)A three-step algorithm, named TAPP, is proposed to allocate proper tasks to participants. The output allocating strategy gain a high benefit for the CSP meanwhile preserves the privacy of participants.(3)Extensive evaluations are executed based on real-world crowdsensing datasets. The evaluation results show TAPP performs well on maximizing the CSP’s benefit and preserving the privacy of participants simultaneously.

The remaining of the paper is organized as follows. The related works are introduced in Section 2. The problem formulation is presented in Section 3. Section 4 discusses the complexity of the formulated problem and introduces the proposed algorithm. Section 5 validates the effectiveness of the algorithm on several aspects. Section 6 concludes the paper.

A large number of researchers concentrate on the task allocation in mobile crowdsensing. Traditional methods of task allocation lack privacy considerations. They make the allocation strategies according to some basic metrics, such as the quality of sensing data [2325], the incentive cost [26, 27], the energy consumption [14, 28], and the travel distance [2931]. The methods which focus on the quality of sensing data are mostly designed for monitoring the environment. They measure the quality of sensing data by a certain metric and attempt to maximize the data quality. The methods focus on the incentive cost allocating tasks on the site of the CSP. Xiong et al. [26] propose an incentive mechanism which minimizes the total budget of the CSP. In their study, the CSP pays according to the participant number. Zhang et al. [27] design a different incentive mechanism which assumes the CSP pays according to tasks. Xiong et al. [14] propose an energy-saving technique, named, piggybacking. It is an optimal collaborative data sensing and uploading scheme which reduces the energy consumption. The travel distance is widely considered in previous studies as well. The methods [2931] measure the travel distances of participants by numerical values. They contain the same object to minimize the overall travel distance for all sensing tasks.

The crowds care more about the disclosure of their sensitive information in recent years [32]. The privacy preservation in mobile crowdsensing has attracted increasing research interests. Numerous preservation methods are proposed regarding to the spatial privacy, one of the important privacies. Some traditional methods [3336] based on spatial cloaking are suitable for preserving privacy in mobile crowdsensing. These methods hide the participant spatial information by spatial transformations, generalization, or a set of dummy locations to preserve privacy. Kazemi et al. [37] propose a privacy protection method which directly applies to mobile crowdsensing. This method considers the CSP untrusted and adjusts the spatial information of a participant group. To et al. [38] adopt the differential privacy mechanism and propose a method for spatial crowdsensing task allocation. This method sets a trusted third party entity to aggregate the spatial information of participants. Wang et el. [39] propose a truthful incentive mechanism which preserves the privacy based on differential privacy and auction theory. Duan et al. [40] introduce the reverse auction to task allocation and design allocating algorithm in a novel respective.

A closely related work to ours is presented by Wang et al. [41]. They first preserve the spatial privacy on the participant’s site and then allocate the tasks. This adds an extra procedure to the participants. The preservation mechanism is based on the differential privacy. In contrast, our study preserves the privacy on the CSP’s site, which does not bother the participants.

3. Problem Formulation

This section introduces the general system model, system input, the definition of utility, and the requirements of privacy preservation. The list of main notations is shown in Table 1.

3.1. System Model

A general mobile crowdsensing system consists of three main entities: participants, the CSP, and data requesters, as shown in Figure 1. In this study, we consider the task allocation problem under several specific assumptions of these entities, described as follows.

Participants. The participants receive assigned crowdsensing tasks from the CSP. Each participant actively finishes assigned tasks in time, if the tasks are not overmuch and their privacy is protected. After that, they get a reward from the CSP paid daily or monthly.

Crowdsensing Platform (CSP). The CSP is trusted by participants and knows sensitive information of participants. The CSP receives the crowdsensing requirements from the data requesters and assigns tasks to the suitable participants. The CSP releases the requested data to data requesters and gets rewards from them.

Data Requesters. The data requesters acquire data from the CSP. They may extract the sensitive information of participants from the acquired data, which leads to a privacy leakage.

Note the payment assumption of participants, we consider the scenario that each participant who finishes all assigned tasks is paid daily or monthly [42, 43]. This payment setting is helpful for the quality of acquired sensing data, since it assigns tasks regularly to fixed participants.

3.2. System Input

Assume the crowdsensing system executes the crowdsensing works in a fixed area, which consists of subregions . The crowdsensing system has participants . Each participant has a personal movement pattern in real-life. We call this pattern participant’s actual profile, defined as follows: where , is an infinitely small quantity, and . We use instead of zero as the lower bound, because this avoids the condition happening when we calculate the privacy disclosure. We call is the inactive subregion of if , which means the participant never goes to the subregion . Otherwise, we call is the active subregion of . The CSP acquires the actual profile of each participant, by requiring this information in the register procedure.

When the CSP receives original tasks from the data requesters, the CSP divides the original tasks into unit tasks. Each unit task requires the same time cost and associates with a subregion . Set the time cost of one unit task as one for simplicity. The CSP collects all unit tasks according to associated regions; thus the workload (required time cost) in each subregion is . means there is only one unit task required in .

The CSP allocates the unit tasks to the participants, denoted as . For example, means participant needs to cost time to finish the allocated tasks in region . Each participant has a time threshold . Assume the participants care about both the workloads and work regions. Therefore, the CSP should avoid following situations to maintain sufficient participants.(1)Total time cost: if the total time cost exceeds the time threshold of the participant, he/she will quit the crowdsensing system. This means the participants are assigned too much works with insufficient rewards.(2)Work regions: if the participant is assigned to a location where he/she never goes, i.e., , while , he/she will quit the crowdsensing system. This means the participants are assigned unsuitable works regarding to their actual profile.

The CSP collects the data generated by participants and sends the data to the data requester for rewards. Then, the data requesters can extract a movement pattern of each participant from acquired data. We define this pattern as observed profile of each participant , where , and . Each is calculated as follows:

3.3. Utility

The utility of the CSP is its benefit. Recall the payment assumption that if a participant finishes all assigned tasks, the participant gets payment daily or monthly. If a participant has no assigned task, the participant gets no payment. The CSP gets a reward when all the requested tasks are finished; meanwhile it pays each recruited participant .

The utility of CSP is

3.4. Privacy

Since the data requesters are curious and untrusted, they may be the adversaries. They infer the movement patterns of participants, which is the concerned sensitive information, from the observed profiles. The adversaries usually choose the participants with abnormal profiles as vulnerable users and very likely execute further attacks. Thus, we define the difference between an individual and its community as the privacy disclosure in this study.

Assume participant is in community with several other participants. The average profile of the community is where

Then we define the privacy disclosure of as the relative entropy between and Furthermore, each participant may set a privacy threshold on this divergence, noted as . When CSP allocates tasks to participants, the relative entropy for each participant should be bounded by the privacy threshold to guarantee adversaries cannot learn significantly private information from the observed profiles.

3.5. Design Object

Our object is to derive an allocation scheme for the CSP, so that sufficient participants are maintained by allocating suitable tasks and preserving their privacy, meanwhile maximizing the benefit for the CSP. We formalize the problem as follows:

4. Task Allocation Algorithm

In this section, we first analyze the complexity of the formulated problem. Then we introduce overview of the proposed algorithm. In the remaining parts, the main phases of the whole algorithm are presented.

4.1. Complexity

The problem of achieving the maximum benefit for the CSP following constraints (9), (10), and (11) is NP-hard.

Consider an arbitrary instance of the minimum set cover problem, consisting of an universal set , series of subsets . We construct the instance of the maximum benefit problem corresponding to the instance of the minimum set cover problem. We set the privacy threshold and time threshold for each , which means there is no privacy constraint and the time threshold equals to the number of subregions where the participant goes. We construct the workload the crowdsensing regions , regarding the universal set and participants corresponding to the subsets. Each corresponding to associates with the actual profile . Set ; otherwise , where is the element number of . We should set the allocation of each as , ; otherwise for maximizing the benefit, for example, following our construction, given and a with the actual profile and . Since using less participants increases the benefit of the CSP, the allocation is more likely to maximize the benefit than .

Thus, finding the minimum set cover equals to finding an allocation which covers all the tasks and selects the minimum number of participants, i.e., achieving the maximum benefit. As the reduction shown above, the formulated problem is NP-hard.

4.2. Overview of the Algorithm

Our task allocation algorithm with privacy preservation, called TAPP, runs on the site of the CSP. The CSP first receives original tasks from the data requesters. It divides the original tasks into unit tasks and merges all unit tasks according to associated subregions. Then the framework analyzes the uploaded information from participants. Then it acquires time thresholds, privacy thresholds, actual profiles, and community profiles. By collecting all the inputs, the framework makes an allocation strategy by following three phases: (i) allocating tasks without privacy preservation, as shown in Algorithm 1; (ii) modifying allocations with the privacy consideration, as shown in Algorithm 2; (iii) reducing allocated participants by merging tasks of two participants, as shown in Algorithm 3.

Input: participants , workloads , each community profile , each actual
profile , each time treshold
Output: the allocation for each
1:for each   do
2:set each , ;
3:compute by ;
4:for each   do
5:if    then  ;
6:else  ;
7:for each   do
8:if    then fails;
9:while    do
10:update ;
11:;
12:, ;
13:for each   do
14:if    then  ;
15:if    then fails;
16:choose with , minimum , and maximum ;
17:, , ;
18:if    then each ;
19:if   and   then fails;
Input: subregions , participants , each community profile , each
workload
Output: the allocation for each
1:update each by ;
2:for each that   do
3:if    then  ;
4:each ;
5:sort by in descending order;
6:sort by in ascending order;
7:for each   do
8:compute ;
9:while    do
10:, , ;
11:for each   do
12:if   and
13:then compute ;
14:else continue;
15:if    then  ;
16:else if  
17:then  ;
18:if    then
19:;
20:else if    then
21:;
22:else fails;
23:, ;
24:, ;
25:if    then   deletes ;
26:if   or   then break;
27:compute ;
Input: subregions , participants , workloads , each community profile
, each actual profile
Output: the allocation for each
1:update each by ;
2:for each   do
3:for each that   do
4:if   and covers   then
5:;
6:compute by ;
7:if    then
8:, ;
9:each , ;
10:for each that   do
11:if    then fails;
4.3. Task Allocation Phase

In this part, we introduce the first phase of TAPP algorithm. It iteratively picks a unit task according to the task priority and allocates the unit task to a participant according to the participant priority. The algorithm only considers the time constraints of participants and ignores the privacy constraints. Generally, the algorithm properly allocates all tasks to participants with no consideration of the privacy preservation.

Combined with Algorithm 1, the algorithm first initializes the allocations , the total workload , and the set for each . Each indicates the active or inactive subregion of ; i.e., if ; otherwise . Then the algorithm computes the set in Lines 4~6, where is a very large number. Each denotes the total available time of participants who are active in subregion . It checks the worst case that the tasks are unfinished, even if all participants cost all of their time in one subregion (Lines 7 and 8). After this, it allocates one union task to a participant step by step, until all tasks are allocated.

In each step, the algorithm chooses a subregion and a participant for task allocation. Specifically, it first updates the set by the method shown in Lines 4~6. Then it computes the priority of tasks associated with different subregions. The priority set is , where . We choose indicates the priority, since the tasks in a subregion with minimum excess should first be assigned. For example, assume some . The subregion is the active subregion only for participants and , which means only and . The time threshold of and are 5, respectively. Thus we can only allocate them spending all their time to finish tasks in for a feasible allocation. So that the algorithm chooses associated with , the minimum one in . For each participant who contains active subregion , it chooses the participant that contains minimum number of active regions and the maximum rest time. Then, the algorithm allocates a unit task from to (Line 17). It sets all for , when the workload for equals the time threshold. This makes the participant who has no rest time never be chosen anymore.

The allocation iterates until all the tasks are allocated. Otherwise, it fails (Line 19). The allocation loop is the main part of this algorithm, which costs or time. It finally costs time in total, since is always much bigger than in practice.

4.4. Allocation Modification Phase

We introduce the second phase of TAPP in this part. We call the participant whose privacy leakage is bigger than the privacy threshold as a dangerous participant. In this phase, the algorithm modifies the allocations among participants in order to reduce dangerous participants. Specifically, the algorithm transfers some workloads from dangerous participants to safe participants, in order to make all participants safe.

Combined with Algorithm 2, the algorithm first updates set by the actual profiles. Then it checks all allocated participants and adds dangerous participants in set (Line 2~3). The set contains two kinds of participants: (i) the safe participants have some allocated tasks but their workloads are less than the time threshold; (ii) the participants have no allocated task. The algorithm sorts the participants in regarding to their workloads in descending order. It sorts each participant in regarding in ascending order. This means the dangerous participants first choose safe participants with allocated tasks, when they search for safe participants to transfer their workloads. After these, the algorithm transfers some allocations from dangerous participants to safe participants.

For each dangerous participant, the algorithm first computes the set . Each denotes the variation of the privacy leakage if reduces a unit task associated with . Moreover, we set if there is no candidate safe participants to allocate an unit task in , where is a very large number. Thus, means the privacy leakage will reduce if we reduce an unit task in for . The algorithm iteratively transfers a unit task associated with minimum to the candidate safe participant, until the privacy leakage of the dangerous participant is less than the threshold or its workload is zero (Lines 9~27). Specifically, a candidate safe participant should satisfy and , given the transferred task associated with . Then it computes each (Lines 12~14), which is similar to . The difference is that is the variation of the privacy leakage if adds an unit task associated with . The algorithm chooses the candidate according to the rest workload (Lines 18~22). Then the algorithm transfers an unit task in from dangerous to safe .

The algorithm attempts to modify the allocations of all dangerous participants by the above transferring method. The best case is that there is no dangerous participant after these modifications. The time complexity of this phase is in total.

4.5. Allocation Mergence Phase

After the modification phase, we introduce the allocation mergence phase of TAPP in this part. The basic idea of this phase is that we can transfer all the allocations of a to a , if the time and privacy constraints of are still satisfied. This procedure reduces the number of allocated participants; thus it increases the utility of the CSP.

Combined with Algorithm 3, the algorithm first updates each set by actual profiles. Then it iteratively checks each participant pair whether all their allocations can be merged and allocated to (Lines 2~9). For each participant pair , the algorithm first check the time constraint of , where covers means (Line 4). The set is the mergence of all allocations (Line 5). Then the algorithm checks the privacy constraint of , if we allocate the to . The algorithm allocates the mergence to if the time and privacy constraints are satisfied and allocates no task to (Line 7~9).

The algorithm checks all allocated participants at the end. If there still are some dangerous participants, the algorithm fails. The time complexity of this phase is in total.

5. Evaluation

We evaluate the performance of TAPP towards a real-world dataset from Yelp (https://www.yelp.com/dataset/challenge). Yelp is a location-based service system where reviewers publish reviews and comments for nearby businesses. In our evaluation, we consider reviewers as participants, and reviews as tasks. A review associates with a business, and the business associates with a location.

Three cities are considered in our evaluation: Cleveland, OH, USA; Tempe, AZ, USA; Calgary, AB, Canada. The user activities in each city reflect different real-world situations. Thus, these three cities are representative for evaluations. Specifically, we focus on the active participants with more than 30 reviews. The numbers of active participants in each city are shown in Table 2. The area of each city is divided into 3 by 3 grids. Since each review has a corresponding grid, the participant’s actual profile is the ratio of reviews located in each grid. We consider the participants of a city belong to a community. The community profile for each city is the average value evaluated from all the participants’ actual profiles. The cost for finishing a unit task is set to 1.

This evaluation focuses on two metrics: the number of dangerous participants, and the utility of the CSP. A dangerous participant is the one who has a privacy leakage more than its privacy threshold at the end of allocation. The number of dangerous participant is bigger than zero means the allocation is unfeasible. However, this metric helps us analyze the effectiveness of the allocation algorithm. Thus, we change Line 22 to “else break” in Algorithm 2 for the evaluation.

The TAPP is compared with two baseline allocating methods. Baseline1 is the first phase of TAPP. Baseline2 is a greedy allocating method. It prefers to allocate the maximum to participants who have the active subregion and the maximum time threshold.

5.1. General Performance

We validate the effectiveness of TAPP in this part. The time threshold and the privacy threshold are set as the same value for each participant, respectively. Moreover, the privacy threshold ranges from 0.2 to 0.8.

Figure 2 shows the numbers of dangerous participants in each city. As we can see, when the privacy threshold is small, three algorithms suffer large numbers of dangerous participants. However, the number of dangerous participants in TAPP is averagely 76.23% less than Baseline1 and Baseline2, when the privacy threshold ranges from 0.2 to 0.4, because the second and third phases of TAPP help to reduce the dangerous participants. The TAPP acquires feasible allocating solutions when the privacy threshold grows. Specifically, TAPP gets feasible solutions after the privacy threshold achieves 0.5, 0.5, and 0.6, respectively. Note that TAPP first acquires a feasible solution in Calgary with bigger privacy threshold than the other cities. It is because that the city with more population, the profiles are more heterogeneous. Then the algorithm performs relatively worse.

Figure 3 shows the utility of the CSP in each city. We treat an allocating solution as a feasible solution for comparison, even if it is unfeasible. Specifically, given participants in total and participants who have allocated tasks, the utility of the CSP is in this evaluation section. Because Baseline2 is based on the greedy strategy, the results of Baseline2 are close to optimal if there is no privacy constraint. The TAPP gets close to the results of Baseline2, when the privacy threshold grows bigger. By further analysis between TAPP and Baseline1, the second and third phases of TAPP averagely increase the utility in three cities. Focusing on the results in Calgary, the heterogeneity of profiles affects the utility as well, since it affects the allocating solution.

5.2. Performance for Different Cases

In this part, we investigate the performance of TAPP for different types of task distributions. The results indicate the effectiveness of our algorithm under different task workloads. Specifically, we set the requested tasks following the distribution of the community profile, and the requested tasks following the uniform distribution. and satisfy . The results under and are denoted as com-distribution and uni-distribution, respectively. The rest settings are as the same as in Section 5.1.

Figure 4 shows the numbers of dangerous participants under these two distributions. The privacy threshold under uni-distribution is bigger than com-distribution, when TAPP first acquires a feasible solution. This is caused by the privacy constraint. The privacy constraint is based on the relative entropy between the observed profile and the community profile. Since follows the distribution of the community profile, the algorithm is easier to acquire a solution which satisfies the privacy constraint. Meanwhile, TAPP acquires the first feasible solution under uni-distribution, when the privacy threshold is a little bigger than it under com-distribution. Comparing the results among three cities, the heterogeneity of profiles affects the dangerous numbers under different distributions as well.

Figure 5 shows the utilities of the CSP under these two distributions. The utilities under uni-distribution are less than com-distribution in all cases. This is because the task allocation is based on the privacy constraint, which is strongly related to the community profile. Since the com-distribution follows the same distribution with the community profile, the task allocation regarding the privacy constraint under the com-distribution is easier than the uni-distribution. The TAPP tries to make the solution satisfying the privacy constraint by allocating more participants, when given a more strict task distribution. The utility increases and under different distributions in Tempe and Calgary, respectively. Thus, the heterogeneity of profiles may not affect the utilities under different distributions.

6. Conclusion

Since the crowds care more about their privacy disclosure in recent years, the design of a task allocation algorithm should consider the privacy preservation. In this study, we investigate the algorithm of task allocation with basic considerations and the spatial privacy consideration. The problem formulation of task allocation is first presented. After that, we propose a task allocation algorithm on CSP’s site with privacy preservation based on the formulation. It consists of three phases, allocating tasks without privacy preservation, modifying allocations with privacy consideration and merging the allocations. The algorithm maximizes the benefit of the CSP, but meanwhile preserves the special privacy of participants. Evaluation results on utility and privacy aspects show the effectiveness of our proposed algorithm.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This research work is supported by the key research plan for State Commission of Science Technology of China (2018YFC0807501, 2018YFC0807503), by the Foundation of Science & Technology Department of Sichuan province under Grants nos. 2017JY0027, 2017JY0007, 2018JY0067, 2017GFW0128, and 2016FZ0108, and by the Sichuan Provincial Economic and Information Commission (no. 2018DS010).