Abstract

Mobile crowdsensing (MCS) offers a novel paradigm for large-scale sensing with the proliferation of smartphones. Task assignment is a critical problem in mobile crowdsensing (MCS), where service providers attempt to recruit a group of brilliant users to complete the sensing task at a limited cost. However, selecting an appropriate set of users with high quality and low cost is challenging. Existing works of task assignment ignore the data redundancy of large-scale users and the individual preference of service providers, resulting in a significant workload on the sensing platform and inaccurate assignment results. To tackle this issue, we propose a task assignment method based on user-union clustering and individual preferences, which considers the influence of clustering data quality and preference-based sensing cost. Firstly, we design a user-union clustering algorithm (UCA) by defining user similarity and setting user scale, which aims to balance user distribution, reduce data redundancy, and improve the accuracy of high-quality user aggregation. Then, we consider individual preferences of service providers and construct a preference-based task assignment algorithm (PTA) to achieve the diversified sensing cost control needs. To evaluate the performance of the proposed solutions, extensive simulations are conducted. The results demonstrate that our proposed solutions outperform the baseline algorithm, which realizes the individual preference-based task assignment under the premise of ensuring high-quality data.

1. Introduction

The pervasive adoption of mobile smart devices and the rapid development of communication network technologies have accelerated the unprecedented expansion of mobile crowdsensing (MCS) in many aspects of our daily lives. MCS [1] is a compelling paradigm that allows a large group of individuals to collaboratively sense data and extract information about social events and national phenomena with common interest using mobile devices (e.g., smartphones, smart glasses).

Task assignment is a critical problem in MCS, where service providers attempt to recruit a group of brilliant users to complete the sensing task at a limited cost. Thus, the core goal of task assignment is to make a good balance between data quality and sensing cost [2]. Specifically, suppose the sensing platform always assigns inappropriate tasks to users and keeps users away from daily activities. In that case, users will refuse to perform tasks, leading to revenue loss for service providers and reducing the sensing utility. In addition, service providers may have individual preferences (i.e., maximizing benefits, minimizing costs) when selecting user data, further increasing the complexity and diversity of the task assignment.

By now, various methods to improve data quality have been proposed for task assignment. While service providers can enjoy the convenience provided by user data, large-scale user data will lead to the increase of redundant data. Data redundancy is a potential threat to data quality, which will increase the workload of the platform and reduce the accuracy of the task assignment. Shahraki et al. [3] pointed out that cluster analysis can solve the problem of data redundancy. The most common data clustering mechanism is the -means algorithm, despite the effectiveness, applying the -means algorithm in MCS to realize task assignments still need to tackle complexity and balance challenges. Firstly, the -means algorithm may require a high calculation time. Unfortunately, most task assignment methods are time-sensitive. That is, the acquisition of high-quality data should not be at the expense of time. Secondly, these algorithms are still a risk of poor data balance due to work neglecting the difference in the scale of clustered users. Thus, leveraging clustering analysis to form balanced, low-redundancy, and high-quality data in sensing regions with a limited time is a crucial problem in the MCS task assignment.

Sensing cost is another important issue to consider in task assignment, which aims to select suitable users to achieve task assignment. Up to now, the research works on sensing cost mainly focused user recruitment cost [46], user travel cost [79], and data transferring cost [1013]. Most of the existing works for sensing cost consider a homogeneous preference model, which assumes all service providers have the same preference. Each service provider selects user data independently and randomly according to the same task preference. However, this model is at best an approximation, because different service providers indeed have various tastes and preferences. Such heterogeneity in preferences of service providers has been observed in [14].

The shortcomings of existing works drive us to explore a new task assignment method from data quality and sensing cost for realistic MCS applications. Our research efforts aim to achieve a practical task assignment in different individual preferences for real MCS with varying user data quality, while ensuring high-quality clustering data and preference-based sensing cost. More specifically, we first formulate the problem of task assignment. This formulation carefully considers the quality of clustering data and the individual preference for sensing cost. Afterward, a task assignment method is proposed based on user-union clustering and individual preferences. Different from prior works on task assignment, we first considered data redundancy caused by large-scale user data, leveraged the clustering method to reduce data redundancy and improve the accuracy of high-quality user aggregation. Then, based on this solution, we analyzed the impact of individual preferences and solved the diversified task assignment under the individual preference sensing cost.

In summary, this paper makes the following contributions: (i)We formulate the task assignment problem from two perspectives. High-quality clustering data and the individual preference sensing cost are considered in our formulation(ii)A UCA-based solution is proposed to balance user data scale, reduce data redundancy, and ultimately improve platform efficiency and data quality(iii)A PTA-based solution is proposed to solve the task assignment under the individual preference sensing cost. To the best of our knowledge, this is the first work that validates from different perspectives of the task assignment the benefits of exploiting individual preferences and that gains insights through simulations based on real-world data

The rest of the paper is organized as follows: Section 2 discusses related work. Section 3 introduces our system model and problem formulation. Our UCA and PTA solutions are presented in Section 4. In Section 5, we evaluate our proposed method and present evaluation results. Finally, we conclude this paper in Section 6.

Data quality and sensing cost become the main criterion to assign the tasks. Much work has been done to support the efficient task assignment in MCS. In the following, we shall introduce existing work in these two criteria.

2.1. Data Quality

Improving the accuracy of data quality is an essential design objective for most task assignments. Several factors have a significant impact on data quality, including data collection times, task duration, and data spatial-temporal coverage.

Data collection times refer to the number of times a target phenomenon is expected to be sensed. On the one hand, multiple measurements can reduce sensor reading errors and make sensing results approach the ground truth. Gong et al. [15] pointed out data quality keeps increasing as the collection times increase, characterized by a non-decreasing sub-modular function. On the other hand, there are tiny fluctuations of sensing data even in short durations and small areas. Xiong et al. [16] proposed that data quality will no longer increase when the collected data exceeds a certain threshold. However, multiple measurements are necessary to improve data quality in most cases.

Task duration is the period from the instant a task is published to the deadline. Wang et al. [17] proposed a two-level heterogeneous pricing mechanism based on the timeliness and location dependence of random arrival in MCS. The proposed greedy task selection algorithm can help users choose the appropriate task to maximize the total revenue and realize task assignment. Zeng et al. [18] took the execution time of workers as the optimization goal, and proposed an adaptive Top-k worker selection algorithm to select the most appropriate workers and achieve efficient task assignment. Huang et al. [19] investigated and formulated the time-dependent task allocation problem, and characterized the cost of performing a sensing task for each mobile user. They proposed an efficient task assignment algorithm called the optimized allocation scheme of time-dependent tasks (OPAT), which can maximize the sensing capacity of each mobile user.

Data spatial-temporal coverage is another important metric to evaluate data quality and has been extensively studied. To evaluate the time coverage provided by a group of users over a period of time, Alagha et al. [20] considered users’ location and mobility mode. They designed a stable coverage recruitment parameter to realize task assignment. To reduce the system cost, Song et al. [21] migrated certain qualified users to less popular tasks to increase data coverage and optimize other performance factors. To satisfy both the service provider’s coverage sensing preference and the user’s revenue preference, Yucel et al. [22] proposed a coverage-aware stable task assignment method and proved that the user’s revenue is proportional to the task coverage scale. Experimental results show that this method achieved accurate task assignment on the premise of ensuring user satisfaction and coverage quality.

The above works are good references for addressing the data quality of task assignment. However, most of these studies have not considered the importance of clustering analysis. Guo et al. [23] analyzed some common problems in task assignment and pointed out that cluster-based task assignment is necessary for future MCS task assignment. In the research of user data clustering, Du et al. [24] combined the data quality of users and proposed a Bayesian co-clustering truth discovery model to capture the fine-grained reliability of users on different clusters. This model enhances the usability of each user under the most appropriate task, which is conducive to observing aggregated tasks. Jin et al. [25] proposed a novel MCS system framework that integrates an incentive, a data aggregation, and a data perturbation mechanism. The data aggregation mechanism incorporated workers’ reliability to generate highly accurate aggregated results. So far, the research of user clustering data is still in its infancy. It is crucial to consider clustering data quality evaluation to reduce redundancy and improve platform efficiency.

2.2. Sensing Cost

Sensing cost is the costs paid to perform tasks, including user recruitment cost, user travel cost, and data transferring cost. The first is paid by the sensing platform to recruited users for their involvement; the latter two are paid by users for their movements for data collection and data upload, respectively.

User recruitment cost includes per-user recruitment cost and per-data collection cost. To control the recruitment cost of users, Liu et al. [4] studied the user recruitment problem on both the user’s and subarea’s sides and proposed a three-step strategy, including user selection, subarea selection, and user-subarea-cross (US-cross) selection. Extensive experiments on two real-world data sets show that user recruitment algorithms can effectively enhance the data inference accuracy under a budget constraint. In practical application, Campioni et al. [5] improved recruitment algorithms for vehicular crowdsensing networks, which aims to select participants within a crowdsensing network such that the most sensing data is obtained for the lowest possible cost. Zhao et al. [6] classified the extrinsic utility into the task payoff shared with other participants and the resource cost incurred by participation. Based on this, they proposed a social-aware incentive mechanism by deep reinforcement learning (DRL-SIM) to control user recruitment cost and derive the optimal long-term sensing strategy for all vehicles.

User travel cost relies on the traveling paths of users, which could be fixed, predetermined, or predictable based on users’ historical trajectories [7]. In fixed/predetermined-path-based MCS, each user can perform tasks alone or near their traveling path. In this case, the task assignment problem can often be transformed into a set cover problem or bipartite graph matching problem. Wei et al. [8] considered user moving cost and sensing level. They proposed a greedy task assignment algorithm, GP-BS, to select the most cost-effective participant iteratively. In predictable-path-based MCS, the traveling path of each worker is not predetermined. It is tough to accurately predict the specific locations of users in the future at a fine granularity. Wang et al. [9] proposed an approach that exploits the spatial-temporal causality among travel speeds of road sections by a time-lagged correlation coefficient function, which aims to overcome the uneven spatial-temporal distribution of vehicles and the variation of their data-offering intervals. For the sparse MCS scene, Wang et al. [10] propose a deep learning-enabled industrial sensing and prediction scheme, aiming to achieve high-precision prediction of future moments under the hypothesis of sparse historical data.

Data transferring cost is the cost generated for uploading sensing data. Wang et al. [11] considered that the users’ main concern is the cost of data uploading, which affects their willingness to participate in a crowdsensing task. The proposed efficient prediction-based user recruitment for MCS can achieve a lower recruitment payment and the highest delivery efficiency. In [12], a data transfer solution for crowdsensing was proposed to minimize the number of users under the constraints of the quality of sensing data and coverage area of all cell towers. When multiple tasks share a pool of staff with bandwidth constraints, a multi-task allocation strategy is proposed in [13] to ensure platform revenue.

Task assignment algorithms for MCS were designed following the different sensing costs. However, the algorithms proposed in the existing works are usually designed based on a fixed choice. That is, they all neglect the individual preferences for sensing cost. In our previous work [26], we have pointed out the influence of individual preferences on selection. Therefore, it is necessary to consider the individual preferences to ensure the practicality of task assignment.

In summary, despite the variety of the literature on data quality and sensing cost in MCS task assignments, the goal is defined chiefly from the overall system’s point of view without considering the individual preferences and the importance of clustering data. Hence, they may not necessarily achieve high accuracy and rationality in the task assignment.

3. System Model and Problem Formulation

In this section, we first give the system model for task assignment in MCS. Then, we formulate the task assignment problem.

3.1. System Model
3.1.1. Model Construction

We consider a typical MCS architecture, including a trusted sensing platform, a set of sensing users, and a set of service providers, as shown in Figure 1.

For the task assignment, service providers can publish different sensing tasks and task centers to the sensing platform, denoted by , , respectively. The sensing platform assigns tasks to sensing users, denoted by . To reduce data redundancy and the burden on the sensing platform, users form the user-union before uploading data. Service providers select appropriate users to realize task assignment according to individual preferences.

In this paper, we make the following assumptions. (i)The initial locations of users are uniformly distributed in a specific region(ii)The sensing platform is only responsible for the data calculation between users and service providers(iii)Service providers have different individual preferences and decide the final choice. Service providers can only select one sensing user to achieve the task assignment

Such assumptions are practical in enterprise or agreement-based cooperation scenarios [27].

3.2. Problem Formulation
3.2.1. Data Quality Problem

(1) Data Quality Problem Formulation. In data quality research, considering a large number of sensing users, each user uploading data in an independent way will lead to a decrease in sensing utility. Therefore, we leverage the clustering method to reduce data redundancy and improve the accuracy of high-quality user aggregation. We evaluate the data quality and transform the user clustering problem into the maximum similarity matching problem, which can be expressed as follows:

The goal of Equation (1) is to form a union with the highest user similarity from large-scale participating users, so as to reduce data redundancy and improve the efficiency of the sensing platform. represents the similarity between and, which can be expressed as follows: where represents the value of under evaluation index a and represents the value of under evaluation index a.

(2) Method Construction. Step 1: Define the user similarity as a two-tuple.

Definition 1. Define the user similarity as .
is the participants of both clustering data, where denotes the user, and denotes the center of task .

Step 2: Calculate the similarity between and, sort the calculation results, and construct the user-union clustering.

Step 3: Set a maximum user limit τ in each user-union to ensure the balance of the union, i.e.,.

3.2.2. Sensing Cost Problem

(1) Sensing Cost Problem Formulation. In Section 3.2.1, we use user data to build user-unions, which realize user clustering, reduce data scale and ensure data quality. Based on this solution, we consider the diversity and individual preferences for service providers, and solve the diversified task assignment under the individual preference sensing cost.

For each task assignment problem, each user-union has n sets of user schemes and m sets of data sensing cost evaluation indexes, denoted by and . Each user scheme represents a sensing cost requirement, which can be evaluated by the sensing cost indexes, denoted by . According to the decision selection sample matrix, service providers select the appropriate user to realize task assignment. The decision selection sample matrix is expressed as follows:

Based on the above conditions, we normalize the decision information matrix, and use the prospect theory [28] to obtain the positive and negative prospect value matrix. Finally, the acceptability advantage solution is used to sort the schemes and select the most suitable users. Therefore, we transform the preference-based sensing cost problem into the maximum comprehensive prospect value, which can be expressed as follows:

The goal of Equation (4) is to solve the maximum comprehensive prospect value, so as to achieve the preference-based task assignment. The objective function in the first line is to solve the optimal evaluation index weight. The second and third lines define the range of each index, respectively.

(2) Method Construction. Step 1: Normalize the decision matrix of user scheme. We define the user sensing costs as the cost index and the benefit index, denoted by and , , , and , which can be expressed as:

Step 2: Determine the positive and the negative prospect value matrix. The normalized decision matrix is recorded as . We construct the positive and the negative prospect value matrix, which can be expressed as: where and represent the positive and the negative ideal scheme, respectively.

Step 3: Calculate the correlation coefficient. A proper task assignment usually needs a reference node to measure the prospect value of the scheme, rather than the actual value of the decision result. Therefore, we use the values of positive and negative ideal schemes as reference points, which can be expressed as: where represent the positive and the negative correlation coefficients, respectively, , represents the resolution coefficient, define.

Step 4: Construct prospect decision matrix. We construct a prospect value function to represent the subjective feelings of service providers about the user scheme selection, which can be expressed as: where and represent the concave and the convex degree of the benefit and the cost value functions at the reference point, respectively, , . represents the degree of loss aversion of the service provider.

According to Equation (10), we achieve the positive and the negative values of , which expressed as:

Probability weight is the subjective judgment made by the service provider according to the probability of the result of the task assignment, which can be expressed as: where , , , [28], and represent the fitting parameters of the probability weight function on the left and right of the reference point, respectively.

We calculate the comprehensive prospect value of each user scheme, which can be expressed as:

Step 5: Weight optimization. The weight of the user scheme should be reasonably assigned, aiming to obtain the maximum comprehensive prospect value, which can be expressed as:

The multi-attribute hesitant fuzzy evaluation matrix is transformed into the multi-attribute comprehensive prospect matrix.

Step 6: Sort user schemes to determine the preference-based task assignment.

According to the comprehensive prospect matrix, we calculate the positive (i.e., ) and the negative (i.e.,) ideal solutions of each index, which can be expressed as:

We also need to calculate the group benefit value (i.e., ), individual regret value (i.e., ), and comprehensive index value (i.e., ). where represent the maximum and minimum group benefit value, represent the maximum and minimum individual regret value, and represents the decision preference. When , it means that the service provider adopts the maximum group benefit to formulate the task assignment scheme. When , it means that the service provider adopts the minimum individual regret to formulate the task assignment scheme. When , it means that the service provider adopts the balance principle to formulate the task assignment scheme.

According to the judgment criteria of the VIKOR method [29], the value of , , and are arranged in descending order. We use to determine the first (i.e., ) and second (i.e., ) user schemes and realize the preference-based task assignment.

Condition 1 (Acceptability advantage). , where is the number of options.

Condition 2 (Acceptability stable). has the best or .

When Condition 1 and Condition 2 are both satisfied, is the optimal user scheme to realize task assignment. When only Condition 1 is satisfied, and are compromise solutions. When only Condition 2 is satisfied, are approximate ideal schemes.

4. Proposed Task Assignment Solutions

4.1. User-Union Clustering Algorithm

Traditional clustering algorithms are deficient in the efficiency and balance of clustering results. To solve this issue, we propose the user-union clustering algorithm (UCA), as shown in Algorithm 1.

Input: User data , task set , maximum user limit τ
Output: Set of K task clusters
1: ProperCluster(xi)
2: Determine the center of the initial task sets and user data evaluation indexes (
3: Calculate the user similarity by Equation (1), and sort data in descending order
4:  for DO
5:   if
6:      enter
7:      and are saved in
8:     break
9:   else
10:     if is less than the minimum similarity value in
11:      continue
12:     else
13:       joins the jth union and deletes edge user ()
14:      ProperCluster()
15: repeat
16:  for DO
17:   ProperCluster(xi)
18: until saturate task requirements or reach the maximum number of iterations
19: End

Algorithm 1 realizes the generation of user-union. UCA provides a guarantee for the balance of clustering data by setting an upper limit. The function of ProperCluster (xi) is to assign to a suitable user-union. is a two-tuple, which represents the storage of existing user data and the similarity value of the center task in the jth user-union. From 1 to 4, the algorithm is used to calculate the similarity between and, which aims to quantify the behavioral characteristics of each user. From 5 to 18, the algorithm is used to control the scale of users, which can balance the number of users in the user-union.

Computational complexity. The -means algorithm is a simple and efficient clustering algorithm, and the computational complexity of the algorithm is , where represents the number of iterations, represents the user scale, represents the type of user data evaluation index, and represents the number of clustering tasks. UCA is an improvement of the -means algorithm, which uses user similarity to realize user clustering and improves the balance of user scale. First, each user needs to calculate the similarity with , and the complexity is . Next, the value of user similarity is compared with the edge point, when the number of users in the user-union reaches saturation, and the complexity is 1. In the worst case, UCA spends times for comparison. Therefore, the computational complexity of UCA is . In practical application scenarios, to ensure the clustering accuracy, the number of algorithm iterations (i.e., ) is usually greater than clustering tasks (i.e., ); therefore, .

Space complexity. The -means algorithm needs to store user data and the clustering tasks data, and the space complexity of is . Like the -means algorithm, UCA also needs to store user data and clustering task data, and the space complexity is .

4.2. Preference-Based Task Assignment Algorithm

Algorithm 1 provides high-quality data. Then, we propose the preference-based task assignment algorithm (PTA) to solve the diversified task assignment under the individual preference sensing cost, as shown in Algorithm 2.

Input: Decision sample matrix
Output: Optimal task assignment
1: Initialization
2: Normalize the sample matrix by Equations (5)–(7)
3: Determine and by Equation (8)
4: Calculate the correlation coefficient by Equation (9)
5: Build a prospective decision matrix and calculate the prospective value
6: Optimize the index weights to obtain the best comprehensive prospect value by Equations (10)–(14)
7: Calculate , , and by Equations (15)–(17), confirm the first and second value of (i.e., and )
8: for and do
9:  if only meet Condition 1 then
10:    and are compromise solutions
11:  end if
12:  else if only meet Condition 2 then
13   Calculate the largest N by , and are approximate ideal schemes
14:  end if
15:  if both meet Conditions 1 and 2 then
16:   is the optimal solution
17:  end if
18: end for

Algorithm 2 realizes the reasonable and diverse task assignment by calculating the value of group benefit, individual regret, and a comprehensive index. This is a mode of task assignment selection from individual preference, which guides the decision of service providers. From 1 to 2, the algorithm is used to normalize the decision matrix. From 3 to 4, the algorithm is used to determine the positive and negative ideal solutions and calculates the correlation coefficient. From 5 to 6, the algorithm mainly constructs the prospect decision matrix through optimized weights. From 7 to 19, the algorithm is used to sort user schemes, and achieve preference-based task assignment based on the VIKOR method.

5. Performance Evaluation and Discussion

5.1. Basic Simulation Setup

In our experiments, the data we used came from the real Dartmouth College Wi-Fi campus trace data set [30], which was an experiment on the open-source middleware NSense. This data takes sound collection as an example, including timestamps, the distance between test points and sensing nodes, data collection methods, and data collection environments. We define data collection methods and environments as benefit indexes. Other metrics are defined as cost indexes. We consider two user distribution spaces [31] (i.e., sparse and dense regions) and employ different metrics to measure the performance in UCA and PTA.

In the research of task assignments, high-quality user data can improve the accuracy of assignments. User data clustering can reduce data redundancy and improve the overall quality of user data. Therefore, we first verify the performance of UCA. We compare the performance with three common clustering algorithms [14] (i.e., -means, -means improve, and fuzzy -means clustering algorithm) by calculating the accuracy (ACC), normalized mutual information (NMI), and running time. The -means improve algorithm limits the number of users of the -means, aiming to control the balance of user distribution scale.

ACC is used to measure the accuracy of the users’ classification after clustering, and compared to the actual classification in the prior knowledge, which can be expressed as: where is the number of users, is a mapping function that maps the classification of the clustering results to the original data set, is the original classification of user data in prior knowledge. When, the value of is 1. Otherwise, the value of is 0.

NMI is used to evaluate the similarity between the clustering results and the distribution of the original dataset, which can be expressed as: where represents mutual information between and . and represent the information entropy of distributions X and Y, respectively.

Next, we verify the performance of PTA on the premise of obtaining high-quality user data, which aims to realize preference-based task assignment under the individual preference sensing cost. We compare the performance with two methods (i.e., VIKOR [29] and TOPSIS [32]) by calculating the compatibility degree and execution time. The VIKOR method determines the optimal task assignment scheme without the prospect value. The TOPSIS method is a common method to solve the ideal point.

Compatibility degree [33] is used to verify the rationality of task assignment, which can be expressed as: where represents the compatibility of the th method, represents the degree of correlation between and , represents the number of schemes, and represents the sorting difference of the th scheme in and .

5.2. Experiment Results of UCA

It is meaningless to use UCA to reduce data redundancy for the small scale of users in remote regions. Therefore, for the experiment of UCA, we analyze the clustering effect of large-scale users, when the number of users varies from 100 to 1000, respectively. Figures 24 show the performance in terms of ACC, NMI, and running time, achieved by the four algorithms.

Apparently, UCA outperforms the three baselines (i.e., higher ACC, higher NMI, and lower running time), no matter how the number of users varies. In Figure 2, the augmentation of user data decreases the accuracy of all clustering algorithms. The reason is that the increase of user scale leads to the rise in low-similarity users, which reduces the clustering accuracy. The accuracy of UCA is better than these three algorithms, and the ACC is basically above 0.82. Compared with the best performance -means algorithm, the accuracy is improved by about 10%. The reason is that UCA calculates user similarity and sets boundary user replacement rules, which can balance the number of users in different unions and ensure that users with high similarity are clustered together as much as possible. In addition, the accuracy of the -means improve is lower than the -means algorithm. This means that a single restriction on the size of users is not conducive to the formation of high-quality user clustering.

We also perform extensive simulations to validate the reduction of running time achieved by UCA under various user scales, as shown in Figure 5. As seen, the results of the four algorithms show an upward trend, and UCA has the lowest running time. The reasons are as follows: Firstly, the -means algorithm uses random clustering centers to achieve user clustering through multiple iterations. The growth of the user scale leads to more iterations and time overhead. In addition, setting user boundaries in this algorithm may cause more time costs. Unlike the -means algorithm, UCA only needs to calculate the similarity between users and task centers, and compare boundary users to realize the user-union, which can reduce running time. Secondly, the Fuzzy -means algorithm provides more flexible clustering results, but is more sensitive to boundary users. With the growth of user scale, the existence of enormous boundary users will require a longer time overhead for this algorithm.

In general, for the large-scale user clustering scenario, the proposed user-union clustering algorithm has the characteristics of high classification accuracy and fast calculation speed. It can provide high-quality user data for the preference-based task assignment.

5.3. Experiment Results of PTA

On the premise of ensuring high-quality clustering users, we perform the performance of PTA to realize the diversified task assignment under the individual preference sensing cost. In addition, different user scales may have various user characteristics, which affect the performance of execution time and compatibility. As a result, we first use an example to demonstrate the feasibility of small-scale data. Then, we consider PTA performance in two scenarios by calculating execution time and compatibility degree.

5.3.1. Example

According to UCA, we achieve five alternative task assignment schemes (i.e., the sensing cost for five users), as shown in Table 1.

Step 1. Normalize the sample matrix by Eqs. (5)–(7), and are the cost index, and are the benefit index. The decision selection sample matrix is Then, we achieve the positive ideal assignment scheme (i.e., ), and the negative ideal assignment scheme (i.e., ).

Step 2. According to Equation (8), the correlation coefficient of positive and negative ideal scheme is respectively.

Step 3. According to Equation (11), the positive and negative prospect value matrix of each scheme is respectively.

Step 4. Optimize the index weights to obtain the best comprehensive prospect value by Equation (4), where . We achieve the optimal solution (i.e., ) and the comprehensive prospect matrix is

Step 5. Use the VIKOR method to sort the schemes. Calculate the value of , , and , as shown in Table 2.

Table 2 presents the value of , , and for the five users. As seen, has the optimal value of and , which satisfices Condition 2. has the sub-optimal value of , and , which does not satisfy Condition 1. Obviously, and are both acceptable and ideal solutions. The service provider can choose or according to individual preference, and PTA implements the preference-based task assignment. In addition, we also found an interesting phenomenon that PTA usually chooses low-cost and high-quality schemes. The reason is as follows. Firstly, PTA solves the diversified task assignment under the individual preference sensing cost. That is, service providers play a decisive role in the task assignment. Considering the profit orientation of service providers, low-cost and high-quality schemes are more competitive in selection. Secondly, sensing users are competitive and work hard. Users try to improve the quality of uploaded data to win in a task as much as possible. Thirdly, in the calculation results of the best comprehensive prospect value, the weight of the cost index is much greater than the benefit index, which further promotes PTA to choose low-cost and high-quality user solutions.

5.3.2. Performance Comparison

Next, we provide simulation results by three methods in various scenarios.

(1) Compatibility Degree. According to our definition of location regions [31], we conduct simulations to observe the effect of compatibility degree on different solutions when users are in different regions (i.e., popular region and remote region), as shown in Figures 4 and 6.

Generally, high compatibility degree means that the user data is representative and reliable, which means the higher accuracy of task assignment. Figures 4 and 6 show the performance of compatibility degrees under different numbers of users and regions. It is seen that as the number of users increases, the compatibility degree of these methods decreases. The compatibility degree of PTA is better than these two methods, and the compatibility degree in the remote region is better than in the popular region. The reason is as follows. First, as the number of users increases, more similar users participate in sensing tasks, especially in a popular region, which reduces the differences between users. As a result, the sensing platform is challenging to select suitable users, which leads to a decrease in the compatibility of these solutions. Second, compared with the VIKOR method, PTA adds prospect theory to reflect that decision-makers are more sensitive to losses than revenues. Poor indexes are more difficult to compensate by superior indexes, and the selected user data is more balanced to ensure the accuracy of task assignment. Third, compared with the TOPSIS method, PTA does not need to satisfy both the optimal positive ideal solution and the worst negative ideal solution. The final selection meets the individual preferences of the service provider. Furthermore, we also found that the performance of the three methods in remote regions is better than in popular regions. The reason is that large-scale users in popular regions lead to the high similarity between data, which makes it difficult to assign tasks accurately. On the contrary, the small scale of users in remote regions is conducive to accurate task assignment.

(2) Execution Time. We perform extensive simulations to validate the execution efficiency of PTA under various regions, compared with two solutions, as shown in Figures 7 and 8.

Figures 7 and 8 both show the execution time of PTA under various regions. We find that the execution time of the three solutions increases stably when the number of users enlarges. More alternative users in the sensing platform lead to more computational overhead. In addition, the execution time in the popular region is generally higher than that in the remote region. The reason is that more similar users are contained in the popular region, and more calculations are needed to find suitable candidate users. PTA is slightly worse than VIKOR and TOPSIS methods in execution time, because PTA makes user selection from multiple perspectives, which increases the execution time.

In general, the performance of PTA is acceptable in the preference-based task assignment. The reason is as follows. Firstly, PTA has a more significant advantage in the accuracy of user selection (i.e., the highest compatibility degree), which can ensure the accuracy of task assignment. Besides, as another part of the task assignment method, UCA has the characteristics of high classification accuracy and fast calculation speed, which can make up for the lack of execution time of PTA.

6. Conclusions

In this paper, we addressed a task assignment problem in MCS. We proposed a task assignment method based on user-union clustering and individual preferences. Specifically, we analyzed and formulated the task assignment problem from two perspectives, respectively. We first define the user similarity and propose a user-union clustering algorithm (UCA) to reduce data redundancy and achieve high-quality clustering data. Based on this solution, we further consider individual preferences of service providers and propose a preference-based task assignment algorithm (PTA) to meet the needs of diversified sensing cost and achieve the task assignment with individual preference. To evaluate the performance of the proposed solutions, we conducted extensive simulations. The results show that our method realizes the individual preference-based task assignment under the premise of ensuring high-quality clustering data. However, our method usually chooses low-cost and high-quality user data, which may suppress the revenues of users. At the same time, for the user-union, using exact values to evaluate data may reduce the accuracy of evaluation. In future works, we will balance the revenues between users and service providers, improve the accuracy of clustering data quality evaluation, and develop a task assignment method with lower complexities.

Data Availability

The authors declare that all the data and materials in this manuscript are available.

Conflicts of Interest

The authors declare no conflict of interest.

Acknowledgments

This research was funded by the Nature Science Foundation of China, grant number 61872104, and Fundamental Research Fund for the Central Universities in China, grant number 3072020CF0603.