Abstract

In the mobile crowdsensing task assignment, under the premise that the data platform does not know the user’s perceived quality or cost value, how to establish a suitable user recruitment mechanism is the critical issue that this article needs to solve. It is necessary to learn the user’s perceived quality in the execution p. It also needs to try its best to ensure the efficiency and profit maximization of the mobile group intelligence perception platform. Therefore, this paper proposes a mobile crowdsensing user recruitment algorithm based on Combinatorial Multiarmed Bandit (CMAB) to solve the recruitment problem with known and unknown user costs. Firstly, the user recruitment process is modeled as a combined multiarm bandit model. Each rocker arm represents the selection of different users, and the income obtained represents the user’s perceived quality. Secondly, it proposes the upper confidence bound (UCB) algorithm, which updates the user’s perceptual quality according to the completion of the task. This algorithm sorts the users’ perceived quality values from high to low, then selects the most significant ratio of perceived quality to recruitment costs under the budget condition, assigns tasks, and updates their perceived quality. Finally, this paper introduces the regret value to measure the efficiency of the user recruitment algorithm and conducts many experimental simulations based on real data sets to verify the feasibility and effectiveness of the algorithm. The experimental results show that the recruitment algorithm with known user cost is close to the optimal algorithm, and the recruitment algorithm with unknown user cost is more than 75% of the optimal algorithm result, and the gap tends to decrease as the budget cost increases, compared with other comparisons. The algorithm is 25% higher, which proves that the proposed algorithm has good learning ability and can independently select high-quality users to realize task assignments.

1. Introduction

Mobile crowdsensing (MCS) is a new type of perception mode. One of its core ideas is to recruit a large number of users to complete spatiotemporal perception tasks and obtain data resources so that each mobile user carrying an intelligent terminal device can be seen as a sensing node to meet the dynamic task types that cannot be achieved by traditional static sensing tasks [1]. Today’s intelligent terminal devices are equipped with different types of sensors, such as GPS, temperature sensors, distance sensors, and microphones, which can be used for collection. People are closely related to various types of data [2]. Then the data platform realizes data processing (such as cleaning, evaluation, extraction, and aggregation) and outputs standardized and available data to data consumers, which are used in various application fields, such as intelligent transportation, public services, environmental monitoring, and competent medical care [3].

Many typical mobile crowdsensing tasks require long-term monitoring, such as monitoring local water quality and pollution, monitoring air quality data in the surrounding environment, and collecting noise data to generate corresponding noise pollution data maps. For such long-term tasks, the focus is generally on the overall quality of the completed job rather than the quality requirements for a single charge. The data platform first divides the completion period of the original study into many short periods (such as 0.5 h), referred to as “rounds” [4]. In each round, the charge is divided into many small perception tasks. In this case, the system does not need to perceive the original job but only selects the appropriate user in each round to complete the small tasks after the division. Many scholars have designed a mobile group intelligence sensing system [5]. Rana and others introduced an end-to-end participatory urban noise mapping system, Ear-Phone, using urban participatory perception to obtain incomplete and random data samples to draw the latest urban noise map [6]. Gao et al. designed a low-cost urban PM2.5 monitoring system Mosaic based on mobile perception. A small number of air quality monitoring nodes are deployed on the vehicle so that the system can accurately obtain urban air quality with a higher coverage rate and a lower cost [7]. For the target tracking problem, Jing Yao and others proposed a solution based on mobile group intelligence perception, CrowdTracker, which realizes the trajectory prediction and tracking of moving targets through group intelligence-based multiperson collaborative photography [8].

The above system generally consists of task provider, perception data platform, and task user. The system is shown explicitly in Figure 1 and contains the six following steps [9]. The task provider provides the data platform with a release task and pays a certain amount. The perception data platform is a centralized management platform responsible for data receiving, integration, cleaning and analysis, mining, visualization, data transaction, and so forth. Its main job is to receive the task of the provider request, organize and divide the proposal into small-scale perception tasks, and then send the task requirements and corresponding rewards to the mobile perception users [10]. After the task users receive the task information released by the platform, they respond according to their situation. The ability to complete perception tasks, such as perceived quality, is reported. Then the data platform selects some users to complete the perception tasks and pays the corresponding rewards according to the user’s response to maximum benefits. This process is called user recruitment [11]. The users perform the related perception tasks and return the results to the platform. Finally, the platform will evaluate and record the perception quality completed by the users to select more suitable users in the next round. For the above process, the system is based on the perception reported by the users’ ability, t; the user’s perceived quality has been known in advance, but, in actual situations, the system often does not see the user’s perceived. Quality and the user are not very clear about the quality of the data collected. The user’s perceived task quality is related to the intelligent terminal that carried the device and connected to the users’ behaviors and habits. What is more complicated is that the same user’s perception quality is often different in different periods. Some users even falsely report their perception ability to obtain higher rewards [12].

In the current work, many mobile group intelligence perception systems assume that the user’s perceived quality is known, and, on this basis, specific optimization goals are used to recruit users [13]. However, in real life, the perceived quality of mobile users is often unknown. The platform is novel, but it is also difficult for users to determine their own perceived quality because it is related to the user’s comprehensive quality, professional skills, perceived environment, and the user’s perception. Not only the platform does not know the perceived quality, but it is also difficult for users to determine their own perceived quality, because the perceived quality is not only related to the user’s overall quality, professional skills, and perceived environment but also to the user’s perceived equipment and participation interests. Although the user’s perceived quality of completing tasks is not the same, because the user’s perception equipment is unchanged and user habits will not change significantly, the user’s perceived quality satisfies a specific distribution condition. The more the completed perception task is, the more accurate the distribution of perceived quality is. Therefore, this article studies how the data trading platform establishes a suitable user recruitment mechanism under the premise that the platform does not know the user’s perceived quality or cost value [11]. Not only does it need to learn the cost and perceived quality value during the user’s execution, but it also needs to do its best. It is possible to ensure the efficiency and profit maximization of the mobile group intelligence perception system. The unknown quality of user perception poses a significant challenge to user recruitment. Should the platform select those already considered high-quality users or select other users who may have a higher rate? This is the classic problem in reinforcement learning, “exploration” and “utilization” [14]. This paper models the user selection problem as a multiarmed slot machine model with a budget and proposes an improved algorithm based on the upper limit of the confidence interval to measure the recruitment quality of the task. Finally, it offers an enhanced greedy recruitment algorithm to improve the efficiency of platform recruitment of users [15].

Contributions made by this article are as follows:(1)Comprehensively consider the user’s reputation and willingness to participate, construct a user’s perceived quality model, and obtain an estimate of the user’s perceived quality as the expected value of the perceived quality of each round of user selection and then design the corresponding user compensation distribution system based on the perceived quality to encourage users to improve the perceived quality of data.(2)This paper considers the situation that the user’s perceived quality is dynamically changing and intends to solve the problem of maximizing the perceived quality of users under a limited budget. First, it is proved that the problem is an NP-hard problem, and then the combined multiarm gambling machine in reinforcement learning is used. The learning process of user perception is formalized, a user recruitment mechanism to select users with the largest ratio of perceived quality value to cost is proposed, and, finally, the gap between the regret value and the optimization algorithm to measure the performance of the proposed user recruitment algorithm is introduced.(3)The experimental results under different settings show that the user recruitment algorithm based on the combined multiarm gambling machine can effectively learn the user's perceived quality, which is close to more than 75% of the optimal algorithm and 25% higher than other comparison algorithms.

In terms of mobile crowdsensing task allocation, the work done by scholars is mainly divided into two aspects: recruiting users, under cost constraints, to select users who have completed tasks well, that is, with high perceived quality, and involving a suitable incentive mechanism, encouraging users to share data to obtain high-quality data actively [16]. In terms of user recruitment, Liu et al. researched and proposed a new user selection method that allows users to participate in multiple perception tasks and offered a multitask strategy choice. The best user minimizes the total distance of the study, thereby reducing the cost of the platform [17]. Yang et al. selected the most suitable user within a specific budget range to perform the perception task to improve the efficiency of the group intelligence perception system and designed the corresponding algorithm, learnt the user’s perceived cost, and selected the user with the most significant amount of data and information under a limited budget [18]. Wang et al. focused on the problem of user selection with budget constraints and coverage balance. The key is to select an appropriate subset of users. The perception range is made as large and balanced as possible while not violating the budget specified by MCS activities [19]. Song et al. selected popular tasks, and unpopular duties often cannot be assigned to suitable users. The casket task allocation strategy is used to learn and use the task preferences of workers to achieve task allocation under coverage and to migrate unpopular tasks to some qualified workers [20]. Since human behavior may be unreliable or malicious, evaluation and assurance of information quality are difficult to guarantee. Restuccia et al. provided a new framework for defining and implementing information quality in mobile group intelligence perception [21]. Hu et al. studied data quality-sensitive tasks based on mobile group intelligence perception assignment (QSTA) problem, which involves variable tasks and flexible rewards. Each job is assigned to multiple users to ensure the perceived quality of functions, and a greedy algorithm is proposed to solve this problem [22]. In the incentive mechanism aspect, Zhan et al. used the incentive mechanism to study the critical issue of maximizing revenue in the mobile group intelligence perception system and incorporated the personal qualities determined by the sensory platform into the incentive mechanism design to achieve the goal of obtaining high-quality sensor data at a lower cost [23]. Xiong et al. proposed a task-oriented user selection incentive mechanism (TRIM), hoping to establish a task-centric design framework in the MCS system, and constructed task vectors from multiple dimensions to meet the task requirements to the greatest extent [24]. To balance its popularity, Hu et al. proposed a demand-driven dynamic incentive mechanism, which can dynamically change the task reward in each sensing round as needed [25]. Zhang et al. optimized task reliability and spatial diversity to select users, establish an incentive model with two optimization goals, and design two online incentive mechanisms based on reverse auctions [26]. Jiang et al. proposed an incentive mechanism for crowd-sensing recruitment for groups, aiming at the problem that the greedy characteristics of users’ pursuit of high returns will make the recruitment cost high, so as to solve the problem of mobile users’ over-pricing for data to improve their own profit tendencies [27].

Most of the scholars mentioned above optimize the number of users and travel distance based on the known users’ perceived quality and reduce the platform budget. Still, they do not consider the overall quality of users, professional skills, and differences in the perceived environment. The quality of the perceived data collected changes in real time. In this case, we need to learn about the user’s perceived quality and get the most suitable task assignment. Therefore, this paper proposes a user recruitment algorithm based on CMAB, which can interact with the perception data platform in real time, use reinforcement learning to learn the perception quality of users, and solve the problem of maximizing the perception quality of tasks under a limited budget.

3. User Recruitment Algorithm Model Based on CMAB

3.1. Basic Definition

In the mobile group intelligence perception system user recruitment problem, the perception tasks submitted by the task publisher to the data platform are not necessarily short-term tasks but are often long-term tasks. According to the content, the data platform divides many small tasks into different periods. The total budget value is denoted by B, represents the complete rounds of task division, and , respectively, represent n users and m perception tasks, and means the j tasks of the lth round that mobile users need to perform. The importance of each perception task is expressed by weight , . Users can participate in multiple tasks and provide various options to the platform. Each option consists of a specific set of tasks and corresponding rewards. The possibilities are arranged in order of compensation. However, the maximum number of options for a job does not exceed L. Although the platform allows users to provide multiple options, only the option with the best-perceived quality is selected in each round. represents the lth option of mobile user i, represents the set of perception tasks that user i can complete, and represents the remuneration required for user i to perform the lth option.

The user can participate in multiple studies. The maximum number of jobs does not exceed L; which means that the user is performing l rounds of tasks. Each user i has at least two attributes, representing what user i wants to complete. The job set represents the reward for completing the first round of tasks. Although each user can submit multiple studies, the system will only select the task with the best-perceived quality in each game of functions. Table 1 shows the main symbols and interpretations.

3.2. User's Perceived Quality Model

To select the best set of users to complete the perception task, it is necessary to preestimate the user’s perceived quality in each round of user recruitment. Own work shows that the user’s reputation, credibility, and willingness to participate are also crucial for the quality of the final collected data. The existing work shows that the user’s reputation, credibility, and willingness to participate all have a significant impact on the perceptual quality of data. The user’s reputation and desire to participate contribute to evaluating the perceived quality of data. When any of these factors decreases, the user’s perceived quality will also decrease, and vice versa. For example, the higher the reputation of a user, the higher the perceived quality of the user in the completion of previous tasks. In the past, the completion of the task has received more praise from the data platform, and the perceived quality was higher. Therefore, it can be considered that the probability of completing the current task is higher. In addition, the user’s willingness to participate also determines the quality of the service. The greater the desire to participate is, the more human resources the data platform can obtain. This also indirectly expands the scope of selected users, thereby increasing the opportunity to get high-quality data. The lower the user’s willingness to participate, the opposite. Therefore, this article considers reputation (the completed task historical situation) and the user’s desire to participate (participation in the current perception task) and models the product as the standard of the user’s perceived quality; namely,

Here, represents the user’s reputation. Recently, a variety of models have been proposed to evaluate importance. This article uses the popular beta reputation system (BRS) for simplicity. However, due to the limitations of the BRS model, when there are malicious users, the model does not work well. For example, some savvy users may try any solution that can bring them more benefits and get little rewards by deliberately reporting and uploading unconfirmed information; and their reputation has not been significantly affected by these two ludicrous reports. Suppose that there is no punishment for this false report in the long run. In that case, the data platform will not only not get high-quality perception information but also suffer severe economic losses. To avoid this situation, this chapter made appropriate improvements to the BRS model, which is calculated as follows:

Here, T and F are the historical records of the “correct” and “incorrect” results when completing the perception task. In addition, the calculation method of the weight factor related to malicious events is as follows

Here, Å is the malicious event of some users in the perception task, and K is the malicious event threshold set by the data platform. Once the number of adverse events exceeds this threshold, the user's reputation score will be recorded as 0. λ is within the threshold range attenuation factor, and λ ∈ (0,1), the default setting, is 0.8. In addition to the influence of the above reputation factors on the users’ perceived quality, the users’ willingness to participate in the perceived quality is also apparent. For example, users far from the task location should be more closer to the task location and have a lower willingness to participate. A longer distance will increase the user’s travel cost, such as time expenditure. Based on the above analysis, this chapter models the user’s willingness to participate in the perception task and travel distance as in the following function formula:where is calculated as the Euclidean distance between the user coordinates and the perception task coordinates, in kilometers. R is the range constraint of the sensing task. The sensing tasks are usually some microtasks. From the perspective of reducing the cost of the data platform, it is also not worth recruiting users from a long distance. Therefore, the data platform will provide the maximum range of the task. From (4), is a value between 0 and 1. Within the total capacity of the study, if the user is close to the task location, is closer to 1. On the contrary, if the user’s current location exceeds the maximum constraint range of the task, then is 0.

In each round l, Y users will be selected to complete the perception task. The data platform determines the size of user Y. If Y is larger, the number of users recruited in each round will be more significant, and the completion time will be less. However, the greater the cost and the greater the data redundancy, multiple users perform the same task. The smaller K is, the more there are recruitment rounds and the longer it takes but with small data redundancy. Here, we think the platform determines the size of Y according to the specific needs of the task distributor. Therefore, for the venue, the main problem is how to design an effective user recruitment mechanism based on a limited budget to maximize the total perceived quality of all perception tasks. For the convenience of description, will represent the set of users recruited in the rth round. If , then user i was drafted in the lth round. Since there are multiple users, we can calculate the estimated perceived quality of each user. The best perceptual user is selected, and the largest may perform the most considerable perceptual quality value to represent the final perceptual quality.Here, represents the set of perception tasks, represents the estimated value of the perceived quality of job j completed by user i in the rth round, and represents the final perceived quality value of labor j in the rth game. Also, the task initiator is more concerned. Based on the overall situation of all perception tasks, rather than individual studies, considering that each job has a weight value, the weighted completion quality of all perception tasks in the t round is proposed. Multiple users may execute each task. The perceptual quality is also uneven, so we choose the final perceptual quality value of the most considerable perceptual quality task j.

Definition 1. The definition of perceived quality is as follows:

Definition 2. (MaxQLimitB Problem). means that the user's perceived quality in round r equals the perceived quality of task j multiplied by the corresponding weight. Therefore, the quality maximization problem of the perceived job under a known budget can be defined aswhere (7) and (8) indicate the user's maximum perceived quality and the maximum budget B, respectively. indicates that if is true, the user is selected. Otherwise, it is 0. (9) and (10) indicate whether to choose user i and recruit y users in each round to perform perception tasks.
For instance, for A of a 0-1 knapsack problem and its set of objects , A and M are defined as the value and weight of a single item, respectively. The problem is to find a collection of things to maximize the value, that is, , and satisfy the condition , where η is the upper limit of the backpack's carrying capacity. Next, change the 0-1 backpack instance A to the MaxQLimitB problem. First, construct an instance B with a limited budget and . For each user , the contributed perceived quality and cost are and . It is the sum representation, and the MaxQLimitB problem is to find a set of sets to maximize the perceived quality, that is, , and meet the constraints . Assuming that can be used as a solution to case A, try to select users of in set P, where can be used as a solution to the MaxQLimitB problem; then and maximizing the perceived quality means, respectively, and . Since the 0-1 knapsack problem is a proven NP-hard problem, the MaxQLimitB problem is an NP-hard problem.

3.3. Remuneration Distribution System

To motivate users to complete perception tasks, the data platform should pay users rewards, which can be cash or noncash, as incentives, which also constitute the cost of the job. It means the reward for completing the first round of duties. Although each user can submit multiple studies, only the option with the best-perceived quality is selected in each game. Here, we assume that the candidate options offered by the user are satisfied . Generally speaking, the reward required by the user is proportional to the number of completed perceptual tasks.f(x) is a positive correlation function; the more perceptual tasks are completed, the more rewards you will get. ξ is the platform recruitment cost parameter because, for different users, smart devices have higher configurations; the parameter ξ is bigger. First, consider a simple situation where the user’s cost parameters are known for this article.

3.4. User Recruitment Model Based on Multiarm Bandit

The multiarmed bandit model is derived from the real-life gambling machine game. At first, it was a single-armed bandit model. Only one rocker arm can win money from the gambler, and the gambling opportunity is returned to the speculator who has a sure profit. The multiarmed bandit model has multiple rocker arms. Each time the bettor selects a rocker arm, each rocker arm corresponds to a different rate of return; that is, some rocker arms have significant profits and some small. However, this probability value is unknown to the gambler, so the problem facing the gambler is how to maximize his income a limited number of times. Because the bettor needs to learn the probability of spitting money for each rocker and at the same time, under some conditions, think about which rocker arms maximize their interests, scholars named this learning model the MAB model. This article chooses the Combinatorial Multiarmed Bandit (CMAB) model because only one rocker arm can be selected per round in the MAB model. In contrast, the CMAB model can choose multiple rocker arms. In the exploration process of the gambler, there are often two strategies, namely, “utilization” and “exploration.” “Utilization” is the strategy where the gambler will choose the rocker arm with a high probability of spitting money; “exploration” means that the gambler will explore and find the potential optimal rocker arm [16]. Only the “utilization” strategy or the “exploration” strategy is adopted for the gambler. It is not the optimal method; it is mainly to balance these two strategies to maximize profit [17]. Today’s most representative strategies are the UCB (Upper Confidence Bound) algorithm and Thompson sampling. Among them, the idea of the UCB algorithm is to face uncertainty optimistically and use the upper confidence limit of the mean return of each rocker as the next estimated value [18]. Thompson sampling uses Bernoulli distribution to model the return rate of each rocker arm and then updates the distribution parameters according to the return value of each round. But Thompson’s algorithm is mainly for the case where the return is 1 or 0 [19], and the UCB algorithm is always optimistic to believe that each rocker arm’s income can finally reach the maximum value of the uncertainty range and reach the goal of converging to the optimum [20]. Therefore, this paper uses the UCB algorithm to learn the user's perceived quality.

In the CMAB model, we will use “regret” to measure the algorithm’s performance. Assuming that there is a God who knows the probability of spitting money out of each rocker of the gambling machine, he will choose the highest probability value in each round. The rocker arm is called the optimal rocker arm for this rocker arm. However, the CMAB algorithm does not know the value of each rocker arm, so that it will select the nonoptimal rocker arm with a high probability in each round. The difference between the two benefits is the regret value [21]. In the process of user recruitment, the user’s perceived quality is often unknown. How to predict and update the user’s perceived quality expectations is based on the user’s perceived quality of each task completed. This problem is modeled as a CMAB model under budget constraints. Firstly, each mobile user is regarded as a rocker arm of the model. The user’s perceived quality is considered the benefit of the gambling machine, and different options are selected. Rocker arms bring other benefits. If you want to recruit users with high perceived quality, it means choosing the rocker arm with the highest revenue to maximize the benefits. In this article, we use the extended UCB formula to express the perceived quality of the CMAB model value, propose a UCB-based perceptual quality function, and select Y mobile users in each round; each user has to perform all tasks in the round; that is, the platform will learn the perceptual quality of each lesson. First, each user is calculated based on the perceived quality value of UCB, the ratio of this quality value to the recruitment cost is calculated, and the user with the most significant balance each time is selected.

For the perceptual quality value of UCB, the proposed improvement formula is as follows: is the average quality of user i who completed the perception task in the previous r rounds, the specific update is shown in (14), Y is the maximum limit value of recruitment in each round, is user i’s total number of recruits in the previous r rounds.

Based on the UCB perception quality formula, we propose a perception quality function, as shown in the public announcement:

3.4.1. Greedy Repair Algorithm

The learning process of perceived quality under budget constraints is mainly divided into two parts. The first is the initialization phase, where the first option of all users is selected as the result of recruitment; that is, the required reward is the lowest. The defined parameters , , and will get the initial value. Then, in the subsequent user recruitment process, the recruited user set is initialized to empty; when , according to the greedy algorithm, the user with the most significant ratio is selected, that is, the most considerable ratio between the value of the perceived quality function and the recruitment cost. The specific expression is as follows:

However, in each selection round, there are always some participants whose costs exceed the cost budget limit. So this paper proposes a novel greedy repair algorithm (GRA) and it is given in Algorithm 1. First, all preselected users are sorted in descending order of the perceived quality cost ratio and stored in the array Q[0,...n]. The flag Boolean array F[i] is used to identify the selection status of each user. When F[i] = 1, the user is selected; otherwise, F[i] = 0 is not selected. First, the participants are selected; in turn, the state of F[i] is changed simultaneously, and the accumulated cost value is calculated. GRA will not stop when the cumulative cost is greater than the budget. It will subtract the cost of the currently selected participant from the total cost and mark the participant's status as F[i] = 0. Then the above steps are repeated and as many participants as possible are selected. The time complexity of the GRA algorithm is O(), and the space complexity is O(n).

Input: , Q[0,…n], F[i], , B, Cost, j
Output: Q′[0,…n]
(1)  ⟵ 0,F[i] ⟵ 0, j ⟵ 0;
(2)Cost ⟵ 0,i ⟵ 0;
(3)In the l round, sorts candidate participants in descending order and stores them in the array Q[0,…n]
(4)for(i ⟵ 0 to n) do
(5)Cost=Cost+;
(6)If (Cost ≤ B and F[i] = 0)
(7)F[i]  ⟵ 1, Q′[j++] = Q[i]
(8)else
(9)Cost=Cost-
(10)return Q′[j]
3.4.2. User Recruitment Algorithm Based on CMAB

This article focuses on how the data platform recruits users with high perceived quality under the premise of limited budget and cost. Therefore, the user recruitment algorithm is designed according to the CMAB model. The first is the initialization phase. The data platform first selects the first option of the user to obtain all users—the initial value of the relevant parameters. Then, in the user recruitment stage, the data platform selects Y mobile-aware users each round under the limited budget. According to the previous greedy selection strategy, combined with GRA selection, cost is less, and the perceived quality is high. As shown in Algorithm 2, Steps 1–3 represent the initialization phase. The platform will first select the first option with the lowest reward for the user, obtain the average value of the perceived quality, and obtain the relevant parameters of all users. Steps 4–17 represent the user recruitment phase; Step 6 is to get the ratio of the perceived quality value to the cost value of all users, Step 7 is to sort the candidate users in descending order and store them in the array Q[0,...n], and Steps 8–17 represent the user selection process of GRA. If the number of recruits is more minor than Y and the cost value of user i is less than the budget, the user’s option is added to the recruiting set. Otherwise, the cycle continues to find the following user. According to the algorithm, the algorithm’s time complexity is O(APLY). Since the array needs temporary storage space of size n, the space complexity is O(n).

Input: R, P, , B, Y, Q[0,…n], Cost, , Task collection
Output:
(1)r = 1, Select the first option of each user to get the initial parameters
(2)Update the corresponding , , , , Cost = 0
(3)Average value of perceived task quality:
(4)while true do
(5)r=r+1
(6)while
(7)Calculate the ratio between the value of the perceived quality function and the cost of all users
(8)Sort the candidate users in descending order of and store them in the array Q[0,…n]
(9)for(i←1 to n)
(11)
(12)if (and&F[i] = 0)
(13)F[i] ←1
(14)Add to
(15)else
(16)
(17)=-Cost

The problem of maximizing the perceived quality under a limited budget has been proved to be an NP-hard problem. However, selecting the user with the most significant perceived quality function value ratio to the recruitment cost in each round can get the approximately optimal results. Therefore, we will introduce the regret value to measure the gap between optimal and approximate solutions. First, Hoeffding’s inequality is submitted, which is suitable for bounded random variables and is used to find the upper limit of confidence regret value of inequality.

Lemma 1 (Hoeffding’s inequality). Let be the independent random variables of [0,1], and ; the empirical mean of these random variables can be expressed as

Then, .

If the regret value is smaller, the algorithm’s performance is better. Define the optimal user set ; R(B) represents the regret value of the user recruitment algorithm under the limit of budget B. The formula is as follows:

However, it is unreasonable to compare the optimal strategy with the recruitment results, because the optimal solution cannot be calculated in a limited time under the premise of knowing the relevant parameters and budget, so we propose the concept of ε-regret value.

Definition 3. Use to represent the ε-regret value of the algorithm under the budget B limit, and the definition is as follows:

Theorem 1. The ε-regret value of the CMAB-based recruitment algorithm with known user costs is .

The Proof first introduces it as a counter for each update. Define the maximum/minimum perceptual quality gap between the ε-optimal participant set and the non-ε-optimal participant set.

Next, analyze and update the expected value of the counter [] and get the expected value, where ο(B) is the total number of rounds executed by the user recruitment algorithm under the budget B limit. The obtained expected value is shown as follows:

We have that .

Based on this, we can get that the total number of non-ε-optimal will not be greater than a specific value . The total number of recruitment rounds is also uncertain, and then continue to analyze the limit value , which has higher and lower limits.

Finally, according to the expected value [] and the upper and lower bounds, ε-regret value (B) can be obtained.

The specific value of the function is

The certificate is complete.

4. Experimental Results and Analysis

The experiment in this article is the process of selecting users with unknown perceived quality under the background of limited budgets and verifies the ability of the CMAB-based user cost known recruitment algorithm to learn the perceived quality of users and whether high-quality users can be selected. At the same time, to verify the efficiency of the algorithm, using greedy, ε-first, random selection, and α-optimal algorithm for comparison experiments, the specific settings are as follows. The algorithm in this article is referred to as CMAB-UCB.

4.1. Data Settings

This paper uses the most widely used data set, the real situation of taxis in Rome in the past month, which includes the date and time of passengers getting on and off the bus, the location of the taxi and passengers, the distance traveled, and the number of passengers. Because the data set is too large and the total cost of the calculation is too high, we processed the data set for the first time: (1) A rectangular area of 10 km × 10 km is selected as the space limit. (2) In the area, m passenger positions in the data set are selected as the perception task issued by the simulation platform, and the value range is, respectively, [200, 1000]; n taxis are used as simulated mobile users, and the value range is [60, 200]. In this experiment, the default values of m and n are set to 800 and 150, respectively, and the scope of tasks performed by each mobile user is [6, 15]. (3) Each perception task adopts the method of disk coverage; that is, the disk is drawn with a radius of 250 meters with the task as the center. If the user is within the range of the disk, then related tasks can be performed. The expected value of perceived quality represents the frequency of user i's visit to the location; the expected value of the cost parameter is a randomly generated number in the range (0, 1), and Gaussian distribution and uniform distribution are used to set the perceived quality value and cost. We set up an actual noise monitoring group intelligence experiment to obtain the perceptual quality values of different users and then input accurate data into the simulation experiment to further evaluate the performance of our proposed algorithm. Mobile devices support it. Some people collect the surrounding environmental noise information, use 20 mobile devices (intelligent Android phones) to act as users of group intelligence perception, and carry the NoiseTube application software to monitor and collect environmental noise. The noise source of the experiment was a mobile phone that continuously played music, and it was carried out in a room of 12 meters × 8 meters. 20 mobile devices were placed around the noise source, and three professional decibel testers were placed in the room to monitor the noise level in it.

Based on previous experimental experience, we have to use wrong perception methods to simulate the perception behavior in the natural environment. When users collect noise, they unconsciously put their mobile phones in their pockets or briefcases to make the accumulated noise—less than the actual value. Moreover, the quality of the collected data is also related to the user. If the user's enthusiasm is not high, there is a high probability that the user will not follow the correct measurement method and keep the mobile phone in his pocket. Therefore, to correctly simulate user behavior, use different probabilities to group users, assuming that A1–A4 strictly follow the measurement method and keep the intelligent device in the air; A5–A9 follow the measurement method most of the time, and, in 10%–20% of the time, the phone will be put in the bag; A10–A15 will have a 70%–80% probability of taking the correct method to put the phone in the air; assuming that A16–A18 are not active in sensing tasks, put the phone according to your own needs, take it out, and use it; the probability is 50%. A19 and A20 do not care about the task; the reputation score is low, assuming there is a 10–20% probability to adopt the correct measurement method. Our experiment lasted 720 minutes, and each round of the group intelligence perception task was 1 minute. We have collected more than 800,000 pieces of data in total. Based on the user's classification, we manually change the measurement method of different devices according to a predetermined probability. For example, for device A12, we reset its noise measurement method every 12 minutes, exposing the phone to the air with a chance of 70%–80% and putting the phone in the bag with other likelihoods. The data is further processed according to the obtained data quality. The data value and the actual value received by the intelligent device are normalized to obtain the distribution of the perceived quality value of the data. At the same time, different reputation scores are set for users who are not used.

4.2. Algorithm Comparison

To highlight the efficiency of the UCB-based user recruitment algorithm, four algorithms were set up for comparison experiments, namely, greedy, ε-first, random selection, and α-optimal algorithm. Every time, a greedy algorithm chooses the users with the most profit and makes full use of the strategy. In the ε-first algorithm, the platform divides the budget into two stages. In the previous stage (ε∙B budget), the medium will randomly select Y users to perform the corresponding tasks; in the latter phase (i.e., the remaining ((1-ε)∙B budget)), the platform will always select the first Y users who perform best in the first stage. In the experiment, we will set the value of ε to 0.1. As for the random selection algorithm, the platform randomly selects Y users to perform the corresponding perception task in each round. In the α-optimal algorithm, it is assumed that the platform first knows all unknown parameter values; on this basis, the platform will always select the approximately optimal group of users in each round.

4.3. Experimental Process Analysis

For Algorithm 2, the regret value of the algorithm has been obtained in this article, so the next step is to set the performance of the evaluation algorithm in four aspects: the number of people recruited in each round of the platform Y, the number of users recruited A, the number of tasks posted P, and the cost value of the budget. First, set the experimental conditions; the minimum budget value B is 5000, 10000, 15000, and 20000; that is, the value range is 5000–20000 yuan. Next, in the experiment of this article, the parameters under different budgets variables are set; the first is the proportion of the number of recruits in each round of Y, set Y = 20–80, and the interval is 10; the second is to compare the number of recruited users, assigned A = 60–200, and the gap is 20; the third is to compare posted tasks quantity, set P = 200–1000, and the interval is 200. The settings are summarized in Table 2.

4.4. Algorithm Comparison

For situations 1–4, as shown in Figures 24, when the number of recruits in each round is 20–80, the difference in the perceived quality of the five algorithms is not very big. Still, the UCB algorithm is better than the greedy algorithm and the ε-first algorithm, and the random selection algorithm is close to the ε-optimal algorithm, because the more people the platform recruits in each round, the fewer the total rounds, as shown in Figure 5, which means that the user’s selectivity is expanded. It is easy to select high-quality users. However, as the number of recruits significantly increases, personal gains decrease, user enthusiasm is likely to fall, participation is not high, and the perceived quality obtained decreases. Therefore, it is necessary to choose appropriate Y to maximize the perceived quality under limited costs.

As shown in Figures 68, the CMAB-UCB algorithm is closer to the ε-optimal algorithm with the increase of recruited users. The perceived quality obtained is significantly more excellent than the other three algorithms. This means that the CMAB-UCB algorithm fully selects the utilization strategy and will learn the user’s perceived quality in each round to select qualified high-quality users. Quality users and the three different algorithms make it easy to fall into local optimization, and the optimal use is not set.

As shown in Figures 911, as the number of tasks increases, the perceived quality of the ε-optimal algorithm first increases and then decreases, reaching the maximum perceived quality at the number of functions 1000 because the more tasks there are, the more users are recruited. The cost of the study is reduced, but it will take into account the fact that users with higher perceived quality will have a higher price. In this case, the algorithm will prioritize users with lower perceived quality but lower cost.

5. Concluding Remarks

This paper aims at the problem of user recruitment with the maximum perceived quality under the condition that the user cost is known and unknown under the limited budget. Under the state that the perceived quality of the task is constantly changing, a user recruitment algorithm based on CMAB model is proposed when the perceived quality of tasks is constantly changing. The mobile crowdsensing platform dynamically adjusts its selection according to the feedback of users’ perceived quality in each round and selects qualified candidates in the next round. According to the feedback of the user’s perceived quality in each round, their choices are dynamically adjusted, and qualified and high-quality users are selected in the next round, so that user recruitment can obtain a better average perceived quality in the long-term selection. However, in user recruitment, users providing false results for high profits are not considered, and a lot of research will be conducted to solve the problem of users doing erroneous tasks to get paid.

Data Availability

The data used to support the findings of this study have not been made available because the organization has confidentiality measures.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (61472136 and 61772196), the Natural Science Foundation of Hunan Province, China (2020JJ4249), and the Degree and Graduate Education Reform Research Project of Hunan Province, China (2020JGYB234),