Abstract

In spatial crowdsourcing, mobile workers are recruited to perform special tasks related to location and time. The task assignment has been a focus of the research community due to its importance in spatial crowdsourcing. However, most of the existing efforts focus on optimizing the task requester’s goal and do not consider the requirements and constraints of workers. Unlike the existing methods, we consider task assignment comprising the goals of both the workers and the requesters, that is, the trade-off regarding the reward and travel cost of a worker and the delay of a task. We formulate the delay-sensitive task assignment (DSTA) problem that optimizes the combined goal of minimizing costs and maximizing rewards of a worker; meanwhile, reducing the delay to wait until the task is assigned. The DSTA algorithm based on greed idea (DSTA-G) is proposed to address the problem. To improve the efficiency, we introduce the Geohash algorithm and propose a new solution DSTA-GH, which is scalable to large-scale datasets. Moreover, the task assignment method that only considers the worker’s objective (TAW) is proposed as the baseline. Finally, extensive experiments offer evidence that DSTA and DSTA-GH outperform the baseline in terms of performance and DSTA-GH only takes about 0.03% of the CPU time as compared to DSTA-G.

1. Introduction

Ubiquitous mobile devices and mobile networks contribute considerably to the novel paradigm of accomplishing spatial tasks [1], such as taking photos [2], counting crowds [3], and monitoring traffic conditions [4]. This novel paradigm is called spatial crowdsourcing (SC) [5], which recruits mobile workers to travel towards specified locations and perform spatial tasks.

In spatial crowdsourcing, users involve the requester and worker. The location-specific tasks are published online by task requesters. The workers accept the tasks assigned by the SC server. In addition, the workers can also choose the tasks voluntarily. The optimization of task assignments to the workers for achieving the desired objectives is a well-known research topic. The task assignment, that is, the process of assigning tasks to the workers, aims at solving the problem of finding suitable workers (or tasks) for the task (or worker) based on the requirements and the features of both sides [6, 7].

Most of the existing works on task assignments are devoted to satisfying the task requester, such as maximizing the number of assigned tasks, minimizing the reward, or maximizing the difference between the earned reward and the cost of the worker. However, only considering the expectations of either stakeholder may discourage the other entity from participating in the crowdsourcing task, consequently hindering the development of crowdsourcing markets (https://www.waze.com, https://picasa.google.com). In real world, the task requesters expect their posted tasks to be completed as soon as possible. On the other hand, the workers usually contemplate the cost and reward of the task in order to decide whether to accept or reject it. The major target of the crowdsourcing markets is to promote their businesses by attempting to satisfy both stakeholders in task assignments. In this work, we consider the problem of finding the tasks for a worker by considering the trade-off between the reward and the cost. In addition, we give selection priority to the delayed tasks.

The location feature is the most critical feature of spatial crowdsourcing. All the existing spatial task assignment methods consider the acceptable distance of workers or tasks. Please note that the process of computing the distance between the tasks and workers on large-scale datasets is time-consuming. If the tasks enclosed in a worker’s accepting region are directly determined, it will improve the task assignment efficiency. In order to accomplish this feature, geospatial indexing technology is used in spatial crowdsourcing. For example, Uber uses H3 spatial index (https://eng.uber.com/h3/). However, unlike the existing workers mainly finding the nearby worker (or task), we want to improve the search efficiency while maintaining the performance of task allocation.

We first consider the task assignment problem with the objectives corresponding to the worker and the requester. In particular, given a dynamically arriving worker and a set of available tasks, the goal of this work is to find a subset of tasks that meets the worker’s requirements for travel cost and reward, such that the task delay is minimized. Then, we try to extend the proposed solution to large-scale datasets. The major contributions of this work are presented as follows:(1)We first consider the task assignment from the worker’s perspective in order to address the trade-off between the travel cost and the reward. The multiobjective weighted sum optimization solution, that is, TAW, is proposed.(2)We formulate the delay-sensitive task assignment (DSTA) problem, which aims at minimizing average task latency and optimizing the combined goals of minimizing travel cost and maximizing the reward of the worker. The DSTA-G is proposed to address the problem.(3)We propose a heuristic that exploits the spatial knowledge for space clustering, thus reducing the search space. In particular, we introduce the Geohash method in the DSTA-G and propose a novel delay-sensitive task assignment method based on Geohash (DSTA-GH). The proposed method is scalable to large-scale datasets.(4)We conduct extensive experiments to test the performance and efficiency of the proposed methods.

The rest of this article is organized as follows.

In Section 2, we present the related works and compare them with the proposed works. In Section 3, we present the formal definition of the DSTA problem. In Section 4, we describe the TAW algorithm, DSTA-G algorithm, and the improved DSTA-GH. In Section 5, extensive experiments are conducted to evaluate the proposed algorithms. In Section 6, we discuss the advantages and limitations of the proposed algorithms. Finally, in Section 7, we conclude our work.

Recently, with the rapid development of mobile devices and mobile communications, spatial crowdsourcing [8, 9] has become a popular distributed problem-solving paradigm. There are various real applications, such as Postmate (https://postmates.com/), Gigwalk (https://www.gigwalk.com/), and TaskRabbit (https://www.taskrabbit.com/), which have been presented. In a spatial crowdsourcing market, any user available in the network is a potential task requester uploading tasks and/or a worker performing tasks. In extensive tasks and workers’ context, obtaining high score matching between the workers and tasks has become a research hotspot in recent years. There are two match modes [9], including worker selected tasks (WST) and server assigned tasks (SAT). In the WST mode [7, 10, 11], the online workers choose the tasks without coordinating with the SC server. In SAT mode [5, 6], the SC server assigns the tasks to the nearby available workers. In this work, we consider the task assignment problem in the SAT mode.

Recently, there are various works presented in the literature that focus on the task assignment in the SAT mode. The authors in [5] considered the time-sensitive tasks and attempted to minimize the travel cost of workers, further minimizing the incentive cost. The authors also considered the delay-tolerance scenarios and proposed the task assignment algorithm GGA-U aimed at minimizing the number of workers by preferably selecting the workers with a high task acceptance ratio. The authors in [12] assumed that the worker’s familiarity level regarding the target location directly influences the completion quality of the task. The task assignment method proposed by the authors aims at optimizing bi-objectives, namely, the workers’ familiarity and the recruitment cost. In [13], the authors considered that fairness affects the workers’ initiative to participate in a task. The authors presented a task assignment method to minimize the unfairness and maximize the reward earned by workers. The authors in [14] considered a budget-aware task assignment method to maximize the quality of the completed task. The authors in [15] also considered the task budget and the quality of workers. By employing a “task bundling” strategy, the real-time, budget-aware task package allocation for spatial crowdsourcing (RB-TPSC) is proposed to maximize the number of task assignments [9] and maximize the expected quality of results from workers under limited budgets. The authors in [11] assumed that the worker’s preference on task category affects their behavior. Therefore, a preference-aware task assignment method was proposed. According to this method, the quality of task completion is influenced by the workers’ preferences. Please note that the worker’s preference changes with time. Considering the dynamic nature of workers and tasks, the authors in [16] proposed an online task assignment DRR-UCB to maximize task reliability and minimize travel costs. Instead of providing a local-optimal allocation solution based on a given trade-off among multiple objectives under some constraints, few works [1720] attempted to give all the Pareto-optimal solutions. In addition, the privacy issues in the spatial task assignments are also presented in [21, 22].

A comparison of a few related workers is presented in Table 1. Except for the work presented in [16], which considered the requester’s objective to maximize the task reliability and the worker’s objective to minimize travel costs, most of the existing task assignment algorithms are designed for the requester. The work presented in [13] only considered the worker’s objective.

Unlike the aforementioned works, we first consider a worker’s trade-off between cost and reward simultaneously and propose the task assignment algorithm to simultaneously optimize the cost and reward. Second, we address the task assignment considering the task delay and a worker’s trade-off simultaneously. Thirdly, the scalability of the task assignment algorithm on large-scale datasets is also considered.

3. Problem Formulation

In this section, we formulate the problem. The symbols used in this work are presented in Table 2.

Assuming that the task set comprises the available tasks, where , the task . , where and denote the longitude and latitude of the task , respectively. denotes the maximum number of workers needed by the task , which has been released at the time .

The worker set comprises the available workers, where and the worker is the worker with ID i. , where and denote the longitude and latitude of the worker , respectively. We assume that the worker only accepts the tasks within the area with center and radius . The notation represents the maximum tasks that the worker can undertake, which is known as the capacity of the worker .

Definition 1. (Worker-task coverage). Given being the distance between and , let denote the task coverage set of , such that for every ,The worker also covers the task . The th task accepted by the worker is represented as , which is released at the time . The distance between two locations is computed by using the Haversine formula [23]. In Figure 1, a smaller time means early arrival. and are released before arrives, and they are within the radius of . Therefore, an example of coverage in Figure 1 is .

Definition 2. (Task delay). Given a task , If is assigned to , then the delay of the task is defined asWe assume that a worker is assigned tasks once he arrives, which means that the assignment time of an assigned task is the arrival time of the worker to whom it is assigned. Therefore, the task delay is defined as the interval between the release time and the assignment time.

Definition 3. (Reward for one assignment). Given a task and the worker , if is assigned to , the reward of the assignment is defined as the average value of the task’s budget:where represents the reward that the task pays to the worker .
In this work, we consider a real-time task assignment scenario where a worker is assigned tasks once he/she arrives. We assume that a worker may accept multiple tasks simultaneously, and vice versa. Our objective is to minimize the task delay and maximize the workers’ trade-offs between cost and reward, which is a multiobjective optimization problem (MOP) [24]. The weighted sum [25] is a classical method to solve multiobjective optimization problems due to its simple calculation.

Definition 4. (DSTA problem). Given the set of tasks and the worker , DSTA finds the subset of task , such thatwhere represents the worker’s trade-off between the travel cost and the reward. denotes the preference parameter of the worker, and . indicates that workers care more about the travel cost as compared to the reward and vice versa.
In addition to the goals of the worker, the delay requirements of the task are also considered. While a worker has the same trade-offs for various tasks, the tasks with longer waiting times should be assigned earlier. On the contrary, when multiple tasks have the same delay, we want to assign those tasks to a worker that has a higher trade-off. Therefore, we use the term to discount the worker’s trade-off. Obviously, in terms of a worker-task pair, (5) is maximized by maximizing the worker trade-off and minimizing the task delay simultaneously.
Note that different objectives have different natures. For example, the reward is a positive attribute, but distance is a negative attribute. This is because the final goal is positively affected by the reward, but negatively affected by the distance. Additionally, as different objectives have different value ranges, the objectives with high values are highlighted in the weighted sum [25]. Thus, normalization must be employed to the objectives before applying some of the MOP methods. In this work, the distance, reward, and delay are normalized as expressed in (6), (7), and (8).

4. The Proposed Algorithms

4.1. TAW Algorithm

The task assignment method that only considers the worker’s objective (TAW) is used as the baseline algorithm. The TAW algorithm solves the task assignment problem by only optimizing the reward and traveling cost of a worker, that is, the Definition 3 without the delay part . First, we determine the candidate tasks covered by (Line 2). By computing , we rate the tasks that should be assigned (Line 3 ∼ 6). Next, we sort them in descending order of score (Line 7). Finally, we assign the top tasks to the worker (Line 8 ∼ 11). The computational complexity of this process is .

Input: a set , the worker
Output: the assignment result
(1)
(2)Compute by using (1) and (2)
(3)For each :
(4) If :
(5)  Rate by and get the score
(6)  
(7)Sort in descending order of score
(8)For and >0: #Select the top tasks.
(9)
(10)  
(11)  

Next, by considering the location, budget, and release time constraints, we optimize the task assignment based on the distance, reward, and task delay. We propose two algorithms, that is, the delay-sensitive task assignment algorithm based on greed idea (DSTA-G) and the improved version of DSTA-G based on Geohash (DSTA-GH).

4.2. DSTA-G

In this section, we proposed a basic solution DSTA-G to solve the DSTA problem, which is presented in Algorithm 2. The task set and the worker are taken as input, and the output is the task assignment result of .

Input: a set , the worker
Output: the assignment result
(1),
(2)Compute by using (1) and (2)
(3)For each :
(4) If :
(5)  Rate by (5) and get the score
(6)  
(7)Sort in descending order of score
(8)For and : #Select the top tasks.
(9)
(10)  
(11)  

Please note that Algorithm 2 differs from Algorithm 1 in terms of the method of rating the candidate tasks only. In Algorithm 2, the candidate task are rated by computing (Line 5). The computational complexity of Algorithm 2 is .

4.3. Improving DSTA-G

Algorithm 2 searches all the available tasks to obtain the most suitable task matching with a single worker. The main task is to compute the distance between the tasks and the worker. However, in real-world applications, many workers emerge quickly in a short time simultaneously. This means that a lot of worker-task matching takes place instantaneously. In this scenario, the scalability of Algorithm 2 on large-scale datasets is worth considering. The CPU costs in terms of computing the distance of every worker-task pair are compared for two datasets, that is, (1) 1877 workers and 835 tasks; and (2) 18203 workers and 151849 tasks. The results are 43.53558158874512 and 19923.283917188644 seconds on the dataset (1) and dataset (2), respectively. This shows that the scalability of Algorithm 2 mainly relies on the size of search space.

In order to reduce the search space in location-based service (LBS) [26], spatial indexing structures, such as R-tree [27], quad-Tree [28], and Geohash [29], are widely used for clustering location data. The R-tree is dynamically updated when a data element is inserted or deleted. Please note that maintaining an R-tree requires extra cost. The Quad-Tree is based on a known knowledge of data distribution. It is noteworthy that it cannot be flexibly applied to future data. In contrast to R-tree and Quad-Tree, Geohash overcomes these shortcomings and has been widely used in cases where one-dimensional indexes are required to process the spatial data [3032].

4.3.1. Geohash Fundamentals

Geohash is used to encode a location <longitude, latitude> [33]. First, the latitude interval (−90, 90) is partitioned by dichotomy. The same is performed for the longitude interval (−180, 180). The left subinterval is labeled with 0, and the right subinterval is labeled with 1. A partition example of the latitude value of 39.9232 is shown in Table 3. The number of iterations depends on the requirements of precision. Second, for every 5 bits, the even position is a longitude bit, and the odd position is a latitude bit. Then, one character is used to encode every 5 bits according to Base32. For example, the location with latitude 33.84236 and longitude 117.998 is encoded with the string “9qh0,” while the geohash length is 4.

Every string code represents a rectangular area (Figure 2). As shown in Table 4, the longer code means higher precision and a smaller area. When the geohash length is 4, the width and height of the area are about 36.25 km and 19.55 km, respectively.

Figure 2 illustrates that the location and its nearest neighbors may be divided into two blocks. For example, the nearest neighbor of is located at the other block “9qh1.” If the search space is limited to the block “9qh0,” some adjacent tasks, such as and , are excluded from the search space. However, the excluded tasks may be better suited to the workers. In order to overcome this shortcoming, it appears that the optimal approach is to extend the scan area from the central block towards its eight neighbors. This approach guarantees correct results, although the efficiency is reduced slightly.

4.3.2. DSTA-GH Algorithm

In order to improve the efficiency of task assignment, we proposed a heuristic that is useful for reducing the search space. As discussed, owing to simple computation and a stable structure, Geohash is used in the preselection phase of task assignment. Therefore, we propose a delay-sensitive task assignment based on Geohash (DSTA-GH) described in Algorithm 3.

Input: a set , the worker
Output: the assignment result
(1)
(2)
(3)
(4)determine based on geohashcodes
(5)Compute by using (1) and (2)
(6)For each :
(7) If :
(8)  Rate by (5) and get the score
(9)  
(10)Sort in descending order of score
(11)For and : Select the top tasks.
(12)
(13)  
(14)  

Algorithm 3 mainly includes two steps: (1) the presection phase (Line 2 ∼ 4). In these steps, the Geohash algorithm is used to determine the candidate task set , in which the task is located in the block mapped from the worker and 8 adjacent blocks. (2) The refining phase (Line 5 ∼ 14). This phase is similar to Algorithm 2. The desirable tasks in are assigned to the worker .

Let’s consider the computational complexity of Algorithm 3. In the preselection phase, the complexity of encoding a location depends primarily on the geohash length, which is given by a constant . Determining the candidate tasks has a complexity of O(T) (Line 4). Therefore, the preselection has a computational complexity of . The complexity of the refining phase is . Therefore, the total computational complexity of Algorithm 3 is .

Both Algorithms 2 and 3 are computed in the linear time. However, in Algorithm 2, for every oncoming worker, the distances between the available tasks and the worker are computed to determine the tasks within the radius of the worker. In Algorithm 3, dividing the available tasks into blocks is only performed once. Then, for every oncoming worker, the distance computation is limited to fewer tasks within the block where the worker is located and the corresponding 8 neighbors. Assigning tasks to a large number of workers concurrently, as compared to Algorithms 2 and 3, greatly reduces the computation time.

5. Experiment Evaluation

In this section, extensive experiments are conducted to evaluate the performance of the proposed approach.

5.1. Metrics

In the experiments, we measure the performance of each approach according to the following metrics:(1)Average travel cost : This metric is equal to the average travel distance of matched worker-task pairs:(2)Average reward : This metric measures the average reward obtained by workers:(3)Average delay : This metric measures the average waiting time of a task before being assigned:(4)Successful match : This metric measures the successful assignment and is defined as the total number of matched “worker-task” pairs:

5.2. Datasets

The synthetic data are obtained from Gowalla in October 2010. This dataset includes 151849 tasks and 18203 workers located in the United States. There are four attributes of this dataset, including Task ID, Task’s Location, Worker ID, and Worker’s Location. The radius of the worker is set to 30 km. The other attributes are generated from a uniform distribution over a range. Their exact value is trivial because we focus on the following: under a given radius range of a worker, (1) whether optimizing the desired objectives of both sides is superior to optimizing one side and (2) whether the task assignment method works efficiently on a large-scale dataset.

Moreover, since the acceptable distance of a worker is usually within a few tens of kilometers, the geohash length is set as 4 (Table 4).

5.3. Experimental Results

In order to evaluate the three algorithms, the experiments assume that a large number of workers should be assigned tasks. The workers are divided into 50 groups in order of arrival time. Multiprocess runs concurrently.

As compared to TAM, the task assignment approach considering the task delay and the worker’s trade-off between reward and travel cost simultaneously, that is, DSTA-G and DSTA-GH, significantly reduces the waiting time of tasks (Figure 3) and the travel cost of workers (Figure 4). Moreover, the reward obtained by the workers also increases (Figure 5).

Next, as shown in Figure 6, the number of successful matches of each algorithm at different radiuses is evaluated. on the X-coordinate indicates the radius . For example, R10 indicates the radius R = 10 km. The Y-coordinate is the number of matches. In general, y increases along with x. The number of successful matches under TAW is the same as that under DSTA-G, because they are directly looking for the right tasks for the worker across the whole dataset. Compared with the DSTA-G algorithm, the growth trend of the DSTA-GH algorithm slows slowly. This is because the DSTA-GH algorithm preliminarily limits the scope of task allocation through Geohash clustering. Specifically, since the length of the Geohash encoding is 4, the corresponding rectangular search scope is approximately . Therefore, when the radius R is greater than 20, the number of successful matches of the DSTA-GH algorithm still increases with R, but the growth trend is slower than that of the DATA-G algorithm.

The CPU cost of the total assignments of 50 groups is shown in Figure 7. DSTA-G takes slightly more time as compared to TAW since DSTA-G considers the extra delay of a task. As compared to DSTA-G, DSTA-GH reduces the computation time by about 99.97%, which shows that the improved DSTA-GH works well on large-scale datasets.

6. Discussions

In this section, we discuss the advantages and limitations of the proposed approach:(1)Effective and Efficient. The proposed DSTA-G and DSTA-GH algorithms optimize the objectives of both the requester and the worker. These algorithms have a linear computational complexity.(2)Scalability. By exploiting the spatial index to reduce the search range, the proposed DSTA-GH algorithm significantly improves the efficiency of indexing records, which enables it to work efficiently on large-scale datasets.(3)Limitation. The area of a geohash block varies greatly even if the length of the geohash string increases or reduces by 1. Therefore, the exact length of geohash string is difficult to be determined. However, the range of a block labeled by a given length of a geohash string can be computed. One possible approach of determining the candidate blocks is to choose a string of suitable length and determine the block of a given location, and then extend the blocks to the surrounding neighbors. The suitable level and direction of the extension need further analysis.

7. Conclusions

In this work, we focus on the task assignment approach by optimizing the objectives of both the task requester and the worker. In addition, considering the inefficiency of various spatial task assignment approaches on large-scale datasets, we introduced the geospatial index technique and presented a new solution to improve the efficiency of the task assignment algorithm by improving the efficiency of indexing records.

Data Availability

The synthetic data are obtained from Gowalla in October 2010. This dataset includes 151849 tasks and 18203 workers located in the United States. There are four attributes of this dataset: Task ID, Task’s Location, Worker ID, and Worker’s Location. The radius of the worker is set to 30 km. The other attributes are generated from a uniform distribution over a range. Their exact value is trivial, because that we focus on the following: under a given radius range of a worker, (1) whether optimizing the desired objectives of both sides is superior to optimizing one side and (2) whether the task assignment method works efficiently on a large-scale dataset.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this article.

Acknowledgments

This work was funded by the Guangxi Key Laboratory of Trusted Software (Grant no. kx201727), the Natural Science Foundation of China (Grant nos. 62066010, 61966009, and U1811264), and the Natural Science Foundation of Guangxi Province (Grant nos. 2020GXNSFAA159055, 2019GXNSFBA245049, and 2018GXNSFDA281045).