Fairness plays a vital role in crowd computing by attracting its workers. The power of crowd computing stems from a large number of workers potentially available to provide high quality of service and reduce costs. An important challenge in the crowdsourcing market today is the task allocation of crowdsourcing workflows. Requester-centric task allocation algorithms aim to maximize the completion quality of the entire workflow and minimize its total cost, which are discriminatory for workers. The crowdsourcing workflow needs to balance two objectives, namely, fairness and cost. In this study, we propose an alternative greedy approach with four heuristic strategies to address such an issue. In particular, the proposed approach aims to monitor the current status of workflow execution and use heuristic strategies to adjust the parameters of task allocation. We design a two-phase allocation model to accurately match the tasks with workers. The F-Aware allocates each task to the worker that maximizes the fairness and minimizes the cost. We conduct extensive experiments to quantitatively evaluate the proposed algorithms in terms of running time, fairness, and cost by using a customer objective function on the WorkflowSim, a well-known cloud simulation tool. Experimental results based on real-world workflows show that the F-Aware, which is 1% better than the best competitor algorithm, outperforms other optimal solutions in finding the tradeoff between fairness and cost.

1. Introduction

In China, crowdsourcing has been quickly making progress in various fields in the past years. Zhubajie (http://www.zbj.com) has established itself as a crowdsourcing leader with more than 22 million active workers. This company covers a range of online and offline services, including tutoring and logo and product design. Didi Chuxing (DiDi) is another representative example. DiDi is China’s leading mobile transportation platform that provides a full range of application-based transportation services (including taxi; express, premier, and deluxe bus; designated driving; enterprise solutions; bike and e-bike sharing, automobile solutions, and food delivery) for numerous people. Tens of millions of drivers find flexible work opportunities on the DiDi platform. The platform provides more than 10 billion passenger trips a year. Crowdsourcing has become a fast, convenient, and cost-effective mode of research and production to obtain flexible and cheap resources. Various organizations flexibly outsource work, such as collaborative sensing [1, 2] and human-powered online security [3, 4], to a global pool of workers on a temporary basis. Therefore, addressing fairness in crowdsourcing systems is relevant as a modern issue in ethics.

In crowdsourcing systems, tasks are posted to a crowdsourcing platform, and these tasks are solved by a large group of workers registered at the platform. The Amazon Mechanical Turk (MTurk) [5] is a successful crowdsourcing system that enables individuals or companies to harness collective intelligence from a global workforce to accomplish various tasks, such as human intelligence tasks (HITs). Employers (known as requesters) recruit employees (known as workers) to execute HITs and reward the employees for their labor.

Fairness is important for requesters because workers are likely to engage with HITs, which are published by fair requesters. Fairness is also a key point for crowdsourcing platforms because a negative correlation exists between workers’ satisfaction and turnover. Fairness has a positive effect on the job satisfaction of workers. Such effect of workers’ expectations on platforms’ fairness on the likelihood of their participation is more than that of the considerations of self-interest [6, 7]. For example, workers expect to be allocated a fair number of tasks which is proportional to their matching availabilities for the task. Fairness is an essential concept for sustaining a powerful crowd with substantial participation of workers. Therefore, fairness should be considered when allocating tasks to workers.

Solving complex tasks by using crowdsourcing platforms has emerged in recent years. A collaborative crowdsourcing model is an alternative approach to solve complex tasks on crowdsourcing platforms. In this model, a complex task is generally decomposed into a series of interrelated subtasks. These subtasks are organized in terms of workflow. However, in this scenario, task design and allocation remain challenging. We propose a general method that enables the requesters and workers to work collaboratively for improving the quality of task design. A relevant paper has passed the review of a journal. This study focuses on the fairness-aware task allocation in a collaborative crowdsourcing model.

Traditionally, the optimization of task allocation problem is focused on cost instead of fairness. More challenges are faced when ones research the current problem of crowdsourcing workflows compared to that of simple HITs. The first problem is that the current crowdsourcing platforms, such as Zhubajie and MTurk, cannot directly execute a workflow because it describes a logical structure across tasks. Moreover, the maximal fairness of the whole workflow instead of one simple task needs to be considered. Thus, the research on addressing fairness-aware task allocation in scenarios is limited.

We propose an alternative fairness-aware task allocation approach to complete the workflows with minimal cost and maximal fairness before the deadline to bridge the gap. In particular, the proposed approach aims to monitor the current status of all tasks that are not allocated to workers during the workflow execution and dynamically update the parameter values of allocation algorithms.

This study investigates a practical and important problem, that is, dynamic fairness-aware task allocation, to maximize fairness and minimize cost in crowdsourcing workflows. The summary of the principal contributions of this work is provided as follows:The fairness-aware task allocation problem in crowdsourcing workflows is formulated as a constraint optimization problem based on the customized fairness. A generic two-phase task allocation model inspired by [8] is designed to cover various crowdsourcing workflow scenarios based on the new fair criterion.Four heuristic strategies are proposed to solve the current problem. A well-designed fairness-aware solution called the F-Aware is introduced to minimize the overall cost and target maximum fairness.Performance is assessed using the workflow simulation tool named the WorkflowSim with different workflow benchmarks. Results show that the F-Aware outperforms other optimal solutions in finding the tradeoff between fairness and cost. The F-Aware performs 1% better than the best competitor algorithm, Max_Fairness.

The motivation of the research is to find the trade-off between two objectives, that is, minimal cost and maximal fairness, while allocating crowd tasks to workers. Furthermore, all tasks must be completed before the deadline. The ideas presented are beneficial to solve the problem of task allocation in the crowd context based on the workflow. The methods presented can promote the prosperity of the crowd platform. In addition, the results presented can be applied to other task allocation problems, in which the distributor wants to enforce the fairness among participants.

The remaining parts of this paper are organized as follows. The related work is explored in Section 2. The task allocation problem in the crowdsourcing workflow scenario is formulated in Section 3. The two-phase task allocation model is explained in detail in Section 4. The F-Aware algorithm is explained in detail in Section 5. The experimental evaluation is presented in Section 5. The conclusion is provided in Section 6.

The combination of humans and computers to accomplish tasks that neither can do alone has attracted considerable attention from academic and industrial circles [9]. This idea dates back to the 1960s with the publication of “Man–Computer Symbiosis” by Likelier [10]. Tim Berners-Lee has proposed the concept of a social machine in 2009 and regarded the cooperation between machines and humans as the next direction of web application development [11]. The term “crowdsourcing” was coined by Jeff Howe in 2006 [12]. The MTurk is a pioneering crowdsourcing system and has been successfully used to solve multiple simple tasks. Solving complex tasks by leveraging the crowdsourcing systems remains challenging [13]. An alternative approach is to bring the workflow into the solution of crowdsourcing complex tasks. In particular, a complex task is decomposed into a series of interdependent subtasks which is relatively easy to solve. Workflows are used to express the logical structures across subtasks.

Promoting fair treatment is of utmost importance for effective and capable crowdsourcing systems. If workers believe that the environment that they are working within is fair, we can expect an improvement in the quality of workers’ answers, which is beneficial for requesters, and an increase in retaining and recruiting additional workers, which is beneficial for crowdsourcing platforms [14]. However, existing efforts give much attention to other objectives instead of fairness especially in crowdsourcing workflows.

For the design of crowdsourcing workflows, Kittur et al. have proposed a solution to decompose a complex task on the basis of the MapReduce mechanisms [15]. In their proposed method, task designers must specify the execution sequence of subtasks. Little et al. have explored an iterative workflow paradigm for solving complex tasks, including image description, copyediting, handwriting recognition, and sorting [16, 17], and improved the quality of results by using an iterative algorithm in which the number of iterations is determined by the budget. However, requesters are required to divide each task by using a hard code before the task is posted on a third-party crowdsourcing system, such as MTurk. Dai et al. have improved the iterative workflow model from the aspect of workflow control [18]. Their model can autonomously control workflows without human intervention and yield good results. Lin et al. have proposed the idea of multiple workflows based on a probabilistic graphical model and dynamically implemented switches across these workflows [19]. Their experiments demonstrated good results for named-entity recognition. Bernstein et al. introduced a novel idea of multiple-phase workflow and designed a find–fix–verify crowd programming pattern, which split tasks into a series of generation and review stages for complex crowdsourcing writing [20]. Kulkarni et al. developed the PDS (price-divide-solve) algorithm to guide workers by converting large and complex tasks into microtasks that are appropriate for crowd markets. Zheng et al. proposed a general workflow technology by using a state machine based on recursive decomposition approaches, wherein varying types of crowdsourcing applications could be developed using this platform, to meet the requirement of solving diverse tasks [21]. Xiong et al. extended Zheng’s work by proposing a workflow framework called the SmartCrowd for complex crowdsourcing tasks [22]. Wu et al. presented the Service4Crowd, which was a highly flexible and extensible process management platform for crowdsourcing on the basis of service-oriented architectures [23]. They indicated that the platform could provide a one-stop solution for requesters. Inspired by the above developments, the authors of this study proposed an alternative method that engages workers in task design to enable requesters to achieve a high-quality design of complex tasks.

The design and the allocation of tasks have achieved remarkable progress but remain as open issues in crowdsourcing workflow scenarios. Task allocation aims to maximize task quality and minimize cost by allocating tasks to appropriate workers [24]. A task allocation method is crucial in allocating tasks to the most appropriate workers. Vaughan et al. explored the problem of allocating heterogeneous tasks to workers with different and unknown skill sets in crowdsourcing markets [25]. Khazankin et al. proposed solutions by extending standards, such as web service-level agreement, to ensure the quality between the crowd consumers and the crowdsourcing platform and provided a skill-based crowd scheduling algorithm on the basis of negotiated agreements [26]. Karger et al. considered a general model of crowdsourcing tasks to minimize the total price [27]. Boutsis et al. explored the most efficient allocation of tasks to workers to achieve their successful completion under real-time constraints [28, 29]. Unlike solving simple tasks, the quality of solving complex tasks needs to ensure the quality of the entire workflow. Several studies have been conducted to determine when and how to publish the tasks on the crowdsourcing platform to complete the workflows with minimum cost and without missing the deadlines. Khazankin et al. formulated the problem as a time-constrained optimization problem [30]. Tang et al. adopted heuristic strategies to improve Khazankin et al.’s work [31].

Fairness in crowdsourcing has been considered in providing fair wages and detecting malicious workers. Franke et al. first highlighted the importance of fairness in the context of crowdsourcing and demonstrated that the workers’ expectations on fairness have a strong effect on their decision to participate in solving tasks [7]. Faullant et al. supported this argument by further exploring how the different types of fairness, that is, distributive (a fair amount and distribution of the offered reward) and procedural fairness (fair procedures to determine the winners), affect the workers’ intentions to participate in future crowdsourcing tasks and their loyalty toward the platform [14]. Recently, researchers have provided incentive strategies to improve workers’ perceived fairness [32, 33]. McInnis et al. have viewed wage discrimination as the wrongful rejection of work, unfair compensation amount, or delayed payment [34]. Studies that address malicious workers through task assignment and workers’ reputation focused on the quality, reliability, and total cost of workers’ contributions [35]. These schemes are requester-centric and do not guarantee fair task allocation to workers. The problems in some particular crowdsourcing fields have been discussed, but the solutions of fair task allocation of crowdsourcing workflows are very limited [36].

The abovementioned studies show that the research on task allocation has attracted increasing attention by considering the characteristics of workers. All the presented results of the existing literature have provided references and guidelines for the current research on the fair task allocation of crowdsourcing workflows.

3. Problem Formulation

This study considers a crowdsourcing workflow system. Employers (i.e., requesters) recruit employees (i.e., workers or machines) to complete specific tasks and provide them with varying values of wages (i.e., reward). In this section, the fairness-aware task allocation architecture is designed for crowdsourcing workflows, and some preliminary definitions are presented.

3.1. Crowdsourcing Workflow System

A complex task, which comes from a requester, is designed as a workflow that consists of interdependent subtasks. The focus is to distribute these subtasks to workers or machines and consider cost and fairness. In other words, an allocation strategy is designed and proposed. Figure 1 illustrates the basic structure of the system, in which the fairness is considered a major factor in picking workers or machines. In this scenario, a complex task submitted by a requester is first converted to a workflow. Then, the crowdsourcing platform identifies the nominees for each subtask to be allocated. Candidates who come from nominees who volunteer to accept the subtask are confirmed. Finally, the platform allocates the subtask to the candidate in accordance with the allocation strategy. The goal of this study is to optimize the distribution strategy.

The basic definitions are as follows.

Definition 1. Requesters are people who submit complex tasks to the crowdsourcing platform. The set of requesters is represented as , where denotes the i-th requester.

Definition 2. Workers are people who complete tasks and earn money. The set of workers is represented as , where denotes the i-th worker. Each worker is associated with a set of attributes . is the number of tasks allocated to worker . is the number of accepted tasks. is the probability of accepting a task. is a skill vector, where is a Boolean value that identifies whether the worker has the j-th skill, and is the lowest reward for completing a task.

Definition 3. Machines are a group of computing nodes that consist of a series of computing services. The set of machines is represented as , where denotes the i-th machine. Each machine is associated with a set of attributes . is the number of tasks allocated to machine . is the number of accepted tasks. is the probability of accepting a task. is a configuration vector , where is a Boolean value that identifies whether the machine has the j-th configuration, and is the lowest reward for completing a task.

Definition 4. Tasks can be defined as a set , where denotes the i-th task. Each task is associated with a set of attributes . is the type of task used to classify a task as either machine or human (0 or 1, respectively). is the business type of task , such as ui-design, programming. is the start time of task . is the receiving time of task . is the time required to finish task . is the end time of task . and are dynamically changing in terms of workflow progress. tells the worker the amount of reward he will receive after successfully completing a task on time. Initially, variable will not be assigned a value until available workers or machines are found. is the condition vector which must be satisfied to complete the task .

Definition 5. Fairness can be defined as an indicator that shows two similar workers or machines with similar probability of obtaining the same task. Two different workers or machines are similar if they have similar availabilities. In other words, the parameters of a worker or a machine are similar in context.

3.2. Workflow Model

The workflow can be represented by a directed graph that indicates the dependency of data and the order of execution across tasks. Therefore, the workflow can be defined as a quaternion: , where denotes the collection of task nodes. corresponds to the number of vertices in graph . Each node in the graph represents a human or machine task, which is the smallest allocation unit. The directional edge in the graph denotes the order of execution between tasks and . cannot be executed until is completed. Therefore, task is called the predecessor of task , and task is the successor of task . In this study, denotes the predecessor set of task , and denotes the successor set of task . Start time is denoted as . End time is denoted as .

3.3. Measurement of Fairness

In the context, fairness, which is shown in Definition 5, is the probability that two similar workers or machines accomplish the same task. Thus, two factors, that is, and , are considered for measurement. Each worker or machine is associated with a local allocation ratio, that is, or , respectively. The meaning of the two symbols, and , is interpreted in Definitions 2 and 3. A worker’s and machine’s local allocation ratios can be calculated using the two following equations, respectively.

The fairness of the allocation strategy can be defined as the proximity of local allocation ratios of similar workers or machines. Accordingly, the fairness of each participant can be calculated while crowdsourcing a workflow using the two following equations:where is the number of similar workers or machines.

3.4. Formalization

In collaborative crowdsourcing scenarios, the procedure of fairness-aware task allocation can be formally described as follows. Given a workflow , a set of workers , and a set of machines , the allocation of tasks in the workflow to workers or machines maximizes the overall fairness of the allocation strategy and minimizes the total cost under the constraint of completing the total workflow before the deadline.

Overall fairness: let be the overall fairness, which can be expressed as

Total cost: Let be the total cost, which can be expressed as

On the basis of equations (5) and (6), we formulate the fundamental research problem of the fair allocation of the workflow among workers and machines as an objective function.

Maximizesubject to

4. Allocation Model

In this section, a two-phase allocation model is described in detail. In this case, a requester, whose responsibility can be found in Definition 1, submits a complex task with some constraint conditions. Then, these tasks with conditions are converted as a workflow. The parameters , , , and , which are interpreted in Definition 4, are estimated by the log of completed tasks. Other parameters change with the completion progress of the workflow. The platform has knowledge on workers’ and machines’ parameters because the parameters are registered in the platform in advance.

First, the appropriate set of workers and machines needs to be found in accordance with the task’s constraint conditions to allocate a task. These workers and machines are called nominees. These workers are nominated to the task if their conditions, the task accept ratio, and the available vector satisfy the task’s constraints. The same is true for a nominated machine. In other words, the platform searches participants among the given set of workers and machines under the constraint conditions. Then, these similar nominees are grouped together. A value determined by the distribution of different groups’ reward is assigned to the parameter . The candidates are found by progressive multicasting the task to similar nominees in batches until at least one candidate is found. The parameter is determined by the number of the members of two similar groups.

Second, one worker or one machine is picked from the candidates. The main goal of this step is to ensure that the workflow can be completed before its deadline while minimizing the total cost and maximizing the overall fairness. Thus, the longest path of the workflow is calculated first to satisfy the deadline. denotes all the paths included in the workflow from the start node to the end node. denotes the longest path in the set . Two virtual nodes, and , are added to the path for convenience and represent the start and the end node, respectively. The discovery process of the longest path is described using the pseudocode in Algorithm 1. In the context, Algorithm 1 is used to check .

Input: given workflow
(1) Let
(2) Let current_node = 
(3)While current_node! =  Do
(4)  Let =
(5)  Let longest_time = 0
(6)  Let be null
(7)  For i = 1 to Do
(8)   If > longest_time Then
(9)    longest_time = 
(10)      = 
(11)   End If
(12)End For
(13) add to
(14) Let current_node = 
(15)End while

One available constraint solver can always be found to solve the current problem. However, the computational complexity is inevitable when using these solvers. Thus, heuristic strategies are used to deal with this issue instead of the constraint solvers in this study. The simplest allocation strategy is random selection, which is also used as one baseline in the following experimental evaluation. Picking the most proper candidate is beneficial to maximize the overall fairness on the basis of the goal maximizing the local fairness. However, the strategy may result in additional budgets. Similarly, the low-cost strategies may result in the unfairness to workers.

A fairness-aware allocation algorithm called F-Aware is designed to cope with such an issue, which is described in Algorithm 2. The algorithm greedily allocates each task to the most desirable participant. The details are as follows. For each task in the workflow , the parameters , , and are manually initialized in accordance with the characteristics of the task. and are estimated in accordance with the log of completed tasks. The F-Aware first calls Algorithm 1 to obtain the longest path and checks whether the workflow can be completed before the deadline. Second, the algorithm finds the successive nodes of the start node , searches nominees for each node, and confirms the corresponding candidates. Third, the algorithm sorts the candidates in decreasing order of or . Finally, the current task is allocated to the top candidate. The parameters of the remaining tasks need to be updated after the task is finished. The process continues until all nodes are visited.

Input: given workflow:
Output: {totalCost, overallFairness, completedTime}
(1)for , initialize , , , and
(2)While > 0 Do
(3) Get by calling findLongestPath ()
(5)  Let current_node = 
(6)  Let S=
(7)  For i = 1 to Do
(8)   Get the list of nominees for
(9)   Group nominees by similarity
(10)   Set for each group
(11)   Get candidates by multicasting nominees by group
(12)   Calculate the overallFairness
(13)   Calculate the indicator or for all candidates
(14)   Sort the list by or
(15)   Select the top candidate for
(16)   Remove from
(17)   totalCost + =
(18)   completedTime = 
(19)  End For
(20) Update parameters of remaining tasks after finishing S
(21)  End If
(22)End While
(23) return {totalCost, overallFairness, completedTime}

5. Performance Evaluation

In this section, we first analyze the complexity of the allocation algorithm by Big O notation. Then, we evaluate solutions through experiments on the basis of real-world workflows. An object function combing all optimization goals is given as equation (9) to compare various solutions easily and effectively.

The object function has the same tendency with because the goal is maximizing . However, the exponential part decreases with the total cost . The parameter determines the importance of the total cost , where . Only the fairness objective is considered when it is assigned to 0. The total cost is paid increasing attention with an increase in this parameter.

Two sets of experiments are conducted. In the first set, the same dataset with different sizes is used to compare the F-Aware with three other competitor algorithms from runtime, fairness, cost, as well as the combination of fairness and cost. The random selection is considered one baseline. The strategy, which is most beneficial to the maximum of the fairness objective, is considered as the second competitor. The intuition behind the strategy is finding a reasonable cost. The last strategy is most beneficial to the minimum total cost. The intuition behind the strategy is that an economic and fair allocation solution can be found. The second set of experiments is performed to evaluate their performance on the basis of the different datasets with the same scale.

5.1. Algorithm Complexity

Recall that the allocation algorithm, that is, Algorithm 2, consists of a loop that includes Algorithm 1 and a nested loop. If , then the complexity of the outer loop is in the worse case. For Algorithm 1, the complexity is in the worse case. Thus, the complexity of Algorithm 2 is .

5.2. Experimental Setup

Setting up experimental environments for the crowdsourcing workflow is expensive and difficult with limited resources. Thus, the simulation tool, WorkflowSim, is utilized to achieve our goals. The WorkflowSim is an extension based on the simulation tool, CloudSim [37]. This tool is used to simulate workflow management and scheduling in a dynamic cloud environment. The WorkflowSim has better accuracy and wider support than existing solutions in terms of supporting DAG and simulating scientific workflows in distributed environments [38, 39].

The proposed algorithms are implemented on the basis of the WorkflowSim. The class CondorVM is extended to simulate a worker, in which the name of the corresponding class is Worker. A machine is also implemented on the basis of the class CondorVM, in which the codes are the Machine class. The class Job is extended to simulate a crowd task, and the corresponding class is the class CrowdTask. The class CrowdWorkflowParser, which is used to read real workflow datasets from external files to generate suitable crowd tasks, is rewritten. The package org.workflowsim.crowdsourcing.scheduling includes the above algorithms. The basic crowdsourcing framework is implemented on these classes, that is, CrowdWorkflowPlanner, CrowdWorkflowClusteringEngine, CrowdBasicClustering, CrowdReclusteringEngine, CrowdWorkflowEngine, CrowdWorkflowScheduler, and CrowdFailureGenerator, in the context of crowdsourcing workflows. Readers can find the code at the following URL: https://github.com/hhluci/WorkflowSim-1.0.

Five real workflow applications in the Pegasus project are chosen to perform two sets of experiments.CyberShake: this workflow is used by the Southern California Earthquake Center to classify earthquake alarms [40]Epigenomics: this workflow uses DNA sequence lanes to generate multiple lanes of DNA sequences [41]Montage: this workflow is created by the NASA/IPAC stitches to gather multiple input images for the creation of custom mosaics of the sky [42]Inspiral: this workflow is used to generate and analyze gravitational waveforms from the data collected during the coalescing of compact binary systems [43]Sipht: this workflow is used in bioinformatics to search for small nontranslated bacterial regulatory RNAs [44]

The mentioned datasets are processed to meet the current requirements after obtaining them from external storage.

5.3. Experimental Results

In the first set of experiments, the runtime, fairness, and cost are observed as a function of the workflow size. The experiments are performed on the dataset, CyberShake workflows, with 30, 50, 100, and 1000 tasks. The least reward of a worker and a machine is assigned using a normal distribution. The mean is set to 0.3 cents, and the upper and lower limits are 0.1 and 0.6 cents, respectively [45]. The parameters in Definitions 2 and 3 are initialized as follows: and are initialized using a normal distribution. The mean is set to 0.5, and the upper and lower limits are 0.0 and 1.0, respectively. The parameters and are randomly initialized as a Boolean vector with 10 elements. These workloads are scheduled on a heterogeneous cloud infrastructure with 14 physical hosts, 10 virtual machines, and 10 virtual workers, which are created using different types of virtual machines in terms of MIPS, RAM, and BW. The parameters in Definition 4 are initialized as follows: The parameter is assigned by a uniform distribution. is initialized by a uniform distribution, and the upper and lower limits are 0 and 24 h, respectively. The parameter comes from the data file, and is set to 1.0. The parameter is determined by the member number of the similar group.

Figure 2 presents the related results. In the figures, the x-axis represents the number of tasks to allocate, whereas the y-axis represents different metrics. Different series represent different allocation algorithms. Figure 2(a) plots the running time as a function of the number of tasks. As shown in Figure 2, the difference between the algorithms is negligible in terms of runtime on the small datasets. Figures 2(b) and 2(c) are discussed together because they are complementary and plot the overall fairness and the total cost, respectively, as a function of the number of tasks. As shown in Figure 2(b), the F-Aware and the Max_Fairness are better than the Random and the Min_Cost. The advantages are highlighted at the large dataset. The F-Aware is lower than the Max_Fairness. For 1000 tasks, the Max_Fairness reaches 0.98, whereas the F-Aware is 0.06 less than its competitor. The F-Aware immensely surpasses the Max_Fairness, as shown in Figure 2(c). The Max_Fairness costs 376 cents to complete a workflow with 1000 tasks, whereas the F-Aware costs only 347 cents. One hundred tasks can be completed for 29 cents because the average cost of a task is 0.3 cents. This finding exhibits the remarkable difference between the two algorithms on the metric.

Figure 2(d) presents the performance of algorithms in terms of the objective, equation (9) as a function of fairness and cost. In the experiment, is set to 1.0 to observe the balanced outcome of fairness and cost. Notably, the object function decreases with the increase in task count in the figure. The cost increases faster than the fairness with increasing task count because the increase in the cost results in the decrease of the object function. As shown in Figure 2, the F-Aware performs better than the Max_Fairness in terms of the objective. The maximum gap between the two algorithms is 1%. The Random and the Min_Cost allocation algorithms show low balance.

The Max_Fairness is the best competitor of F-Aware if fairness is considered. Only the Min_Cost can match the F-Aware when only the cost is considered. The F-Aware is the best option when the two factors are considered.

In the second set of experiments, the same metrics are used as the first set of experiments on the basis of different datasets with the same scale. The CyberShake_1000, the Epigenomics_997, the Inspiral_1000, the Montage_1000, and the Sipht_1000 are applied. Figure 3 presents the related results. Figure 3(a) shows the runtime in different workflows. The difference between the algorithms is small. The F-Aware performs slightly better than the Max_Fairness. The Min_Cost performs poorly in terms of runtime. Figure 3(b) illustrates the fairness in different workflows. The F-Aware and the Max_Fairness are superior to the Min_Cost and the Random on all datasets. The Max_Fairness performs approximately 6% better than the F-Aware on other datasets. However, one requester may pay more by using the Max_Fairness than by using the F-Aware, which is shown in Figure 3(c). Figure 3(d) confirms that the F-Aware can find the tradeoff between fairness and cost, thereby increasing fairness while maintaining low cost. In summary, the F-Aware is better than other algorithms in terms of fairness and cost.

6. Conclusion

Requester-centric task allocation algorithms are discriminatory for workers. Strategies are presented to bridge such a gap in this study. This study is valuable in addressing crowdsourcing problems, but the strategies presented here are directed to solve the crowdsourcing workflow. A combined optimal function is designed to maximize the overall fairness and minimize the total cost. In the two-phase allocation model, a set of nominees are identified using the availabilities of the participants for each task. The task is allocated to nominees by using a batch progressive strategy for determining the candidates. The batch size is determined by the size of the similar group. The problem can be reduced to a multiobjective optimal problem when the candidates are identified for one task. Four heuristic strategies are designed to solve the problem to avoid the computational complexity trouble. These strategies are tested on the basis of the WorkflowSim by using scientific workflow benchmarks. The evaluation has shown that the F-Aware outperforms other optimal solutions in finding the tradeoff between fairness and cost. The F-Aware performs better than the best competitor algorithm, Max_Fairness.

The experimental evaluation shows that the tradeoff between fairness and cost can be found for a crowdsourcing workflow. Heuristic strategies can effectively increase the overall fairness and maintain a proper cost. The long-term effects of fairness-aware task allocation strategies in real crowdsourcing workflow platforms should be evaluated in future studies.

Data Availability

The data used to support the findings of this study are included within the article.


Any opinions, findings, conclusions, and recommendations expressed in this publication are from the authors and do not necessarily reflect the views of the sponsors.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.


The authors would like to thank everyone, including Song Wenai, Liu Zhongbao, Cai Xingwang, and other members of the research group, for their suggestions and ideas. This work was supported by the Natural Science Foundation of Shanxi Province of China under Grant 201801D121151.