Review Article

Creation of Reliable Relevance Judgments in Information Retrieval Systems Evaluation Experimentation through Crowdsourcing: A Review

Table 7

Run-time methods.

MethodDescription

Majority voting (MV)MV is a straightforward and common method which eliminates the wrong results by using the majority decision [31, 61, 68]

Expectation maximization (EM) algorithmEM algorithm measures the worker quality by estimating the accurate answer for each task through labels completed by different workers using maximum likelihood. The algorithm has two phases: (i) the correct answer is estimated for each task through multiple labels submitted by different workers, accounting for the quality of each worker and (ii) comparing the assigned responses to the concluded accurate answer in order to estimate quality of each worker [69]

Naive Bayes (NB)Following EM, NB is a method to model the biases and reliability of single workers and correct them in order to intensify the quality of the workers’ results. According to gold standard data, a small amount of training data labeled by expert was used to correct the individual biases of workers. The idea is to recalibrate answers of workers to be more matched with experts. An average of four inexpert labels for each example is needed to emulate expert level label quality. This idea helps to improve annotation quality [68]

Observation of the pattern of responsesLooking at the pattern of answers is another effective way of filtering unreliable responses as some untrustworthy workers have a regular pattern, for example, selecting the first choice of every question

Probabilistic matrix factorization (PMF) Using probabilistic matrix factorization (PMF) that induces a latent feature vector for each worker and example to infer unobserved worker assessments for all examples [70]. PMF is a standard method in collaborative filtering through converting crowdsourcing data to collaborative filtering data to predict unlabeled labels from workers [71, 72]

Expert reviewExpert review uses experts to evaluate workers [73]

Contributor evaluationThe workers are evaluated according to quality factors such as their reputation, experience, or credentials. If the workers have enough quality factors, the requester would accept their tasks. For instance, Wikipedia would accept the article written by administrators without evaluation [27]. For instance, the tasks that are submitted by the workers who have a higher approval rate or master workers would be accepted without doubt

Real-time supportThe requesters give workers feedback about the quality of their work in real time while workers are accomplishing the task. This helps workers to amend their works and the results showed that self-assessment and external feedback improve the quality of the task [48]. Another real-time approach was proposed by Kulkarni et al. [74] where requesters can follow the workers workflows while solving the tasks. A tool called Turkomatic was presented which employees workers to do tasks for requesters. While workers are doing the task, requesters are able to monitor the process and view the status of the task in real time