Recommendation of Crowdsourcing Tasks Based on Word2vec Semantic Tags

Pan, Qingxian; Dong, Hongbin; Wang, Yingjie; Cai, Zhipeng; Zhang, Lizong

doi:https://doi.org/10.1155/2019/2121850

Wireless Communications and Mobile Computing

On this page

Abstract Introduction Related Works Conclusion Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Special Issue

Algorithm Optimization for Wireless Mobile Applications of Smart Cities

View this Special Issue

Research Article | Open Access

Volume 2019 | Article ID 2121850 | https://doi.org/10.1155/2019/2121850

Recommendation of Crowdsourcing Tasks Based on Word2vec Semantic Tags

Qingxian Pan,^1,2Hongbin Dong ,¹Yingjie Wang,²Zhipeng Cai,³and Lizong Zhang⁴

Guest Editor: Michele Nogueira

Received01 Nov 2018

Revised18 Feb 2019

Accepted03 Mar 2019

Published24 Mar 2019

Abstract

Crowdsourcing is the perfect show of collective intelligence, and the key of finishing perfectly the crowdsourcing task is to allocate the appropriate task to the appropriate worker. Now the most of crowdsourcing platforms select tasks through tasks search, but it is short of individual recommendation of tasks. Tag-semantic task recommendation model based on deep learning is proposed in the paper. In this paper, the similarity of word vectors is computed, and the semantic tags similar matrix database is established based on the Word2vec deep learning. The task recommending model is established based on semantic tags to achieve the individual recommendation of crowdsourcing tasks. Through computing the similarity of tags, the relevance between task and worker is obtained, which improves the robustness of task recommendation. Through conducting comparison experiments on Tianpeng web dataset, the effectiveness and applicability of the proposed model are verified.

1. Introduction

Deep learning was proposed by Geoffrey Hinton et al. in 2006. This method simulates human brain neural network to model and realize multiple level abstraction [1, 2]. In 2006, Jeff Howe of American Wired magazine reporter proposed crowdsourcing concept [3]. As a new kind of business model, crowdsourcing has been widespread concern in various fields and becomes the new hot point of computer research fields. Task requester, crowdsourcing platform, and worker make up crowdsourcing system [4]. The process of crowdsourcing includes designing task, publishing task, selecting task, sensing task, submitting solution, and integrating solution. Among them, task selection is the key phase in the process of crowdsourcing. This is the key to complete crowdsourcing task that the appropriate worker selects appropriate task in appropriate time [5].

The popular crowdsourcing platforms use task searching to get the favourite task by keyword searching [6]. However, with the rapid development of crowdsourcing, the problem of information overload is more and more serious. In addition, it is more and more difficult to get the favourite crowdsourcing task for worker. Recommender system is an effective medium to solve the problem, which is used on many E-Commerce Platforms, such as Alibaba, Amazon, and Netflix [7]. But there are many problems which are not solved in recommender systems, such as similarity calculation, the lower recommended accuracy, data sparseness, and cold boot. In brief, improving the accuracy and reliability of recommender systems has been paid more attention by scholars.

However, individual recommendation research of the task is lesser in crowdsourcing, and task selection is relied on hobbies and expertise. Few crowdsourcing platforms can actively recommend task. This paper researches the crowdsourcing tasks recommendation model based on Word2vec semantic tags in order to achieve individual recommendation of crowdsourcing tasks [8].

The main contributions of this paper include following three contents:(1)Compute the similarity of word vectors and build the semantic tags similar matrix database based on the Word2vec deep learning.(2)Research the task recommending model based on semantic tags to achieve the individual recommendation of crowdsourcing tasks. This paper computes similarity of tasks and workers based on the semantic tag similar matrix.(3)Utilizing the Tianpeng Web dataset, the experiments are conducted. The experimental results show that the model is feasible and effective. The model can be used in other fields according to the different semantic databases.

This paper is organized as follows. Section 2 reviews the related works. The Work2vec is discussed in Section 3. In addition, the tasks recommendation model and realization method based on semantic tags are researched in Section 4. The comparison experiments, as well as the analysis for the experimental results, are introduced in Section 5. The conclusion is presented in Section 6.

In order to discuss the related works for recommendation of crowdsourcing, we, respectively, introduce the related works of crowdsourcing and recommendations.

2.1. Crowdsourcing

In 2006, Jeff Howe proposed crowdsourcing concept firstly [3]: a company or an institution outsources the tasks performed by an employee in the past to an unspecific public network in a free and voluntary manner. With the development of crowdsourcing technology, the different crowdsourcing concepts appeared. Chen et al. [9] summarized 40 different crowdsourcing definitions. Feng et al. [10] gave the definition of crowdsourcing according to the basic features of crowdsourcing. According to the definition, crowdsourcing is a distributed problem-solving mechanism opening to the Internet public, and it completes the tasks that are difficult to complete by a computer through integrating computers and the unknown public on the Internet [11].

Crowdsourcing is successfully applied in language translation, image recognition, intelligent transportation, software development, entry interpretation, tourism photography, and other fields, which has become the perfect embodiment of group wisdom [12, 13]. Crowdsourcing is made up of the task requester, crowdsourcing platform, and workers. The crowdsourcing workflow includes designing tasks by task requester, publishing tasks, selecting tasks by workers, solving tasks, submitting answer, and arranging answer. The workflow of crowdsourcing is shown by Figure 1. The public participation is the basis of crowdsourcing. And the key to high-quality complete crowdsourcing tasks is to recommend appropriate tasks to appropriate worker in appropriate time [14].

2.2. Recommender Systems

With the arrival of big data era, the problem of information overload is more and more serious and that finding the useful and best information is more and more difficult. Recommender Systems is an effective medium to solve the above problems [15]. However, there are some inherent defects in recommendation systems, such as low accuracy, data sparseness, cold boot, the defects of the centralized system, similarity calculation, and being easy to be attacked. In addition, many recommender systems applied to business systems, whose purpose is to sell more goods and seek the maximum benefits, rather than to recommend the best commodities to users. In brief, the credibility and accuracy of recommendation systems need to be improved, which has attracted the attention of scholars. Yang et al. [16] proposed a recommender system based on transfer learning. Chen et al. [17] proposed a recommender system based on bind context. Tang et al. [18] researched recommender system based on crossing knowledge. Liu [19] and Zhou et al. [20] researched recommender systems for social recommendation. Combining Markov and social attributes of users, Wang et al. [21] proposed a probability-based recommendation model to recommend items for users.

Crowdsourcing task recommendation is mainly from the perspective of crowdsourcing platform. Based on the task discovery model, crowdsourcing platform recommends related tasks according to the preferences of workers [5]. The main crowdsourcing platforms basically adopt the way of task search and rarely adopt the method of task recommendation [22]. Some task recommendation methods were researched based on traditional recommendation methods, including content-based recommendation, collaborative filtering, and mixed recommendation algorithms. Ambati et al. [23] proposed the use of task and workers' historical information for task recommendation. Yuen et al. [24] proposed a worker-task recommendation model through combining the historical information of workers and browsing history. Deng et al. [25] researched the problem of maximizing task selection for spatiotemporal tasks.

3. Word2vec

In 2003, Bengio et al. [26] proposed Neural Network Language Model-NNLM based on 3 levels. NNLM is used to compute the probability of the next word of a context, and word vector is the byproduct during training. Word2vec is a tool based on deep learning to compute the similarity of word vector which was proposed by Google company in 2013 [27]. It converts the word into word vector and computes similarity according to the cosine between word vectors. When using the tool, the texts after segmentation are input, and the output-word vector can be used to do a lot of Natural Language Processing (NLP) related work, such as clustering, looking for synonyms, and part of speech analysis.

Word2vec uses word vector presentation mode based on Distributed representation. Distributed representation is proposed by Hinton in 1986 [28]. Its basic thought is to map each word into a -dimension real vector by training ( is a hyperparameter in the model) and to judge the semantic similarity between them according to the distance between words (such as cosine similarity, Euclidean distance). It uses a ‘3 layers neural network’, input layer-hidden layer-output layer. Its core technology is to use Huffman code according to word frequency, which makes the activated content basically consistent of all word frequency similar words in hidden layer. The higher the frequency of the word, the less the number of hidden layers they activate, which effectively reduces the computational complexity.

Compared with Latent Semantic Index-LSI and Latent Dirichlet Allocation-LDA, Word2vec uses the context of words and makes the semantic information richer. There are two kinds of training model-CBOW (Continuous Bag-of-Words) and Skip-gram in Word2vec, which are shown by Figure 2. Two models both include input layer, projection layer, and output layer. CBOW model predicts the current words according to the known context, and Skip-gram model predicts context according to the current words.

(a) CBOW model

(b) Skip-gram model

In this paper, the objective optimization function of CBOW is expressed by where means the word vector of the root node in the Hoffman tree, represents the context of word , that is, the collection of peripheral words, represents the nodes number of the path , and represents Huffman code of the word ; represents the vectors corresponding to nonleaf nodes of the path . Therefore, the logistic regression probability that passes a node in the Hoffman tree is shown by (2). The corresponding parameter is shown by (3).In order to clearly represent the meaning of logistic regression probability , we combine (2) and (3) to obtain the value of , which is shown by For avoiding the value of too small, logarithm Likelihood function is used to represent the objective function; thus, (1) can be converted into Through combining (4) and (5), the objective function is shown by Therefore, (6) is the object function of CBOW in this paper. Word2vec uses random gradient ascent method to optimize the object function of CBOW.

4. The Tasks Recommendation Model and Realization Method Based on Semantic Tags

4.1. Basic Model Frame and Mathematical Computation Model

The results and discussion may be presented separately, or in one combined section, and may optionally be divided into headed subsections.

The core of the model is the research of tag similar matrix. The model uses tag similar matrix to compute the similarity of workers and tasks, produces worker-tag similar matrix, and realizes tasks recommendation or workers recommendation. In model, tag similar matrix is obtained by Word2vec computing. Worker-tag matrix is got according to history work information of the worker, registration information, etc. And task-tag matrix is got according to task description, task classification, etc.

Define tag similar matrix , , is a symmetric matrix, that is, , represents the similarity of tag and tag , , and its value is got through using Word2vec tool to compute. Define worker-tag matrix , , and, among them, .

We define the task-tag matrix , , and, among them, .

Therefore, the worker-task similar matrix is obtained by (7), where is the worker-tag matrix, is the tag similar matrix, and means the task-tag transposed matrix. Through (7), the relationship between workers and tasks can be obtained.

4.2. Basic Flow

The main steps of the process of the proposed recommendation model are shown as follows: (1) compute the word vectors based on Word2vec; (2) computing the similarity of word vectors; (3) generating the tag similar matrix; (4) obtaining the worker-tag matrix and task-tag matrix; (5) computing the worker-task similarity matrix; (6) standardization and normalization; (7) tasks and workers recommendation. Tag similar matrix generation uses Word2vec tool. Worker-task similarity computation uses mathematical methods introduced in the previous section. The section mainly introduces standardization and normalization method.

standardization method: the norm definition of vector is shown as follows: .

In order to make normalized to the unit norm, the mapping between and is established, so that the norm of is 1, and the proof is shown as follows:where the value of is shown by In order to get the standardization and generality of data, the standardization data of is normalized, so that the data fall in the interval , the conversion formula is shown by (10), where means the minimum in , and is the maximum in .

5. Experiment and Simulation

In this section, we conduct the comparison experiments on the simulation dataset and real dataset, respectively. The real dataset is the dataset crawled from Tianpeng web site.

In the experiment, text8 is corpora training set, and experimental environment is Intel Core (TM) i5-337U CPU @1.8GHz dual-core, and 8GB memory.

5.1. The Experiments Conducted on Simulation Dataset

In this group of comparison experiments, the training parameters are shown in Table 1.

In addition, the tag similar matrix after training is shown in Table 2. In the matrix, the elements indicate the similarities between tags.

In this group of experiments, there are 100 workers, 50 tasks, 2000 tags in the experiment. The worker-tag matrix is generated randomly, which is shown in Table 3. The elements in Table 3 represent the similarities between workers and tags. The task-tag matrix is shown in Table 4. The elements in Table 4 indicate the similarities between tasks and tags. After computing the worker-task matrix, the standardization and normalization of worker-task matrix are shown in Table 5. The elements in Table 5 mean the similarities between workers and tasks.

Recall, precision, and F-measure are commonly used evaluation indexes [29]. The computing methods for the three evaluation indexes are shown by (11), (12), and (13). According to (11), (12), and (13), it can be seen that F-measure index is the comprehensive measure index through considering both recall and precision.The threshold values are 0.55, 0.6, and 0.65, respectively, and the recall, precision, and F-measure of the 50 tasks are obtained. The comparison experimental results on recall, precision, and F-measure indexes are shown by Figures 3, 4, and 5, respectively. In these experiments, x-coordinate indicates the Task-tag matrix T, and y-coordinates are recall rate, precision rate, and F-measure rate, respectively. From the experimental results, it can be seen that threshold=0.6 has better performance than other two thresholds comprehensively.

In addition, we compare the proposed method with the method of tasks research. The experimental result is shown in Figure 6, where x-coordinate indicates the Task-tag matrix T and y-coordinate means the number of workers. The method used in this paper is better than the method used in tasks research, which proves the effectiveness of the method of this paper. In addition, the potential workers can be found by lowering the threshold, which can be used to analyze the potential users.

5.2. The Experiments Conducted on Tianpeng Dataset

The data collected from the Tianpeng web site were collected to form a corpus for training, and the tag similarity matrix was obtained as shown in the Table 6.

We select 510 workers and 371 tasks from Tianpeng dataset as experimental objects. Utilizing the dataset, we conduct the comparison experiments to verify the effectiveness of the proposed model. In the comparison experiments, 0.6 is taken as the threshold, and 20 tasks are randomly selected as recommended objects. The experimental results were compared with binary map matching and greedy algorithm in terms of recall rate, accuracy rate, and F-value measure indexes.

According to the recall measure index, the comparison experimental result is shown by Figure 7. The x-coordinate indicates the Task-tag matrix T, and y-coordinate presents the recall rate. From the experimental result, it can be seen that the proposed recommendation model has the best performance on recall rate through compared with greedy algorithm and bipartite graph matching. In addition, the proposed recommendation model has better stability with the changing of T.

Figure 8 shows the experimental result on precision rate. Similarly, the x-coordinate indicates the Task-tag matrix T, and y-coordinate means the precision rate. In experimental result, the average precision rate of the proposed recommendation is better than other two algorithms. From Figure 7, it can be seen that the proposed recommendation has the best performance on precision rate through compared with greedy algorithm and bipartite graph matching.

According to the experimental result on F-measure shown by Figure 9, we can see that the proposed recommendation also has the best performance on F-measure. In addition, F-measure index is the comprehensive measure index through considering both recall and precision. Therefore, we can infer that the proposed recommendation has the best performance through compared with greedy algorithm and bipartite graph matching algorithm.

Through the comparison shows that the proposed methods than the binary map matching method, greedy algorithm in the recall, F-measure index significantly, in terms of accuracy with high and low, because to make the task would be able to complete the task of recommended for workers as much as possible, including the potential of workers, so the accuracy index can be put lower in the recommended requirements. It can be seen that the method proposed in this paper has higher practical significance and application value.

6. Conclusion

Crowdsourcing is the prefect shown of group wisdom. It was applied in many fields as a new business model. In recent years, it has become the new hot research in computer science. The success key of crowdsourcing is to recommend task to appropriate worker. The recommendation method based on tag similar matrix is proposed in this paper. The method uses Word2vec technology to generate tag similar matrix and then computes the similarity of worker and task. According to the comparison experiments, it proves that the method is effective and feasible. The recommendation method can be extended to other fields with the different corpora.

Because the success key of crowdsourcing is the participate rate of workers, it has become a hot topic in crowdsourcing research, such as reputation mechanism, preference evolution, and privacy protection of workers. It will be the focus of future research to improve the accuracy of recommender systems by combining recommender systems with reputation, preference evolution and historical information.

Data Availability

The [Tianpeng] dataset used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work is supported by the National Natural Science Foundation of China under Grants No. 61472095, No. 61502410, and No. 61572418, the China Postdoctoral Science Foundation under Grant No. 2017M622691, the National Science Foundation (NSF) under Grants No. 1704287, No. 1252292, and No. 1741277, and the Natural Science Foundation of Sichuan Province under Grant No. 2018HH0075.

References

Y. Cun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015.
View at: Publisher Site | Google Scholar
Y. Wang, Z. Cai, G. Yin, Y. Gao, X. Tong, and G. Wu, “An incentive mechanism with privacy protection in mobile crowdsourcing systems,” Computer Networks, vol. 102, pp. 157–171, 2016.
View at: Publisher Site | Google Scholar
J. Howe, “The rise of crowdsourcing,” Wired Magazine, vol. 14, no. 6, pp. 1–4, 2006.
View at: Google Scholar
Z. Cai and X. Zheng, “A private and efficient mechanism for data uploading in smart cyber-physical systems,” IEEE Transactions on Network Science and Engineering, p. 1, 2018.
View at: Google Scholar
Y. Hu, Y. Wang, Y. Li, and X. Tong, “An incentive mechanism based on multi-attribute reverse auction in mobile crowdsourcing,” Sensors, vol. 18, no. 10, p. 3453, 2018.
View at: Publisher Site | Google Scholar
J. Li, Z. Cai, J. Wang, M. Han, and Y. Li, “Truthful incentive mechanisms for geographical position conflicting mobile crowdsensing systems,” IEEE Transactions on Computational Social Systems, vol. 5, no. 2, pp. 324–334, 2018.
View at: Publisher Site | Google Scholar
R. Katarya and O. P. Verma, “Recent developments in affective recommender systems,” Physica A: Statistical Mechanics and its Applications, vol. 461, pp. 182–190, 2016.
View at: Publisher Site | Google Scholar
K. W. Church, “Emerging trends: Word2Vec,” Natural Language Engineering, vol. 23, no. 1, pp. 155–162, 2017.
View at: Publisher Site | Google Scholar
X. Chen, P. N. Bennett, K. Collins-Thompson, and E. Horvitz, “Pairwise ranking aggregation in a crowdsourced setting,” in Proceedings of the Sixth ACM International Conference, pp. 193–202, Rome, Italy, Feburary 2013.
View at: Publisher Site | Google Scholar
J. Feng, G. Li, and J. Feng, “A survey on crowdsourcing,” Chinese Journal of Computers, vol. 38, pp. 1713–1726, 2015.
View at: Google Scholar
Z. Duan, W. Li, and Z. Cai, “Distributed auctions for task assignment and scheduling in mobile crowdsensing systems,” in Proceedings of the 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), pp. 635–644, GA, USA, June 2017.
View at: Publisher Site | Google Scholar
Y. Wang, Z. Cai, X. Tong, Y. Gao, and G. Yin, “Truthful incentive mechanism with location privacy-preserving for mobile crowdsourcing systems,” Computer Networks, vol. 135, pp. 32–43, 2018.
View at: Publisher Site | Google Scholar
Y. Wang, Y. Li, Z. Chi, and X. Tong, “The truthful evolution and incentive for large-scale mobile crowd sensing networks,” IEEE Access, vol. 6, pp. 51187–51199, 2018.
View at: Publisher Site | Google Scholar
J. L. Cai, M. Yan, and Y. Li, “Using crowdsourced data in location-based social networks to explore influence maximization,” in Proceedings of the 35th Annual IEEE International Conference on Computer Communications, 2016.
View at: Google Scholar
P. Resnick and H. R. Varian, “Recommender systems,” Communications of the ACM, vol. 40, no. 3, pp. 56–58, 1997.
View at: Google Scholar
W. Pan and Q. Yang, “Transfer learning in heterogeneous collaborative filtering domains,” Artificial Intelligence, vol. 197, pp. 39–55, 2013.
View at: Publisher Site | Google Scholar | MathSciNet
G. Chen and L. Chen, “Recommendation based on contextual opinions,” UMAP 2014, LNCS 8538, pp. 61–73, 2014.
View at: Google Scholar
L. Liu, J. Tang, J. Han, and S. Yang, “Learning influence from heterogeneous social networks,” Data Mining and Knowledge Discovery, vol. 25, no. 3, pp. 511–544, 2012.
View at: Publisher Site | Google Scholar
J. Tang, X. Hu, and H. Liu, “Social recommendation: a review,” Social Network Analysis and Mining, vol. 3, no. 4, pp. 1113–1133, 2013.
View at: Publisher Site | Google Scholar
L. Lü, M. Medo, C. H. Yeung, Y. Zhang, Z. Zhang, and T. Zhou, “Recommender systems,” Physics Reports, vol. 519, no. 1, pp. 1–49, 2012.
View at: Publisher Site | Google Scholar
Y. Wang, G. Yin, Z. Cai, Y. Dong, and H. Dong, “A trust-based probabilistic recommendation model for social networks,” Journal of Network and Computer Applications, vol. 55, pp. 59–67, 2015.
View at: Publisher Site | Google Scholar
L. Zhang, Z. Cai, and X. Wang, “FakeMask: a novel privacy preserving approach for smartphones,” IEEE Transactions on Network and Service Management, vol. 13, no. 2, pp. 335–348, 2016.
View at: Publisher Site | Google Scholar
V. Ambati, S. Vogel, and J. Carbonell, “Towards task recommendation in micro-task markets,” in Proceedings of the 25th AAAI Workshop in Human Computation, pp. 80–83, CA, USA, 2011.
View at: Google Scholar
M. C. Yuen, I. King, and K. S. Leung, “Probabilistic matrix factorization in task recommendation in crowdsourcing systems,” in Proceedings of the 19th International Conference on Neural Information Processing, pp. 516–525, Springer, Doha, Qatar, 2012.
View at: Google Scholar
D. Deng, C. Shahabi, and U. Demiryurek, “Maximizing the number of worker's self-selected tasks in spatial crowdsourcing,” in Proceedings of the 21st ACM SIGSPATIAL International Conference, pp. 1–10, FL, USA, November 2013.
View at: Publisher Site | Google Scholar
J. Turian, L. Ratinov, and Y. Bengio, “Word representations: a simple and general method for semi-supervised learning,” in Proceedings of the 8th Annual Meeting of the Association for Computational Linguistics, pp. 384–394, Uppsala, Sweden, July 2010.
View at: Google Scholar
Y. Yao, X. Li, X. Liu et al., “Sensing spatial distribution of urban land use by integrating points-of-interest and Google Word2Vec model,” International Journal of Geographical Information Science, vol. 31, no. 4, pp. 825–848, 2017.
View at: Publisher Site | Google Scholar
R. Wang, H. Zhao, B.-L. Lu, M. Utiyama, and E. Sumita, “Bilingual continuous-space language model growing for statistical machine translation,” IEEE Transactions on Audio, Speech and Language Processing, vol. 23, no. 7, pp. 1209–1220, 2015.
View at: Publisher Site | Google Scholar
L. Li, G. Liu, and Q. Liu, “Advancing iterative quantization hashing using isotropic prior,” in Proceedings of the International Conference on Multimedia Modelling, pp. 174–184, Springer International Publishing, 2016.
View at: Google Scholar

Copyright

Copyright © 2019 Qingxian Pan et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

1572

Downloads

1257

Citations

Wireless Communications and Mobile Computing

Algorithm Optimization for Wireless Mobile Applications of Smart Cities

Recommendation of Crowdsourcing Tasks Based on Word2vec Semantic Tags

Abstract

1. Introduction

2. Related Works

2.1. Crowdsourcing

2.2. Recommender Systems

3. Word2vec

4. The Tasks Recommendation Model and Realization Method Based on Semantic Tags

4.1. Basic Model Frame and Mathematical Computation Model

4.2. Basic Flow

5. Experiment and Simulation

5.1. The Experiments Conducted on Simulation Dataset

5.2. The Experiments Conducted on Tianpeng Dataset

6. Conclusion

Data Availability

Conflicts of Interest

Acknowledgments

References

Copyright