Table of Contents Author Guidelines Submit a Manuscript
Computational Intelligence and Neuroscience
Volume 2016, Article ID 1638936, 13 pages
http://dx.doi.org/10.1155/2016/1638936
Research Article

Improving Feature Representation Based on a Neural Network for Author Profiling in Social Media Texts

1Instituto Politécnico Nacional (IPN), Centro de Invetigación en Computación (CIC), Mexico City, Mexico
2Instituto Politécnico Nacional (IPN), Escuela Superior de Ingeniería Mecánica y Eléctrica Unidad Zacatenco (ESIME-Zacatenco), Mexico City, Mexico

Received 30 January 2016; Revised 19 July 2016; Accepted 14 August 2016

Academic Editor: Francesco Camastra

Copyright © 2016 Helena Gómez-Adorno et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. Q. Le and T. Mikolov, “Distributed representations of sentences and documents,” in Proceedings of the 31st International Conference on Machine Learning (ICML '14), pp. 1188–1196, Beijing, China, 2014.
  2. R. Socher, C. C.-Y. Lin, C. D. Manning, and A. Y. Ng, “Parsing natural scenes and natural language with recursive neural networks,” in Proceedings of the 28th International Conference on Machine Learning (ICML '11), pp. 129–136, Bellevue, Wash, USA, July 2011. View at Scopus
  3. I. Brigadir, D. Greene, and P. Cunningham, “Adaptive representations for tracking breaking news on Twitter,” http://arxiv.org/abs/1403.2923.
  4. C. Yan, F. Zhang, and L. Huang, “DRWS: a model for learning distributed representations for words and sentences,” in PRICAI 2014: Trends in Artificial Intelligence, D.-N. Pham and S.-B. Park, Eds., vol. 8862 of Lecture Notes in Computer Science, pp. 196–207, Springer, 2014. View at Publisher · View at Google Scholar
  5. V. K. Rangarajan Sridhar, “Unsupervised text normalization using distributed representations of words and phrases,” in Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing (NAACL '15), pp. 8–16, Association for Computational Linguistics, Denver, Colo, USA, June 2015. View at Publisher · View at Google Scholar
  6. F. Jiang, Y. Liu, H. Luan, M. Zhang, and S. Ma, “Microblog sentiment analysis with emoticon space model,” in Social Media Processing, pp. 76–87, Springer, Berlin, Germany, 2014. View at Google Scholar
  7. D. Pinto, D. Vilariño, Y. Alemán, H. Gómez-Adorno, N. Loya, and H. Jiménez-Salazar, “The soundex phonetic algorithm revisited for SMS text representation,” in Text, Speech and Dialogue: 15th International Conference, TSD 2012, Brno, Czech Republic, September 3–7, 2012. Proceedings, vol. 7499 of Lecture Notes in Computer Science, pp. 47–55, Springer, Berlin, Germany, 2012. View at Publisher · View at Google Scholar
  8. J. Atkinson, A. Figueroa, and C. Pérez, “A semantically-based lattice approach for assessing patterns in text mining tasks,” Computación y Sistemas, vol. 17, no. 4, pp. 467–476, 2013. View at Google Scholar · View at Scopus
  9. D. Das and S. Bandyopadhyay, “Document level emotion tagging: machine learning and resource based approach,” Computación y Sistemas, vol. 15, pp. 221–234, 2011. View at Google Scholar
  10. F. Rangel, F. Celli, P. Rosso, M. Potthast, B. Stein, and W. Daelemans, “Overview of the 3rd author profiling task at PAN 2015,” in Proceedings of the CLEF Labs and Workshops, vol. 1391 of Notebook Papers, CEUR Workshop Proceedings, Toulouse, France, September 2015.
  11. T. Baldwin, “Social media: friend or foe of natural language processing?” in Proceedings of the 26th Pacific Asia Conference on Language, Information and Computation (PACLIC '12), pp. 58–59, November 2012. View at Scopus
  12. E. Clark and K. Araki, “Text normalization in social media: progress, problems and applications for a pre-processing system of casual english,” Procedia—Social and Behavioral Sciences, vol. 27, pp. 2–11, 2011. View at Publisher · View at Google Scholar
  13. I. Hemalatha, G. P. S. Varma, and A. Govardhan, “Preprocessing the informal text for efficient sentiment analysis,” International Journal of Emerging Trends & Technology in Computer Science, vol. 1, no. 2, pp. 58–61, 2012. View at Google Scholar
  14. E. Haddi, X. Liu, and Y. Shi, “The role of text pre-processing in sentiment analysis,” Procedia Computer Science, vol. 17, pp. 26–32, 2013. View at Google Scholar
  15. S. Roy, S. Dhar, S. Bhattacharjee, and A. Das, “A lexicon based algorithm for noisy text normalization as pre-processing for sentiment analysis,” International Journal of Research in Engineering and Technology, vol. 2, no. 2, pp. 67–70, 2013. View at Google Scholar
  16. Z. Ben-Ami, R. Feldman, and B. Rosenfeld, “Using multi-view learning to improve detection of investor sentiments on twitter,” Computacion y Sistemas, vol. 18, no. 3, pp. 477–490, 2014. View at Publisher · View at Google Scholar · View at Scopus
  17. G. Sidorov, M. Ibarra Romero, I. Markov, R. Guzman-Cabrera, L. Chanona-Hernández, and F. Velásquez, “Detección automática de similitud entre programas del lenguaje de programación Karel basada en técnicas de procesamiento de lenguaje natural,” Computación y Sistemas, vol. 20, no. 2, pp. 279–288, 2016 (Spanish). View at Google Scholar
  18. F. Rangel, P. Rosso, M. Koppel, E. Stamatatos, and G. Inches, “Overview of the author profiling task at PAN 2013,” in Proceedings of the Cross Language Evaluation Forum Conference (CLEF '13), vol. 1179, Notebook Papers, September 2013. View at Scopus
  19. F. Rangel, P. Rosso, I. Chugur et al., “Overview of the 2nd author profiling task at PAN 2014,” in Proceedings of the CLEF 2014 Labs and Workshops, vol. 1180 of Notebook Papers, pp. 898–927, Sheffield, UK, 2014.
  20. S. Nowson, J. Perez, C. Brun, S. Mirkin, and C. Roux, “XRCE personal language analytics engine for multilingual author profiling,” in Proceedings of the Working Notes of CLEF 2015—Conference and Labs of the Evaluation Forum, vol. 1391, CEUR, Toulouse, France, 2015.
  21. S. Ait-Mokhtar, J. Chanod, and C. Roux, “A multi-input dependency parser,” in Proceedings of the in 7th International Workshop on Parsing Technologies (IWPT '01), pp. 201–204, Beijing, China, October 2001.
  22. P. Przybyla and P. Teisseyre, “What do your look-alikes say about you? Exploiting strong and weak similarities for author profiling,” in Proceedings of the Conference and Labs of the Evaluation Forum, vol. 1391 of Working Notes of CLEF, CEUR, 2015.
  23. S. Maharjan, P. Shrestha, and T. Solorio, “A simple approach to author profiling in MapReduce,” in Proceedings of the Working Notes of CLEF 2014—Conference and Labs of the Evaluation Forum, vol. 1180, pp. 1121–1128, CEUR, Sheffield, UK, September 2014.
  24. Y. Alemán, N. Loya, D. Vilari, and D. Pinto, “Two methodologies applied to the author profiling task,” in Proceedings of the Working Notes of CLEF 2013-Conference and Labs of the Evaluation forum, vol. 1179, CEUR, 2013.
  25. A. Bartoli, A. D. Lorenzo, A. Laderchi, E. Medvet, and F. Tarlao, “An author profiling approach based on language-dependent content and stylometric features,” in Proceedings of the Working Notes of CLEF 2015—Conference and Labs of the Evaluation Forum, vol. 1391, CEUR, 2015.
  26. A. P. Garibay, A. T. Camacho-González, R. A. Fierro-Villaneda, I. Hernandez-Farias, D. Buscaldi, and I. V. M. Ruíz, “A random forest approach for authorship profiling,” in Proceedings of the Conference and Labs of the Evaluation Forum, vol. 1391 of Working Notes of CLEF, CEUR, 2015.
  27. J. Marquardt, G. Farnadi, G. Vasudevan et al., “Age and gender identification in social media,” in Proceedings of the Working Notes of CLEF 2014-Conference and Labs of the Evaluation forum, vol. 1180, pp. 1129–1136, CEUR, 2014.
  28. D. I. H. Farias, R. Guzman-Cabrera, A. Reyes, and M. A. Rocha, “Semantic-based features for author profiling identification,” in Proceedings of the Working Notes of CLEF 2013—Conference and Labs of the Evaluation Forum, vol. 1179, CEUR, 2013.
  29. L. Flekova and I. Gurevych, “Can we hide in the web? Large scale simultaneous age and gender author profiling in social media,” in Proceedings of the Working Notes of CLEF 2013—Conference and Labs of the Evaluation Forum, vol. 1179, CEUR, 2013.
  30. S. Goswami, S. Sarkar, and M. Rustagi, “Stylometric analysis of bloggers' age and gender,” in Proceedings of the 3rd International Conference on Weblogs and Social Media (ICWSM '09), San Jose, Calif, USA, May 2009.
  31. A. A. C. Diaz and J. M. G. Hidalgo, “Experiments with SMS translation and stochastic gradient descent in Spanish text author profiling,” in Proceedings of the Working Notes of CLEF 2013—Conference and Labs of the Evaluation Forum, vol. 1179, CEUR, 2013.
  32. R. Socher, A. Perelygin, J. Wu et al., “Recursive deep models for semantic compositionality over a sentiment treebank,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP '13), pp. 1631–1642, Association for Computational Linguistics, 2013.
  33. T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” Computing Research Repository, https://arxiv.org/abs/1301.3781.
  34. F. Rangel, P. Rosso, B. Verhoeven, W. Daelemans, M. Potthast, and B. Stein, “Overview of the 4th author profiling task at PAN 2016: cross-genre evaluations,” in Proceedings of the Working Notes Papers of the CLEF 2016 Evaluation Labs, CEUR Workshop Proceedings, CLEF and CEUR-WS.org, Évora, Portugal, 2016.
  35. J. Posadas-Durán, H. Gómez-Adorno, I. Markov et al., “Syntactic n-grams as features for the author profiling task,” in Proceedings of the Working Notes of CLEF 2015—Conference and Labs of the Evaluation Forum, Toulouse, France, 2015.
  36. G. Sidorov, H. Gómez-Adorno, I. Markov, D. Pinto, and N. Loya, “Computing text similarity using tree edit distance,” in Proceedings of the Annual Conference of the North American Fuzzy Information Processing Society (NAFIPS) held jointly with 5th World Conference on Soft Computing (WConSC '15), pp. 1–4, IEEE, Redmond, Wash, USA, August 2015. View at Publisher · View at Google Scholar
  37. R. Snow, B. O'Connor, D. Jurafsky, and A. Y. Ng, “Cheap and fast—but is it good? Evaluating non-expert annotations for natural language tasks,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP '08), pp. 254–263, Association for Computational Linguistics, 2008.
  38. V. Camacho-Vázquez, G. Sidorov, and S. N. Galicia-Haro, “Construcción de un corpus marcado con emociones para el análisis de sentimientos en Twitter en espanol,” Escritos: Revista del Centro de Ciencias del Lenguaje, In press.
  39. L. Padró and E. Stanilovsky, “Freeling 3.0: towards wider multilinguality,” in Proceedings of the Language Resources and Evaluation Conference (LREC '12), ELRA, 2012.
  40. T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” in Proceedings of the 27th Annual Conference on Neural Information Processing Systems, vol. 26 of Advances in Neural Information Processing Systems, pp. 3111–3119, Lake Tahoe, Nev, USA, December 2013. View at Scopus
  41. S. Argamon, M. Koppel, J. W. Pennebaker, and J. Schler, “Automatically profiling the author of an anonymous text,” Communications of the ACM, vol. 52, no. 2, pp. 119–123, 2009. View at Publisher · View at Google Scholar · View at Scopus
  42. L. Buitinck, G. Louppe, M. Blondel et al., “API design for machine learning software: experiences from the scikit-learn project,” in Proceedings of the ECML PKDD Workshop: Languages for Data Mining and Machine Learning, pp. 108–122, Prague, Czech Republic, September 2013.
  43. I. Markov, H. Gómez-Adorno, G. Sidorov, and A. Gelbukh, “Adapting cross-genre author profiling to language and corpus,” in Proceedings of the Working Notes of CLEF 2016-Conference and Labs of the Evaluation forum, vol. 1609 of CEUR Workshop Proceedings, pp. 947–955, 2016, CLEF and CEUR-WS.org.
  44. F. Wilcoxon, “Individual comparisons by ranking methods,” Biometrics Bulletin, vol. 1, no. 6, pp. 80–83, 1945. View at Publisher · View at Google Scholar
  45. J. Demšar, “Statistical comparisons of classifiers over multiple data sets,” Journal of Machine Learning Research, vol. 7, pp. 1–30, 2006. View at Google Scholar · View at MathSciNet