About this Journal Submit a Manuscript Table of Contents
BioMed Research International
Volume 2014 (2014), Article ID 240403, 6 pages
http://dx.doi.org/10.1155/2014/240403
Research Article

Evaluating Word Representation Features in Biomedical Named Entity Recognition Tasks

1Department of Computer Science, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China
2School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
3Department of Medical Informatics, Second Military Medical University, Shanghai 200433, China

Received 23 November 2013; Revised 25 January 2014; Accepted 3 February 2014; Published 6 March 2014

Academic Editor: Bing Zhang

Copyright © 2014 Buzhou Tang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. U. Leser and J. Hakenberg, “What makes a gene name? Named entity recognition in the biomedical literature,” Briefings in Bioinformatics, vol. 6, no. 4, pp. 357–369, 2005. View at Publisher · View at Google Scholar · View at Scopus
  2. J.-D. Kim, T. Ohta, Y. Tsuruoka, Y. Tateisi, and N. Collier, “Introduction to the bio-entity recognition task at JNLPBA,” in Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications, pp. 70–75, Stroudsburg, Pa, USA, 2004.
  3. L. Smith, L. K. Tanabe, R. Ando et al., “Overview of BioCreative II gene mention recognition,” Genome Biology, vol. 9, 2, article S2, 2008. View at Publisher · View at Google Scholar · View at Scopus
  4. R. Gaizauskas, G. Demetriou, and K. Humphreys, “Term Recognition and Classification in Biological Science Journal Articles,” in Proceedings of the Computional Terminology for Medical and Biological Applications Workshop of the 2nd International Conference on NLP, pp. 37–44, 2000.
  5. K. Fukuda, A. Tamura, T. Tsunoda, and T. Takagi, “Toward information extraction: identifying protein names from biological papers,” Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing, pp. 707–718, 1998. View at Scopus
  6. D. Proux, F. Rechenmann, L. Julliard, V. V. Pillet, and B. Jacq, “Detecting gene symbols and names in biological texts: a first step toward pertinent information extraction,” Genome Informatics Work. Genome Informatics, vol. 9, pp. 72–80, 1998.
  7. C. Nobata, N. Collier, and J. Tsujii, “Automatic term identification and classification in biology texts,” in Proceedings of the 5th NLPRS, pp. 369–374, 1999.
  8. L. R. Rabiner, “A tutorial on hidden markov models and selected applications in speech recognition,” Proceedings of the IEEE, vol. 77, no. 2, pp. 257–286, 1989. View at Publisher · View at Google Scholar · View at Scopus
  9. S. Zhao, “Named entity recognition in biomedical texts using an HMM model,” in Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications, pp. 84–87, Stroudsburg, Pa, USA, 2004.
  10. A. Mccallum, D. Freitag, and F. Pereira, “Maximum entropy markov models for information extraction and segmentation,” in Proceedings of the 17th International Conference on Machine Learning, pp. 591–598, 2000.
  11. J. Finkel, S. Dingare, H. Nguyen, M. Nissim, C. Manning, and G. Sinclair, “Exploiting context for biomedical entity recognition: from syntax to the web,” in Proceedings of the Joint Workshop on Natural Language Processing in Biomedicine and its applications (JNLPBA '04), 2004.
  12. J. D. Lafferty, A. McCallum, and F. C. N. Pereira, “Conditional random fields: probabilistic models for segmenting and labeling sequence data,” in Proceedings of the 18th International Conference on Machine Learning, pp. 282–289, San Francisco, Calif, USA, 2001.
  13. B. Settles, “Biomedical named entity recognition using conditional random fields and rich feature sets,” in Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications, pp. 104–107, Stroudsburg, Pa, USA, 2004.
  14. C. J. C. Burges, “A tutorial on support vector machines for pattern recognition,” Data Mining and Knowledge Discovery, vol. 2, no. 2, pp. 121–167, 1998. View at Scopus
  15. L. Si, T. Kanungo, and X. Huang, “Boosting performance of bio-entity recognition by combining results from multiple systems,” in Proceedings of the 5th International Workshop on Bioinformatics, pp. 76–83, New York, NY, USA, 2005.
  16. N. Ponomareva, P. Rosso, F. Pla, and A. Molina, Conditional Random Fields Vs. Hidden Markov Models in a Biomedical Named Entity Recognition Task, 2007.
  17. F. Liu, Y. Chen, and B. Manderick, “Named entity recognition in biomedical literature: a comparison of support vector machines and conditional random fields,” in Enterprise Information Systems, J. Filipe, J. Cordeiro, and J. Cardoso, Eds., pp. 137–147, Springer, Berlin, Germany, 2009.
  18. H. Liu, Z.-Z. Hu, J. Zhang, and C. Wu, “BioThesaurus: a web-based thesaurus of protein and gene names,” Bioinformatics, vol. 22, no. 1, pp. 103–105, 2006. View at Publisher · View at Google Scholar · View at Scopus
  19. O. Bodenreider, “The Unified Medical Language System (UMLS): integrating biomedical terminology,” Nucleic Acids Research, vol. 32, no. supplement 1, pp. D267–D270, 2004. View at Scopus
  20. J. Turian, L. Ratinov, and Y. Bengio, “Word representations: a simple and general method for semi-supervised learning,” in Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL '10), pp. 384–394, Stroudsburg, Pa, USA, July 2010. View at Scopus
  21. P. P. Kuksa and Y. Qi, “Semi-supervised bio-named entity recognition with word-codebook learning,” in Proceedings of the SIAM International Conference on Data Mining (SDM '10), pp. 25–36, Columbus, Ohio, USA, April 2010.
  22. K. Lund and C. Burgess, “Producing high-dimensional semantic spaces from lexical co-occurrence,” Behavior Research Methods, Instruments, and Computers, vol. 28, no. 2, pp. 203–208, 1996. View at Scopus
  23. T. Hofmann, “Probabilistic latent semantic analysis,” in Proceedings of the Uncertainty in Artificial Intelligence (UAI ’99), pp. 289–296, 1999.
  24. D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichlet allocation,” Journal of Machine Learning Research, vol. 3, no. 4-5, pp. 993–1022, 2003. View at Scopus
  25. P. Kanerva, J. Kristoferson, and A. Holst, “Random indexing of text samples for latent semantic analysis,” in Proceedings of the 22nd Annual Conference of the Cognitive Science Society, pp. 103–106, 2000.
  26. D. R. Hardoon, S. Szedmak, O. Szedmak, and J. Shawe-taylor, Canonical Correlation Analysis; An Overview with Application to Learning Methods, 2007.
  27. P. F. Brown, P. V. deSouza, R. L. Mercer, V. J. D. Pietra, and J. C. Lai, “Class-based n-gram models of natural language,” Computational Linguistics, vol. 18, pp. 467–479, 1992.
  28. Y. Bengio, R. Ducharme, P. Vincent, and C. Jauvin, “A neural probabilistic language model,” Journal of Machine Learning Research, vol. 3, no. 6, pp. 1137–1155, 2003. View at Publisher · View at Google Scholar · View at Scopus
  29. Y. Bengio, H. Schwenk, J. -S. Senécal, F. Morin, and J.-L. Gauvain, “Neural probabilistic language models,” in Innovations in Machine Learning, P. D. E. Holmes and P. L. C. Jain, Eds., pp. 137–186, Springer, Berlin, Germany, 2006.
  30. T. Mikolov, M. Karafiát, L. Burget, C. Jan, and S. Khudanpur, “Recurrent neural network based language model,” in Proceedings of the 11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All (INTERSPEECH '10), pp. 1045–1048, September 2010. View at Scopus
  31. R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa, “Natural language processing (almost) from scratch,” Journal of Machine Learning Research, vol. 12, pp. 2493–2537, 2011. View at Scopus
  32. T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” CoRR, vol. abs/1301.3781, 2013.
  33. B. Tang, H. Cao, Y. Wu, M. Jiang, and H. Xu, “Clinical entity recognition using structural support vector machines with rich features,” in Proceedings of the ACM 6th International Workshop on Data and Text Mining in Biomedical Informatics, pp. 13–20, New York, NY, USA, 2012.
  34. B. Tang, H. Cao, Y. Wu, M. Jiang, and H. Xu, “Recognizing clinical entities in hospital discharge summaries using Structural Support Vector Machines with word representation features,” BMC Medical Informatics and Decision Making, vol. 13, no. supplement 1, p. S1, 2013.
  35. B. Tang, Y. Wu, M. Jiang, Y. Chen, J. C. Denny, and H. Xu, “A hybrid system for temporal information extraction from clinical text,” Journal of the American Medical Informatics Association, 2013. View at Publisher · View at Google Scholar
  36. R. K. Ando, “BioCreative II gene mention tagging system at IBM watson,” in Proceedings of the 2nd Biocreative Challenge Evaluation Workshop, pp. 101–104, Madrid, Spain, 2007.
  37. K. Ganchev, K. Crammer, F. Pereira, et al., “Penn/UMass/CHOP Biocreative II systems,” in Proceedings of the 2nd Biocreative Challenge Evaluation Workshop, pp. 119–124, 2007.