- About this Journal ·
- Abstracting and Indexing ·
- Aims and Scope ·
- Annual Issues ·
- Article Processing Charges ·
- Articles in Press ·
- Author Guidelines ·
- Bibliographic Information ·
- Citations to this Journal ·
- Contact Information ·
- Editorial Board ·
- Editorial Workflow ·
- Free eTOC Alerts ·
- Publication Ethics ·
- Reviewers Acknowledgment ·
- Submit a Manuscript ·
- Subscription Information ·
- Table of Contents
BioMed Research International
Volume 2014 (2014), Article ID 240403, 6 pages
Evaluating Word Representation Features in Biomedical Named Entity Recognition Tasks
1Department of Computer Science, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China
2School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
3Department of Medical Informatics, Second Military Medical University, Shanghai 200433, China
Received 23 November 2013; Revised 25 January 2014; Accepted 3 February 2014; Published 6 March 2014
Academic Editor: Bing Zhang
Copyright © 2014 Buzhou Tang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
- U. Leser and J. Hakenberg, “What makes a gene name? Named entity recognition in the biomedical literature,” Briefings in Bioinformatics, vol. 6, no. 4, pp. 357–369, 2005.
- J.-D. Kim, T. Ohta, Y. Tsuruoka, Y. Tateisi, and N. Collier, “Introduction to the bio-entity recognition task at JNLPBA,” in Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications, pp. 70–75, Stroudsburg, Pa, USA, 2004.
- L. Smith, L. K. Tanabe, R. Ando et al., “Overview of BioCreative II gene mention recognition,” Genome Biology, vol. 9, 2, article S2, 2008.
- R. Gaizauskas, G. Demetriou, and K. Humphreys, “Term Recognition and Classification in Biological Science Journal Articles,” in Proceedings of the Computional Terminology for Medical and Biological Applications Workshop of the 2nd International Conference on NLP, pp. 37–44, 2000.
- K. Fukuda, A. Tamura, T. Tsunoda, and T. Takagi, “Toward information extraction: identifying protein names from biological papers,” Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing, pp. 707–718, 1998.
- D. Proux, F. Rechenmann, L. Julliard, V. V. Pillet, and B. Jacq, “Detecting gene symbols and names in biological texts: a first step toward pertinent information extraction,” Genome Informatics Work. Genome Informatics, vol. 9, pp. 72–80, 1998.
- C. Nobata, N. Collier, and J. Tsujii, “Automatic term identification and classification in biology texts,” in Proceedings of the 5th NLPRS, pp. 369–374, 1999.
- L. R. Rabiner, “A tutorial on hidden markov models and selected applications in speech recognition,” Proceedings of the IEEE, vol. 77, no. 2, pp. 257–286, 1989.
- S. Zhao, “Named entity recognition in biomedical texts using an HMM model,” in Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications, pp. 84–87, Stroudsburg, Pa, USA, 2004.
- A. Mccallum, D. Freitag, and F. Pereira, “Maximum entropy markov models for information extraction and segmentation,” in Proceedings of the 17th International Conference on Machine Learning, pp. 591–598, 2000.
- J. Finkel, S. Dingare, H. Nguyen, M. Nissim, C. Manning, and G. Sinclair, “Exploiting context for biomedical entity recognition: from syntax to the web,” in Proceedings of the Joint Workshop on Natural Language Processing in Biomedicine and its applications (JNLPBA '04), 2004.
- J. D. Lafferty, A. McCallum, and F. C. N. Pereira, “Conditional random fields: probabilistic models for segmenting and labeling sequence data,” in Proceedings of the 18th International Conference on Machine Learning, pp. 282–289, San Francisco, Calif, USA, 2001.
- B. Settles, “Biomedical named entity recognition using conditional random fields and rich feature sets,” in Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications, pp. 104–107, Stroudsburg, Pa, USA, 2004.
- C. J. C. Burges, “A tutorial on support vector machines for pattern recognition,” Data Mining and Knowledge Discovery, vol. 2, no. 2, pp. 121–167, 1998.
- L. Si, T. Kanungo, and X. Huang, “Boosting performance of bio-entity recognition by combining results from multiple systems,” in Proceedings of the 5th International Workshop on Bioinformatics, pp. 76–83, New York, NY, USA, 2005.
- N. Ponomareva, P. Rosso, F. Pla, and A. Molina, Conditional Random Fields Vs. Hidden Markov Models in a Biomedical Named Entity Recognition Task, 2007.
- F. Liu, Y. Chen, and B. Manderick, “Named entity recognition in biomedical literature: a comparison of support vector machines and conditional random fields,” in Enterprise Information Systems, J. Filipe, J. Cordeiro, and J. Cardoso, Eds., pp. 137–147, Springer, Berlin, Germany, 2009.
- H. Liu, Z.-Z. Hu, J. Zhang, and C. Wu, “BioThesaurus: a web-based thesaurus of protein and gene names,” Bioinformatics, vol. 22, no. 1, pp. 103–105, 2006.
- O. Bodenreider, “The Unified Medical Language System (UMLS): integrating biomedical terminology,” Nucleic Acids Research, vol. 32, no. supplement 1, pp. D267–D270, 2004.
- J. Turian, L. Ratinov, and Y. Bengio, “Word representations: a simple and general method for semi-supervised learning,” in Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL '10), pp. 384–394, Stroudsburg, Pa, USA, July 2010.
- P. P. Kuksa and Y. Qi, “Semi-supervised bio-named entity recognition with word-codebook learning,” in Proceedings of the SIAM International Conference on Data Mining (SDM '10), pp. 25–36, Columbus, Ohio, USA, April 2010.
- K. Lund and C. Burgess, “Producing high-dimensional semantic spaces from lexical co-occurrence,” Behavior Research Methods, Instruments, and Computers, vol. 28, no. 2, pp. 203–208, 1996.
- T. Hofmann, “Probabilistic latent semantic analysis,” in Proceedings of the Uncertainty in Artificial Intelligence (UAI ’99), pp. 289–296, 1999.
- D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichlet allocation,” Journal of Machine Learning Research, vol. 3, no. 4-5, pp. 993–1022, 2003.
- P. Kanerva, J. Kristoferson, and A. Holst, “Random indexing of text samples for latent semantic analysis,” in Proceedings of the 22nd Annual Conference of the Cognitive Science Society, pp. 103–106, 2000.
- D. R. Hardoon, S. Szedmak, O. Szedmak, and J. Shawe-taylor, Canonical Correlation Analysis; An Overview with Application to Learning Methods, 2007.
- P. F. Brown, P. V. deSouza, R. L. Mercer, V. J. D. Pietra, and J. C. Lai, “Class-based n-gram models of natural language,” Computational Linguistics, vol. 18, pp. 467–479, 1992.
- Y. Bengio, R. Ducharme, P. Vincent, and C. Jauvin, “A neural probabilistic language model,” Journal of Machine Learning Research, vol. 3, no. 6, pp. 1137–1155, 2003.
- Y. Bengio, H. Schwenk, J. -S. Senécal, F. Morin, and J.-L. Gauvain, “Neural probabilistic language models,” in Innovations in Machine Learning, P. D. E. Holmes and P. L. C. Jain, Eds., pp. 137–186, Springer, Berlin, Germany, 2006.
- T. Mikolov, M. Karafiát, L. Burget, C. Jan, and S. Khudanpur, “Recurrent neural network based language model,” in Proceedings of the 11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All (INTERSPEECH '10), pp. 1045–1048, September 2010.
- R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa, “Natural language processing (almost) from scratch,” Journal of Machine Learning Research, vol. 12, pp. 2493–2537, 2011.
- T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” CoRR, vol. abs/1301.3781, 2013.
- B. Tang, H. Cao, Y. Wu, M. Jiang, and H. Xu, “Clinical entity recognition using structural support vector machines with rich features,” in Proceedings of the ACM 6th International Workshop on Data and Text Mining in Biomedical Informatics, pp. 13–20, New York, NY, USA, 2012.
- B. Tang, H. Cao, Y. Wu, M. Jiang, and H. Xu, “Recognizing clinical entities in hospital discharge summaries using Structural Support Vector Machines with word representation features,” BMC Medical Informatics and Decision Making, vol. 13, no. supplement 1, p. S1, 2013.
- B. Tang, Y. Wu, M. Jiang, Y. Chen, J. C. Denny, and H. Xu, “A hybrid system for temporal information extraction from clinical text,” Journal of the American Medical Informatics Association, 2013.
- R. K. Ando, “BioCreative II gene mention tagging system at IBM watson,” in Proceedings of the 2nd Biocreative Challenge Evaluation Workshop, pp. 101–104, Madrid, Spain, 2007.
- K. Ganchev, K. Crammer, F. Pereira, et al., “Penn/UMass/CHOP Biocreative II systems,” in Proceedings of the 2nd Biocreative Challenge Evaluation Workshop, pp. 119–124, 2007.