About this Journal Submit a Manuscript Table of Contents
BioMed Research International
Volume 2014 (2014), Article ID 240403, 6 pages
http://dx.doi.org/10.1155/2014/240403
Research Article

Evaluating Word Representation Features in Biomedical Named Entity Recognition Tasks

1Department of Computer Science, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China
2School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
3Department of Medical Informatics, Second Military Medical University, Shanghai 200433, China

Received 23 November 2013; Revised 25 January 2014; Accepted 3 February 2014; Published 6 March 2014

Academic Editor: Bing Zhang

Copyright © 2014 Buzhou Tang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Biomedical Named Entity Recognition (BNER), which extracts important entities such as genes and proteins, is a crucial step of natural language processing in the biomedical domain. Various machine learning-based approaches have been applied to BNER tasks and showed good performance. In this paper, we systematically investigated three different types of word representation (WR) features for BNER, including clustering-based representation, distributional representation, and word embeddings. We selected one algorithm from each of the three types of WR features and applied them to the JNLPBA and BioCreAtIvE II BNER tasks. Our results showed that all the three WR algorithms were beneficial to machine learning-based BNER systems. Moreover, combining these different types of WR features further improved BNER performance, indicating that they are complementary to each other. By combining all the three types of WR features, the improvements in -measure on the BioCreAtIvE II GM and JNLPBA corpora were 3.75% and 1.39%, respectively, when compared with the systems using baseline features. To the best of our knowledge, this is the first study to systematically evaluate the effect of three different types of WR features for BNER tasks.