Critical Assessment of Information Extraction Systems in Biology

Blaschke, Christian; Hirschman, Lynette; Yeh, Alexander; Valencia, Alfonso

doi:https://doi.org/10.1002/cfg.337

International Journal of Genomics

On this page

Abstract Copyright Related Articles

Meeting review | Open Access

Volume 4 | Article ID 301037 | https://doi.org/10.1002/cfg.337

Critical Assessment of Information Extraction Systems in Biology

Christian Blaschke,¹Lynette Hirschman,²Alexander Yeh,²and Alfonso Valencia¹

Received18 Sept 2003

Revised24 Sept 2003

Accepted25 Sept 2003

Abstract

An increasing number of groups are now working in the area of text mining, focusing on a wide range of problems and applying both statistical and linguistic approaches. However, it is not possible to compare the different approaches, because there are no common standards or evaluation criteria; in addition, the various groups are addressing different problems, often using private datasets. As a result, it is impossible to determine how well the existing systems perform, and particularly what performance level can be expected in real applications. This is similar to the situation in text processing in the late 1980s, prior to the Message Understanding Conferences (MUCs). With the introduction of a common evaluation and standardized evaluation metrics as part of these conferences, it became possible to compare approaches, to identify those techniques that did or did not work and to make progress. This progress has resulted in a common pipeline of processes and a set of shared tools available to the general research community. The field of biology is ripe for a similar experiment. Inspired by this example, the BioLINK group (Biological Literature, Information and Knowledge [1]) is organizing a CASP-like evaluation for the text data-mining community applied to biology. The two main tasks specifically address two major bottlenecks for text mining in biology: (1) the correct detection of gene and protein names in text; and (2) the extraction of functional information related to proteins based on the GO classification system. For further information and participation details, see http://www.pdg.cnb.uam.es/BioLink/BioCreative.eval.html

Copyright

Copyright © 2003 Hindawi Publishing Corporation. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation Order printed copies

Views

391

Downloads

1027

Citations