Review Article

Biomedical Relation Extraction: From Binary to Complex

Table 4

Available annotated corpora for binary relation extraction in the biomedical domain.

Corpus name General descriptionURL

GENIA 2,000 MEDLINE abstracts with more than 400,000 words and almost 100,000 annotations for biological terms. http://www.nactem.ac.uk/genia/genia-corpus

LLL05 80 sentences in the training set including 106 examples of genic interactions without coreferences and 165 examples of interactions with coreferences.http://genome.jouy.inra.fr/texte/LLLchallenge/

BioCreAtIvE II Training data is derived from the content of the IntAct and MINT databases. The test set collection consists of a collection of PubMed article abstracts. http://www.biocreative.org

AIMed 225 MEDLINE abstracts (200 abstracts describing interactions between human proteins and around 1000 tagged interactions).ftp://ftp.cs.utexas.edu/pub/mooney/bio-data

BioInfer1100 sentences annotated with protein names, their relationships, and PPI annotations. http://mars.cs.utu.fi/BioInfer/

HPRD50 50 abstracts referenced by the Human Protein Reference Database including 266 relation instances.http://www.hprd.org