|
Corpus | Text mining evaluation task | Brief introduction |
|
JNLPBA (Joint Workshop on NLP in Biomedicine and Its Applications) [18] | Gene/protein concept extraction | The corpus consists of 2,000 PubMed abstracts as training data and 404 PubMed abstracts as test data. |
|
BioCreAtivE 2004 Task 1A dataset [19] | Gene/protein concept extraction | The corpus consists of 15,000 PubMed sentences as training data and 5,000 PubMed sentences as test data. |
|
BioCreAtivE 2 Gene Mention (GM) dataset [20] | Gene/protein concept extraction | The corpus consists of 15,000 PubMed sentences as training data and 5,000 PubMed sentences as test data. |
|
AIMED [21] | Protein-protein interaction | The corpus consists of 225 PubMed abstracts that contain 1,987 sentences with 4,075 protein mentions. |
|
HPRD50 (Human Protein Reference Database) [22] | Protein-protein interaction | The corpus consists of sentences with protein-protein interaction from 50 PubMed abstracts. |
|
BioInfer (Bio Information Extraction Resource) [23] | Protein, gene, and RNA relationships | The corpus consists of 1100 sentences annotated with concept names, relationships, and syntactic dependencies. |
|
IEPA (Interaction Extraction Performance Assessment) [24] | Protein-protein interaction | The corpus consists of more than 200 PubMed sentences annotated with protein-protein interaction. |
|
BioCreAtivE 2.5 Elsevier Corpus [25] | Protein-protein interaction | The corpus consists of 61 PubMed articles as training data and 62 PubMed articles as test data. |
|
BC4GO Corpus [26] | Gene ontology | The corpus consists of 1356 distinct GO terms from 200 PubMed articles. |
|
GREC Corpus [27] | Gene regulation and gene expression events | The corpus consists of 240 PubMed abstracts with annotations on gene regulation and gene expression events. |
|
GETM [28] | Gene expression events | The corpus consists of 150 PubMed abstracts with annotation for gene expression events. |
|
AnEM [29] | Tissue, cell, developing anatomical structure, cellular component | The corpus consists of 500 PubMed sentences with annotations on variety of biomedical concepts. |
|
CellFinder Corpus [30] | Anatomical parts, cell lines, cell types, species, and cell components | The corpus consists of annotations from 10 full-text PubMed articles. |
|