Research Article

Biomarker Identification for Prostate Cancer and Lymph Node Metastasis from Microarray Data and Protein Interaction Network Using Gene Prioritization Method

Table 6

Data and Text Mining Gene Prioritization Methods.

Method Brief description Reported results

Gene seekerGathers gene expression and phenotypic data from human and mouse from nine databases. Relies on the assumption that disease genes are likely to be expressed in tissues affected by that disease [6] Offers a web-service to find disease-related genes to the input genetic localisation and phenotypic/expression terms

eVOC Co-occurrence of disease name on PubMed Abstracts. It selects the disease genes according to expression profiles [5] It was tested on 417 candidate genes, using 17 known disease genes. It successfully retrieved 15 of the 17 known disease genes and shrunk the candidate set by 63.3%

DPG Basic Sequence Information [8]. They concluded that disease proteins tend to be long, conserved, phylogenetically extended, and without close paralogues.

Prospectr Basic Sequence Information [10]. It achieved an enrichment of list of disease genes twofold 77% of the time, fivefold 37% of the time and twentyfold 11% of the time

Suspects Extension of prospectr, incorporates GO [9, 15]. On average the target gene was on the top 31.23% of the resulting ranking list.

MedSim GO enrichment and functional comparison [13]. It accomplished a performance of up to 0.90 in their ROC curve.

Limitations Generally imposed by the source data which carries little knowledge about the disease. For instance GO terms include brief description of the corresponding biological function of the genes but only 60% of all human genes have associated
GO terms, and they may be inconsistent due to differences in curators’ judgement [16]