Research Article

Gene Prioritization of Resistant Rice Gene against Xanthomas oryzae pv. oryzae by Using Text Mining Technologies

Algorithm 1

Gene prioritization algorithm.
Step 1. Collect NCBI literature in the rice research field, denote the text database as , here = “rice”, “Event”,
  “Binding”, “Catabolism”, “Expression”, “Localization”, “phosphorylation”, “regulation”, “transcript”, "Xoo”;
Step 2. Build phrase dictionary, denote the terms as .
Step 3. Evaluate the relevance between and by computing TF*IDF( ), here is the total text data set.
Step 4. Rank important .
Step 5. Retrieve protein in NCBI with annotation include .
Step 6. Rank candidate protein by using the built-in classifier [17] which is sequence-based.
Step 7. Use Conserved Domain Data (CDD) and Gene Ontology (GO) to verify the result of prioritization.