Research Article

Tyrosine Kinase Ligand-Receptor Pair Prediction by Using Support Vector Machine

Figure 1

Flow of dataset generation. Data from the Database of Interacting Proteins (DIP) were filtered by selecting only pairs with a protein in the search result of UniProtKB [10] using a keyword and EC number search (“tyrosine kinase” AND EC:2.7.10.1 AND reviewed: yes). To exclude redundancy, clustering was performed against the 174 hits obtained from the interacting pairs of receptor tyrosine kinases and ligands using BLASTclust and one protein was extracted from each cluster. An identity of 80% or above within the 100% region of the amino acid sequence was set as the criteria for BLASTclust. As a result of clustering, 34 receptor tyrosine kinases and 67 ligand proteins were extracted. On the basis of these procedures, 95 pairs were obtained as the final positive data for protein-protein interaction. Negative data (2183) were artificially prepared by excluding the 95 positive data hits from all the combinations of the retrieved receptor tyrosine kinases and their above ligands.