Table of Contents Author Guidelines Submit a Manuscript
Journal of Applied Mathematics
Volume 2012, Article ID 897289, 23 pages
http://dx.doi.org/10.1155/2012/897289
Research Article

Selecting Negative Samples for PPI Prediction Using Hierarchical Clustering Methodology

1Department of Computer Architecture and Computer Technology, University of Granada, 18017 Granada, Spain
2Department of Applied Mathematics, University of Granada, 18017 Granada, Spain

Received 12 September 2011; Accepted 17 November 2011

Academic Editor: Venky Krishnan

Copyright © 2012 J. M. Urquiza et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. V. Deshmukh, C. Cannings, and A. Thomas, “Estimating the parameters of a model for protein-protein interaction graphs,” Mathematical Medicine and Biology, vol. 23, no. 4, pp. 279–295, 2006. View at Publisher · View at Google Scholar · View at Scopus
  2. D. J. Higham, G. Kalna, and J. K. Vass, “Spectral analysis of two-signed microarray expression data,” Mathematical Medicine and Biology, vol. 24, no. 2, pp. 131–148, 2007. View at Publisher · View at Google Scholar · View at Scopus
  3. J. F. Rual, K. Venkatesan, T. Hao et al., “Towards a proteome-scale map of the human protein-protein interaction network,” Nature, vol. 437, no. 7062, pp. 1173–1178, 2005. View at Publisher · View at Google Scholar · View at Scopus
  4. C. Huang, F. Morcos, S. P. Kanaan, S. Wuchty, D. Z. Chen, and J. A. Izaguirre, “Predicting protein-protein interactions from protein domains using a set cover approach,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 4, no. 1, pp. 78–87, 2007. View at Publisher · View at Google Scholar · View at Scopus
  5. A. J. González and L. Liao, “Predicting domain-domain interaction based on domain profiles with feature selection and support vector machines,” BMC Bioinformatics, vol. 11, article 537, 2010. View at Publisher · View at Google Scholar
  6. U. Stelzl, U. Worm, M. Lalowski et al., “A human protein-protein interaction network: a resource for annotating the proteome,” Cell, vol. 122, no. 6, pp. 957–968, 2005. View at Publisher · View at Google Scholar · View at Scopus
  7. K. Venkatesan, J. F. Rual, A. Vazquez et al., “An empirical framework for binary interactome mapping,” Nature Methods, vol. 6, no. 1, pp. 83–90, 2009. View at Publisher · View at Google Scholar · View at Scopus
  8. P. Braun, M. Tasan, M. Dreze et al., “An experimentally derived confidence score for binary protein-protein interactions,” Nature Methods, vol. 6, no. 1, pp. 91–97, 2009. View at Publisher · View at Google Scholar · View at Scopus
  9. J. Yu and R. L. Finley, “Combining multiple positive training sets to generate confidence scores for protein-protein interactions,” Bioinformatics, vol. 25, no. 1, pp. 105–111, 2009. View at Publisher · View at Google Scholar · View at Scopus
  10. R. Jansen, H. Yu, D. Greenbaum et al., “A bayesian networks approach for predicting protein-protein interactions from genomic data,” Science, vol. 302, no. 5644, pp. 449–453, 2003. View at Publisher · View at Google Scholar · View at Scopus
  11. A. Patil and H. Nakamura, “HINT—a database of annotated protein-protein interactions and their homologs,” Biophysics, vol. 1, pp. 21–24, 2005. View at Google Scholar
  12. I. Kim, Y. Liu, and H. Zhao, “Bayesian methods for predicting interacting protein pairs using domain information,” Biometrics, vol. 63, no. 3, pp. 824–833, 2007. View at Publisher · View at Google Scholar · View at Zentralblatt MATH
  13. M. Deng, S. Mehta, F. Sun, and T. Chen, “Inferring domain-domain interactions from protein-protein interactions,” Genome Research, vol. 12, no. 10, pp. 1540–1548, 2002. View at Publisher · View at Google Scholar
  14. I. Iossifov, M. Krauthammer, C. Friedman et al., “Probabilistic inference of molecular networks from noisy data sources,” Bioinformatics, vol. 20, no. 8, pp. 1205–1213, 2004. View at Publisher · View at Google Scholar · View at Scopus
  15. L. V. Zhang, S. L. Wong, O. D. King, and F. P. Roth, “Predicting co-complexed protein pairs using genomic and proteomic data integration,” BMC Bioinformatics, vol. 5, article no. 38, 2004. View at Publisher · View at Google Scholar · View at Scopus
  16. Y. Liu, I. Kim, and H. Zhao, “Protein interaction predictions from diverse sources,” Drug Discovery Today, vol. 13, no. 9-10, pp. 409–416, 2008. View at Publisher · View at Google Scholar · View at Scopus
  17. A. Ben-Hur and W. S. Noble, “Kernel methods for predicting protein-protein interactions,” Bioinformatics, vol. 21, no. 1, pp. i38–i46, 2005. View at Publisher · View at Google Scholar · View at Scopus
  18. R. A. Craig and L. Liao, “Improving protein-protein interaction prediction based on phylogenetic information using a least-squares support vector machine,” Annals of the New York Academy of Sciences, vol. 1115, pp. 154–167, 2007. View at Publisher · View at Google Scholar
  19. A. Patil and H. Nakamura, “Filtering high-throughput protein-protein interaction data using a combination of genomic features,” BMC Bioinformatics, vol. 6, article no. 100, 2005. View at Publisher · View at Google Scholar · View at Scopus
  20. F. Azuaje, H. Wang, H. Zheng, O. Bodenreider, and A. Chesneau, “Predictive integration of gene ontology-driven similarity and functional interactions,” in Proceedings of the 6th IEEE International Conference on Data Mining, pp. 114–119, 2006.
  21. C. M. Deane, L. Salwiński, I. Xenarios, and D. Eisenberg, “Protein interactions: two methods for assessment of the reliability of high throughput observations,” Molecular and Cellular Proteomics, vol. 1, no. 5, pp. 349–356, 2002. View at Google Scholar
  22. R. Saeed and C. Deane, “An assessment of the uses of homologous interactions,” Bioinformatics, vol. 24, no. 5, pp. 689–695, 2008. View at Publisher · View at Google Scholar · View at Scopus
  23. M. Pellegrini, D. Haynor, and J. M. Johnson, “Protein interaction networks,” Expert Review of Proteomics, vol. 1, no. 2, pp. 239–249, 2004. View at Publisher · View at Google Scholar · View at Scopus
  24. P. Uetz, L. Glot, G. Cagney et al., “A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae,” Nature, vol. 403, no. 6770, pp. 623–627, 2000. View at Publisher · View at Google Scholar · View at Scopus
  25. T. Ito, T. Chiba, R. Ozawa, M. Yoshida, M. Hattori, and Y. Sakaki, “A comprehensive two-hybrid analysis to explore the yeast protein interactome,” Proceedings of the National Academy of Sciences of the United States of America, vol. 98, no. 8, pp. 4569–4574, 2001. View at Publisher · View at Google Scholar · View at Scopus
  26. A. C. Gavin, M. Bösche, R. Krause et al., “Functional organization of the yeast proteome by systematic analysis of protein complexes,” Nature, vol. 415, no. 6868, pp. 141–147, 2002. View at Publisher · View at Google Scholar · View at Scopus
  27. Y. Ho, A. Gruhler, A. Heilbut et al., “Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry,” Nature, vol. 415, no. 6868, pp. 180–183, 2002. View at Publisher · View at Google Scholar · View at Scopus
  28. A. J. M. Walhout, R. Sordella, X. Lu et al., “Protein interaction mapping in C. elegans Using proteins involved in vulval development,” Science, vol. 287, no. 5450, pp. 116–122, 2000. View at Publisher · View at Google Scholar
  29. S. Li, C. M. Armstrong, N. Bertin et al., “A map of the interactome network of the metazoan C. elegans,” Science, vol. 303, no. 5657, pp. 540–543, 2004. View at Publisher · View at Google Scholar · View at Scopus
  30. L. Giot, J. S. Bader, C. Brouwer et al., “A protein interaction map of Drosophila melanogaster,” Science, vol. 302, no. 5651, pp. 1727–1736, 2003. View at Publisher · View at Google Scholar · View at Scopus
  31. E. Formstecher, S. Aresta, V. Collura et al., “Protein interaction mapping: a Drosophila case study,” Genome Research, vol. 15, no. 3, pp. 376–384, 2005. View at Publisher · View at Google Scholar
  32. T. Bouwmeester, A. Bauch, H. Ruffner et al., “A physical and functional map of the human TNF-α/NF-κB signal transduction pathway,” Nature Cell Biology, vol. 6, no. 2, pp. 97–105, 2004. View at Publisher · View at Google Scholar · View at Scopus
  33. Y. Qi, J. Klein-Seetharaman, and Z. Bar-Joseph, “Random forest similarity for protein-protein interaction prediction from multiple sources,” in Proceedings of the Pacific Symposium on Biocomputing, pp. 531–542, 2005.
  34. H. Yu, P. Braun, M. A. Yildirim et al., “High-quality binary protein interaction map of the yeast interactome network,” Science, vol. 322, no. 5898, pp. 104–110, 2008. View at Publisher · View at Google Scholar · View at Scopus
  35. X. Wu, L. Zhu, J. Guo, D. Y. Zhang, and K. Lin, “Prediction of yeast protein-protein interaction network: insights from the Gene Ontology and annotations,” Nucleic Acids Research, vol. 34, no. 7, pp. 2137–2150, 2006. View at Publisher · View at Google Scholar · View at Scopus
  36. H. Wang, F. Azuaje, O. Bodenreider, and J. Dopazo, “Gene expression correlation and gene ontology-based similarity: an assessment of quantitative relationships,” in Proceedings of the IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB '04), pp. 25–31, 2004.
  37. G. O. Consortium, “The gene ontology (GO) database and informatics resource,” Nucleic Acids Research, vol. 32, pp. D258–D261, 2004. View at Google Scholar · View at Scopus
  38. J. Yu, M. Guo, C. J. Needham, Y. Huang, L. Cai, and D. R. Westhead, “Simple sequence-based kernels do not predict protein-protein interactions,” Bioinformatics, vol. 26, no. 20, Article ID btq483, pp. 2610–2614, 2010. View at Publisher · View at Google Scholar · View at Scopus
  39. H. Peng, F. Long, and C. Ding, “Feature selection based on mutual information: criteria of Max-Dependency, Max-Relevance, and Min-Redundancy,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 8, pp. 1226–1238, 2005. View at Publisher · View at Google Scholar · View at Scopus
  40. P. Block, J. Paern, E. Hüllermeier, P. Sanschagrin, C. A. Sotriffer, and G. Klebe, “Physicochemical descriptors to discriminate protein-protein interactions in permanent and transient complexes selected by means of machine learning algorithms,” Proteins: Structure, Function and Genetics, vol. 65, no. 3, pp. 607–622, 2006. View at Publisher · View at Google Scholar
  41. M. J. Mizianty and L. Kurgan, “Modular prediction of protein structural classes from sequences of twilight-zone identity with predicting sequences,” BMC Bioinformatics, vol. 10, article 414, 2009. View at Publisher · View at Google Scholar
  42. E. Camon, M. Magrane, D. Barrell et al., “The gene ontology annotation (GOA) database: sharing knowledge in uniprot with gene oncology,” Nucleic Acids Research, vol. 32, pp. D262–D266, 2004. View at Google Scholar · View at Scopus
  43. U. Güldener, M. Münsterkötter, G. Kastenmüller et al., “CYGD: the comprehensive yeast genome database,” Nucleic Acids Research, vol. 33, pp. D364–D368, 2005. View at Publisher · View at Google Scholar
  44. A. Stein, R. B. Russell, and P. Aloy, “3did: interacting protein domains of known three-dimensional structure,” Nucleic Acids Research, vol. 33, pp. D413–D417, 2005. View at Publisher · View at Google Scholar · View at Scopus
  45. B. Boeckmann, A. Bairoch, R. Apweiler et al., “The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003,” Nucleic Acids Research, vol. 31, no. 1, pp. 365–370, 2003. View at Publisher · View at Google Scholar · View at Scopus
  46. R. Roslan, R. M. Othman, Z. A. Shah et al., “Utilizing shared interacting domain patterns and Gene Ontology information to improve protein-protein interaction prediction,” Computers in Biology and Medicine, vol. 40, no. 6, pp. 555–564, 2010. View at Publisher · View at Google Scholar · View at Scopus
  47. H. M. Berman, J. Westbrook, Z. Feng et al., “The protein data bank,” Nucleic Acids Research, vol. 28, no. 1, pp. 235–242, 2000. View at Google Scholar · View at Scopus
  48. T. U. Consortium, “The universal protein resource (UniProt),” Nucleic Acids Research, vol. 35, pp. D193–D197, 2007. View at Google Scholar
  49. R. D. Finn, J. Tate, J. Mistry et al., “The Pfam protein families database,” Nucleic Acids Research, vol. 36, no. 1, pp. D281–D288, 2008. View at Publisher · View at Google Scholar · View at Scopus
  50. P. A. Estévez, M. Tesmer, C. A. Perez, and J. M. Zurada, “Normalized mutual information feature selection,” IEEE Transactions on Neural Networks, vol. 20, no. 2, pp. 189–201, 2009. View at Publisher · View at Google Scholar · View at Scopus
  51. G. John, R. Kohavi, and K. Pfleger, “Irrelevant features and the subset selection problem,” in Proceedings of the International Conference on Machine Learning, pp. 121–126, 1994.
  52. J. Bins and B. A. Draper, “Feature selection from huge feature sets,” in Proceedings of the 8th International Conference on Computer Vision, vol. 2, pp. 159–165, July 2001. View at Scopus
  53. T. M. Cover and J. A. Thomas, Elements of Information Theory, John Wiley & Sons, Hoboken, NJ, USA, Second edition, 2006.
  54. S. Kullback, Information Theory and Statistics, Dover Publications, Mineola, NY, USA, 1997.
  55. C. Cortes and V. Vapnik, “Support-vector networks,” Machine Learning, vol. 20, no. 3, pp. 273–297, 1995. View at Publisher · View at Google Scholar · View at Scopus
  56. L. J. Herrera, H. Pomares, I. Rojas, A. Guillén, A. Prieto, and O. Valenzuela, “Recursive prediction for long term time series forecasting using advanced models,” Neurocomputing, vol. 70, no. 16–18, pp. 2870–2880, 2007. View at Publisher · View at Google Scholar · View at Scopus
  57. A. Statnikov, L. Wang, and C. F. Aliferis, “A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification,” BMC Bioinformatics, vol. 9, article 319, 2008. View at Publisher · View at Google Scholar
  58. T. Wu, C. Lin, and R. C. Weng, “Probability estimates for multi-class classification by pairwise coupling,” Journal of Machine Learning Research, vol. 5, pp. 975–1005, 2004. View at Google Scholar
  59. J. A. K. Suykens, T. V. Gestel, J. D. Brabanter, B. D. Moor, and J. Vandewalle, Least Squares Support Vector Machines, World Scientific Publishing Company, 2003.
  60. I. Rojas, H. Pomares, J. Gonzáles et al., “Analysis of the functional block involved in the design of radial basis function networks,” Neural Processing Letters, vol. 12, no. 1, pp. 1–17, 2000. View at Publisher · View at Google Scholar · View at Scopus
  61. T. Zhang, “Statistical behavior and consistency of classification methods based on convex risk minimization,” The Annals of Statistics, vol. 32, no. 1, pp. 56–85, 2004. View at Publisher · View at Google Scholar · View at Zentralblatt MATH
  62. C. Chang and C. Lin, LIBSVM: a Library for Support Vector Machines, 2001, http://www.csie.ntu.edu.tw/main.php.
  63. J. Macqueen, “Some methods for classification and analysis of multivariate observations,” in Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297, University of California Press, Berkeley, Calif, USA, 1967. View at Google Scholar · View at Zentralblatt MATH
  64. A. Guillén, H. Pomares, J. González, I. Rojas, O. Valenzuela, and B. Prieto, “Parallel multiobjective memetic RBFNNs design and feature selection for function approximation problems,” Neurocomputing, vol. 72, no. 16–18, pp. 3541–3555, 2009. View at Publisher · View at Google Scholar
  65. A. Guillen, D. Sovilj, A. Lendasse, F. Mateo, and I. Rojas, “Minimising the delta test for variable selection in regression problems,” International Journal of High Performance Systems Architecture, vol. 1, no. 4, pp. 269–281, 2008. View at Google Scholar
  66. A. Kumar, S. Agarwal, J. A. Heyman et al., “Subcellular localization of the yeast proteome,” Genes and Development, vol. 16, no. 6, pp. 707–719, 2002. View at Publisher · View at Google Scholar · View at Scopus
  67. F. Browne, H. Wang, H. Zheng, and F. Azuaje, “Supervised statistical and machine learning approaches to inferring pairwise and module-based protein interaction networks,” in Proceedings of the 7th IEEE International Conference on Bioinformatics and Bioengineering, pp. 1365–1369, 2007. View at Publisher · View at Google Scholar
  68. H. Zheng, H. Wang, and D. H. Glass, “Integration of genomic data for inferring protein complexes from global protein-protein interaction networks,” IEEE Transactions on Systems, Man, and Cybernetics Part B, vol. 38, no. 1, pp. 5–16, 2008. View at Publisher · View at Google Scholar · View at Scopus
  69. F. Browne, H. Wang, H. Zheng, and F. Azuaje, “A knowledge-driven probabilistic framework for the prediction of protein-protein interaction networks,” Computers in Biology and Medicine, vol. 40, no. 3, pp. 306–317, 2010. View at Publisher · View at Google Scholar · View at Scopus
  70. J. Fogarty, R. S. Baker, and S. E. Hudson, “Case studies in the use of ROC curve analysis for sensor-based estimates in human computer interaction,” in Proceedings of the Graphics Interface (GI ’05), pp. 129–136, Canadian Human-Computer Communications Society, Victoria, Canada, 2005.
  71. J. A. Hanley and B. J. McNeil, “A method of comparing the areas under receiver operating characteristic curves derived from the same cases,” Radiology, vol. 148, no. 3, pp. 839–843, 1983. View at Google Scholar · View at Scopus
  72. T. Jiang and A. E. Keating, “AVID: an integrative framework for discovering functional relationship among proteins,” BMC Bioinformatics, vol. 6, article 136, 2005. View at Publisher · View at Google Scholar