Table of Contents Author Guidelines Submit a Manuscript
BioMed Research International
Volume 2014, Article ID 348725, 12 pages
http://dx.doi.org/10.1155/2014/348725
Research Article

Developing eThread Pipeline Using SAGA-Pilot Abstraction for Large-Scale Structural Bioinformatics

1RADICAL, ECE, Rutgers University, New Brunswick, NJ 08901, USA
2Center for Computation and Technology, Louisiana State University, Baton Rouge, LA 70803, USA
3Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA

Received 6 March 2014; Accepted 8 May 2014; Published 9 June 2014

Academic Editor: Daniele D’Agostino

Copyright © 2014 Anjani Ragothaman et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. R. Chen and M. Snyder, “Systems biology: personalized medicine for the future?” Current Opinion in Pharmacology, vol. 12, no. 5, pp. 623–628, 2012. View at Publisher · View at Google Scholar · View at Scopus
  2. X. Feng, X. Liu, Q. Luo, and B.-F. Liu, “Mass spectrometry in systems biology: an overview,” Mass Spectrometry Reviews, vol. 27, no. 6, pp. 635–660, 2008. View at Publisher · View at Google Scholar · View at Scopus
  3. S. C. Schuster, “Next-generation sequencing transforms today's biology,” Nature Methods, vol. 5, no. 1, pp. 16–18, 2008. View at Publisher · View at Google Scholar · View at Scopus
  4. T. S. Furey, “ChIP-seq and beyond: new and improved methodologies to detect and characterize protein-DNA interactions,” Nature Reviews Genetics, vol. 13, no. 12, pp. 840–852, 2012. View at Publisher · View at Google Scholar · View at Scopus
  5. R. Chen and M. Snyder, “Promise of personalized omics to precision medicine,” Wiley Interdisciplinary Reviews: Systems Biology and Medicine, vol. 5, no. 1, pp. 73–82, 2013. View at Publisher · View at Google Scholar · View at Scopus
  6. E. R. Mardis, “Next-generation DNA sequencing methods,” Annual Review of Genomics and Human Genetics, vol. 9, pp. 387–402, 2008. View at Publisher · View at Google Scholar · View at Scopus
  7. Z. Wang, M. Gerstein, and M. Snyder, “RNA-Seq: a revolutionary tool for transcriptomics,” Nature Reviews Genetics, vol. 10, no. 1, pp. 57–63, 2009. View at Publisher · View at Google Scholar · View at Scopus
  8. J. Shendure and E. L. Aiden, “The expanding scope of DNA sequencing,” Nature Biotechnology, vol. 30, no. 11, pp. 1084–1094, 2012. View at Publisher · View at Google Scholar · View at Scopus
  9. J. D. McPherson, “Next-generation gap,” Nature Methods, vol. 6, no. 11, supplement, pp. S2–S5, 2009. View at Google Scholar · View at Scopus
  10. A. S. Juncker, L. J. Jensen, A. Pierleoni et al., “Sequence-based feature prediction and annotation of proteins,” Genome Biology, vol. 10, no. 2, article 206, 2009. View at Google Scholar · View at Scopus
  11. Y. Loewenstein, D. Raimondo, O. C. Redfern et al., “Protein function annotation by homology-based inference,” Genome Biology, vol. 10, no. 2, article 207, 2009. View at Google Scholar · View at Scopus
  12. J. Skolnick, J. S. Fetrow, and A. Kolinski, “Structural genomics and its importance for gene function analysis,” Nature Biotechnology, vol. 18, no. 3, pp. 283–287, 2000. View at Publisher · View at Google Scholar · View at Scopus
  13. A. M. Schnoes, S. D. Brown, I. Dodevski, and P. C. Babbitt, “Annotation error in public databases: misannotation of molecular function in enzyme superfamilies,” PLoS Computational Biology, vol. 5, no. 12, Article ID e1000605, 2009. View at Publisher · View at Google Scholar · View at Scopus
  14. J. Skolnick and M. Brylinski, “FINDSITE: a combined evolution/structure-based approach to protein function prediction,” Briefings in Bioinformatics, vol. 10, no. 4, pp. 378–391, 2009. View at Publisher · View at Google Scholar · View at Scopus
  15. J. A. Capra, R. A. Laskowski, J. M. Thornton, M. Singh, and T. A. Funkhouser, “Predicting protein ligand binding sites by combining evolutionary sequence conservation and 3D structure,” PLoS Computational Biology, vol. 5, no. 12, Article ID e1000585, 2009. View at Publisher · View at Google Scholar · View at Scopus
  16. F. Glaser, Y. Rosenberg, A. Kessel, T. Pupko, and N. Ben-Tal, “The ConSurf-HSSP database: the mapping of evolutionary conservation among homologs onto PDB structures,” Proteins: Structure, Function and Genetics, vol. 58, no. 3, pp. 610–617, 2005. View at Publisher · View at Google Scholar · View at Scopus
  17. M. Brylinski, “Unleashing the power of metathreading for evolution/structure-based function inference of proteins,” Frontiers in Genetics, vol. 4, article 118, 2013. View at Publisher · View at Google Scholar
  18. S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, “Basic local alignment search tool,” Journal of Molecular Biology, vol. 215, no. 3, pp. 403–410, 1990. View at Publisher · View at Google Scholar · View at Scopus
  19. T. Gunarathne, T.-L. Wu, J. Y. Choi, S. Bae, and J. Qiu, “Cloud computing paradigms for pleasingly parallel biomedical applications,” Concurrency Computation Practice and Experience, vol. 23, no. 17, pp. 2338–2354, 2011. View at Publisher · View at Google Scholar · View at Scopus
  20. S. Jha, D. S. Katz, A. Luckow, A. Merzky, and K. Stamou, “Understanding scientific applications for cloud environments,” in Cloud Computing: Principles and Paradigms, p. 664, 2011. View at Publisher · View at Google Scholar
  21. R. C. Taylor, “An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics,” BMC Bioinformatics, vol. 11, supplement 12, article S1, 2010. View at Publisher · View at Google Scholar · View at Scopus
  22. J. Gurtowski, M. C. Schatz, and B. Langmead, “Genotyping in the cloud with crossbow,” in Current Protocols in Bioinformatics, chapter 15, unit 15.3, 2012. View at Publisher · View at Google Scholar
  23. M. C. Schatz, “CloudBurst: highly sensitive read mapping with MapReduce,” Bioinformatics, vol. 25, no. 11, pp. 1363–1369, 2009. View at Publisher · View at Google Scholar · View at Scopus
  24. M. Baker, “Next-generation sequencing: adjusting to data overload,” Nature Methods, vol. 7, no. 7, pp. 495–499, 2010. View at Publisher · View at Google Scholar · View at Scopus
  25. J. Kim, S. Maddineni, and S. Jha, “Advancing next-generation sequencing data analytics with scalable distributed infrastructure,” Concurrency Computation Practice and Experience, vol. 26, no. 4, pp. 894–906, 2014. View at Publisher · View at Google Scholar · View at Scopus
  26. P. K. Mantha, N. Kim, A. Luckow, J. Kim, and S. Jha, “Understanding MapReduce-based next-generation sequencing alignment on distributed cyberinfrastructure,” in Proceedings of the 3rd International Emerging Computational Methods for the Life Sciences Workshop (ECMLS '12), pp. 3–12, ACM, June 2012. View at Publisher · View at Google Scholar · View at Scopus
  27. M. Brylinski and D. Lingam, “eThread: a highly optimized machine learning-based approach to meta-threading and the modeling of protein tertiary structures,” PLoS ONE, vol. 7, no. 11, Article ID e50200, 2012. View at Publisher · View at Google Scholar · View at Scopus
  28. M. Brylinski and W. P. Feinstein, “Setting up a metathreading pipeline for high-throughput structural bioinformatics: eThread software distribution, walkthrough and resource profiling,” Journal of Computer Science and Systems Biology, vol. 6, no. 1, pp. 001–010, 2012. View at Google Scholar
  29. A. Luckow, M. Santcroos, A. Merzky, O. Weidner, P. Mantha, and S. Jha, “P*: a model of pilot-abstractions,” in Proceedings of the IEEE 8th International Conference on E-Science (e-Science '12), pp. 1–10, Chicago, Ill, USA, October 2012. View at Publisher · View at Google Scholar · View at Scopus
  30. A. Luckow, L. Lacinski, and S. Jha, “SAGA BigJob: an extensible and interoperable Pilot-Job abstraction for distributed applications and systems,” in Proceedings of the 10th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing (CCGrid '10), pp. 135–144, Melbourne, Australia, May 2010. View at Publisher · View at Google Scholar · View at Scopus
  31. SAGA BigJob, http://saga-project.github.io/BigJob/.
  32. A. Luckow, M. Santcroos, O. Weidner, A. Zebrowski, and S. Jha, “Pilot-data: an abstraction for distributed data,” CoRR, http://arxiv.org/abs/1301.6228.
  33. S. Maddineni, J. Kim, Y. El-Khamra, and S. Jha, “Distributed application runtime environment (DARE): a standards-based middleware framework for science-gateways,” Journal of Grid Computing, vol. 10, no. 4, pp. 647–664, 2012. View at Publisher · View at Google Scholar · View at Scopus