Table of Contents
Computational Biology Journal
Volume 2013, Article ID 707540, 12 pages
http://dx.doi.org/10.1155/2013/707540
Research Article

SCBI_MapReduce, a New Ruby Task-Farm Skeleton for Automated Parallelisation and Distribution in Chunks of Sequences: The Implementation of a Boosted Blast+

1Supercomputación y Bioinformática-Plataforma Andaluza de Bioinformática (SCBI-PAB), Universidad de Málaga, 29071 Málaga, Spain
2Departamento de Lenguajes y Ciencias de la Computación, Universidad de Málaga, 29071 Málaga, Spain
3Departamento de Biología Molecular y Bioquímica, Universidad de Málaga, 29071 Málaga, Spain

Received 21 June 2013; Revised 18 September 2013; Accepted 19 September 2013

Academic Editor: Ivan Merelli

Copyright © 2013 Darío Guerrero-Fernández et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. C. Huttenhower and O. Hofmann, “A quick guide to large-scale genomic data mining,” PLoS Computational Biology, vol. 6, no. 5, Article ID e1000779, 2010. View at Publisher · View at Google Scholar · View at Scopus
  2. M. C. Schatz, B. Langmead, and S. L. Salzberg, “Cloud computing and the DNA data race,” Nature Biotechnology, vol. 28, no. 7, pp. 691–693, 2010. View at Publisher · View at Google Scholar · View at Scopus
  3. D. Patterson, “The trouble with multi-core,” IEEE Spectrum, vol. 47, no. 7, pp. 28–53, 2010. View at Publisher · View at Google Scholar · View at Scopus
  4. C. Camacho, G. Coulouris, V. Avagyan et al., “BLAST+: architecture and applications,” BMC Bioinformatics, vol. 10, article 421, 2009. View at Publisher · View at Google Scholar · View at Scopus
  5. S. Gálvez, D. Díaz, P. Hernández, F. J. Esteban, J. A. Caballero, and G. Dorado, “Next-generation bioinformatics: using many-core processor architecture to develop a web service for sequence alignment,” Bioinformatics, vol. 26, no. 5, pp. 683–686, 2010. View at Publisher · View at Google Scholar · View at Scopus
  6. H. Lin, X. Ma, W. Feng, and N. F. Samatova, “Coordinating computation and I/O in massively parallel sequence search,” IEEE Transactions on Parallel and Distributed Systems, vol. 22, no. 4, pp. 529–543, 2011. View at Publisher · View at Google Scholar · View at Scopus
  7. T. Nguyen, W. Shi, and D. Ruden, “CloudAligner: a fast and full-featured MapReduce based tool for sequence mapping,” BMC Research Notes, vol. 4, article 171, 2011. View at Publisher · View at Google Scholar · View at Scopus
  8. T. Rognes, “Faster Smith-Waterman database searches with inter-sequence SIMD parallelisation,” BMC Bioinformatics, vol. 12, article 221, 2011. View at Publisher · View at Google Scholar · View at Scopus
  9. X.-L. Yang, Y.-L. Liu, C.-F. Yuan, and Y.-H. Huang, “Parallelization of BLAST with MapReduce for long sequence alignment,” in Proceedings of the 4th International Symposium on Parallel Architectures, Algorithms and Programming (PAAP '11), pp. 241–246, IEEE Computer Society, December 2011. View at Publisher · View at Google Scholar · View at Scopus
  10. B. Langmead, M. C. Schatz, J. Lin, M. Pop, and S. L. Salzberg, “Searching for SNPs with cloud computing,” Genome Biology, vol. 10, no. 11, article R134, 2009. View at Publisher · View at Google Scholar · View at Scopus
  11. M. Needham, R. Hu, S. Dwarkadas, and X. Qiu, “Hierarchical parallelization of gene differential association analysis,” BMC Bioinformatics, vol. 12, article 374, 2011. View at Publisher · View at Google Scholar · View at Scopus
  12. M. K. Gardner, W.-C. Feng, J. Archuleta, H. Lin, and X. Mal, “Parallel genomic sequence-searching on an ad-hoc grid: experiences, lessons learned, and implications,” in Proceedings of the ACM/IEEE Conference on High Performance Networking and Computing, vol. 1, pp. 1–14, 2006.
  13. L. Yu, C. Moretti, A. Thrasher, S. Emrich, K. Judd, and D. Thain, “Harnessing parallelism in multicore clusters with the All-Pairs, Wavefront, and Makeflow abstractions,” Cluster Computing, vol. 13, no. 3, pp. 243–256, 2010. View at Publisher · View at Google Scholar · View at Scopus
  14. M. K. Chen and K. Olukotun, “The Jrpm system for dynamically parallelizing Java programs,” in Proceedings of the 30th Annual International Symposium on Computer Architecture (ISCA '03), pp. 434–445, San Diego, Calif, USA, June 2003. View at Scopus
  15. P. Haller and M. Odersky, “Scala Actors: unifying thread-based and event-based programming,” Theoretical Computer Science, vol. 410, no. 2-3, pp. 202–220, 2009. View at Publisher · View at Google Scholar · View at Scopus
  16. J. Armstrong, R. Virding, C. Wikström, and M. Williams, Concurrent Programming in ERLANG, Prentice Hall, 2nd edition, 1996.
  17. W. Gropp, E. Lusk, and A. Skjellum, Using MPI: Portable Parallel Programming with the Message-Passing Interface, MIT Press, Cambridge, Mass, USA, 2nd edition, 1999.
  18. L. Dagum and R. Menon, “Openmp: an industry-standard api for shared-memory programming,” IEEEComputational Science & Engineering, vol. 5, no. 1, pp. 46–55, 1998. View at Google Scholar
  19. Q. Zou, X.-B. Li, W.-R. Jiang, Z.-Y. Lin, G.-L. Li, and K. Chen, “Survey of mapreduce frame operation in bioinformatics,” Briefings in Bioinformatics. In press.
  20. R. C. Taylor, “An overview of the hadoop/mapreduce/hbase framework and its current applications in bioinformatics,” BMC Bioinformatics, vol. 11, supplement 12, p. S1, 2010. View at Google Scholar
  21. J. Lin, “Mapreduce is good enough?” Big Data, vol. 1, no. 1, pp. 28–37, 2013. View at Google Scholar
  22. D. Thain, T. Tannenbaum, and M. Livny, “Distributed computing in practice: the Condor experience,” Concurrency Computation Practice and Experience, vol. 17, no. 2-4, pp. 323–356, 2005. View at Publisher · View at Google Scholar · View at Scopus
  23. S. Pellicer, G. Chen, K. C. C. Chan, and Y. Pan, “Distributed sequence alignment applications for the public computing architecture,” IEEE Transactions on Nanobioscience, vol. 7, no. 1, pp. 35–43, 2008. View at Publisher · View at Google Scholar · View at Scopus
  24. J. Hill, M. Hambley, T. Forster et al., “SPRINT: a new parallel framework for R,” BMC Bioinformatics, vol. 9, article 558, 2008. View at Publisher · View at Google Scholar · View at Scopus
  25. J. Li, X. Ma, S. Yoginath, G. Kora, and N. F. Samatova, “Transparent runtime parallelization of the R scripting language,” Journal of Parallel and Distributed Computing, vol. 71, no. 2, pp. 157–168, 2011. View at Publisher · View at Google Scholar · View at Scopus
  26. F. Berenger, C. Coti, and K. Y. J. Zhang, “PAR: a PARallel and distributed job crusher,” Bioinformatics, vol. 26, no. 22, pp. 2918–2919, 2010. View at Publisher · View at Google Scholar · View at Scopus
  27. M. Aldinucci, M. Torquati, C. Spampinato et al., “Parallel stochastic systems biology in the cloud,” Briefings in Bioinformatics. In press.
  28. A. Matsunaga, M. Tsugawa, and J. Fortes, “CloudBLAST: combining MapReduce and virtualization on distributed resources for bioinformatics applications,” in Proceedings of the 4th IEEE International Conference on eScience (eScience '08), pp. 222–229, IEEE Computer Society, Washington, DC, USA, December 2008. View at Publisher · View at Google Scholar · View at Scopus
  29. W. Lu, J. Jackson, and R. Barga, “AzureBlast: a case study of developing science applications on the cloud,” in Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing (HPDC '10), pp. 413–420, ACM, Chicago, Ill, USA, June 2010. View at Publisher · View at Google Scholar · View at Scopus
  30. P. D. Vouzis and N. V. Sahinidis, “GPU-BLAST: using graphics processors to accelerate protein sequence alignment,” Bioinformatics, vol. 27, no. 2, pp. 182–188, 2011. View at Publisher · View at Google Scholar · View at Scopus
  31. C. S. Oehmen and D. J. Baxter, “Scalablast 2.0: rapid and robust blast calculations on multiprocessor systems,” Bioinformatics, vol. 29, no. 6, pp. 797–798, 2013. View at Google Scholar
  32. J. Aerts and A. Law, “An introduction to scripting in Ruby for biologists,” BMC Bioinformatics, vol. 10, article 221, 2009. View at Publisher · View at Google Scholar · View at Scopus
  33. S. Balakrishnan, R. Rajwar, M. Upton, and K. Lai, “The impact of performance asymmetry in emerging multicore architectures.,” SIGARCH Computer Architecture News, vol. 33, no. 2, pp. 506–517, 2005. View at Google Scholar
  34. L. Jostins and J. Jaeger, “Reverse engineering a gene network using an asynchronous parallel evolution strategy,” BMC Systems Biology, vol. 4, article 17, 2010. View at Publisher · View at Google Scholar · View at Scopus
  35. O. Thorsen, B. Smith, C. P. Sosa et al., “Parallel genomic sequence-search on a massively parallel system,” in Proceedings of the 4th Conference on Computing Frontiers (CF '07), pp. 59–68, Ischia, Italy, May 2007. View at Publisher · View at Google Scholar · View at Scopus
  36. M. Armbrust, A. Fox, R. Griffith et al., “A view of cloud computing,” Communications of the ACM, vol. 53, no. 4, pp. 50–58, 2010. View at Publisher · View at Google Scholar · View at Scopus
  37. C.-L. Hung and Y.-L. Lin, “Implementation of a parallel protein structure alignment service on cloud,” International Journal of Genomics, vol. 2013, Article ID 439681, 8 pages, 2013. View at Publisher · View at Google Scholar