Table of Contents Author Guidelines Submit a Manuscript
BioMed Research International
Volume 2014, Article ID 134023, 13 pages
http://dx.doi.org/10.1155/2014/134023
Review Article

Managing, Analysing, and Integrating Big Data in Medical Bioinformatics: Open Problems and Future Perspectives

1Bioinformatics Research Unit, Institute for Biomedical Technologies, National Research Council of Italy, Segrate, 20090 Milan, Italy
2Bioinformatics and High Performance Computing Research Group (BIO-HPC), Computer Science Department, Universidad Católica San Antonio de Murcia (UCAM), 30107 Murcia, Spain
3Department of Computer Science and Engineering, Center for Research Computing, University of Notre Dame, P.O. Box 539, Notre Dame, IN 46556, USA
4Advanced Computing Systems and High Performance Computing Group, Institute of Applied Mathematics and Information Technologies, National Research Council of Italy, 16149 Genoa, Italy

Received 18 June 2014; Accepted 13 August 2014; Published 1 September 2014

Academic Editor: Carlo Cattani

Copyright © 2014 Ivan Merelli et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. Y. Genovese and S. Prentice, “Pattern-based strategy: getting value from big data,” Gartner Special Report G00214032, 2011. View at Google Scholar
  2. The Big Data Research and Development Initiative, http://www.whitehouse.gov/sites/default/files/microsites/ostp/big_data_press_release_final_2.pdf.
  3. R. M. Durbin, G. R. Abecasis, R. M. Altshuler et al., “A map of human genome variation from population-scale sequencing,” Nature, vol. 467, pp. 1061–1073, 2010. View at Publisher · View at Google Scholar
  4. B. J. Raney, M. S. Cline, K. R. Rosenbloom et al., “ENCODE whole-genome data in the UCSC genome browser (2011 update),” Nucleic Acids Research, vol. 39, no. 1, pp. D871–D875, 2011. View at Publisher · View at Google Scholar · View at Scopus
  5. International Human Genome Sequencing Consortium, “Initial sequencing and analysis of the human genome,” Nature, vol. 409, pp. 860–921, 2001. View at Publisher · View at Google Scholar
  6. The Genome 10K Project, https://genome10k.soe.ucsc.edu/.
  7. The 100,000 Genomes Project, http://www.genomicsengland.co.uk/.
  8. E. C. Hayden, “The $1,000 genome,” Nature, vol. 507, no. 7492, pp. 294–295, 2014. View at Publisher · View at Google Scholar
  9. A. Sboner, X. J. Mu, D. Greenbaum, R. K. Auerbach, and M. B. Gerstein, “The real cost of sequencing: higher than you think!,” Genome Biology, vol. 12, no. 8, article 125, 2011. View at Publisher · View at Google Scholar · View at Scopus
  10. E. Pinheiro, W.-D. Weber, and L. A. Barroso, “Failure trends in a large disk drive population,” in Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST '07), pp. 17–29, 2007.
  11. IBM, Big data and analytics, Tools and technologies for architects and developers, http://www.ibm.com/developerworks/library/bd-archpatterns1/index.html?ca=drs.
  12. Oracle and Big Data, http://www.oracle.com/us/technologies/big-data/index.html.
  13. Microsoft, Reimagining your business with Big Data and Analytics, http://www.microsoft.com/enterprise/it-trends/big-data/default.aspx?Search=true#fbid=SPTCZLfS2h_.
  14. J. Torres, Big Data Challenges in Bioinformatics, 2014, http://www.jorditorres.org/wp-content/uploads/2014/02/XerradaBIB.pdf.
  15. “The GÉANT pan-European research and education network,” http://www.geant.net/Pages/default.aspx.
  16. The Internet2 Network, http://www.internet2.edu/research-solutions/case-studies/xsedenet-advanced-network-advancing-science/.
  17. Renee Boucher Ferguson, “Big Data: Service From the Cloud,” http://sloanreview.mit.edu/article/big-data-service-from-the-cloud/.
  18. T. D. Thanh, S. Mohan, E. Choi, K. SangBum, and P. Kim, “A taxonomy and survey on distributed file systems,” in Proceedings of the 4th International Conference on Networked Computing and Advanced Information Management (NCM '08), vol. 1, pp. 144–149, Gyeongju, Republic of Korea, September 2008. View at Publisher · View at Google Scholar · View at Scopus
  19. The IBM General Parallel File System, http://www-03.ibm.com/software/products/en/software.
  20. The OpenSFS and Lustre Community Portal, http://lustre.opensfs.org/.
  21. “The Top500 List,” 2014, http://www.top500.org/.
  22. IBM Elastic Storage, http://www-03.ibm.com/systems/platformcomputing/products/gpfs/.
  23. The Apache Hadoop Project, 2014, http://hadoop.apache.org/.
  24. S. Ghemawat, H. Gobioff, and S. Leung, “The google file system,” in Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP '03), pp. 29–43, October 2003. View at Publisher · View at Google Scholar · View at Scopus
  25. The Message Passing Interface, 2014, http://www.mpi-forum.org/.
  26. S. J. Plimpton and K. D. Devine, “MapReduce in MPI for large-scale graph algorithms,” Parallel Computing, vol. 37, no. 9, pp. 610–632, 2011. View at Publisher · View at Google Scholar · View at Scopus
  27. X. Lu, F. Liang, B. Wang, L. Zha, and Z. Xu, “DataMPI: extending MPI to hadoop-like big data computing,” in Proceedings of the 28th IEEE International Parallel and Distributed Processing Symposium (IPDPS '14), 2014.
  28. G. Ostrouchov, D. Schmidt, W.-C. Chen, and P. Patel, “Combining R with scalable libraries to get the best of both for big data,” in International Association for Statistical Computing Satellite Conference for the 59th ISI World Statistics Congress, pp. 85–90, 2013.
  29. The Hadoop Ecosystem Table, 2014, http://hadoopecosystemtable.github.io/.
  30. List of institutions that are using Hadoop for educational or production uses, 2014, http://wiki.apache.org/hadoop/PoweredBy.
  31. The Genome Analysis Toolkit, 2014, http://www.broadinstitute.org/gatk/.
  32. A. McKenna, M. Hanna, E. Banks et al., “The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data,” Genome Research, vol. 20, no. 9, pp. 1297–1303, 2010. View at Publisher · View at Google Scholar · View at Scopus
  33. H. Nordberg, K. Bhatia, K. Wang, and Z. Wang, “BioPig: a Hadoop-based analytic toolkit for large-scale sequence data,” Bioinformatics, vol. 29, no. 23, pp. 3014–3019, 2013. View at Publisher · View at Google Scholar
  34. Q. Zou, X. B. Li, W. R. Jiang, Z. Y. Lin, G. L. Li, and K. Chen, “Survey of MapReduce frame operation in bioinformatics,” Briefings in Bioinformatics, vol. 15, no. 4, pp. 637–647, 2014. View at Publisher · View at Google Scholar
  35. R. C. Taylor, “An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics,” BMC Bioinformatics, vol. 11, no. 12, article S1, 2010. View at Publisher · View at Google Scholar · View at Scopus
  36. “Hadoop Bioinformatics Applications on Quora,” http://hadoopbioinfoapps.quora.com/.
  37. M. Niemenmaa, A. Kallio, A. Schumacher, P. Klemelä, E. Korpelainen, and K. Heljanko, “Hadoop-BAM: directly manipulating next generation sequencing data in the cloud,” Bioinformatics, vol. 28, no. 6, pp. 876–877, 2012. View at Publisher · View at Google Scholar · View at Scopus
  38. L. Pireddu, S. Leo, and G. Zanetti, “Seal: a distributed short read mapping and duplicate removal tool,” Bioinformatics, vol. 27, no. 15, pp. 2159–2160, 2011. View at Publisher · View at Google Scholar · View at Scopus
  39. The Apache Hive data warehouse software, 2014, http://hive.apache.org/.
  40. M. Krzywinski, I. Birol, S. J. Jones, and M. A. Marra, “Hive plots-rational approach to visualizing networks,” Briefings in Bioinformatics, vol. 13, no. 5, pp. 627–644, 2012. View at Publisher · View at Google Scholar · View at Scopus
  41. “The Apache Pig platform,” http://pig.apache.org/.
  42. The Disco project, 2014, http://discoproject.org/.
  43. P. J. A. Cock, T. Antao, J. T. Chang et al., “Biopython: freely available Python tools for computational molecular biology and bioinformatics,” Bioinformatics, vol. 25, no. 11, pp. 1422–1423, 2009. View at Publisher · View at Google Scholar · View at Scopus
  44. “The Apache Storm system,” http://storm.incubator.apache.org/.
  45. I. Merelli, H. Pérez-Sánchez, S. Gesing, and D. D'Agostino, “Latest advances in distributed, parallel, and graphic processing unit accelerated approaches to computational biology,” Concurrency and Computation: Practice and Experience, vol. 26, no. 10, pp. 1699–1704, 2014. View at Publisher · View at Google Scholar
  46. D. D'Agostino, A. Clematis, A. Quarati et al., “Cloud infrastructures for in silico drug discovery: economic and practical aspects,” BioMed Research International, vol. 2013, Article ID 138012, 19 pages, 2013. View at Publisher · View at Google Scholar
  47. U. Rencuzogullari and S. Dwarkadas, “Dynamic adaptation to available resources for parallel computing in an autonomous network of workstations,” in Proceedings of the 8th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 72–81, June 2001. View at Publisher · View at Google Scholar · View at Scopus
  48. P. E. C. Compeau, P. A. Pevzner, and G. Tesler, “How to apply de Bruijn graphs to genome assembly,” Nature Biotechnology, vol. 29, no. 11, pp. 987–991, 2011. View at Publisher · View at Google Scholar · View at Scopus
  49. J. T. Simpson, K. Wong, S. D. Jackman, J. E. Schein, S. J. M. Jones, and I. Birol, “ABySS: a parallel assembler for short read sequence data,” Genome Research, vol. 19, no. 6, pp. 1117–1123, 2009. View at Publisher · View at Google Scholar · View at Scopus
  50. M. Garland, S. Le Grand, J. Nickolls et al., “Parallel computing experiences with CUDA,” IEEE Micro, vol. 28, no. 4, pp. 13–27, 2008. View at Publisher · View at Google Scholar · View at Scopus
  51. The Open Computing Language standard, 2014, https://www.khronos.org/opencl/.
  52. M. Garland and D. B. Kirk, “Understanding throughput-oriented architectures,” Communications of the ACM, vol. 53, no. 11, pp. 58–66, 2010. View at Publisher · View at Google Scholar · View at Scopus
  53. GPU applications for Bioinformatics and Life Sciences, http://www.nvidia.com/object/bio_info_life_sciences.html.
  54. “The Intel Xeon Phi coprocessor performance,” http://www.intel.com/content/www/us/en/processors/xeon/xeon-phi-detail.html.
  55. H. Li and R. Durbin, “Fast and accurate short read alignment with Burrows-Wheeler transform,” Bioinformatics, vol. 25, no. 14, pp. 1754–1760, 2009. View at Publisher · View at Google Scholar · View at Scopus
  56. R. D. Finn, J. Clements, and S. R. Eddy, “HMMER web server: interactive sequence similarity searching,” Nucleic Acids Research, vol. 39, no. 2, pp. W29–W37, 2011. View at Publisher · View at Google Scholar · View at Scopus
  57. S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, “Basic local alignment search tool,” Journal of Molecular Biology, vol. 215, no. 3, pp. 403–410, 1990. View at Publisher · View at Google Scholar · View at Scopus
  58. J. Fang, H. Sips, L. Zhang, C. Xu, and A. L. Varbanescu, “Test-driving intel Xeon Phi,” in Proceedings of the 5th ACM/SPEC International Conference on Performance Engineering (ICPE'14), pp. 137–148, 2014.
  59. E. Afgan, D. Baker, N. Coraor, B. Chapman, A. Nekrutenko, and J. Taylor, “Galaxy CloudMan: delivering cloud compute clusters,” BMC Bioinformatics, vol. 11, supplement 12, article S4, 2010. View at Publisher · View at Google Scholar · View at Scopus
  60. L. Dai, X. Gao, Y. Guo, J. Xiao, and Z. Zhang, “Bioinformatics clouds for big data manipulation,” Biology Direct, vol. 7, article 43, 2012. View at Publisher · View at Google Scholar · View at Scopus
  61. C. Cava, F. Gallivanone, C. Salvatore, P. della Rosa, and I. Castiglioni, “Bioinformatics clouds for high-throughput technologies,” in Handbook of Research on Cloud Infrastructures for Big Data Analytics, chapter 20, IGI Global, 2014. View at Publisher · View at Google Scholar
  62. N. Cannata, M. Schröder, R. Marangoni, and P. Romano, “A Semantic Web for bioinformatics: goals, tools, systems, applications,” BMC Bioinformatics, vol. 9, supplement 4, article S1, 2008. View at Publisher · View at Google Scholar · View at Scopus
  63. H. Wache, T. Vögele, U. Visser et al., “Ontology-based integration of information a survey of existing approaches,” in Proceedings of the Workshop on Ontologies and Information Sharing (IJCAI '01), pp. 108–117, 2001.
  64. “OWL Web Ontology Language Overview,” W3C Recommendation, 2004, http://www.w3.org/TR/owl-features/.
  65. E. Antezana, W. Blondé, M. Egaña et al., “BioGateway: a semantic systems biology tool for the life sciences,” BMC Bioinformatics, vol. 10, no. 10, article S11, 2009. View at Google Scholar · View at Scopus
  66. The GO File Format Guide, http://geneontology.org/book/documentation/file-format-guide.
  67. A. West, “ML, RDF, OWL and LSID: Ontology integration within evolving “omic” standards,” in Proceedings of the Invited Talk International Oracle Life Sciences and Healthcare User Group Meeting, Boston, Mass, USA, 2006.
  68. X. Wang, R. Gorlitsky, and J. S. Almeida, “From XML to RDF: how semantic web technologies will change the design of “omic” standards,” Nature Biotechnology, vol. 23, no. 9, pp. 1099–1103, 2005. View at Publisher · View at Google Scholar · View at Scopus
  69. S. Stephens, D. LaVigna, M. DiLascio, and J. Luciano, “Aggregation of bioinformatics data using Semantic Web technology,” Journal of Web Semantics, vol. 4, no. 3, pp. 216–221, 2006. View at Publisher · View at Google Scholar · View at Scopus
  70. N. Sioutos, S. D. Coronado, M. W. Haber, F. W. Hartel, W. Shaiu, and L. W. Wright, “NCI Thesaurus: a semantic model integrating cancer-related clinical and molecular information,” Journal of Biomedical Informatics, vol. 40, no. 1, pp. 30–43, 2007. View at Publisher · View at Google Scholar · View at Scopus
  71. J. Golbeck, G. Fragoso, F. Hartel, J. Hendler, J. Oberthaler, and B. Parsia, “The National Cancer Institute's thésaurus and ontology,” Web Semantics, vol. 1, no. 1, pp. 75–80, 2003. View at Publisher · View at Google Scholar · View at Scopus
  72. A. Kozlenkov and M. Schroeder, “PROVA: rule-based java-scripting for a bioinformatics semantic web,” in Data Integration in the Life Sciences, vol. 2994 of Lecture Notes in Computer Science, pp. 17–30, Springer, Berlin, Germany, 2004. View at Google Scholar
  73. R. D. Stevens, H. J. Tipney, C. J. Wroe et al., “Exploring Williams-Beuren syndrome using myGrid,” Bioinformatics, vol. 20, supplement 1, pp. i303–i310, 2004. View at Publisher · View at Google Scholar · View at Scopus
  74. J. A. Blake and C. J. Bult, “Beyond the data deluge: data integration and bio-ontologies,” Journal of Biomedical Informatics, vol. 39, no. 3, pp. 314–320, 2006. View at Publisher · View at Google Scholar · View at Scopus
  75. F. Viti, I. Merelli, A. Calabria et al., “Ontology-based resources for bioinformatics analysis,” International Journal of Metadata, Semantics and Ontologies, vol. 6, no. 1, pp. 35–45, 2011. View at Publisher · View at Google Scholar · View at Scopus
  76. J. D. Osborne, J. Flatow, M. Holko et al., “Annotating the human genome with disease ontology,” BMC Genomics, vol. 10, supplement 1, article S6, 2009. View at Publisher · View at Google Scholar · View at Scopus
  77. I. Merelli, A. Calabria, P. Cozzi, F. Viti, E. Mosca, and L. Milanesi, “SNPranker 2.0: a gene-centric data mining tool for diseases associated SNP prioritization in GWAS,” BMC Bioinformatics, vol. 14, supplement 1, article S9, 2013. View at Publisher · View at Google Scholar · View at Scopus
  78. E. Mosca, R. Alfieri, I. Merelli, F. Viti, A. Calabria, and L. Milanesi, “A multilevel data integration resource for breast cancer study,” BMC Systems Biology, vol. 4, article 76, 2010. View at Publisher · View at Google Scholar · View at Scopus
  79. E. Mosca, R. Alfieri, F. Viti, I. Merelli, and L. Milanesi, “Nervous system database (NSD): data integration spanning molecular and system levels,” in Proceedings of the Frontiers in Neuroinformatics Conference Abstract: Neuroinformatics, 2009. View at Publisher · View at Google Scholar
  80. F. Viti, I. Merelli, A. Caprera, B. Lazzari, A. Stella, and L. Milanesi, “Ontology-based, tissue microarray oriented, image centered tissue bank,” BMC Bioinformatics, vol. 9, supplement 4, article S4, 2008. View at Publisher · View at Google Scholar · View at Scopus
  81. I. Merelli, P. Cozzi, D. D'Agostino, A. Clematis, and L. Milanesi, “Image-based surface matching algorithm oriented to structural biology,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 8, no. 4, pp. 1004–1016, 2011. View at Publisher · View at Google Scholar · View at Scopus
  82. R. Alfieri, I. Merelli, E. Mosca, and L. Milanesi, “The cell cycle DB: a systems biology approach to cell cycle analysis,” Nucleic Acids Research, vol. 36, no. 1, pp. D641–D645, 2008. View at Publisher · View at Google Scholar · View at Scopus
  83. R. G. Côté, P. Jones, R. Apweiler, and H. Hermjakob, “The ontology lookup service, a lightweight crossplatform tool for controlled vocabulary queries,” BMC Bioinformatics, vol. 7, article 97, 2006. View at Publisher · View at Google Scholar · View at Scopus
  84. The Gene Ontology Consortium, “The gene ontology’s reference genome project: a unified framework for functional annotation across species,” PLoS Computational Biology, vol. 5, no. 7, Article ID e1000431, 2009. View at Publisher · View at Google Scholar
  85. The KEGG Orthology Database, http://www.genome.jp/kegg/ko.html.
  86. A. Chang, M. Scheer, A. Grote, I. Schomburg, and D. Schomburg, “BRENDA, AMENDA and FRENDA the enzyme information system: New content and tools in 2009,” Nucleic Acids Research, vol. 37, no. 1, pp. D588–D592, 2009. View at Publisher · View at Google Scholar · View at Scopus
  87. J. Bard, S. Y. Rhee, and M. Ashburner, “An ontology for cell types.,” Genome biology, vol. 6, no. 2, p. R21, 2005. View at Publisher · View at Google Scholar · View at Scopus
  88. D. A. Natale, C. N. Arighi, W. C. Barker et al., “Framework for a protein ontology,” BMC Bioinformatics, vol. 8, supplement 9, no. 9, p. S1, 2007. View at Publisher · View at Google Scholar · View at Scopus
  89. S. J. Nelson, M. Schopen, A. G. Savage, J. Schulman, and N. Arluk, “The MeSH translation maintenance system: structure, interface design, and implementation,” Studies in Health Technology and Informatics, vol. 107, part 1, pp. 67–69, 2004. View at Google Scholar · View at Scopus
  90. C. A. Orengo, A. D. Michie, S. Jones, D. T. Jones, M. B. Swindells, and J. M. Thornton, “CATH—a hierarchic classification of protein domain structures,” Structure, vol. 5, no. 8, pp. 1093–1108, 1997. View at Publisher · View at Google Scholar · View at Scopus
  91. The Bio2RDF Project, 2014, http://bio2rdf.org/.
  92. F. Belleau, M. Nolin, N. Tourigny, P. Rigault, and J. Morissette, “Bio2RDF: towards a mashup to build bioinformatics knowledge systems,” Journal of Biomedical Informatics, vol. 41, no. 5, pp. 706–716, 2008. View at Publisher · View at Google Scholar · View at Scopus
  93. S. Jupp, J. Malone, J. Bolleman et al., “The EBI RDF platform: linked open data for the life sciences,” Bioinformatics, vol. 30, no. 9, pp. 1338–1339, 2014. View at Publisher · View at Google Scholar
  94. The Open PHACTS platform, http://www.openphacts.org.
  95. The AtlasRDF-R Package, 2014, https://github.com/jamesmalone/AtlasRDF-R.
  96. L. Torterolo, I. Porro, M. Fato, M. Melato, A. Calanducci, and R. Barbera, “Building science gateways with enginframe: a life science example,” in Proceedings of the International Workshop on Portals for Life Sciences (IWPLS '09), 2009. View at Publisher · View at Google Scholar
  97. Liferay, 2014, https://http://www.liferay.com/.
  98. J. Goecks, A. Nekrutenko, J. Taylor et al., “Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences,” Genome Biology, vol. 11, article R86, no. 8, 2010. View at Publisher · View at Google Scholar · View at Scopus
  99. D. Blankenberg, G. V. Kuster, N. Coraor et al., “Galaxy: a web-based genome analysis tool for experimentalists,” Current Protocols in Molecular Biology, 2010. View at Publisher · View at Google Scholar · View at Scopus
  100. B. Giardine, C. Riemer, R. C. Hardison et al., “Galaxy: a platform for interactive large-scale genome analysis,” Genome Research, vol. 15, no. 10, pp. 1451–1455, 2005. View at Publisher · View at Google Scholar · View at Scopus
  101. Drupal, https://www.drupal.org/.
  102. Joomla, 2014, http://www.joomla.org/.
  103. Django, 2014, https://http://www.djangoproject.com/.
  104. K. Megy, S. J. Emrich, D. Lawson et al., “VectorBase: improvements to a bioinformatics resource for invertebrate vector genomics,” Nucleic Acids Research, vol. 40, no. 1, pp. D729–D734, 2012. View at Publisher · View at Google Scholar · View at Scopus
  105. R. Dooley, M. Vaughn, D. Stanzione, S. Terry, and E. Skidmore, “Software-as-a-service: the iPlant foundation API,” in Proceedings of the 5th IEEE Workshop on Many-Task Computing on Grids and Supercomputers (MTAGS '12), 2012.
  106. R. Ananthakrishnan, K. Chard, I. Foster, and S. Tuecke, “Globus platform-as-a-service for collaborative science applications,” Concurrency and Computation: Practice and Experience, 2014. View at Publisher · View at Google Scholar
  107. W. Allcock, “GridFTP: Protocol Extensions to FTP for the Grid,” Global Grid Forum GFD-R-P.020 Proposed Recommendation, 2003.
  108. Dropbox, 2014, https://http://www.dropbox.com.
  109. K. G. Helmer, J. L. Ambite, J. Ames et al., “Enabling collaborative research using the Biomedical Informatics Research Network (BIRN),” Journal of the American Medical Informatics Association, vol. 18, pp. 416–422, 2011. View at Publisher · View at Google Scholar
  110. A. Hajnal, Z. Farkas, and P. Kacsuk, “Data avenue: remote storage resource management in WS-PGRADE/ gUSE,” in Proceedings of the 6th International Workshop on Science Gateways (IWSG 2014), June 2014.
  111. P. Kacsuk, Z. Farkas, M. Kozlovszky et al., “WS-PGRADE/ gUSE generic DCI gateway framework for a large variety of user communities,” Journal of Grid Computing, vol. 10, no. 4, pp. 601–630, 2012. View at Publisher · View at Google Scholar · View at Scopus
  112. A. Shoshani, A. Sim, and J. Gu, “Storage resource managers: essential components for the Grid,” in Grid Resource Management, Kluwer Academic Publishers, 2003. View at Google Scholar
  113. Amazon Simple Storage Service, http://aws.amazon.com/s3.
  114. “Integrated Rule-Oriented Data System,” 2014, https://www.irods.org.
  115. J. Krüger, R. Grunzke, S. Gesing et al., “The moSGrid science gateway—a complete solution for molecular simulations,” Journal of Chemical Theory and Computation, vol. 10, no. 6, pp. 2232–2245, 2014. View at Publisher · View at Google Scholar
  116. Apache Lucene project, 2014, http://lucene.apache.org/.
  117. F. Hupfeld, T. Cortes, B. Kolbeck et al., “The XtreemFS architecture: a case for object-based file systems in Grids,” Concurrency Computation Practice and Experience, vol. 20, no. 17, pp. 2049–2060, 2008. View at Publisher · View at Google Scholar · View at Scopus
  118. S. Razick, R. Močnik, L. F. Thomas, E. Ryeng, F. Drabløs, and P. Sætrom, “The eGenVar data management system-cataloguing and sharing sensitive data and metadata for the life sciences,” Database, 2014. View at Publisher · View at Google Scholar
  119. I. Foster, C. Kesselman, G. Tsudik, and S. Tuecke, “A security infrastructure for computational grids,” in Proceedings of the 5th ACM Conference on Computer and Communications Security, pp. 83–92, 1998.
  120. T. Hey, S. Tansley, and K. Tolle, The Fourth Paradigm: Data-Intensive Scientific Discovery, Microsoft Research, Redmond, Va, USA, 2009.
  121. Era7 Bioinformatics, 2014, http://era7bioinformatics.com.
  122. DNAnexus, 2014, https://dnanexus.com/.
  123. Seven Bridge Genomics, https://http://www.sbgenomics.com.
  124. EagleGenomics, 2014, http://www.eaglegenomics.com.
  125. MaverixBio, 2014, http://www.maverixbio.com.
  126. Illumina, 2014, http://www.illumina.com.
  127. BGI, http://www.genomics.cn/en/index.
  128. E. H. Downs, “UF hires bioinformatics expert,” 2014, https://m.ufhealth.org/news/2014/uf-hires-bioinformatics-expert.
  129. McKinsey & Company, “How big data can revolutionize pharmaceutical R&D,” http://www.mckinsey.com/insights/health_systems_and_services/how_big_data_can_revolutionize_pharmaceutical_r_and_d.
  130. Medill Reports, 2014, http://news.medill.northwestern.edu/chicago/news.aspx?id=228875.
  131. PatientsLikeMe, http://www.patientslikeme.com/.