International Journal of Genomics

International Journal of Genomics / 2007 / Article

Research Article | Open Access

Volume 2007 |Article ID 035604 | https://doi.org/10.1155/2007/35604

B. Jayashree, Manindra S. Hanspal, Rajgopal Srinivasan, R. Vigneshwaran, Rajeev K. Varshney, N. Spurthi, K. Eshwar, N. Ramesh, S. Chandra, David A. Hoisington, "An Integrated Pipeline of Open Source Software Adapted for Multi-CPU Architectures: Use in the Large-Scale Identification of Single Nucleotide Polymorphisms", International Journal of Genomics, vol. 2007, Article ID 035604, 7 pages, 2007. https://doi.org/10.1155/2007/35604

An Integrated Pipeline of Open Source Software Adapted for Multi-CPU Architectures: Use in the Large-Scale Identification of Single Nucleotide Polymorphisms

Academic Editor: Peter Little
Received02 May 2007
Revised20 Jul 2007
Accepted19 Oct 2007
Published02 Dec 2007

Abstract

The large amounts of EST sequence data available from a single species of an organism as well as for several species within a genus provide an easy source of identification of intra- and interspecies single nucleotide polymorphisms (SNPs). In the case of model organisms, the data available are numerous, given the degree of redundancy in the deposited EST data. There are several available bioinformatics tools that can be used to mine this data; however, using them requires a certain level of expertise: the tools have to be used sequentially with accompanying format conversion and steps like clustering and assembly of sequences become time-intensive jobs even for moderately sized datasets. We report here a pipeline of open source software extended to run on multiple CPU architectures that can be used to mine large EST datasets for SNPs and identify restriction sites for assaying the SNPs so that cost-effective CAPS assays can be developed for SNP genotyping in genetics and breeding applications. At the International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), the pipeline has been implemented to run on a Paracel high-performance system consisting of four dual AMD Opteron processors running Linux with MPICH. The pipeline can be accessed through user-friendly web interfaces at http://hpc.icrisat.cgiar.org/PBSWeb and is available on request for academic use. We have validated the developed pipeline by mining chickpea ESTs for interspecies SNPs, development of CAPS assays for SNP genotyping, and confirmation of restriction digestion pattern at the sequence level.

References

  1. D. G. Wang, J.-B. Fan, C.-J. Siao et al., “Large-scale identification, mapping, and genotyping of single-nucleotide polymorphisms in the human genome,” Science, vol. 280, no. 5366, pp. 1077–1082, 1998. View at: Publisher Site | Google Scholar
  2. L. Picoult-Newberg, T. E. Ideker, M. G. Pohl et al., “Mining SNPs from EST databases,” Genome Research, vol. 9, no. 2, pp. 167–174, 1999. View at: Google Scholar
  3. G. T. Marth, I. Korf, M. D. Yandell et al., “A general approach to single-nucleotide polymorphism discovery,” Nature Genetics, vol. 23, no. 4, pp. 452–456, 1999. View at: Publisher Site | Google Scholar
  4. Z. Ning, A. J. Cox, and J. C. Mullikin, “SSAHA: a fast search method for large DNA databases,” Genome Research, vol. 11, no. 10, pp. 1725–1729, 2001. View at: Publisher Site | Google Scholar
  5. J. A. Aerts, B. J. Jungerius, and M. A. M. Groenen, “POSA: perl objects for DNA sequencing data analysis,” BMC Genomics, vol. 5, no. 1, p. 60, 2004. View at: Publisher Site | Google Scholar
  6. F. J. Useche, G. Gao, M. Harafey, and A. Rafalski, “High-throughput identification, database storage and analysis of SNPs in EST sequences,” Genome Informatics, vol. 12, pp. 194–203, 2001. View at: Google Scholar
  7. L. K. Matukumalli, J. J. Grefenstette, D. L. Hyten, I.-Y. Choi, P. B. Cregan, and C. P. Van Tassell, “SNP-PHAGE—high throughput SNP discovery pipeline,” BMC Bioinformatics, vol. 7, p. 468, 2006. View at: Publisher Site | Google Scholar
  8. S. Weckx, J. Del-Favero, R. Rademakers et al., “novoSNP, a novel computational tool for sequence variation discovery,” Genome Research, vol. 15, no. 3, pp. 436–442, 2005. View at: Publisher Site | Google Scholar
  9. B. Chevreux, T. Pfisterer, B. Drescher et al., “Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs,” Genome Research, vol. 14, no. 6, pp. 1147–1159, 2004. View at: Google Scholar
  10. G. Barker, J. Batley, H. O'Sullivan, K. J. Edwards, and D. Edwards, “Redundancy based detection of sequence polymorphisms in expressed sequence tag data using autoSNP,” Bioinformatics, vol. 19, no. 3, pp. 421–422, 2003. View at: Google Scholar
  11. R. Kota, S. Rudd, A. Facius et al., “Snipping polymorphisms from large EST collections in barley (Hordeum vulgare L.),” Molecular Genetics and Genomics, vol. 270, no. 1, pp. 24–33, 2003. View at: Publisher Site | Google Scholar
  12. A. Kalyanaraman, S. Aluru, V. Brendel, and S. Kothari, “Space and time efficient parallel algorithms and software for EST clustering,” IEEE Transactions on Parallel and Distributed Systems, vol. 14, no. 12, pp. 1209–1221, 2003. View at: Publisher Site | Google Scholar
  13. X. Huang, J. Wang, S. Aluru, S.-P. Yang, and L. Hillier, “PCAP: a whole-genome assembly program,” Genome Research, vol. 13, no. 9, pp. 2164–2170, 2003. View at: Google Scholar
  14. Z. Zhang, S. Schwartz, L. Wagner, and W. Miller, “A greedy algorithm for aligning DNA sequences,” Journal of Computational Biology, vol. 7, no. 1-2, pp. 203–214, 2000. View at: Publisher Site | Google Scholar
  15. T. Thiel, R. Kota, I. Grosse, N. Stein, and A. Graner, “SNP2CAPS: a SNP and INDEL analysis tool for CAPS marker development,” Nucleic Acids Research, vol. 32, no. 1, p. e5, 2004. View at: Google Scholar
  16. G. Pertea, X. Huang, F. Liang et al., “TIGR gene indices clustering tools (TGICL): a software system for fast clustering of large EST datasets,” Bioinformatics, vol. 19, no. 5, pp. 651–652, 2003. View at: Google Scholar
  17. E. S. Mace, H. K. Buhariwalla, and J. H. Crouch, “A high-throughput DNA extraction protocol for tropical molecular breeding programs,” Plant Molecular Biology Reporter, vol. 21, no. 4, pp. 459a–459h, 2003. View at: Google Scholar
  18. E. Bartocci, F. Corradini, E. Merelli, and L. Scortichini, “BioWMS: a web-based workflow management system for bioinformatics,” BMC Bioinformatics, vol. 8, 1, p. S2, 2007. View at: Google Scholar
  19. Q. Lu, P. Hao, V. Curcin et al., “KDE bioscience: platform for bioinformatics analysis workflows ,” Journal of Biomedical Informatics, vol. 39, no. 4, pp. 440–450, 2006. View at: Publisher Site | Google Scholar
  20. T. Oinn, M. Addis, J. Ferris et al., “Taverna: a tool for the composition and enactment of bioinformatics workflows,” Bioinformatics, vol. 20, no. 17, pp. 3045–3054, 2004. View at: Publisher Site | Google Scholar
  21. S. Hoon, K. K. Ratnapu, J.-M. Chia et al., “Biopipe: a flexible framework for protocol-based bioinformatics analysis,” Genome Research, vol. 13, no. 8, pp. 1904–1915, 2003. View at: Google Scholar
  22. B. Schmidt, L. Feng, A. Laud, and Y. Santoso, “Development of distributed bioinformatics applications with GMP,” Concurrency and Computation: Practice & Experience, vol. 16, no. 9, pp. 945–959, 2004. View at: Google Scholar
  23. P. Romano, E. Bartocci, G. Bertolini et al., “Biowep: a workflow enactment portal for bioinformatics applications,” BMC Bioinformatics, vol. 8, 1, p. S19, 2007. View at: Google Scholar

Copyright © 2007 B. Jayashree et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


More related articles

 PDF Download Citation Citation
 Order printed copiesOrder
Views211
Downloads743
Citations

Related articles

Article of the Year Award: Outstanding research contributions of 2020, as selected by our Chief Editors. Read the winning articles.