Artificial Evolution Methods in the Biological and Biomedical Sciences
View this Special IssueResearch Article | Open Access
Edgar D. Arenas-Díaz, Helga Ochoterena, Katya Rodríguez-Vázquez, "Multiple Sequence Alignment Using a Genetic Algorithm and GLOCSA", Journal of Artificial Evolution and Applications, vol. 2009, Article ID 963150, 10 pages, 2009. https://doi.org/10.1155/2009/963150
Multiple Sequence Alignment Using a Genetic Algorithm and GLOCSA
Abstract
Algorithms that minimize putative synapomorphy in an alignment cannot be directly implemented since trivial cases with concatenated sequences would be selected because they would imply a minimum number of events to be explained (e.g., a single insertion/deletion would be required to explain divergence among two sequences). Therefore, indirect measures to approach parsimony need to be implemented. In this paper, we thoroughly present a Global Criterion for Sequence Alignment (GLOCSA) that uses a scoring function to globally rate multiple alignments aiming to produce matrices that minimize the number of putative synapomorphies. We also present a Genetic Algorithm that uses GLOCSA as the objective function to produce sequence alignments refining alignments previously generated by additional existing alignment tools (we recommend MUSCLE). We show that in the example cases our GLOCSA-guided Genetic Algorithm (GGGA) does improve the GLOCSA values, resulting in alignments that imply less putative synapomorphies.
References
- F. C. Bernstein, T. F. Koetzle, G. J. B. Williams et al., “The protein data bank: a computer based archival file for macromolecular structures,” Journal of Molecular Biology, vol. 112, no. 3, pp. 535–542, 1977. View at: Google Scholar
- D. A. Benson, I. Karsch-Mizrachi, D. J. Lipman, J. Ostell, and D. L. Wheeler, “Genbank,” Nucleic Acids Reseach, vol. 34, pp. D16–D20, 2006. View at: Google Scholar
- R. C. Edgar, “Muscle: multiple sequence alignment with high accurracy and high throughput,” Nucleic Acids Reseach, vol. 32, no. 5, pp. 1792–1797, 2004. View at: Google Scholar
- W. S. Klug, M. R. Cummings, and C. Spencer, Concepts of Genetics, Benjamin Cummings, Essex, UK, 2005.
- “Using genetic algorithms for pairwise and multiple sequence alignments,” in Evolutionary Computation in Bioinformatics, G. B. Fogel and D. W. Corne, Eds., chapter 5, Morgan Kaufman, San Francisco, Calif, USA, 2003. View at: Google Scholar
- B. Haubold and T. Wiehe, Introduction to Computational Biology: An Evolutionary Approach, Birkhäuser, Basel, Switzerland, 2007.
- S. B. Needleman and C. D. Wunsch, “A general method applicable to the search for similarities in the amino acid sequence of two proteins,” Journal of Molecular Biology, vol. 48, no. 3, pp. 443–453, 1970. View at: Google Scholar
- T. F. Smith and M. S. Waterman, “Comparison of biosequences,” Advances in Applied Mathematics, vol. 2, no. 4, pp. 482–489, 1981. View at: Google Scholar
- M. Ishikawa, T. Toya, and Y. Tokoti, “Parallel iterative aligner with genetic algorithm,” in Proceedings of the 13th International Conference on Artificial Ingelligence and Genome Workshop, pp. 84–93, 1993. View at: Google Scholar
- C. Notredame and D. G. Higgins, “SAGA: sequence alignment by genetic algorithm,” Nucleic Acids Research, vol. 24, no. 8, pp. 1515–1524, 1996. View at: Publisher Site | Google Scholar
- C. Notredame, E. A. O'Brien, and D. G. Higgins, “RAGA: RNA sequence alignment by genetic algorithm,” Nucleic Acids Research, vol. 25, no. 22, pp. 4570–4580, 1997. View at: Google Scholar
- K. Chellapilla and G. Fogel, “Multiple sequence alignment using evolutionary programming,” in Proceedings of the IEEE Congress on Evolutionary Computation, vol. 1, p. 452, Washington, DC, USA, July 1999. View at: Publisher Site | Google Scholar
- L. Cai, D. Juedes, and E. Liakhovitch, “Evolutionary computation techniques for multiple sequence alignment,” in Proceedings of the IEEE Conference on Evolutionary Computation (ICEC '00), vol. 2, pp. 829–835, 2000. View at: Google Scholar
- C. Sander and R. Schneider, “Database of homology-derived protein structures and the structural meaning of sequence alignment,” Proteins: Structure, Function and Genetics, vol. 9, no. 1, pp. 56–68, 1991. View at: Google Scholar
- J. I. Davis and J. J. Doyle, “Homology in molecular phylogenetics: a parsimony perspective,” in Molecular Systematics of Plants II, pp. 101–131, Kluwer Academic Publishers, Boston, Mass, USA, 1998. View at: Google Scholar
- H. Ochoterena, “Homology in coding and non-coding DNA sequences: a parsimony perspective,” Plant Systematics and Evolution. View at: Publisher Site | Google Scholar
- M. O. Dayhoff, Atlas of Protein Sequence and Structure, National Biomedical Research Fundation, Washington, DC, USA, 1978.
- D. J. Lipman, S. F. Altschul, and J. D. Kececioglu, “A tool for multiple sequence alignment,” Proceedings of the National Academy of Sciences of the United States of America, vol. 86, no. 12, pp. 4412–4415, 1989. View at: Google Scholar
- S. Henikoff and J. G. Henikoff, “Amino acid substitution matrices from protein blocks,” Proceedings of the National Academy of Sciences of the United States of America, vol. 89, no. 22, pp. 10915–10919, 1992. View at: Publisher Site | Google Scholar
- S. F. Altschul, “Gap costs for multiple sequence alignment,” Journal of Theoretical Biology, vol. 138, no. 3, pp. 297–309, 1989. View at: Google Scholar
- S. F. Altschul and D. J. Lipman, “Trees, stars, and multiple biological sequence alignment,” SIAM Journal on Applied Mathematics, vol. 49, no. 1, pp. 197–209, 1989. View at: Google Scholar
- J. D. Thompson, F. Plewniak, and O. Poch, “BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs,” Bioinformatics, vol. 15, no. 1, pp. 87–88, 1999. View at: Publisher Site | Google Scholar
- A. Bahr, J. D. Thompson, J.-C. Thierry, and O. Poch, “BAliBASE (Benchmark Alignment dataBASE): enhancements for repeats, transmembrane sequences and circular permutations,” Nucleic Acids Research, vol. 29, no. 1, pp. 323–326, 2001. View at: Google Scholar
Copyright
Copyright © 2009 Edgar D. Arenas-Díaz et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.