Artificial Evolution Methods in the Biological and Biomedical SciencesView this Special Issue
Research Article | Open Access
Multiple Sequence Alignment Using a Genetic Algorithm and GLOCSA
Algorithms that minimize putative synapomorphy in an alignment cannot be directly implemented since trivial cases with concatenated sequences would be selected because they would imply a minimum number of events to be explained (e.g., a single insertion/deletion would be required to explain divergence among two sequences). Therefore, indirect measures to approach parsimony need to be implemented. In this paper, we thoroughly present a Global Criterion for Sequence Alignment (GLOCSA) that uses a scoring function to globally rate multiple alignments aiming to produce matrices that minimize the number of putative synapomorphies. We also present a Genetic Algorithm that uses GLOCSA as the objective function to produce sequence alignments refining alignments previously generated by additional existing alignment tools (we recommend MUSCLE). We show that in the example cases our GLOCSA-guided Genetic Algorithm (GGGA) does improve the GLOCSA values, resulting in alignments that imply less putative synapomorphies.
- F. C. Bernstein, T. F. Koetzle, G. J. B. Williams et al., “The protein data bank: a computer based archival file for macromolecular structures,” Journal of Molecular Biology, vol. 112, no. 3, pp. 535–542, 1977.
- D. A. Benson, I. Karsch-Mizrachi, D. J. Lipman, J. Ostell, and D. L. Wheeler, “Genbank,” Nucleic Acids Reseach, vol. 34, pp. D16–D20, 2006.
- R. C. Edgar, “Muscle: multiple sequence alignment with high accurracy and high throughput,” Nucleic Acids Reseach, vol. 32, no. 5, pp. 1792–1797, 2004.
- W. S. Klug, M. R. Cummings, and C. Spencer, Concepts of Genetics, Benjamin Cummings, Essex, UK, 2005.
- “Using genetic algorithms for pairwise and multiple sequence alignments,” in Evolutionary Computation in Bioinformatics, G. B. Fogel and D. W. Corne, Eds., chapter 5, Morgan Kaufman, San Francisco, Calif, USA, 2003.
- B. Haubold and T. Wiehe, Introduction to Computational Biology: An Evolutionary Approach, Birkhäuser, Basel, Switzerland, 2007.
- S. B. Needleman and C. D. Wunsch, “A general method applicable to the search for similarities in the amino acid sequence of two proteins,” Journal of Molecular Biology, vol. 48, no. 3, pp. 443–453, 1970.
- T. F. Smith and M. S. Waterman, “Comparison of biosequences,” Advances in Applied Mathematics, vol. 2, no. 4, pp. 482–489, 1981.
- M. Ishikawa, T. Toya, and Y. Tokoti, “Parallel iterative aligner with genetic algorithm,” in Proceedings of the 13th International Conference on Artificial Ingelligence and Genome Workshop, pp. 84–93, 1993.
- C. Notredame and D. G. Higgins, “SAGA: sequence alignment by genetic algorithm,” Nucleic Acids Research, vol. 24, no. 8, pp. 1515–1524, 1996.
- C. Notredame, E. A. O'Brien, and D. G. Higgins, “RAGA: RNA sequence alignment by genetic algorithm,” Nucleic Acids Research, vol. 25, no. 22, pp. 4570–4580, 1997.
- K. Chellapilla and G. Fogel, “Multiple sequence alignment using evolutionary programming,” in Proceedings of the IEEE Congress on Evolutionary Computation, vol. 1, p. 452, Washington, DC, USA, July 1999.
- L. Cai, D. Juedes, and E. Liakhovitch, “Evolutionary computation techniques for multiple sequence alignment,” in Proceedings of the IEEE Conference on Evolutionary Computation (ICEC '00), vol. 2, pp. 829–835, 2000.
- C. Sander and R. Schneider, “Database of homology-derived protein structures and the structural meaning of sequence alignment,” Proteins: Structure, Function and Genetics, vol. 9, no. 1, pp. 56–68, 1991.
- J. I. Davis and J. J. Doyle, “Homology in molecular phylogenetics: a parsimony perspective,” in Molecular Systematics of Plants II, pp. 101–131, Kluwer Academic Publishers, Boston, Mass, USA, 1998.
- H. Ochoterena, “Homology in coding and non-coding DNA sequences: a parsimony perspective,” Plant Systematics and Evolution.
- M. O. Dayhoff, Atlas of Protein Sequence and Structure, National Biomedical Research Fundation, Washington, DC, USA, 1978.
- D. J. Lipman, S. F. Altschul, and J. D. Kececioglu, “A tool for multiple sequence alignment,” Proceedings of the National Academy of Sciences of the United States of America, vol. 86, no. 12, pp. 4412–4415, 1989.
- S. Henikoff and J. G. Henikoff, “Amino acid substitution matrices from protein blocks,” Proceedings of the National Academy of Sciences of the United States of America, vol. 89, no. 22, pp. 10915–10919, 1992.
- S. F. Altschul, “Gap costs for multiple sequence alignment,” Journal of Theoretical Biology, vol. 138, no. 3, pp. 297–309, 1989.
- S. F. Altschul and D. J. Lipman, “Trees, stars, and multiple biological sequence alignment,” SIAM Journal on Applied Mathematics, vol. 49, no. 1, pp. 197–209, 1989.
- J. D. Thompson, F. Plewniak, and O. Poch, “BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs,” Bioinformatics, vol. 15, no. 1, pp. 87–88, 1999.
- A. Bahr, J. D. Thompson, J.-C. Thierry, and O. Poch, “BAliBASE (Benchmark Alignment dataBASE): enhancements for repeats, transmembrane sequences and circular permutations,” Nucleic Acids Research, vol. 29, no. 1, pp. 323–326, 2001.
Copyright © 2009 Edgar D. Arenas-Díaz et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.