Artificial Evolution Methods in the Biological and Biomedical SciencesView this Special Issue
Multiple Sequence Alignment Using a Genetic Algorithm and GLOCSA
Algorithms that minimize putative synapomorphy in an alignment cannot be directly implemented since trivial cases with concatenated sequences would be selected because they would imply a minimum number of events to be explained (e.g., a single insertion/deletion would be required to explain divergence among two sequences). Therefore, indirect measures to approach parsimony need to be implemented. In this paper, we thoroughly present a Global Criterion for Sequence Alignment (GLOCSA) that uses a scoring function to globally rate multiple alignments aiming to produce matrices that minimize the number of putative synapomorphies. We also present a Genetic Algorithm that uses GLOCSA as the objective function to produce sequence alignments refining alignments previously generated by additional existing alignment tools (we recommend MUSCLE). We show that in the example cases our GLOCSA-guided Genetic Algorithm (GGGA) does improve the GLOCSA values, resulting in alignments that imply less putative synapomorphies.
F. C. Bernstein, T. F. Koetzle, G. J. B. Williams et al., “The protein data bank: a computer based archival file for macromolecular structures,” Journal of Molecular Biology, vol. 112, no. 3, pp. 535–542, 1977.View at: Google Scholar
D. A. Benson, I. Karsch-Mizrachi, D. J. Lipman, J. Ostell, and D. L. Wheeler, “Genbank,” Nucleic Acids Reseach, vol. 34, pp. D16–D20, 2006.View at: Google Scholar
R. C. Edgar, “Muscle: multiple sequence alignment with high accurracy and high throughput,” Nucleic Acids Reseach, vol. 32, no. 5, pp. 1792–1797, 2004.View at: Google Scholar
W. S. Klug, M. R. Cummings, and C. Spencer, Concepts of Genetics, Benjamin Cummings, Essex, UK, 2005.
“Using genetic algorithms for pairwise and multiple sequence alignments,” in Evolutionary Computation in Bioinformatics, G. B. Fogel and D. W. Corne, Eds., chapter 5, Morgan Kaufman, San Francisco, Calif, USA, 2003.View at: Google Scholar
B. Haubold and T. Wiehe, Introduction to Computational Biology: An Evolutionary Approach, Birkhäuser, Basel, Switzerland, 2007.
S. B. Needleman and C. D. Wunsch, “A general method applicable to the search for similarities in the amino acid sequence of two proteins,” Journal of Molecular Biology, vol. 48, no. 3, pp. 443–453, 1970.View at: Google Scholar
T. F. Smith and M. S. Waterman, “Comparison of biosequences,” Advances in Applied Mathematics, vol. 2, no. 4, pp. 482–489, 1981.View at: Google Scholar
M. Ishikawa, T. Toya, and Y. Tokoti, “Parallel iterative aligner with genetic algorithm,” in Proceedings of the 13th International Conference on Artificial Ingelligence and Genome Workshop, pp. 84–93, 1993.View at: Google Scholar
C. Notredame, E. A. O'Brien, and D. G. Higgins, “RAGA: RNA sequence alignment by genetic algorithm,” Nucleic Acids Research, vol. 25, no. 22, pp. 4570–4580, 1997.View at: Google Scholar
L. Cai, D. Juedes, and E. Liakhovitch, “Evolutionary computation techniques for multiple sequence alignment,” in Proceedings of the IEEE Conference on Evolutionary Computation (ICEC '00), vol. 2, pp. 829–835, 2000.View at: Google Scholar
C. Sander and R. Schneider, “Database of homology-derived protein structures and the structural meaning of sequence alignment,” Proteins: Structure, Function and Genetics, vol. 9, no. 1, pp. 56–68, 1991.View at: Google Scholar
J. I. Davis and J. J. Doyle, “Homology in molecular phylogenetics: a parsimony perspective,” in Molecular Systematics of Plants II, pp. 101–131, Kluwer Academic Publishers, Boston, Mass, USA, 1998.View at: Google Scholar
M. O. Dayhoff, Atlas of Protein Sequence and Structure, National Biomedical Research Fundation, Washington, DC, USA, 1978.
D. J. Lipman, S. F. Altschul, and J. D. Kececioglu, “A tool for multiple sequence alignment,” Proceedings of the National Academy of Sciences of the United States of America, vol. 86, no. 12, pp. 4412–4415, 1989.View at: Google Scholar
S. F. Altschul, “Gap costs for multiple sequence alignment,” Journal of Theoretical Biology, vol. 138, no. 3, pp. 297–309, 1989.View at: Google Scholar
S. F. Altschul and D. J. Lipman, “Trees, stars, and multiple biological sequence alignment,” SIAM Journal on Applied Mathematics, vol. 49, no. 1, pp. 197–209, 1989.View at: Google Scholar
A. Bahr, J. D. Thompson, J.-C. Thierry, and O. Poch, “BAliBASE (Benchmark Alignment dataBASE): enhancements for repeats, transmembrane sequences and circular permutations,” Nucleic Acids Research, vol. 29, no. 1, pp. 323–326, 2001.View at: Google Scholar