About this Journal Submit a Manuscript Table of Contents
ISRN Artificial Intelligence
Volume 2012 (2012), Article ID 178658, 6 pages
http://dx.doi.org/10.5402/2012/178658
Research Article

Simulated Annealing with Previous Solutions Applied to DNA Sequence Alignment

Facultad de Sistemas, Universidad Autónoma de Coahuila, Saltillo, Coahuila, 25280 México, Mexico

Received 1 July 2012; Accepted 25 July 2012

Academic Editors: M. F. Abbod, M. Arif, and P. Trunfio

Copyright © 2012 Ernesto Liñán-García and Lorena Marcela Gallegos-Araiza. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

A new algorithm for solving sequence alignment problem is proposed, which is named SAPS (Simulated Annealing with Previous Solutions). This algorithm is based on the classical Simulated Annealing (SA). SAPS is implemented in order to obtain results of pair and multiple sequence alignment. SA is a simulation of heating and cooling of a metal to solve an optimization problem. In order to select randomly a current solution, SAPS algorithm chooses a solution from solutions that have been previously generated within the Metropolis Cycle. This simple change has led to increase the quality of the solution to the problem of aligning genomic sequences with respect to the classical Simulated Annealing algorithm. The parameters of SAPS, for certain instances, are tuned by an analytical method, and some parameters have experimentally been tuned. SAPS has generated high-quality results in comparison with the classical SA. The instances used are specific genes of the AIDS virus.

1. Introduction

Sequence alignment is one of the most important and challenging problems in computational biology and bioinformatics [1, 2]. Finding the optimal alignment of a set of sequences is known as a NP-complete problem [3]. Alignment of sequences can be an important tool to measure the similarity of two or more sequences. Sequence Alignment is classified as a combinatorial optimization problem [4], which is solved by using computer algorithms. These algorithms lead to represent, to process, and to compare genetic information to determine evolutionary relationships among living beings [3]. The sequence alignment highlights areas of similarity among sequences. The similarities among sequences may indicate functional or evolutionary relationships among genes or proteins [5].

The problem of sequence alignment is to obtain the maximum alignment of a set of genomic sequences, which is denoted as ; each sequence of this set is formed by the alphabet . The solution to this problem is represented by , which denotes a set with the alphabet . represents the optimal alignment of .

Exact algorithms have been applied to solve the sequence alignment problem. For example, dynamic programming has been one of the most used to solve the sequence alignment problem [6, 7]. The disadvantage of using exact algorithms is that these generate optimal solutions for small problems, but for large problems, exact algorithms become inefficient. For this reason, several metaheuristic methods have been designed to obtain suboptimal alignments. Metaheuristics have also been applied to solve this problem [8], for example, Ant Colony Algorithm [9], Simulated Annealing [10, 11], Genetic Algorithms [12], among others. The disadvantage is that metaheuristics do not guarantee optimal solutions, but solutions generated can be very close to optimal solution in a reasonable processing time.

The proposed algorithm is a modified version of classical Simulated Annealing. SAPS includes a new way to select a current solution after the Metropolis Cycle is finished. In general, SAPS generates better solutions to sequence alignment problem than the classical Simulated Annealing. SAPS was tested with different genes of AIDS virus.

This paper is organized as follows: in Section 2, classical simulated annealing algorithm is described. In Section 3, the SASP algorithm is explained in detail. In Section 4, the analytical tuning method is described. In Section 5, the implementation of the SASP is detailed. In Section 6, the experimentation and results are described. Finally, Section 7 discusses the conclusions.

2. Classical Simulated Annealing

The classical Simulated Annealing is an algorithmic process that simulates the gradual metal cooling for crystallization. This algorithm usually starts at high value of temperature, and then this parameter is decreased until a final temperature is reached. The final temperature typically is very close to zero [13, 14]. Through a cooling function, the temperature value is decreased from the initial temperature to the final temperature. There are cooling functions that have been used in the simulated annealing algorithm [1518]; the most common cooling function is defined by . This function decreases the temperature value by a factor, which does a range of . A gradual cooling is applied when is very close to 1, and a fast cooling is applied when is very close to 0.70.

The classical Simulated Annealing has two cycles; the first cycle is named Cycle of Temperature. Into this cycle, value temperature is decreased by a cooling function. The second cycle is named Metropolis Cycle, and it is applied to generate, to accept, or to reject solutions for the problem to be optimized. Algorithm 1 shows the pseudo code of the classical Simulated Annealing. The initial and final temperature values are set (see line 1). These values are obtained by an analytical (see Section 4) or experimental way. It is recommended that the initial temperature is as high as possible, and the final temperature is as close to zero. The initial solution of the problem to be optimized is created (see line 2). The current solution is set to . Set T to initial temperature (see line 3). The temperature cycle is executed from the initial temperature to the final temperature (see lines 4–18). The Metropolis Cycle gets started (see lines 5–16). This cycle takes a number of times specified in the stop criterion. A new solution is created within the Metropolis Cycle by creating a small perturbation to the current solution (see line 6). The difference between these two solutions ( and ) is obtained. If the difference is less or equal than zero (see line 8), the new solution is accepted (see line 9). If the difference is greater than zero, the Boltzmann probability is calculated (see line 11). If the Boltzmann probability is higher than a random value between 0 and 1 (see line 12) then the new solution is accepted (see line 13). After the Metropolis Cycle is completed, the temperature value is decreased (see line 17).

alg1
Algorithm 1: Pseudocode of classical Simulated Annealing.

Algorithm 2 shows the pseudo code of the SA, which is applied to obtain solutions to the problem of aligning two or more genomic sequences. The Simulated Annealing algorithm is modified then it can be implemented to solve the problem of alignment sequence. The values of initial and final temperatures are tuned by using an analytical method (see lines 1-2). The cooling factor value is set to a value very close to 1 ( ) (see line 3). The current solution is set to the original solution (see line 4). The similarity of this solution is calculated by comparing base by base (see line 5). The variable is set to the initial temperature (see line 6). The Metropolis Cycle length is set to an initial value (see line 7). This cycle has an increasing length, at high temperature, it has a low value, and it is increased as the temperature is decreased. The length of Metropolis Cycle is increased by a factor , where must be greater than 1. Temperature cycle is executed (see lines 8–29) with a logic condition that T is greater than . Within this cycle, the variable is updated with value 1 (see line 9), and within the metropolis cycle, this variable is incremented (see line 25).

alg2
Algorithm 2: Pseudo code SA applied to Sequence Alignment.

The Metropolis Cycle is executed (see lines 10–26). At the end of the Metropolis Cycle, the temperature is decreased (see line 27), and the Metropolis Cycle length is increased (see line 28). Within the Metropolis Cycle, new solutions are generated by modifying the current solution . This is done by adding or removing gaps into DNA sequences (see line 11). The similarity of new solutions is calculated (see line 12), and the difference of similarities between and is calculated (see line 13). This difference is denoted by . The new solutions are accepted when these are better than current solutions, so current solutions are replaced by new solutions (see line 15). When new solutions are of low quality (worse solutions) than current solutions, then new solutions are accepted using the Boltzmann probability (see line 22). This probability is directly related to the current value of the temperature and the quality difference between and . The Boltzmann probability is calculated by the following equation . As the temperature value is decreased, the probability of is decreased, which is of range .

3. Simulated Annealing with Previous Solutions

In order to generate high-quality solutions to sequence alignment, the classical SA was modified, so the SAPS algorithm is a modified version of the classical SA. After the Metropolis Cycle execution is done, the selection of a current solution is done. During the execution of Metropolis Cycle, the best solutions are stored in a set named .

The best of all solutions created in this cycle is stored in . The original sequence is stored in . After the Metropolis cycle is finished, a current solution is randomly selected from , , or . So .

The Metropolis Cycle length of SAPS is growing, which ranges from an initial value to a final value. At high temperature, is set to a small value and as the temperature value is increased, the value of the Metropolis Cycle length is increased until . when is reached, is reached too. Thus, an increasing number of solutions are created as the temperature is decreased. At high temperatures, a small number of solutions are created and as the temperature is decreased, the number of solutions is increased with a factor , where .

Algorithm 3 shows the pseudo code of SAPS, some lines of code were added to SA, for example, at line 5, and are set with . At line 19, is added to . At line 31, is chosen from , , or .

alg3
Algorithm 3: Pseudo code of SAPS.

4. Analytical Tuning Method

Some parameters of SAPS are tuned by the analytical method [1922]. For example, in order to calculate the initial temperature, the maximum deterioration (defined by ) of the instance is applied. The probability of accepting a solution is applied at high temperature. On other hand, the final temperature is calculated by applying the minimum deterioration (defined by ) of the instance and the probability of accepting a Solution at low temperature.

The analytical tuning based on Boltzmann distribution can be helpful for setting up the initial temperature [21]. The probability of accepting any new solution is very close to 1 ( ) at high temperatures, so the deterioration of cost function is maximal. The initial temperature ( ) is associated with the maximum deterioration admitted and the defined acceptance probability .

Let be the current solution and a new proposed one, and and are the costs associated to and , respectively; the maximum and minimum deteriorations are expressed as and , respectively. Then, the probability of accepting a new solution with the maximum deterioration is defined by ( ). This equation basically is the Boltzmann Distribution, which is applied for calculating the . This temperature value is defined by . Similarly, the final temperature ( ) is established according to the probability of accepting a new solution with the minimum deterioration. The equation to calculate the final temperature is defined by .

There are other parameters of SAPS that are calculated by applying a particular cooling function; for example, the Metropolis Cycle length is calculated by applying . The incremental factor of this cycle is also calculated and defined by .

The analytical method determines the Metropolis Cycle lenght with a simple Markov model [22]; at high temperatures, only a few iterations are required because the stochastic equilibrium is quickly reached; nevertheless, at low temperatures a more exhaustive exploration is needed, so a larger is used. Let be at and let be the maximum Metropolis Cycle length.

Let be decreased by the cooling function ( ), and the be calculated by the follow equation , where is the rate of increment of Metropolis Cycle (>1); so and have an initial value, and the last Metropolis Cycle is equal to . The functions and are applied successively in Simulated Annealing from to ; consequently, and are obtained by and , respectively. is the step number from to ; so we can get the and as follows: and .

5. Implementation

SASP was tested with all of the most HIV virus genes of human and simian. The nine genes of the human virus were compared with the nine genes of simian virus; for example, the gen named “env” of HIV human was aligned with the gen “env” of HIV simian, the gen named “gag” of HIV human was aligned with the gen “gag” of HIV simian, and so successively. The information of the virus genes is shown in Table 1. The parameters , , , and are tuned by analytical method. The factor is higher than 1, and it is very close to 1. The values of these parameters are shown in Table 2. In this table, the values of initial temperatures are high; these values are related to the maximum deterioration and the probability of accepting solutions at high temperatures. It is observed that the final temperature has a value very close to zero (0.43); this is because the minimum deterioration is equal to 1.0. The parameters and have the values 2, and 300, respectively.

tab1
Table 1: HIV genes of human and simian.
tab2
Table 2: Values of parameters.

6. Experimentation and Results

In Table 3, the results of the experiments are shown. The information shown is the average similarity and the standard deviation of the genes of both viruses (HIV Human and HIV Simian). The results show that the average obtained by SASP is of better quality than the average obtained by the classical SA. Table 4 shows that the SAPS processing time generally is better than the processing time of SA.

tab3
Table 3: Results of quality solutions.
tab4
Table 4: Results of processing time.

7. Conclusions

In this paper, a new approach is to make efficient the classical Simulated Annealing algorithm proposed to solve the problem of aligning genomic sequences. This approach is called SAPS. After completing the Metropolis Cycle, a current solution is selected randomly from the best solutions’ set, the best solution and the initial solution. This change in the classical simulated annealing resulted in an improved efficiency to solve the problem of aligning sequences. The parameters of the algorithms SA and SAPS were tuned using a tuning method, specifically the initial temperature, final temperature, and Metropolis Cycle length.

This approach to tune the parameters depends directly on the instance to test. With a preprocessing of the instance, the minimum and maximum deteriorations are calculated. With these values and the probability of acceptance, the initial and final temperatures are calculated.

References

  1. R. M. Karp, “Mapping the genome: some combinatorial problems arising in molecular biology,” in Proceedings of the 25th Annual ACM Symposium on the Theory of Computing, pp. 278–285, May 1993. View at Publisher · View at Google Scholar · View at Scopus
  2. E. S. Lander, R. Langridge, and D. M. Saccocio, “Mapping and interpreting biological information,” Communications of the ACM, vol. 34, no. 11, pp. 33–39, 1991.
  3. L. Wang and T. Jiang, “On the complexity of multiple sequence alignment,” Journal of Computational Biology, vol. 1, no. 4, pp. 337–348, 1994. View at Scopus
  4. C. H. Papadimitriou and K. Steiglitz, Combinatorial Optimization: Algorithms and Complexity, Dover Publications, Mineola, NY, USA, 1998.
  5. J. Setubal and J. Meidanis, Introduction to Computational Molecular Biology, PWS Publishing, 1997.
  6. O. Gotoh, “An improved algorithm for matching biological sequences,” Journal of Molecular Biology, vol. 162, no. 3, pp. 705–708, 1982. View at Scopus
  7. S. B. Needleman and C. D. Wunsch, “A general method applicable to the search for similarities in the amino acid sequence of two proteins,” Journal of Molecular Biology, vol. 48, no. 3, pp. 443–453, 1970. View at Scopus
  8. I. Ö. Bucak and V. Uslan, “An analysis of sequence alignment: Heuristic algorithms,” in Proceedings of the 32nd Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC'10), pp. 1824–1827, September 2010. View at Publisher · View at Google Scholar · View at Scopus
  9. L. Chen, L. Zou, and J. Chen, “An efficient ant colony algorithm for multiple sequences alignment,” in Proceedings of the 3rd International Conference on Natural Computation (ICNC '07), pp. 208–212, August 2007. View at Publisher · View at Google Scholar · View at Scopus
  10. J. Kim, S. Pramanik, and M. J. Chung, “Multiple sequence alignment using simulated annealing,” Computer Applications in the Biosciences, vol. 10, no. 4, pp. 419–426, 1994. View at Scopus
  11. S.-M. Chen and C.-H. Lin, “Multiple DNA sequence alignment based on genetic simulated annealing techniques,” International Journal of Information and Management Sciences, vol. 18, no. 2, pp. 97–111, 2007. View at Scopus
  12. C. Notredame, D. G. Higgins, and J. Heringa, “T-coffee: a novel method for fast and accurate multiple sequence alignment,” Journal of Molecular Biology, vol. 302, no. 1, pp. 205–217, 2000. View at Publisher · View at Google Scholar · View at Scopus
  13. V. Cerny, “Thermodynamical approach to the traveling salesman problem: an efficient simulation algorithm,” Journal of Optimization Theory and Applications, vol. 45, no. 1, pp. 41–51, 1985. View at Publisher · View at Google Scholar · View at Scopus
  14. S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, “Optimization by simulated annealing,” Science, vol. 220, no. 4598, pp. 671–680, 1983. View at Scopus
  15. E. Aarts and J. Korst, Simulated Annealing and Boltzmann Machines: a Stochastic Approach to Combinatorial Optimization And Neural Computing, Wiley-Interscience Series in Discrete Mathematics and Optimization, John Wiley & Sons, Chichester, UK, 1989.
  16. L. Ingber, “Simulated annealing: practice versus theory,” Mathematical and Computer Modelling, vol. 18, no. 11, pp. 29–57, 1993. View at Scopus
  17. U. Kjærulff, “Optimal decomposition of probabilistic networks by simulated annealing,” Statistics and Computing, vol. 2, no. 1, pp. 7–17, 1992. View at Publisher · View at Google Scholar · View at Scopus
  18. P. J. Van Laarhoven and E. H. L. Aarts, Simulated Annealing: Theory and Applications, Kluwer Academic Publishers, 1987.
  19. J. Frausto-Solis, E. F. Román, D. Romero, X. Soberon, and E. Liñán-García, “Analytically tuned simulated annealing applied to the protein folding problem,” Lecture Notes in Computer Science, vol. 4488, no. 2, pp. 370–377, 2007. View at Publisher · View at Google Scholar · View at Scopus
  20. J. Frausto-Solis, X. Soberon-Mainero, and E. Liñán-García, “MultiQuenching annealing algorithm for protein folding problem,” Lecture Notes in Computer Science, vol. 5845, pp. 578–589, 2009. View at Publisher · View at Google Scholar · View at Scopus
  21. J. Frausto-Solís, H. Sanvicente-Sánchez, and F. Imperial-Valenzuela, “Andymark: an analytical method to establish dynamically the length of the Markov chain in simulated annealing for the satisfiability problem,” Lecture Notes in Computer Science, vol. 4247, pp. 269–276, 2006. View at Scopus
  22. H. Sanvicente-Sánchez and J. Frausto-Solís, “A method to establish the cooling scheme in simulated annealing like algorithms,” Lecture Notes in Computer Science, vol. 3045, pp. 755–763, 2004. View at Scopus