Multiphase Simulated Annealing Based on Boltzmann and Bose-Einstein Distribution Applied to Protein Folding Problem

Frausto-Solis, Juan; Liñán-García, Ernesto; Sánchez-Hernández, Juan Paulo; González-Barbosa, J. Javier; González-Flores, Carlos; Castilla-Valdez, Guadalupe

doi:https://doi.org/10.1155/2016/7357123

Advances in Bioinformatics

On this page

Abstract Introduction Experimental Results Conclusions References Copyright Related Articles

Research Article | Open Access

Volume 2016 | Article ID 7357123 | https://doi.org/10.1155/2016/7357123

Multiphase Simulated Annealing Based on Boltzmann and Bose-Einstein Distribution Applied to Protein Folding Problem

Juan Frausto-Solis,¹Ernesto Liñán-García,²Juan Paulo Sánchez-Hernández,³J. Javier González-Barbosa,¹Carlos González-Flores,²and Guadalupe Castilla-Valdez¹

Academic Editor: F. Fdez-Riverola

Received24 Nov 2015

Revised05 Apr 2016

Accepted19 Apr 2016

Published20 Jun 2016

Abstract

A new hybrid Multiphase Simulated Annealing Algorithm using Boltzmann and Bose-Einstein distributions (MPSABBE) is proposed. MPSABBE was designed for solving the Protein Folding Problem (PFP) instances. This new approach has four phases: (i) Multiquenching Phase (MQP), (ii) Boltzmann Annealing Phase (BAP), (iii) Bose-Einstein Annealing Phase (BEAP), and (iv) Dynamical Equilibrium Phase (DEP). BAP and BEAP are simulated annealing searching procedures based on Boltzmann and Bose-Einstein distributions, respectively. DEP is also a simulated annealing search procedure, which is applied at the final temperature of the fourth phase, which can be seen as a second Bose-Einstein phase. MQP is a search process that ranges from extremely high to high temperatures, applying a very fast cooling process, and is not very restrictive to accept new solutions. However, BAP and BEAP range from high to low and from low to very low temperatures, respectively. They are more restrictive for accepting new solutions. DEP uses a particular heuristic to detect the stochastic equilibrium by applying a least squares method during its execution. MPSABBE parameters are tuned with an analytical method, which considers the maximal and minimal deterioration of problem instances. MPSABBE was tested with several instances of PFP, showing that the use of both distributions is better than using only the Boltzmann distribution on the classical SA.

1. Introduction

In genetics DNA, RNA, and proteins are the basic elements for many researches. DNA is a molecule that contains genetic instructions, which are involved in protein synthesis process [1]. This molecule represents a complete set of hereditary information of any organism. DNA has four different nucleotides, which are adenine, cytosine, guanine, and thymine. This molecule is divided into genes, and a gene is a sequence of nucleotides that express a protein. A functional protein is conformed in an approximated geometrical model of the global minimum energy [2, 3]. This is a dinamic process where the lowest free energy of the protein plus the solvent can be reasonably approximated by the minimum free energy found by Monte Carlo, conformational space annealing, genetic algorithms, and some deterministic methods [3, 4]. In fact, there are some examples, such as insulin alphalytic [5, 6] with natural conformations whose energy is not minimal. This structure is usually named Native Structure (NS). In addition, the free energy of an NS conformation depends on the interaction among the atoms and their relative positions.

Protein Folding Problem (PFP) is an enormous challenge and important problem in bioinformatics, medicine, and other areas [7]. The function of a protein is directly related to its three-dimensional structure, and misfolded proteins can cause a variety of diseases. The aim of this problem is to find the natural tertiary structure of a protein using only a target sequence. A protein can take a high number of different conformational structures from its primary structure to its NS. The computational problem involved to find the NS is known as Protein Folding Problem. Because PFP is an NP-hard problem [8], heuristic methods avoiding the generation of all possible states of the protein are commonly used. In order to find an NS, computational methods search structures on a huge space of possible solutions. These methods can obtain several structures very close to the NS. A particular class of these methods is known to be ab initio which looks for the NS using only the protein’s amino acid sequence.

As a consequence, to solve PFP, new metaheuristics are applied, where simulated annealing (SA) [9, 10] is one of the most successful [11–13]. Currently, classical SA applies a Boltzmann distribution in order to accept bad solutions and escape from local minima. However, to generate high-quality solutions for PFP, new and more efficient SA have been designed; one of them, named Chaotic Multiquenching Annealing Algorithm (CMQA), has obtained very good results for proteins such as -enkephalin, proinsulin, T0549, T0335, and T0281 or 1PLXW, 1T0C, 2K5E, SR384, and 1A19, in PDB format, respectively. There are three central phases of this algorithm [14]: (i) Multiquenching Phase (MQP), (ii) Annealing Phase (AP), and (iii) Dynamical Equilibrium Phase (DEP). All of these phases are explained in the paper; for this introduction all we need to know is that each phase is designed with an annealing approach looking for finding the best configuration of the previous one. At the beginning of the process, MQP improves a random configuration through an annealing procedure executed at extremely very high temperatures; AP searches for a better solution than that of MQP with an annealing search applied at high temperatures, and, finally, DEP is applied at low temperatures looking for a better solution than that obtained by AP. As the classical SA, all of these phases apply Boltzmann distribution for accept bad solutions. However, Bose-Einstein distribution can also be used for escape from local minima [15]. Nevertheless, algorithms using these two distributions in different ranges of temperatures have not been published for PFP.

In this paper, a new SA algorithm named MPSABBE (Multiphase Simulated Annealing based on Boltzmann and Bose-Einstein distributions) is introduced. MPSABBE applies the Boltzmann and Bose-Einstein distributions at high and low temperatures, respectively. The paper shows that using both distributions the quality solution is improved. This paper is organized as follows. In Section 2, PFP is described. In Section 3, the classical SA and MPSABBE algorithms are explained. In Section 4, the SA applied for solving PFP is detailed. In Section 4, all the four MPSABBE’s phases are presented. In Section 5, analytical tuning methods SA and MPSABBE are described. In Section 6, experimental results are shown. Finally, in Section 7, the conclusions of this research are discussed.

2. Protein Folding Problem

PFP is related to the questions of how and why a protein is folded into its NS. The proteins adopt an extreme number of possible conformations [16], which depends on the number of amino acids and the number of conformations by each amino acid. The essential concept introduced by Levinthal is that the PFP is a random search problem. This general idea means that all conformations of a protein (except the native state) are equally likely. Thus, it is more efficient to find the native state by a random search. PFP is an interdisciplinary problem that involves molecular biology, biophysics, computational biology, and computer science. In the ab initio case, NS prediction requires different mechanisms that lead the searching process to a biological three-dimensional structure. As was previously mentioned, this process requires only the amino acids’ sequence. PFP is an enormous challenge and is very hard to find the NS of a protein because the space of possible conformations of the protein is in general extremely large. For all practical purposes, PFP can be defined as follows.

Given(i)a sequence of amino acids that represents the primary structure of a protein,(ii)an energy function , where the variables represent dihedral angles,

find the following:(i)the Native Structure such that represents the lowest energy value, where(ii)the solution defines the best three-dimensional configuration.

Force fields are used to represent the energy of a protein; some of the most common are AMBER [17], CHARMM [18], ECEPP/2 [19–21], ECEPP/3 [22], and GROMACS [23]. These force fields compute energy components, for instance, the electrostatic energy, the torsion energy, the hydrogen bond energy, and the Lennard-Jones energy. In this paper ECEPP/2 force field is used.

The atoms of a protein are represented in three-dimensional cartesian coordinates. There are four types of torsion angles or dihedral angles as follows:(i)The angle between the amino group and the alpha carbon is referred to as Phi (). This angle represents the angle between the amino group (or NH₂) of the amino acid and the alpha Carbon in the sequence; specifically, it represents the bond angle between atom of amino group and the central carbon ().(ii)The dihedral angle between the alpha carbon and the carboxyl group is referred to as Psi (). Psi represents the angle between the carboxyl () group of the amino acid and the central carbon () of the same amino acid. In particular, Psi measures the angle of the covalent bond between of the carboxyl group and the central carbon ().(iii)For every amino acids sequence, an omega angle () is defined for each two consecutive amino acids ; specifically, it is the angle of the covalent bond between the atom of amino acid and carbon of the carboxyl group amino acid .(iv)And, finally, each Chi angle () is defined between the two planes conformed by two consecutive carbon atoms in the radical group.

The variables of the problem are all of these four angles which are in the range . In the simulations conducted in this research work, these angles are set with discrete values. Some variables have well-defined ranges as is the case of Psi and Phi angles whose ranges are defined by the Ramachandran plot [24]. The Phi angle is defined in the ranges and . The Psi angle is defined in three ranges , , and . Finally, the omega angle is fixed at 180 degrees.

3. Simulated Annealing Algorithm

3.1. Simulated Annealing Based on Boltzmann Distribution

Simulated Annealing (SA) Algorithm is a probabilistic method proposed by Kirkpatrick et al. [9] and Černý [10] and is an adaptation of the Metropolis algorithm, which is a Monte Carlo method [25]. SA is based on the gradual metal cooling for crystallization. This algorithm works by emulating the physical process where a metal is heating at very high temperature and then cooled very slowly until its frozen state. When this process happens, the metal is crystallized with the lowest energy configuration. SA is an algorithm that has been used for finding the optimal solution or close to it for different NP-hard problems including biological problems such as sequence alignment [26–28], phylogenetic trees [29], and PFP [30]. From a theoretical point of view, SA converges to the optimal solution or close to the lowest free energy [31]. However, classical SA is not able to find the lowest energy because energy barriers are too high for SA and cannot escape from local minima. As a consequence, variants of this method are proposed [14, 30].

Simulated annealing usually starts at a very high initial temperature (). Through a cooling function, the temperature value is gradually reduced from to , which usually is very close to zero [9, 10]. There are several cooling functions used in SA [31–36], for example,

The most common function is (1). This function reduces the temperature parameter by factor, which is commonly in the range of . A slow cooling is applied when is very close to 1, while a fast cooling is applied when is around 0.70.

The classical SA has two cycles as is shown in Algorithm 1; the first one is named temperature cycle and is used to decrease the value of the temperature with a specific cooling function. The second cycle is named metropolis cycle and it generates, accepts, or rejects solutions of the problem to be optimized. The initial and final temperature values are set (see lines (1)-(2)). These values are obtained by an analytical (see Section 5) or experimental way: should be as high as possible, while should be close to zero. An initial solution () is required in SA; this solution is generated (see line (3)) and is set to . At the beginning of the process, the parameter is set at the initial temperature (see line (4)). The temperature cycle is executed from to (see lines (5)–(19)). Then the metropolis cycle is repeated (see lines (6)–(17)) a certain number of times until a stop condition, which is explained later in this paper. A new solution () is generated within the metropolis cycle by applying a small perturbation to the current solution (see line (7)). The difference between these two solutions ( and ) is calculated (see line (8)). In practice, SA can be stopped when the probability of accepting a new solution is negligible. For a minimization problem, if this difference is less than or equal to zero (see line (9)), the new solution is accepted (see line (10)). When this difference is greater than zero, the Boltzmann distribution is applied. Then, a Boltzmann probability is calculated using (4) in line (12). If this probability is higher than a random value between 0 and 1 (see line (13)), then the new solution is accepted (see line (14)):

(1) Setting initial temperature ()
(2) Setting final temperature ()
(3) Generate from Initial Solution ()
(4)
(5) While () do //Temperature Cycle
(6) While (stop condition) //Metropolis Cycle
(7) Generate by applying a perturbation to
(8) Obtain difference between and
(9) If (difference ≤ 0) then
(10) Accept
(11) else
(12) Boltzmann Probability =
(13) If (Boltzmann Probability > random(0, 1)) then
(14) Accept
(15) end if
(16) end if
(17) end while
(18) Decrease by a cooling function
(19) end while
(20) Shown better solution ()

After the metropolis cycle is completed, the temperature value is reduced by a cooling function (see line (18)). For a maximization problem, if the difference of is greater than zero, the new solution is accepted; else can be rejected or accepted depending on the Boltzmann probability value.

3.2. Simulated Annealing Based on Bose-Einstein Distribution

Statistical Mechanics (SM) study the overall behavior of a system consisting of a large number of particles whose behavior is unpredictable. SM uses statistics and probability theory and thermodynamic principles. According to SM, the occurrence of each future result is determined by a probabilistic function such as Boltzmann and Bose-Einstein distributions. In addition, only the most probable behavior of the system in thermal equilibrium at a given temperature is observed [37]. Bose-Einstein distribution is obtained by finding the most probable distribution, that is, solving the problem defined by maximizing the most probable distribution, subject to the following constraints: the number of particles (defined by the summation of particles in each microstate) is constant and the total energy (defined by the summation of individual energies of each microstate) is constant. The problem is solved using Lagrange multipliers. The parameters and are defined as lagrage multiplier of and , respectively [38]. Then the Bose-Einstein distribution applied for low and very low temperatures is defined by

Then particles behavior can be modeled by Bose-Einstein distribution defined in (6). This equation defines the acceptance probability distribution of a new configuration of particles:where is the temperature parameter, is related to the constraint of the total of particles in the system, and is the Boltzmann constant. However, at very high temperatures Bose-Einstein distribution practically becomes the Boltzmann distribution. Nevertheless, at low and very low temperatures, the particles behave differently and they tend to congregate at the same lowest energy state; the result is known as a Bose-Einstein condensate [39]; as a consequence, the system can be modeled by Bose-Einstein distribution. Section 4 presents a new SA applying both Boltzmann and Bose-Einstein distributions for accepting bad solutions for high and low temperatures.

3.3. Simulated Annealing Applied to Solve Protein Folding Problem

The classical Simulated Annealing Algorithm can be implemented to solve the Protein Folding Problem [40] as is shown in the pseudocode of Algorithm 2. The initial and final temperature (see lines (1)-(2)) can be calculated according to the instance of the problem by applying the analytical method parameters of Section 5; that means that the protein should be preprocessed.

(1) Tune initial temperature () by analytical method
(2) Tune final temperature () by analytical method
(3) Setting cooling factor ()
(4) is created by modifying the internal angles of protein
(5)
(6) Calculate energy of protein applying a Force Field
(7)
(8) While () do //Temperature Cycle
(9) While (stop condition) //Metropolis Cycle
(10) Create new solution () by modifying internal angles of the protein
(11) Calculate Energy of proteins using a force field funtion
(12) Obtain difference of energies between these two proteins
(13) If (difference ≤ 0) then
(14)
(15) If energy() > energy() then
(16)
(17) end if
(18) else
(19) Boltzmann Probability =
(20) If (Boltzmann Probability > random(0, 1)) then
(21)
(22) end if
(23) end if
(24) end while
(25) Decrease by cooling function ()
(26) end while
(27) Show the best solution of PFP ()

Applying the cooling function (1), the cooling factor value is required. The temperature value is reduced very slowly; thus, must be very close to 1 (see line (3)). In order to reduce very fast the temperature, the cooling factor is set very close to 0.70. An initial solution of PFP is created, which is set to the current solution (see line (4)). The internal angles of the initial solution are modified at random. At this point, the best solution is (see line (5)). The energy of is calculated by applying a force field function (see line (6)). Before starting the temperature cycle, the initial is loaded into variable in line (7). Then the temperature cycle starts (see lines (8)–(26)) with a logic condition ( greater than in line (8)). Inside of temperature cycle, the metropolis cycle is executed (see lines (9)–(24)). After this cycle is completed, the value of the temperature is decreased (see line (25)).

Inside the metropolis cycle, a new solution of Protein Folding Problem is generated by modifying the previous solution . This is done by modifying the internal angles of the protein (see line (10)). The energy of the protein is calculated (see line (11)), and the difference of energies (i.e., between and ) is determined (see line (12)). This difference is denoted by . The new solution is accepted when the new solution is better than the previous one; thus, the current solution is replaced by (see line (14)). When a new solution is worse than the current solution, it can be accepted using the Boltzmann distribution (see line (21)). The probability of this distribution (or acceptance probability) is directly related to the current value of the temperature and the difference of energy between and . This probability is calculated by (4). As the temperature value is reduced, the acceptance probability decreases.

4. MPSABBE Algorithm

4.1. General Description

MPSABBE is a hybrid algorithm, which has four phases (see Figure 1). These phases are (i) Multiquenching Phase (MQP) applied from extremely high to high temperatures, (ii) Boltzmann Annealing Phase (BAP), which is executed from high to low temperatures, (iii) Bose-Einstein Annealing Phase (BEAP) from low to very low temperatures, and finally (iv) Dynamical Equilibrium Phase (DEP) which applies an annealing process at extremely low temperatures using Bose-Einstein distribution.

In order to accept worse solutions, BAP and BEAP apply Boltzmann and Bose-Einstein distributions, respectively. This is done with the aim of escaping from local minima. DEP is an extension of BEAP, where the stochastic equilibrium is dynamically detected. This is done by using a regression method into the metropolis cycle; the iterations’ number is considered as the independent variable and the energy value of each iteration as the dependent variable. The equilibrium detection criterion is the slope of the energy function into the metropolis cycle. The four phases MQP, BAP, BEAP and DEP are executed in the temperatures range shown in Table 1. The initial and final temperatures and are determined using the analytical tuning method of Section 6. The other temperatures are determined using a variability criterion, such as the variability being larger where the temperature is higher.

4.2. MQP Phase of MPSABBE

MQP has several subphases. It starts at an extremely high initial temperature (), which is obtained by an analytical method [41]. This phase is finished when a threshold temperature () is reached. MQP uses the cooling function given bywhere is a decrement factor of the temperature parameter, in the range , and defines how fast each MQP subphase is decreased. A very low value will decrease the temperature very fast. Besides, is defined as

The parameter is defined by (9), where , and it defines a quadratic decrement of the temperature. Notice that converges to zero and (7) is equivalent to (10):

The transition between two subphases is based on parameter; it occurs when converges to zero (). When is very close to zero, a new MQP subphase is started and is set to its initial value. This process continues until the temperature is reached. In Figure 2, the MQP phase is shown. In this phase, several subphases are shown. When a subphase is started, the parameter is set to its initial value.

In Algorithm 3, the MQP pseudocode of MPSABBE is shown. At setting section (see lines (4)–(6)), the initial temperature is calculated by an analytical method. The final temperature of this phase () is set to an initial value, determined in an experimental way. In line five, the variable is set to the initial temperature. The factors and are set to their initial values. The initial solution is generated (see line (8)). The energy of this solution is calculated, and and are set to and , respectively.

(1) MQP Procedure( )
(2) Begin
(3) //Setting section
(4) = Initial Temperature calculated by analytical method
(5) = Initial value,
(6) = initial value, = Initial value
(7) //Creation of initial solution
(8) = Create the initial solution,
(9) ,
(10) Repeat //External Cycle
(11) Repeat //Internal Cycle (Metropolis Cycle)
(12) = Perturbation () //Uniform perturbation
(13) Difference =
(14) If Difference ≤ 0 Then
(15)
(16)
(17) elseif Then //Boltzmann Probability
(18)
(19)
(20) end if
(21) If then //save
(22)
(23)
(24) end if
(25) Until Metropolis Cycle is Finished
(26) =
(27) If very close to 0 Then
(28) = initial value
(29) end if
(30)
(31) Until > //External Cycle
(32) End procedure

The external cycle is started at line (10), and this is finished at line (31). This internal cycle generates solutions of PFP and accepts or rejects solutions using the Boltzmann distribution. The temperature parameter is decreased into this cycle by applying a cooling function (see line (30)). In this cycle, is set by (9) (see line (26)). When is very close to zero, this variable is set to its initial value (see line (28)). The Temperature value is calculated by (7).

After the external cycle is started, the metropolis cycle is started too. This cycle generates new solutions of PFP. A new solution is obtained by applying a small perturbation to the current solution (see line (12)). The difference between the energies of and is calculated (see line (13)). If this difference is less than zero (see line (14)), then the new solution is accepted. is replaced by (see line (15)). is replaced by (see line (16)). If the difference of energies between these solutions is larger than zero, then the Boltzmann probability is applied (see line (17)). If this probability is larger than a random number between 0 and 1 (see line (17)), then the new solution is accepted (see line (18)). The is replaced by (see line (19)). If is less than (see line (21)) then is set to (see line (22)). The is replaced by (see line (23)).

4.3. BAP Phase of MPSABBE

In Algorithm 4, pseudocode of BAP is shown. BAP is based on simulated annealing. The temperature parameter is decreased by () or (). On the other hand, the length of metropolis cycle is determined by (21) or (27), respectively. In the internal cycle of the BAP, new solutions for the instance are generated. In this cycle, a better solution than a previous one is always accepted. However, worse solutions are accepted or rejected by applying the Boltzmann distribution (4). The length of the Markov chain (i.e., the internal cycle length) is determined by (21), where the increment is calculated with (22). The initial temperature was set to a threshold value, which was the final temperature of MQP phase. The final temperature of BAP phase is very close to zero.

(1) BAP Phase( )
(2) Begin
(3) (Final temperature of MQP Phase)
(4) = Final Temperature of this phase
(5) = initial value (very close to one)
(6) = Value calculated by analytical method
(7) CM = Initial value
(8) While () do
(9)
(10) while () do
(11) = perturbation system()
(12) Difference =
(13) If (Difference ≤ 0) then
(14)
(15)
(16) ElseIf ( then
(17)
(18)
(19) End if
(20) If then //save
(21)
(22)
(23) end if
(24)
(25) end while
(26) or
(27)
(28) End while
(29) End

4.4. BEAP Phase of MPSABBE

In Algorithm 5, pseudocode of BEAP is shown. Again the external cycle decreases its temperature value according to the cooling functions (1) or (2). This time, the metropolis cycle length is constant, and it is equal to the maximum length of the last metropolis cycle in BAP phase. In this second cycle, the Bose-Einstein distribution is applied for accepting worse solutions.

(1) BEAP Phase( )
(2) Begin
(3) = Threshold value; Determine
(4) = value very close to zero
(5) = initial value
(6) = Value calculated by analytical method
(7) CM = Initial value
(8) While () do
(9)
(10) while () do
(11) = perturbation system()
(12)
(13) If () then
(14)
(15) ElseIf (( then
(16)
(17) End if
(18)
(19) end while
(20) or
(21) End while
(22) End

4.5. DEP Phase of MPSABBE

In Algorithm 6, the DEP goal is to detect the stochastic equilibrium by determining the iteration where the slope of the energy function remains very close to zero. In order to do that, let us define the next variables: (a) the number of the iterations in the metropolis cycle and (b) the energy found for the algorithm in iteration . Using a standard least squares method, the slope for iterations is defined bywhich becomeswhere

(1) DEP Phase( )
(2) Begin
(3) While () do
(4)
(5) = ,
(6) while () do
(7) = perturbation_system()
(8) If = total_clausule then stop( )
(9)
(10) If () then
(11)
(12) ElseIf () then
(13)
(14) End if
(15)
(16)
(17)
(18) end while
(19) or
(20)
(21)
(22)
(23) End while
(24) End

Notice that the complexity of the computation of (12) is . This equation contains only summations; thus, it is less complex than (11). These summations are computed using simple data structures. and are only constants for a particular value.

5. Analytical Tuning Method

5.1. Parameters Setting Based on Boltzmann Distribution

Parameters of MPSABBE are tuned by the analytical method [42]. The initial temperature is defined by the maximum difference named maximum decrement , which is calculated using a sample of random protein structures at the highest temperature range. In this sample, the energy of two consecutive protein structures defines a simple decrement of energy , and is the maximum difference in the sample. On the other hand, the final temperature is calculated by applying the minimum deterioration (i.e., minimum decrement) of a sample of protein structures taken at low temperatures. Analytical tuning based on Boltzmann distribution can be helpful for setting up the initial temperature. The probability of accepting any new solution is near to one () at high temperatures, so the decrement of the cost function is maximal. The initial temperature () is associated with the maximum deterioration admitted and the defined acceptance probability .

Let be the current solution and a new proposed one, and and are the costs associated to and , respectively. The maximum and minimum deteriorations are and , respectively; then probability of accepting a new solution with the maximum deterioration is defined byThis equation is basically the Boltzmann distribution, which is applied for calculating . This temperature value is defined bySimilarly, the final temperature () is established according to the probability of accepting a new solution with the minimum deterioration. The equation for calculating the final temperature is defined byThere are other parameters of MPSABBE that are calculated by applying a particular cooling function; for example, the metropolis cycle length is calculated by applying

The analytical method determines the metropolis cycle length with a simple Markov model [42]; at high temperatures, only a few iterations are required because, in this condition, the stochastic equilibrium is reached very fast. Nevertheless, at low temperatures, a more exhaustive exploration is needed and should be as largest as possible. Let be at the temperature and let be the maximum metropolis cycle length. Let the temperature be decreased by the cooling function (17) and let be calculated bywhere is the increment coefficient of metropolis cycle (), so and is the initial value. The markov chain length of the last metropolis cycle is equal to . Functions (17) and (18) are consecutively applied in simulated annealing from to ; consequently and are obtained by (19) and (20), respectively,where is the steps number from to .

Notice that the increment coefficient can be calculated if the initial length and the maximum length value are available. As is well known the former can simply be set close to one, while the second depends on the exploration level established in the algorithm as follows.

Thus, the number of times that the metropolis cycle is executed can be simply obtained by using (21). Once is determined the increment of the metropolis cycle length can be calculated by (22):

5.2. Parameters Setting Based on Bose-Einstein Distribution

The initial and final temperatures can be calculated by applying the Bose-Einstein distribution. Then, the probability of accepting a new solution with the maximum deterioration is defined by (23). Consequently, the initial and final temperatures are calculated with (24) and (25), respectively,

Let be decreased by the cooling function (2). Thus, is calculated by As a consequence, and are calculted by

Notice that the increment coefficient can be calculated if the initial and maximum metropolis length and are available [42]. As is well known the former can simply be set close to one, while the second depends on the exploration level established in the algorithm. Therefore, for any solution, the value of depends on the size of neighborhood . Thus, and , where is the rejection probability for a solution . The parameter ranges from 1 to 4.6; the larger value of assures a good exploration level in the neighborhood of at the final temperature. Hence, different exploration levels can be applied. When we explore with values of 63%, 86%, 95%, or 99%, the exploration levels are , or , respectively. Because can be very large for PFP instances, it is important to apply a particular process for detecting the stochastic equilibrium; this is done in DEP phase of MPSABBE that detects efficiently the stochastic equilibrium. The next section explains all MPSABBE phases and the performance of using Boltzmann and Bose-Einstein distribution.

6. Experimental Results

MPSABBE is tested with five instances of PFP, which are -enkephalin, proinsulin, T0549, T0335, and T0281. These instances have different sequence’s length and a different number of variables (dihedral angles). The smallest sequence is -enkephalin, which has five amino acids and 19 variables. The largest sequence is a hypothetical protein (CASP T0281), which has 90 amino acids and 458 variables. The proinsulin instance has 31 amino acids and 132 variables; the 2K5E (CASP T0549) has 73 amino acids and 343 variables. The instance Bacillus subtilis (CASP T0335) has 85 amino acids and 450 variables. The dihedral angles used in the simulations were phi (), psi (), omega (), and Chi (). The initial and final temperature are tuned analytically. In MQP, parameters and are set with 0.85 and 0.999, respectively. In each subphase of MQP the final value of is set to 0.001.

In Table 2, the results of -enkephalin obtained with MPSABBE algorithm are shown. In this table, we show the traditional average energy, processing time in minutes, and the average of the traditional RMSD (Root-Mean-Square Deviation) [43]. The RMSD was calculated using TM-Align [44]. The best average solution for -enkephalin is −5.0634 kcal/mol with 0.8427 minutes of processing time, and the average RMSD obtained was 0.361 Å (Angstroms). The RMSD is a measure which represents a structural alignment between two proteins (target and solution). The target used in this paper was taken from Protein Data Bank (PDB). An RMSD near to zero is taken as a perfect structural alignment between both proteins. The RMSD is commonly used in protein folding to represent how a new obtained solution by simulation is structurally similar to the target solution. In this case, in Figure 3, the graphic of energy and RMSD for each solution is shown. In this graphic, all energies of -enkephalin calculated by MPSABBE are plotted. This is a solution with poor quality because there are better solutions in the literature; the energy found by MPSABBE was −7.2787 kcal/mol. In Figure 4, the graphics of landscape of -enkephalin is shown. The results obtained in the literature for this case by using ECEPP/2 and with fixed at 180 or variable were −10.72 [20] and −12.90 [43, 45], respectively. Examining the features of MPSABBE the exploration ability is not good enough; thus, the algorithm requires improvement. Figure 3 shows all solutions generated by MPSABBE; the curve enveloping the number of solutions in Figure 3 is only a descriptive tool to illustrate that the optimal solution is reached when the RMSD is too small; however, this is not really a very good stop condition. Notice that the best result obtained with the classical simulated annealing in the literature using Boltzmann distribution was only −5 kcal/mol [43], while the best result obtained in this case for MPSABBE using Bose-Einstein distribution was −7.2787 kcal/mol.

In Table 3, the results of proinsulin obtained with MPSABBE algorithm are shown. The best average solution for this instance is −122.4350 kcal/mol with 20.7302 minutes of processing time, the average RMSD is 3.127 Å. This solution was obtained with . In Figure 5 the graphic of energy and RMSD for each solution is shown. In this Figure, some energies of proinsulin calculated by MPSABBE are plotted. The best solution found by MPSABBE was −142.7586 kcal/mol. In Figure 6, the landscape of proinsulin is shown.

In Table 4, the results of T0549 instance obtained with MPSABBE algorithm are shown. The best average solution for this instance is −257.0625 kcal/mol with 106.6151 minutes of processing time, the average RMSD is 4.30 Å. This solution was obtained with . In Figure 7, the energy and RMSD for each solution are shown. In this figure, some energies of T0549 instance calculated by MPSABBE are plotted. The best solution found was −317.2117 kcal/mol. In Figure 8, the landscape of T0549 is shown.

In Table 5, the results of T0335 instance obtained with MPSABBE algorithm are shown. The best average solution for this instance is −378.6827 kcal/mol with 202.2453 minutes of processing time; the average RMSD is 3.5793. This solution was obtained with . In Figure 9, the energy and RMSD for each solution are shown. In this figure, some energies of T0335 instance calculated by MPSABBE are plotted. The best solution was −427.2939 kcal/mol. In Figure 10, the landscape of T0335 is shown.

In Table 6, the results of T0281 instance obtained with MPSABBE algorithm are shown. The best average solution for this instance is −322.3821 kcal/mol with 187.5070 minutes of processing time; the average RMSD is 4.5 Å. This solution was obtained with . In Figure 11, the graphic of energy and RMSD for each solution are shown. In this figure, some energies of T0281 instance calculated by MPSABBE are plotted. The best solution found was −380.1765 kcal/mol. In Figure 12, the landscape of T0281 is shown.

Figures 13–15 show the graphs of energy, which are obtained from consecutive solutions in the cycle of metropolis in specific executions. These figures correspond to the results of energies obtained from the MPSABBE algorithm with -enkephalin, proinsulin, and T0281 instances, respectively.

6.1. Test Hypothesis

In Table 7, the average and deviation of energy and time for each instance applying MPSABBE algorithm are shown. The null hypothesis is defined as , which means that the average energy of MPSABBE () for each instance is less than or equal to CMQA () [14]. The alternative hypothesis is defined as . In Table 8, the average and standard deviation of energy and time for each instance applying the proposed algorithm are shown. The average processing times are used for testing the null hypothesis, which is defined as , which means that the average processing time of MPSABBE () is less than or equal to the average processing time of CMQA (). The alternative hypothesis is defined as . In Table 9, the values obtained for -student are shown; these values were calculated by applying the average and standard deviation of energy and execution time from Tables 7 and 8.

The value of -student is −2.6363 (Table 9). The critical value is 1.645. The statistic test determined that the null hypothesis is accepted; thus, MPSABBE generates better quality solution than CMQA, when these approaches are applied with -enkephalin instance. Therefore, the null hypothesis is rejected, and the average energy of MPSABBE () for -enkephalin instance is less than or equal to CMQA (). For processing execution time, the value of the statistic test (-student) is −2.4022. Thus, MPSABBE (applied to -enkephalin instance) uses less processing execution time than CMQA.

When the proinsulin instance is applied, the value of the statistic test (-student) is −0.4272; thus, MPSABBE generates better quality solution than CMQA. For processing execution time, the value of the statistic test (-student) is −3.0368. MPSABBE (applied to proinsulin instance) uses less processing execution time than the average processing time of CMQA. When the T0549 instance is applied, the value of the statistic test (-student) is 0.3522, so that MPSABBE generates better quality solution than CMQA. For processing time, the value of the statistic test (-student) is −4.0444. The MPSABBE (applied to T0549 instance) uses less processing execution time than CMQA. When the T0335 instance is applied, the value of the statistic test (-student) is 0.5573, so that MPSABBE generates better quality solution than CMQA. For the processing execution time, the value of the statistic test (-student) is −3.5832. The MPSABBE (applied to T0335 instance) uses less processing time than CMQA. When the T0281 instance is applied, the value of the statistic test (-student) is 1.0515; thus, MPSABBE generates better quality solution than CMQA. For processing execution time test, the value of the statistic test (-student) is −4.0533. Then MPSABBE (applied to T0281 instance) uses less processing execution time than CMQA. Therefore, MPSABBE generates the better quality solution and uses less processing execution time than CMQA in all instances.

Notice that the improvement obtained when the two distributions are used is better when the protein is smaller. For instance, for -enkephalin and proinsulin (with five and thirty-one amino acids) MPSABBE surpass CMQA by 13.73 and 1.1243%, respectively; otherwise for T0549, T0335, and T0281 (with 73, 85, and 90 amino acids), these figures were −1.12, −2.13, and −3.75%, respectively. Thus, the new algorithm obtains better results for small proteins than the classical SA.

7. Conclusions

In this paper, a new Simulated Annealing Algorithm named MPSABBE for Protein Folding Problem is presented. This algorithm includes Bose-Einstein and Boltzmann distributions in SA. Traditionally, for PFP, SA only uses the Boltzmann distribution function as the acceptance probability of bad solutions. MPSABBE was compared to a classical SA for protein folding which only applies Boltzmann distribution. According to the experimentation, the new algorithm is more efficient by the use of the two distributions when the proteins are small. The quality of the solutions obtained by the new approach is not always the best alternative, although the difference of the quality solution is only 2 to 5% for the worse cases. Besides, the new approach can overtake the classical quality solution of SA by one to ten percent while execution time is in general lower.

Competing Interests

The authors declare that they have no competing interests.

References

B. Lewin, Genes VIII, Pearson Prentice Hall, 2004.
C. B. Anfinsen, “Principles that govern the folding of protein chains,” Science, vol. 181, no. 4096, pp. 223–230, 1973.
View at: Publisher Site | Google Scholar
A. Liwo, C. Czaplewski, S. Ołdziej, and H. A. Scheraga, “Computational techniques for efficient conformational sampling of proteins,” Current Opinion in Structural Biology, vol. 18, no. 2, pp. 134–139, 2008.
View at: Publisher Site | Google Scholar
J. Lee, H. A. Scheraga, and S. Rackovsky, “New optimization method for conformational energy calculations on polypeptides: conformational space annealing,” Journal of Computational Chemistry, vol. 18, no. 9, pp. 1222–1232, 1997.
View at: Publisher Site | Google Scholar
J. L. Sohl, S. S. Jaswal, and D. A. Agard, “Unfolded conformations of α-lytic protease are more stable than its native state,” Nature, vol. 395, no. 6704, pp. 817–819, 1998.
View at: Publisher Site | Google Scholar
Z. Wang, J. Mottonen, and E. J. Goldsmith, “Kinetically controlled folding of the serpin plasminogen activator inhibitor 1,” Biochemistry, vol. 35, no. 51, pp. 16443–16448, 1996.
View at: Publisher Site | Google Scholar
J. Ngo, J. Marks, and M. Karplus, “Computational complexity, protein structure prediction, and the levinthal paradox,” in The Protein Folding Problem and Tertiary Structure Prediction, pp. 433–506, Springer, Berlin, Germany, 1994.
View at: Publisher Site | Google Scholar
J. T. Ngo and J. Marks, “Computational complexity of a problem in molecular structure prediction,” Protein Engineering, vol. 5, no. 4, pp. 313–321, 1992.
View at: Publisher Site | Google Scholar
S. Kirkpatrick, J. Gelatt, and M. P. Vecchi, “Optimization by simulated annealing,” Science, vol. 220, no. 4598, pp. 671–680, 1983.
View at: Publisher Site | Google Scholar | MathSciNet
V. Černý, “Thermodynamical approach to the traveling salesman problem: an efficient simulation algorithm,” Journal of Optimization Theory and Applications, vol. 45, no. 1, pp. 41–51, 1985.
View at: Publisher Site | Google Scholar | MathSciNet
K. T. Simons, C. Kooperberg, E. Huang, and D. Baker, “Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions,” Journal of Molecular Biology, vol. 268, no. 1, pp. 209–225, 1997.
View at: Publisher Site | Google Scholar
K. W. Kaufmann, G. H. Lemmon, S. L. Deluca, J. H. Sheehan, and J. Meiler, “Practically useful: what the R osetta protein modeling suite can do for you,” Biochemistry, vol. 49, no. 14, pp. 2987–2998, 2010.
View at: Publisher Site | Google Scholar
D. Simoncini and K. Y. J. Zhang, “Efficient sampling in fragment-based protein structure prediction using an estimation of distribution algorithm,” PLoS ONE, vol. 8, no. 7, Article ID e68954, 2013.
View at: Publisher Site | Google Scholar
J. Frausto-Solís, E. Liñan-García, M. Sánchez-Pérez, and J. P. Sánchez-Hernández, “Chaotic multiquenching annealing applied to the protein folding problem,” The Scientific World Journal, vol. 2014, Article ID 364352, 11 pages, 2014.
View at: Publisher Site | Google Scholar
E. A. B. Cole, “Integral evaluation in semiconductor device modelling using simulated annealing with Bose-Einstein statistics,” International Journal of Numerical Modelling: Electronic Networks, Devices and Fields, vol. 20, no. 4, pp. 197–215, 2007.
View at: Publisher Site | Google Scholar
C. Levinthal, “Are there pathways for protein folding?” Journal de Chimie Physique, vol. 65, no. 1, pp. 44–45, 1968.
View at: Google Scholar
J. W. Ponder and D. A. Case, “Force fields for protein simulations,” Advances in Protein Chemistry, vol. 66, pp. 27–85, 2003.
View at: Publisher Site | Google Scholar
B. R. Brooks, R. E. Bruccoleri, B. D. Olafson, D. J. States, S. Swaminathan, and M. Karplus, “CHARMM: a program for macromolecular energy, minimization, and dynamics calculations,” Journal of Computational Chemistry, vol. 4, no. 2, pp. 187–217, 1983.
View at: Publisher Site | Google Scholar
F. A. Momany, R. F. McGuire, A. W. Burgess, and H. A. Scheraga, “Energy parameters in polypeptides. VII. Geometric parameters, partial atomic charges, nonbonded interactions, hydrogen bond interactions, and intrinsic torsional potentials for the naturally occurring amino acids,” The Journal of Physical Chemistry, vol. 79, no. 22, pp. 2361–2381, 1975.
View at: Publisher Site | Google Scholar
F. Eisenmenger and U. H. E. Hansmann, “Variation of the energy landscape of a small peptide under a change from the ECEPP/2 force field to ECEPP/3,” The Journal of Physical Chemistry B, vol. 101, no. 16, pp. 3304–3310, 1997.
View at: Publisher Site | Google Scholar
F. Eisenmenger, U. H. E. Hansmann, S. Hayryan, and C.-K. Hu, “[SMMP] A modern package for simulation of proteins,” Computer Physics Communications, vol. 138, no. 2, pp. 192–212, 2001.
View at: Publisher Site | Google Scholar
G. Némethy, K. D. Gibson, K. A. Palmer et al., “Energy parameters in polypeptides. 10. Improved geometrical parameters and nonbonded interactions for use in the ECEPP/3 algorithm, with application to proline-containing peptides,” Journal of Physical Chemistry, vol. 96, no. 15, pp. 6472–6484, 1992.
View at: Publisher Site | Google Scholar
H. J. C. Berendsen, D. van der Spoel, and R. van Drunen, “GROMACS: a message-passing parallel molecular dynamics implementation,” Computer Physics Communications, vol. 91, no. 1–3, pp. 43–56, 1995.
View at: Publisher Site | Google Scholar
G. N. Ramachandran, C. M. Venkatachalam, and S. Krimm, “Stereochemical criteria for polypeptide and protein chain conformations. III. Helical and hydrogen-bonded polypeptide chains,” Biophysical Journal, vol. 6, no. 6, pp. 849–872, 1966.
View at: Publisher Site | Google Scholar
N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller, “Equation of state calculations by fast computing machines,” The Journal of Chemical Physics, vol. 21, no. 6, pp. 1087–1092, 1953.
View at: Publisher Site | Google Scholar
E. Liñán-García and L. M. Gallegos-Araiza, “Simulated annealing with previous solutions applied to DNA sequence alignment,” ISRN Artificial Intelligence, vol. 2012, Article ID 178658, 6 pages, 2012.
View at: Publisher Site | Google Scholar
J. Kim, S. Pramanik, and M. J. Chung, “Multiple sequence alignment using simulated annealing,” Computer Applications in the Biosciences: CABIOS, vol. 10, no. 4, pp. 419–426, 1994.
View at: Publisher Site | Google Scholar
C. H. Shyi-Ming, “Multiple dna sequence alignment based on genetic simulated annealing techniques,” Information and Management Sciences, vol. 18, no. 2, pp. 97–111, 2007.
View at: Google Scholar
J.-M. Richer, E. Rodriguez-Tello, and K. E. Vazquez-Ortiz, “Maximum parsimony phylogenetic inference using simulated annealing,” in EVOLVE—A Bridge between Probability, Set Oriented Numerics, and Evolutionary Computation II, O. Schütze, C. A. Coello Coello, A.-A. Tantar et al., Eds., vol. 175 of Advances in Intelligent Systems and Computing, pp. 189–203, Springer, Berlin, Germany, 2013.
View at: Publisher Site | Google Scholar
D. J. Wales and H. A. Scheraga, “Global optimization of clusters, crystals, and biomolecules,” Science, vol. 285, no. 5432, pp. 1368–1372, 1999.
View at: Publisher Site | Google Scholar
E. Aarts and J. Korst, Simulated Annealing and Boltzmann Machines: A Stochastic Approach to Combinatorial Optimization and Neural Computing, vol. 37 of Wiley-Interscience Series in Discrete Mathematics and Optimization, John Wiley and Sons, 1989.
View at: MathSciNet
L. Ingber, “Simulated annealing: practice versus theory,” Mathematical and Computer Modelling, vol. 18, no. 11, pp. 29–57, 1993.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
U. Kjærulff, “Optimal decomposition of probabilistic networks by simulated annealing,” Statistics and Computing, vol. 2, no. 1, pp. 7–17, 1992.
View at: Publisher Site | Google Scholar
P. J. M. Van Laarhoven and E. H. L. Aarts, Simulated Annealing: Theory and Applications, vol. 37, Springer Netherlands, 1987.
Y. Nourani and B. Andresen, “A comparison of simulated annealing cooling strategies,” Journal of Physics A: Mathematical and General, vol. 31, no. 41, pp. 8373–8385, 1998.
View at: Publisher Site | Google Scholar | Zentralblatt MATH
Y. Shen, S. Kiatsupaibul, Z. B. Zabinsky, and R. L. Smith, “An analytically derived cooling schedule for simulated annealing,” Journal of Global Optimization, vol. 38, no. 3, pp. 333–365, 2007.
View at: Publisher Site | Google Scholar | MathSciNet
R. Eisberg, R. Resnick, and J. D. Sullivan, “Quantum physics of atoms, molecules, solids, nuclei and particles,” Physics Today, vol. 28, no. 12, p. 51, 1975.
View at: Publisher Site | Google Scholar
F. T. Wall, “Alternative derivations of the statistical mechanical distribution laws,” Proceedings of the National Academy of Sciences of the United States of America, vol. 68, no. 8, pp. 1720–1724, 1971.
View at: Publisher Site | Google Scholar
E. A. Cornell and C. E. Wieman, “Bose-einstein condensation in a dilute gas: the first 70 years and some recent experiments,” in Proceedings of the International Symposium on Frontiers of Science in Celebration of the 80th Birthday of C. N. Yang, H.-T. Nieh, Ed., pp. 183–222, World Scientific, Singapore, 2003.
View at: Google Scholar
J. Frausto-Solís, M. Sánchez-Pérez, E. Lińan-García,, and J. P. Sánchez-Hernández, “Threshold temperature tuning simulated annealing for protein folding problem in small peptides,” Computational & Applied Mathematics, vol. 32, no. 3, pp. 471–482, 2013.
View at: Publisher Site | Google Scholar | MathSciNet
J. Frausto-Solís, H. Sanvicente-Sánchez, and F. Imperial-Valenzuela, “ANDYMARK: an analytical method to establish dynamically the length of the Markov chain in simulated annealing for the satisfiability problem,” in Simulated Evolution and Learning, T.-D. Wang, X. Li, S.-H. Chen et al., Eds., vol. 4247 of Lecture Notes in Computer Science, pp. 269–276, Springer, Berlin, Germany, 2006.
View at: Publisher Site | Google Scholar
H. Sanvicente-Sánchez and J. Frausto-Solís, “A method to establish the cooling scheme in simulated annealing like algorithms,” in Proceedings of the International Conference on Computational Science and Its Applications (ICCSA '04), vol. 3045, pp. 755–763, Springer, Assisi, Italy, 2004.
View at: Publisher Site | Google Scholar
A. Nayeem, J. Vila, and H. A. Scheraga, “A comparative study of the simulated-annealing and Monte Carlo-with-minimization approaches to the minimum-energy structures of polypeptides: [Met]-enkephalin,” Journal of Computational Chemistry, vol. 12, no. 5, pp. 594–605, 1991.
View at: Publisher Site | Google Scholar
Y. Zhang and J. Skolnick, “TM-align: a protein structure alignment algorithm based on the TM-score,” Nucleic Acids Research, vol. 33, no. 7, pp. 2302–2309, 2005.
View at: Publisher Site | Google Scholar
Z. Li and H. A. Scheraga, “Monte Carlo-minimization approach to the multiple-minima problem in protein folding,” Proceedings of the National Academy of Sciences of the United States of America, vol. 84, no. 19, pp. 6611–6615, 1987.
View at: Publisher Site | Google Scholar | MathSciNet

Copyright

Copyright © 2016 Juan Frausto-Solis et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

3588

Downloads

1047

Citations