Abstract

The 46- and 69-residue BLN model proteins both exhibit frustrated folding to β-barrel structures. We study the effect of varying the strength of nonnative interactions on the corresponding energy landscapes by introducing a parameter 𝜆, which scales the potential between the BLN (𝜆=1) and Gō-like (𝜆=0) limits. We study the effect of varying 𝜆 on the efficiency of global optimisation using basin-hopping and genetic algorithms. We also construct disconnectivity graphs for these proteins at selected values of 𝜆. Both methods indicate that the potential energy surface is frustrated for the original BLN potential but rapidly becomes less frustrated as 𝜆 decreases. For values of 𝜆0.9, the energy landscape is funnelled. The fastest mean first encounter time for the global minimum does not correspond to the Gō model: instead, we observe a minimum when the favourable nonnative interactions are still present to a small degree.

1. Introduction

Proteins are biopolymers constructed from a sequence of amino acid residues. The potential energy landscapes of proteins have many degrees of freedom and include important contributions between pairs of residues that are distant in sequence, but close to each other in space. Despite this complexity, many globular proteins fold to a well-defined the native state. According to the thermodynamic hypothesis, this structure is the global free energy minimum for a given sequence [1]. Frustration occurs when there are low-lying structures separated by high barriers [2]. All the favourable interactions between pairs of residues cannot be accommodated at the same time, which can lead to energetic frustration, where there are several low-lying structures with different patterns of contacts. Geometric frustration occurs when the interconversion of two low-lying structures requires the breaking of several favourable contacts.

A systematic way to simplify the potential energy surface for a protein is to include only attractive interactions between pairs of residues that are in contact in the native state, which constitutes a Gō model [3]. Various on- and off-lattice Gō models have been investigated by different authors to study a range of different proteins. In spite of the simplified potential, these models have proved capable of reproducing certain aspects of protein dynamics and thermodynamics [411]. Using a Gō model tends to lead to funnelled energy landscapes [12], with very little frustration. For some proteins, neglecting nonnative interactions can have a significant influence on the energy landscape [13].

United atom representations introduce a further level of coarse-graining, which can speed up simulations significantly, at the cost of atomistic detail. The simplest coarse-grained model is the HP model, in which each protein residue is represented by a single hydrophobic (H) or polar (P) bead and is constrained to lie on a regular lattice [14, 15]. The BLN model is an off-lattice generalisation of the HP model with three types of bead: hydrophobic (B), hydrophilic (L), and neutral (N). The 46-residue sequence [12, 1633] B9N3(LB)4N3B9N3(LB)5L and the 69-residue sequence [3438] B9N3(LB)4N3B9N3(LB)4N3B9N3(LB)5L were designed to exhibit frustrated folding and have several alternate 𝛽-barrel structures that are separated by large energy barriers. Disconnectivity graphs [39] for both of these proteins exhibit energy landscapes comprising several folding funnels [12, 38]. Using a Gō potential for these two proteins changes the nature of their energy landscapes, and they both exhibit single funnels with very little frustration [12, 38].

Intermediate potentials can be generated using a parameter, 𝜆, which scales the strength of the nonnative interactions between the Gō (𝜆=0) and BLN (𝜆=1) limits. The folding thermodynamics of the 46-residue BLN protein have been investigated using this scaled BLN potential [23, 32, 33], showing that most of the frustration is only present for values of 𝜆0.9. The introduction of salt bridges (gatekeepers) to the 46-residue protein also produces energy landscapes of intermediate character [27, 28].

In the present work, we study the effect of varying 𝜆 on the ease of global optimisation of the 46- and 69-residue BLN proteins using a basin-hopping algorithm and a genetic algorithm. We also construct disconnectivity graphs to compare the energy landscapes of the proteins for different values of 𝜆.

2. Computational Methods

The protein structures were modelled using the following BLN potential [12, 21, 26, 28]:𝑉BLN=12𝐾𝑟𝑁1𝑖=1𝑅𝑖,𝑖+1𝑅𝑒2+12𝐾𝜃𝑁2𝑖=1𝜃𝑖𝜃𝑒2+𝜖𝑁3𝑖=1𝐴𝑖1+cos𝜙𝑖+𝐵𝑖1+3cos𝜙𝑖+4𝜖𝑁2𝑁𝑖=1𝑗=𝑖+2𝐶𝑖𝑗𝜎𝑅𝑖𝑗12𝐷𝑖𝑗𝜎𝑅𝑖𝑗6,(1) where 𝑅𝑖𝑗 is the distance between two beads 𝑖 and 𝑗. The first term is a harmonic bond restraint with 𝐾𝑟=231.2𝜖𝜎2 and 𝑅𝑒=𝜎. The second term is a bond angle restraint with 𝐾𝜃=20 rad−2 and 𝜃𝑒=1.8326 rad. The third term involves torsional angles, 𝜙, defined by four successive beads. If two or more of these beads are N, then 𝐴=0 and 𝐵=0.2. For all other sequences, 𝐴=𝐵=1.2. The final term introduces pairwise nonbonded interactions. If one residue is L and the other is L or B, then 𝐶=2/3 and 𝐷=1. If either of the residues is N, then 𝐶=1 and 𝐷=0. If both residues are B, then 𝐶=1, but the value of 𝐷 depends on the presence of the contact in the native state of the protein. For native contacts, 𝐷=1. For nonnative contacts, 𝐷=𝜆, where 0<𝜆<1. The case where 𝜆=1 is the original BLN potential and 𝜆=0 is the Gō potential.

Native contacts are defined as all pairs of residues where 𝑅𝑖𝑗 is less than a fixed cut-off distance in the native state (global minimum) of the protein. When 𝜆1, the value of this cut-off radius will influence the energy landscape. Here, we use 1.167𝜎 for consistency with previous work [12, 28, 38].

Global optimisation was performed using the basin-hopping approach [4244] and a Lamarckian genetic algorithm [38, 45], which are both implemented in the GMIN program [46]. Each algorithm involves local energy minimisation after each structural perturbation. This minimisation transforms the potential energy surface into the basins of attraction of local minima [47] and removes downhill barriers. The search parameters for both algorithms were optimised in previous work for BLN proteins [38], and these parameters were used without adjustment for all searches presented here (Table 1). The GMIN input files used for these searches are included in the supplementary data (see Supplementary Material available online at http://dx.doi.org/10.1155/2012/192613).

The genetic algorithm represents each structure with a genome consisting of the torsion angles in the backbone of the protein. Offspring structures are generated by one-point crossover from two parent structures. Mutants are generated by making a copy of an existing structure (parent or offspring) and replacing one of the torsion angles. To prevent stagnation of the genetic algorithm searches, a restart operator was used. If an entire generation of offspring contains no solutions that are fitter than any of the parent structures, a new epoch is started with a new random population. For the 69-residue protein, the fittest structure from each epoch survives into the next epoch.

All conformational searches were run until the global minimum structure was found. We report the mean time taken to encounter this structure in conformational searches from randomised starting points to compare the exploration of the energy landscape as a a function of 𝜆. Searches were performed for values of 𝜆 between 0 and 1 in steps of 0.1, with additional points at 𝜆=0.95 and 𝜆=0.99. The initial structures for this benchmarking were generated using two alternative methods: either random placement of the residues inside a sphere of radius 3𝜎, or random assignment of the backbone dihedral angles. Full details of all of the global optimisation runs are available as supplementary data.

The disconnectivity graphs for the model proteins were constructed from databases of stationary points generated using the PATHSAMPLE program, [48] which organises independent pathway searches using OPTIM [49]. All the transition state searches in OPTIM were conducted in Cartesian coordinates [50] using a quasicontinuous interpolation scheme to avoid chain crossings, with local maxima accurately refined to transition states by hybrid eigenvector-following [5153]. Successive pairs of local minima were selected for connection attempts within OPTIM using the missing connection algorithm [54]. Disconnectivity graphs [39] will be illustrated for both the 46- and 69-residue scaled BLN proteins with 𝜆 values of 0, 0.5, 0.9, and 1.

We also study the effect of 𝜆 on key structures of the BLN proteins. These structures were reminimised using values of 𝜆 between 0 and 1 in steps of 0.1. Pathways between pairs of interesting minima were studied by Dijkstra analysis [55] in PATHSAMPLE [48], with the discrete paths [56] that make the largest contribution to the steady-state rate constant [56, 57] presented here.

With a few exceptions, all of the stationary points of the BLN model proteins are chiral. However, the BLN potential includes no chiral terms, so each structure has an enantiomer with the same energy. When evaluating the optimisation algorithms, we accept convergence to either of the enantiomers of the global minimum. When looking at the pathways, it is important to use the same chirality for both structures, otherwise much longer paths result. For some of the trapped structures, pathways to both enantiomers of the global minimum can be viable.

3. Results

3.1. BLN-46

Searches for 𝜆=0 (Gō potential) find the global minimum much more rapidly than when 𝜆=1 (BLN potential), as one would expect for a more funnelled energy landscape [2, 5861]. However, the number of steps required varies nonlinearly between these two extremes and behaves differently for each search algorithm. When optimising with the GA, the mean first encounter time decreases rapidly from 𝜆=1 to 𝜆=0.9 and then more slowly to a minimum at 𝜆=0.5 (Figure 1). After this minimum, there is a small increase in the required time as 𝜆 decreases to 0. This result is consistent with previous observations that the introduction of some nonnative interactions can assist the folding of some proteins [62]. Below 𝜆=0.9, almost all searches find the global minimum within the first epoch of the GA. For larger values of 𝜆, several searches require two or more epochs, leading to much more variation in the first encounter time. The choice of the random starting configurations for the initial population of the GA makes little difference to the mean first encounter time.

In basin-hopping searches, the choice of starting structures makes a large difference to the efficiency of the optimisation. When starting from residues randomly distributed inside a sphere, for values of 𝜆<0.7, 95% of the searches find the global minimum rapidly. The remaining searches become trapped and require several thousand attempted Monte Carlo moves to escape (Figure 2). In this trap, the first, third, and fourth strands are correctly packed, but the second is wrapped around the outside of the protein (Figure 3). Searches with larger values of 𝜆 do not become trapped in this basin, which suggests that the nonnative interactions are important in stabilising the intermediates between this structure and the global minimum.

The trap configuration lies 12.4𝜖 above the global minimum when 𝜆=0 and becomes more unfavourable for larger values of 𝜆 (Table 2). The fastest escape route from this trap involves unthreading of the N-terminus from the loop made by the second strand (Table 2). The energy of the highest transition state on this pathway relative to the trapped state increases from 𝜆=0 to 𝜆=0.9 before levelling off. The highest transition state on this pathway lies above the barrier to interconversion of the two enantiomers of the global minimum. For searches starting from a random set of torsion angles, this trapping is much less frequent and is only seen in 3 of the 700 searches performed where 0𝜆0.6. By retaining some notion of connectivity, these initial structures cover less of the configurational space than the entirely random starting points. However, the complete coverage of conformational space comes at the cost of including more unstable structures, such as the trap seen here.

The five lowest minima in the BLN-46 protein span an energy range of less than 𝜖 (Figure 4). The two most stable minima are in the same basin, and both have all of the BB contacts from the native state. Across the range of 𝜆, the relative energies of these minima are within 0.1𝜖 of each other, with the second-best minimum becoming slightly more stable as 𝜆 decreases and moving below the former global minimum when 𝜆<0.3 [12]. The next three minima are stabilised by some nonnative contacts and become less stable relative to the global minimum as 𝜆 decreases. In the region around 𝜆=0.5, these structures cease to be minima and fall into the basins of attraction [41] of the two lowest energy structures.

The disconnectivity graphs within 7𝜖 of the global minimum for 𝜆=0 and 𝜆=0.5 are funnelled and almost indistinguishable (Figure 5). When 𝜆=0.9, some frustration appears in the low-energy regions of the energy landscape, but it is still mostly funnelled. Almost all of the frustration is introduced between 𝜆=0.9 and 𝜆=1, where several alternate 𝛽-barrel structures are separated by barriers of 4 to 5𝜖. This organisation is consistent with the increase in the mean first encounter times seen for global optimisation with 𝜆>0.9 and agrees with previous studies of the thermodynamics of the 46-residue protein [32, 33], where 𝜆=0 and 𝜆=0.5 were found to be good folders, 𝜆=0.9 an intermediate folder, and 𝜆=1 a poor folder.

3.2. BLN-69

The behaviour of the GA for the 69-residue protein is similar to that for the 46-residue protein, with the fastest search time found at 𝜆=0.5. When optimising with basin-hopping on the 69-residue protein, there are several slow searches between 𝜆=0.4 and 𝜆=0.8 (Figure 6). There are multiple trap structures, and the one that is seen most frequently, which is responsible for the slowest searches, is formed from three strands from the left-handed barrel and three strands from right-handed barrel (Figure 7). This structure is a six-stranded 𝛽-barrel similar to the global minimum, but with two sets of interstrand contacts swapped (1–6 and 3-4 in the global minimum compared to 1–4 and 3–6 in the trap).

Conversion from the above structure to the global minimum proceeds either by inversion of the three strands at the N-terminus or of the three strands at the C-terminus. The barriers to these two mechanisms are different and vary with 𝜆 (Table 3). The barrier for the fastest pathway for inversion at the C-terminus becomes larger with increasing 𝜆. However, the barrier for inversion of the N-terminus varies much less with 𝜆. In the region where 0.5𝜆0.7, the barriers to both routes out of the trap are relatively high, which is a possible explanation for the slow basin-hopping optimisation for these values of 𝜆. This is doubtless an oversimplification when we consider that there are multiple trap structures.

For the 69-residue BLN protein, the energies of the five lowest minima span less than 0.4𝜖 (Figure 8). One structure lies in the same funnel as the global minimum, and its relative energy increases from 0.2𝜖 to 1.6𝜖 when 𝜆 decreases from 1 to 0. The other three structures occupy different funnels from the global minimum, with several nonnative contacts, and their stability decreases steeply with decreasing 𝜆. Unlike the 46-residue protein, the global minimum structure remains the same for all values of 𝜆. The low-energy region of disconnectivity graphs for values of 𝜆 between 0 and 0.9 are mostly funnelled (Figure 9). Almost all of the frustration in this region of the potential energy surface appears for 𝜆>0.9.

4. Conclusions

Much of the energetic frustration in the BLN proteins is removed once the potential contains a 10% contribution from the Gō function. When looking at geometric frustration in higher-energy traps, the effect of 𝜆 is less predictable. The removal of nonnative interactions can stabilise or destabilise the transition states that must be crossed to escape from these traps. Measures of the landscape complexity [30] could provide a useful way to understand the influence of nonnative interactions and will be considered in future work.

Acknowledgments

The authors acknowledge the Engineering and Physical Sciences Research Council, UK (EPSRC), for funding under Programme Grant EP/I001352/1. The calculations described in this paper were performed using the University of Birmingham’s BlueBEAR HPC service, which was purchased through HEFCE SRIF-3 funds (see http://www.bear.bham.ac.uk/).

Supplementary Materials

Example input files for GMIN and mean first encounter times for all GMIN searches are available as Supplementary Material.

  1. Supplementary Material
  2. Supplementary Material