Abstract

We present a newly adapted Brownian-Dynamics (BD)-based protein docking method for predicting native protein complexes. The approach includes global BD conformational sampling, compact complex selection, and local energy minimization. In order to reduce the computational costs for energy evaluations, a shell-based grid force field was developed to represent the receptor protein and solvation effects. The performance of this BD protein docking approach has been evaluated on a test set of 24 crystal protein complexes. Reproduction of experimental structures in the test set indicates the adequate conformational sampling and accurate scoring of this BD protein docking approach. Furthermore, we have developed an approach to account for the flexibility of proteins, which has been successfully applied to reproduce the experimental complex structure from the structure of two unbounded proteins. These results indicate that this adapted BD protein docking approach can be useful for the prediction of protein-protein interactions.

1. Background

Protein-protein interactions are involved in most cellular process. In order to gain a better understanding of the function of a protein, it is necessary to know how it interacts with other proteins at the molecular level. Both experimental and computational methodologies are used to achieve this goal. Experimental techniques, such as X-ray crystallography and nuclear magnetic resonance (NMR), have been used to determine an increasing number of protein structures that are deposited into the Protein Data Bank (PDB). However, determining the structures of protein complexes is more difficult by using these techniques. In the PDB, the number of protein complexes as biologically functional units is relatively small (11,331 (PDB search criteria: chain type: protein; entity type: protein; number of entities >1; October 14, 2011)) compared to the total number of protein structures (70,852) [1, 2]. As for the computational tools, a successful protein docking method can be used to predict the three-dimensional structures of protein complexes and provide theoretical understanding of protein-protein interactions at the atomic level. The most challenging problems for the computational approaches are extensive enough conformational sampling, accurate scoring, and inclusion of the flexibility of proteins [3, 4].

A variety of search algorithms were developed and implemented in protein docking programs, such as fast Fourier transforms (FFT), geometric matching, and the Monte Carlo technique [3]. FFT-based methods, which are used in ZDOCK, FTDOCK [5, 6], and so forth, use correlations to search the sampling space and evaluate the putative complexes with grid representation [7]. These methods initially assess the geometrical surface complementarity and further expand to include electrostatics and hydrophobic complementarity. The Monte Carlo methods, which are used in RosettaDock, ICM [8, 9], and so forth, conduct stochastic rotation/translation searches from random initial structures around the known or hypothetical binding site and thus explore only certain regions of the surface [10]. The Brownian dynamics (BD) method, which is similar to the force-biased Monte Carlo approach [1113], has been used in the past to predict protein-protein interactions [1420]. In earlier work, we used BD to successfully simulate the recognition between scorpion toxins and potassium channels [21, 22]. Recently, our previous BD prediction of the interaction between scorpion toxin Lq2 and KcsA potassium channel has been verified by the potassium channel-charybdotoxin complex structure (PDB: 2A9H), which was determined by NMR studies [21, 23]. The sequence identity between the KcsA of our model and the experimental structure is 97%. Scorpion toxins Lq2 and charybdotoxin (CTX) share 78% sequence identity, and the RMSD of Cα atoms between Lq2 and CTX is 1.62 Å. The RMSD of our predicted KcsA-Lq2 model and experimental KcsA-CTX structure is 2.85 Å (based on the backbones of Lq2 and CTX only, two KcsA structures are superimposed), which shows excellent agreement between the BD predicted and experimentally solved complex structures. Our BD results indicate that the strong electrostatic interactions between scorpion toxins and potassium channels are the main driving force for their recognition and association.

In the original version of the BD program (MacroDox), only electrostatic interactions between two proteins are included [24]. However, for protein docking simulations it is critical to include the short-range interaction energy terms to rank the final complexes correctly. Accordingly, we have introduced additional local energy minimizations after BD simulations by using a force field, which includes electrostatics, van der Waals, and desolvation energy terms. We used the Coulomb potential with a sigmoidal distance-dependent dielectric permittivity to model the electrostatic energy term, a 12-6 Lennard-Jones potential to model the van der Waals energy term, and the general approach of Wesson and Eisenberg [25] to model the desolvation term. The 12-6 Lennard-Jones potential parameters of four united atom types (C, N, O, and S) were obtained from the AUTODOCK program. The atomic solvation parameters were calculated based on the absolute partial charge on the atom. These energy terms were introduced by the corresponding potential maps. We previously used a similar grid-based force field for protein loop predictions [26].

In order to systematically evaluate the performance of our newly adapted BD program for protein-protein docking simulations, we applied the BD approach on a test set of 24 crystal protein complexes from PDB, which was used by other groups for protein docking approach evaluations [27]. The results from redocking experiments showed that the root mean square deviation (RMSD) between the predicted structures and the crystal structures is within 2 Å. We have further developed the approach to account for the flexibility of proteins, which has been successfully applied to reproduce the experimental complex structure from two unbounded proteins. These results indicate that this adapted BD protein docking approach can be useful for prediction of protein-protein interactions.

2. Methods

2.1. Global BD Sampling

The program package MacroDox, version 3.2.2 [24], was used to assign charges on proteins, solve the linearized Poisson-Boltzmann equation, and run the BD simulations. The BD algorithm for this program has been detailed by Northrup et al. [28, 29].

For the BD simulations of protein-protein interactions, there are only two solute particles, the receptor and ligand proteins treated as rigid bodies. In this study, the receptor of the complex was defined as protein I and was kept fixed with its center of mass (COM) at the origin while the ligand was defined as protein II of which translational and rotational motions were simulated (Figure 1). In each trajectory, protein II starts with a random orientation and position on the b-surface, which was defined as the sum of the maximum radii of the two proteins plus 2 Å. Protein II was subjected to three forces: electrostatic, the random Brownian force, and the frictional force due to solvent viscosity. During the simulation, the motion of the complex(es) satisfying the reaction criteria for encounter complex formation is retained as an effective trajectory [30]. Within such an effective trajectory, two configurations which have lowest electrostatics interaction energy and shortest distance between protein I and protein II, respectively, are recorded as two independent sampled conformations. The trajectory was terminated when protein II either escaped out of the c-surface (5 Å outside of the b surface) or was running longer than 20 ns.

2.2. Structure Refinement by Local Energy Minimization

Local energy minimizations were conducted for all putative protein complexes obtained from the BD simulations. We developed a shell-based grid force field based on our earlier work [26], which includes van der Waals (vdW), electrostatics, and hydrophobic potentials to represent protein I (united atom types) and solvation effect. Figure 2 displays the two-shell grid force field generated from protein I of 1AY7. The inner shell includes 948,240 grids with 0.5 Å resolution representing electrostatics, vdW, and hydrophobic potentials, 15 Å width from the surface of protein I. The outer shell includes 962,432 grids with 0.75 Å resolution representing electrostatics potentials, 30 Å width from the surface of protein I.

The van der Waals interactions are described by the 12-6 Lennard-Jones potentials: where the and Lennard-Jones parameters of four atom types (united atoms C, N, O, and S) are from the AUTODOCK program [31]. Since the real vdW potential (“real” stands for the standard vdW potential as defined in (1)) is extremely sensitive to the distance , we also used a soft potential which replaces the energy value calculated at by the lowest value calculated from ,  Å, and  Å using (1). As shown in Figure 3, this soft potential has the effect of smoothing the potential energy surface and thus helping protein II to cross local energy barriers during the energy minimization.

2.2.1. Electrostatics

Coulomb potential , with a sigmoidal distance-dependent dielectric function was used to model solvent screening, based on the work of Mehler and Solmajer [32]: where is partial charge and , (the dielectric constant of bulk water at 25°C), , , and are parameters. The original parameter set has been modified to produce better results when comparing with the GB model [33]. When the distance between two charges is <1.32 Å, a dielectric constant of 8 is used [26].

2.2.2. Desolvation

The general approach of Wesson and Eisenberg was used [25], and the atomic solvation parameters were calculated based on the absolute partial charge on the atom: where is the index of atoms in the ligand, the index of atoms in the receptor, , linear regression coefficient or weight for the desolvation free energy term, the solvation term for atom , atomic solvation parameter, charge-based atomic solvation parameter, partial atomic charge on atom , the atomic fragmental volume of atom , the distance between atom and atom (in Å), and the Gaussian distance constant = 3.5 Å. The parameters are obtained from the AUTODOCK4 program [34]. Only nonpolar carbon atoms (the absolute value of charge is <0.2) were used for desolvation energy calculations.

The rigid energy minimizations of the protein complexes are based on the downhill simplex method [35] using the newly developed force field [26, 30]. Six variables (three translation values and three rotation values of protein II relative to protein I) are allowed to change during the energy minimizations.

2.3. Accounting for Protein Flexibility by MD Simulations

To account for protein flexibility, we conducted molecular dynamics simulations on each individual protein complex obtained from the BD docking with low interaction energies. The complexes were subjected to 100 steps of the steepest descent (SD) energy minimizations followed by 2ps simulated annealing (SA) from 300 K to 50 K, and additional 100 steps of SD energy minimizations by using the generalized Born (GB) solvation module in the CHARMM program [36]. The optimized complex structures were rescored by the same force field we developed here, but in the explicit form (i.e., without resorting to the grids and without smoothing the LJ potential) with cutoff values of 8 Å and 15 Å for van der Waals and electrostatic interactions, respectively. The top 200 low energy complex structures obtained from rigid body energy minimization by soft energy potential were selected to conduct the flexible protein MD simulations in this study.

2.4. Computational Costs

All simulations were conducted on a Sun Linux Beowulf cluster (2.6 GHz Opteron and 1TB RAM (4 GB–32 GB per node)) at the Center for High Performance Computing (CHiPC) at Virginia Commonwealth University. Computational costs are closely related to the size of both the protein receptor and ligand. For BD sampling, one million runs can be completed within 20 h/CPU for this test set (average residue number is about 500 in the protein complexes). For the soft optimization, it usually takes 12 h using 40 CPUs by splitting the trajectory into 40 subtrajectories. For the real optimization, only the top 500 complexes, ranked by soft interaction energies, are selected to be further calculated, which usually takes 1 h per CPU. For each complex the post-MD-simulated annealing refinement process takes about 30 minutes per CPU.

3. Results

3.1. Docking Bound Complexes

To evaluate the performance of our newly developed BD protein docking approach, we first applied it for a redocking experiment on a test set of 24 protein-protein complex crystal structures which was previously used by several other groups for protein docking method evaluations [27]. The computational protocol of the BD protein docking method including conformational sampling, compact complex selection, and energy minimization is presented in Figure 4. Firstly, the experimental crystal complex structures were subjected to 200 steps of the steepest descent (SD) followed by 2000 steps of the adopted basis Newton-Raphson Method (ABNR) energy minimizations using the generalized born (GB) solvation module in the CHARMM program [36] to remove steric overlap that produces bad contacts. Then BD simulations were conducted to explore the configuration space of the protein ligand relative to the protein receptor by translational and rotational motions to form the protein complexes. The intermolecular forces and torques between proteins were given by the sum of electrostatic and excluded volume forces [24].

For each system, one million independent BD simulations were conducted that usually generated about 0.9 million protein complexes. To reduce computational costs, a distance-based criterion was introduced to filter the complexes from the BD sampling. The protein complexes whose distance between the surface of protein I and the COM of the protein II is smaller than the maximum radius of protein II were retained. This filtering step usually yielded around 0.5 million of these “compact complexes,” which were further optimized using precalculated shell-based grid soft potentials (see Figure 3). Then, the optimized protein complexes were ranked by the soft protein interaction energies. The top 500 complexes with the lowest interaction energies were selected and followed by structural optimization using the shell-based grid real potentials. The optimized complex structures were reranked and the one, with the lowest interaction energies were selected as the final predicted protein complexes.

The predicted results from the newly developed BD protein docking approach are listed in Table 1. The RMSDs from the corresponding crystal structures of all the redocked complexes are within 2 Å, ranging from 0.18 to 1.98 Å (the RMSDs are based on the Cα atoms of the ligands since the receptors are always fixed). With the single exception of 2PCB (see Section 4), in each system studied, the near-native conformation was ranked first based on our scoring function.

To further verify the shell-based grid force field, we also optimized the crystal structures with the real potential map (unsmoothed vdW potential) (Table 1). One can see that the interaction energies of crystal complexes are comparable to those of our redocked complexes. On the other hand, we analyzed the correlation between RMSD and interaction energy for each complex. Taking 1CHO as an example, the plot of interaction energies versus RMSD is shown in Figure 5. After the optimization with the grid-based real potentials the complex with the lowest interaction energy has the smallest RMSD to the crystal structure. Both aspects indicate that this force-field-based scoring function can discriminate between the crystal conformation and other decoy conformations.

3.2. Test BD Protein Docking Approach on Unbound Proteins

To further evaluate BD docking approach for real protein-protein interactions, we also conducted docking studies using the unbound structures of the two proteins in a complex. Crystal structures for proteins, Trypsin and BPTI, are available in both bound (Trypsin/BPTI, PDB : 2PTC) and unbound (Trypsin, PDB : 5PTP; BPTI, PDB : 1BPI) forms, which provide us an opportunity to further evaluate our BD approach. 90,079 complexes were obtained from BD docking simulations and followed by rigid body local energy minimization by the soft potential. The predicted complex with the lowest RMSD (2.9 Å) from the crystal structure complex is ranked as the 142nd by interaction energy. It suggests that even though the BD approach is able to sample the native configuration of the complex correctly, it cannot be selected based on the energy ranking. It is not surprising since the flexibility of both proteins was not considered during the BD docking. We conducted MD-simulated annealing simulations for the top 200 complexes with the lowest interaction energies obtained from the BD docking simulations and followed by rescoring with the same force field we developed here (see Section 2). The energy ranking for the structure with the lowest RMSD has been significantly improved to the 4th with the RMSD of 3.2 Å from the crystal structure. The optimized complex structure of Trypsin/BPTI superimposed with the crystal complex structure is shown in Figure 6.

4. Discussion

4.1. BD Sampling and Electrostatics Correlation

Sampling of the complex conformation is the critical step in protein-protein docking process, since neither scoring function evaluation nor following optimization refinement would be enough if correct or near-native conformations cannot be produced by sampling. In our docking procedure, several significant modifications of sampling parameters were made compared to the traditional BD simulations. As described in Section 2 (Figure 1), the radius of b-surface is 2 Å larger than the sum of the maximum radii of the two proteins; c-surface is 5 Å from the b-surface. Both are much smaller than default values in the MacroDox (default values: the radius of b-surface is 20 Å larger than the sum of the maximum radii of the two proteins and the radius of c-surface is 200 Å). Also, 20 ns was set as the longest simulation time for each BD run. This choice of parameters significantly reduced BD simulation time per run making it possible to perform 1 million BD runs for each complex in reasonable computation time.

BD approach has been used to predict the protein-protein interactions using electrostatics forces as the biasing guide in addition to Brownian stochastic motions. Adding electrostatics forces to the docking/diffusion process has the advantage of speeding up the encounter of the receptor and the complex formation. When the attractive electrostatics is the dominant interaction at the binding site, it can even produce the near-native conformations with only a few thousand BD runs [21, 22, 37, 38]. On the other hand, when the vdW interaction or repulsive electrostatics at the binding site is the dominant force, one million BD runs would be a reasonable number based on our calculations. It could cover the entire sampling space around the receptor (Figure 7(a)). From the column of Sequence number/total number in Table 1 one can see where the best solution comes from. It also indicates that one million BD runs are sufficient for sampling in most cases. One exception is 1VFB where three million BD runs were needed due to the repulsive electrostatics interaction of 1.61 kcal mol−1 between the receptor and ligand based on our calculations.

Cluster analysis was also conducted to illustrate the electrostatics effect on BD sampling within this test set. As shown in Figures 8(a) and 8(b), the more negative (stronger) the electrostatics interaction between the receptor and ligand is, the more of the conformations will fall into the cluster of the near native; and the near-native conformations are produced increasingly early as well. Generally when the electrostatic interaction energy is lower than −5 kcal mol−1, the BD sampling procedure produces the near-native conformations faster and more easily than in other cases.

4.2. Shell-Based Grid Force Field

The grid potential maps generated from the receptor were defined in a shell around the surface of the molecule (see Section 2, Figure 2). This shell-based grid map has the advantage of largely reducing the number of grids compared to the 3D box grid map. Taking 1AY7 as an example, the box-shape potential map would include 1,739,781 and 1,737,808 grids for the inner and outer boxes, respectively; while in the same condition the shell-based map only takes half the number of grids, 948,240 and 962,432 for the inner and outer shells, respectively. In practice, the computer memory limits the number of grids involved in the calculation. For example, in earlier work, only part of the protein surface could be covered for sampling with the box-shape map [27]. Therefore, the binding site had to be identified with the aid of other programs or experimental data. Our shell-based potential map is able to cover the whole surface area of the receptor for conducting a global search, which is necessary when clear information is unavailable for the binding sites of protein receptors.

We used a force-field-based scoring function, which includes vdW, electrostatics, and hydrophobic potentials. We found that when the smoothed vdW energy term (soft potential) was used for the energy optimization, it was much easier for a ligand to find the global energy minima due to its smooth potential energy surface. As seen from Figure 7(b), the RMSD of the ligands between the BD sampling and the crystal structure was 11.5 Å, which was reduced to 2.5 Å after the energy minimization by using the soft potential. Moreover, although the interaction energies obtained from the soft potential function are not as accurate as the real ones, they can still be used for preranking of the predicted complexes. The top 500 preranked complexes were selected and followed by a further energy minimization using the real potential function, which was found to be adequate and effective for the predictions of native protein complexes.

The scoring function exhibits excellent performance in the redocking test set. Only exception was in finding the near-native conformation for the 2PCB. It was ranked the 9th, but the top 8 conformations formed a single cluster with RMSD around 30 Å from the crystal structure. A similar problem existed in Fernández-Recio’s result [27]. We have checked the crystal structure of 2PCB and found an abnormal phenomenon that OD2 of Asp34 of Cyt c Peroxidase and OE2 of Glu90 of Cytochrome c is apart only by 2 Å. It was speculated that occasional crosslinking between these two proteins might account for this phenomenon [39]. Such a short distance between two negatively charged oxygen atoms may lead to higher interaction energy, and thus other decoy conformations appear to have more favorable interaction energy.

In addition to the RMSDs, which were calculated based on all the Cα atoms of ligands, we also calculated the RMSDs for the ligands based on the interface Cα atoms of the predicted structures and compared them with the results from other groups (Table 2). The ligand interface residues were defined as having at least one atom within 5 Å of an atom on the receptor molecule. RMSDs based on interface Cα atoms in this test set are within 1 Å except for 2PCB (1.18 Å). 18 of 24 complex RMSDs are smaller than those from the ICM method [27].

4.3. Crystal Packing

Considering that protein structures in the test set were determined by X-ray crystallography, it is possible that the protein complexes could be affected by crystal packing. We reconstructed the crystal packing protein structures for all the proteins in the test set by using the Swiss PDB Viewer program [42]. We found that in 12 out of the 24 crystal structures, ligands were located at the interface of multiple subunits in the crystal packing, and thus native conformations could be affected by the crystal packing. To test this possibility, we conducted a comparison study in which crystal packing information was included by adding additional related subunits. However, 2 complexes out of 12 packing structures had to be excluded because the additional subunits block the entrances of protein ligands to access the binding sites of protein receptors. Thus, 10 complexes were completely redocked under the condition of crystal packing. The results (Table 3) show that RMSDs of global Cα atoms range from 0.26 to 1.44 Å. Comparing the results without crystal packing, there are no significant differences in RMSDs, which indicates that the crystal packing could be ignored for protein docking method evaluations based on the current test set.

4.4. Prediction of Unbound Proteins

When applying BD sampling approach to predict the complex structure staring from the structures of the unbound proteins Trypsin and BPTI, we found that BD approach was able to sample the near-native conformations, which indicates that the current BD approach is efficient and feasible for real protein complex sampling. However, the near-native poses often have unfavorable interaction energies due to minor steric clashes and therefore cannot be selected based on the energy ranking. This is almost an inevitable consequence arising from rigid body docking process. Although the soft potential used in the current BD approach includes implicitly some partial protein flexibility, it is still not enough to rank the sampled conformations correctly. To achieve high-accuracy prediction, protein flexibility, including flexible side-chains and backbones, has to be considered in either the sampling stage or the postrefinement process. Our new approach based on MD-simulated annealing is a postrefinement process, which accounts for the flexibility of both proteins, and shows very promising results.

Prediction of a correct protein-protein complex relies on sampling the near-native conformation and discriminating it from other decoy conformations. Our BD sampling is validated to sample the near-native conformation within 1 million runs in most cases and works best for the electrostatic attractive protein complexes. We recommend performing 3 million runs for electrostatically unfavorable systems or for any system where electrostatic information is unavailable. Another critical step in our study is using MD-simulated annealing method to account the protein flexibility, which can significantly improve the ranking of near native conformations. In a real case of protein-protein study, the top 500 or more conformations from the soft energy ranking list may need to be included for flexible protein refinement by the MD-simulated annealing simulations.

5. Conclusions

We have developed an adapted Brownian dynamics method to predict the structures of protein complexes. It has three steps. In the first step, global BD sampling, one million independent BD simulations are conducted to explore the entire possible conformational spaces of the protein complexes. In step two, the compact complexes selection, a distance-based criterion is used to select compact complexes. In step three, local energy minimization, all compact complexes are optimized using grid-based soft potentials followed by grid-based real potentials. The complexes with the lowest interaction energies are selected as final predicted protein complexes. To reduce the computational costs for energy evaluations, a grid-based force field was developed to represent protein receptors and solvation effect. The performance of this newly developed BD protein docking approach was evaluated on a test set of 24 crystal protein complexes which was previously used by other groups. The results show that the RMSDs between the predicted and the crystal structures are within 2 Å, which are close to the accuracy of the ICM results from Fernández-Recio et al. [9]. However, compared with the ICM approach, the advantage of our newly adapted BD method is its global sampling, which does not require knowledge of the binding sites of the protein receptors before conducting the BD simulations. Shell-based potential maps can significantly reduce the computer memory requirements compared to rectangular box-based potential maps, which make the global sampling possible. The predicted near-native conformations have the lowest interaction energies for all complexes in the test set except the 2PCB. The results indicate that the grid-based force field scoring function we developed can discriminate the crystal conformation from other sampled conformations. One million BD runs are usually required for the global sampling. However, in the case of positive electrostatics interaction between the protein receptor and ligand, more BD runs are needed. On the other hand, BD sampling has a distinct advantage when stronger electrostatic attraction (<−5 kcal mol−1) exists between the ligand and receptor proteins.

In the current adapted BD protein docking method, both the protein receptor and ligand are treated as rigid bodies. The flexibility during docking is partially considered by using the grid-based soft potentials. We found that a postrefinement process based on the MD-simulated annealing to account for the flexibility of proteins shows us very promising results and can be also useful if combined with other rigid docking approaches to account for protein conformational changes after the rigid docking process. In our future work, more elegant techniques, such as local move Monte Carlo (LMMC) approach for loop sampling, [26] could also be implemented to account for the flexibility of proteins for mimicking the ligand-induced effect.

Acknowledgments

The authors thank Professor S. H. Northrup for his permission to adapt the MacroDox3.2.2 program and for his helpful discussions. The computations were supported by the Center for High Performance Computing (CHiPC) at Virginia Commonwealth University (VCU). This work was supported by National Institute Health Grants (DC008996 and S10RR027411 to M. Cui) and Teragrid Supercomputing Grant (TG-MCB090019 to M. Cui).