Abstract

Since in most cases biological macromolecular systems including solvent water molecules are remarkably large, the computational costs of performing ab initio calculations for the entire structures are prohibitive. Accordingly, QM calculations that are jointed with MM calculations are crucial to evaluate the long-range electrostatic interactions, which significantly affect the electronic structures of biological macromolecules. A UNIX-shell-based interface program connecting the quantum mechanics (QMs) and molecular mechanics (MMs) calculation engines, GAMESS and AMBER, was developed in our lab. The system was applied to a metalloenzyme, azurin, and PU.1-DNA complex; thereby, the significance of the environmental effects on the electronic structures of the site of interest was elucidated. Subsequently, hybrid QM/MM molecular dynamics (MD) simulation using the calculation system was employed for investigation of mechanisms of hydrolysis (editing reaction) in leucyl-tRNA synthetase complexed with the misaminoacylated tRNALeu, and a novel mechanism of the enzymatic reaction was revealed. Thus, our interface program can play a critical role as a powerful tool for state-of-the-art sophisticated hybrid ab initio QM/MM MD simulations of large systems, such as biological macromolecules.

1. Introduction

Ab initio quantum mechanical (QM) calculation is an important tool to investigate functional mechanisms of biological macromolecules based on their three-dimensional and electronic structures. The system size which QM calculations can treat is usually up to a few hundred atoms despite of huge sizes of biomacromolecules including solvent water molecules. Accordingly, isolated models of interesting parts of proteins (e.g., catalytic centers) have been studied in QM calculations. However, the ignored regions in the QM calculations, that is, proteins and solvent, surrounding the catalytic centers, have been shown to contribute to the regulation of electronic structures and geometries of the regions of interest [1โ€“12]. This implies that such QM calculations using the isolated models of the catalytic centers have difficulty in the understanding of biological systems.

To overcome these difficulties, quantum mechanics/molecular mechanics (QM/MM) calculations are utilized, in which the system is divided into QM and MM regions. Here, QM regions correspond to active sites to be investigated, and they are described quantum mechanically. MM regions correspond to the remainder of the system and are described molecular mechanically. The pioneer work of the QM/MM method was accomplished by Warshel and Levitt [12, 13], and since then, there has been much progress on development of QM/MM algorithm and applications to biological systems [14โ€“25]. Here, we describe the procedure of a novel interface program connecting general software packages to perform QM/MM calculations, and its applications for investigating environmental effects of electronic structures of active centers and mechanisms of enzymatic reactions. This could be a solid basis for further applied scientific fields such as biological and medical molecular design.

2. Energy Expression of QM/MM Calculation

In QM/MM method, an entire system is divided into two into QM region, which is described by quantum mechanics, and MM region described by molecular mechanics. Due to the presence of the interactions between the QM and MM regions, the total energy of the entire system can be formally written as follows:๐ธ(ES)=๐ธ(QM)+๐ธ(MM)+๐ธ(QM,MM).(1) The energy term of QM-MM interactions, ๐ธ(QM,MM), enables a more realistic description of the system, compared with isolated QM calculations. With respect to the treatment of the electrostatic interaction between the QM and MM regions, QM/MM methods are divided into two groups, subtractive and additive schemes. Subtractive schemes consist of the three steps as follows; (1) an MM calculation on the entire system, (2) a QM calculation on the QM region, and (3) an MM calculation on the QM region. Then, QM/MM energy of the entire system can be formulated as follows:๐ธsub(ES)=๐ธMM(ES)+๐ธQM(QM)โˆ’๐ธMM(QM).(2) The subscript indicates the type of calculation (QM or MM calculation), and the region on which the calculation is performed is described in parentheses. An advantage of the subtractive schemes is simplicity. Explicit descriptions of interactions between QM and MM regions are not required. In addition, artifacts which might be caused by using link atom schemes to cap the truncated bonds at the QM-MM boundary (described below) can be avoided. On the other hand, disadvantages of the subtractive schemes are the following: (1) force fields are required for describing the QM region which often includes ligands and intermediate structures of enzymatic reactions; in general, reliable force fields of the molecules are not prepared, and additional QM calculations should be carried out for the parameterization every time a new system is studied; (2) the electrostatic interactions between the QM and MM regions are described at molecular mechanics level; that is, the interactions are calculated by the Coulomb interactions between fixed atomic charges in the QM and MM regions. Such description cannot represent polarization of the QM region induced by environment surrounding the QM region.

In contrast, the additive schemes can take into account the polarization effects. The energy for the additive schemes is given inโ€‰โ€‰(3)๐ธadd(ES)=๐ธMM(MM)+๐ธQM(QM)+๐ธQM/MM(QM,MM).(3) A characteristic feature of this scheme is the presence of the energy term with respect to the interactions between QM and MM regions, described as follows:๐ธQM/MM(QM,MM)=๐ธelecQM/MM(QM,MM)+๐ธbondedQM/MM(QM,MM)+๐ธvdWQM/MM(QM,MM).(4) To calculate electrostatic interactions, that is, the first term in the left side of (4), one-electron integrals in the QM Hamiltonian incorporating MM partial charges can be used ๎๐ปelecQM/MM(QM,MM)=โˆ’๐‘๎“๐‘–๐‘€๎“๐‘—๐‘ž๐‘—||๐ซ๐‘–โˆ’๐‘๐‘—||+๐ฟ๎“๐‘˜๐‘€๎“๐‘—๐‘ž๐‘—๐‘„๐‘˜||๐‘๐‘˜โˆ’๐‘๐‘—||.(5) The symbols ๐‘ž๐‘— are the MM partial charges located at ๐‘๐‘—. ๐‘„๐‘˜ are the nuclear charges of the QM atoms at ๐‘๐‘˜, and ๐ซ๐‘– represents electron positions. ๐‘, ๐‘€, and ๐ฟ represent number of electrons, MM atoms to be incorporated into the one-electron integrals, and QM atoms, respectively. Using the additive scheme, the electronic structures of the QM region are affected by the charge distribution of the environment. An advanced approach is to consider polarization of the MM region by the QM region (i.e., to allow the partial charges to be changed according to changes in the electronic structure of the QM region). However, polarizable force fields with broader applications have not yet emerged, despite many efforts to account for the polarization effects [26, 27]. A solution of this problem was recently proposed, and the scheme was applied to molecular dynamics (MD) calculation involving Na+-๐œ‹ interaction, since the polarization effect is important to describe this type of interaction [28].

Note that a problem arises in the case that a covalent bond crosses the boundary between QM and MM regions (e.g., the side chain of an amino acid is assigned to the QM region, while its backbone is assigned to the MM region). Obviously, electronic valence state of the QM subsystem should be closed in the QM Hamiltonian. A traditional approach to reconstruct a saturated valence structure is link atom method [29]. Each bond between such boundary atoms in the QM and MM regions is capped by a hydrogen atom; this link atom is placed along the bond vector between the QM and MM atoms at a distance of ~1โ€‰ร…. This new particle is treated as a QM atom. Although its simplicity is an advantage of this method, introducing the additional atom which does not exist in the real system causes several problems. A serious one is overpolarization problem, where the electronic structure of the QM region is unphysically polarized by the โ€œstrongโ€ partial charge of the boundary MM atom in the case of using the additive scheme, since the link atom is spatially very close to the boundary atoms [21, 29]. In the cases of calculations where small, localized basis sets are employed, the unphysical polarization effects are less serious. However, the effects are more severe in the cases where flexible basis sets are used, such as larger basis sets including polarization and diffuse functions and, in particular, plane waves [21].

Several schemes have been devised to avoid the overpolarization issue: (1) not computing the one-electron integrals regarding the link atoms and MM partial charges [13, 30], (2) assigning zero to the partial charges of the MM atoms replaced with the link atoms [31], and (3) introducing distributed charges by using Gaussian functions centered at the positions of the MM atoms close to the QM region [32, 33]. Generalized hybrid orbitals (GHOs) method is a major method to overcome the disadvantage of the link atom method, which originats from local self-consistent field (LSCF) method [34โ€“40]. In this method, โ€œauxiliaryโ€ and โ€œactiveโ€ orbitals are located on the MM boundary atoms, respectively. By incorporating the active orbital, which is directed towards the QM boundary atoms into SCF calculation, the valency of the QM subsystem is satisfied, while the auxiliary orbitals are fixed during the SCF calculation.

3. Computer Programs for QM/MM Calculations

To conduct QM/MM calculation, several programs are available. For example, for MM calculations, AMBER [41, 42] and CHARMM [43], both of which are widely used program packages, include components for QM calculation in their main packages. A widely used software package for QM calculation, Gaussian [44], has the MM components for QM/MM calculation. An effective approach to carry out QM/MM calculation is to connect existing QM and MM software packages by an interface program. ChemShell [45], QoMMM [46], and PUPIL [47] belong to this category. An advantage of this approach is to enable us to utilize characteristic program modules that each package has (e.g., efficient algorithms for expanding conformational spaces sampled in molecular dynamics simulations implemented in MM packages and algorithms to find optimal reaction paths and stationary points on potential energy surface implemented in QM packages).

We recently developed an interface program to couple AMBER [48] and GAMESS [49] for QM/MM hybrid calculations [11]. Thereby, we can employ various ab initio calculations as the QM part in QM/MM calculations, while just semiempirical methodologies are available in AMBER. Figure 1 shows a schematic architecture of our calculation system. Roles of the interface program are (i) to control the sequence of steps for the QM/MM optimization and MD simulation and (ii) to exchange information between those two program packages. At the first stage of a calculation cycle, a single-point calculation is performed in GAMESS regarding a coordinate set of a system. This yields two types of forces, that is, (1) the forces on QM atoms from the QM-QM and QM-MM interactions and (2) the forces on the MM atom derived from the QM-MM interactions. To calculate the second force, the contribution from the QM atoms to the MM atoms is calculated by integrating the interaction between the partial charges of MM atoms and the electron density at each grid point of the quantum region, as shown inโ€‰โ€‰(6)๐นMM๐‘—=๐‘๎“๐‘–๐ซ๐‘–๐‘—๐‘ž๐‘—||๐‘Ÿ๐‘–๐‘—||3๐‘‘๐‘ž๐‘–.(6) In the equation, ๐‘ž๐‘š is a partial charge in a MM atom and ๐‘ is the total number of electron density points. ๐‘‘๐‘ž๐‘— designates a small volume on the electron density grid, ๐‘‘๐‘ž๐‘—=๐œŒ๐‘—๐‘‘๐‘ฅ๐‘‘๐‘ฆ๐‘‘๐‘ง. This manner of the calculation of the forces on the MM atoms is more accurate than the manner that such forces are calculated by approximating QM charges using Mulliken and RESP charges. With respect to the coordinate set, AMBER is responsible for calculating (1) forces from van der Waals (vdW) interactions between all atoms of the system and (2) forces on MM atoms from MM-MM interactions. After those calculations, the interface program combines the forces calculated by GAMESS and AMBER. At this step, mapping of the relationship between QM and MM atoms is required, since the manner of atom numbering in one program package is different from that of the other one. In order to reduce the overhead cost regarding this process (in particular, the computational time required would be increased, when treating a solvated biomacromolecules over 100,000 atoms), we employ the combination of UNIX shell and C programming. Finally, using the combined forces, AMBER updates the coordinate set in the case of geometry optimization and MD simulation. The issues on the treatments of electrostatic interactions between QM and MM regions are important in the calculation efficiency and have also been discussed previously [50, 51].

4. Investigation of Environmental Effects on Electronic Structure of QM Region

In this section, we discuss the effects of the environmental structures on the electronic state of the active centers of some biological macromolecules. First, we take a metalloenzyme as an example of such an analysis. Metalloenzymes are involved in various biological functions such as electron transfer, binding of dioxygen, storage of metals, turnover of substrates, and configuration of protein structure [52, 53] and are studied in various research fields, such as biochemistry, inorganic chemistry, biophysics, bioengineering, and medical sciences. Understanding the coordination geometry and electronic structure of metal active sites in the metalloenzymes is of fundamental biophysical importance to gain insight into the structure/function correlation of these biological macromolecules. Azurin is one of the metalloenzymes, whose biological function is electron transfer. The copper ion bound to its active site is coordinated by five amino acid residues, a cystein (Cys), two histidines (His), methionine (Met), and a backbone carbonyl oxygen of glycine (Gly) (Figure 2) [54].

We performed QM/MM calculations of the active site of azurin using both of the subtractive and additive schemes to assess the polarization effects of the environment on the geometry and electronic properties of the active center and compared those two results [55, 56]. The spin-unrestricted Hartree-Fock (UHF)/density functional theory (DFT) hybrid all-electron calculation with the B3LYP functional was adopted as the QM Hamiltonian. The detailed theoretical methodologies are previously described [11]. The comparison of the optimized structures obtained by the two schemes showed that the additive scheme gave the electronic property more consistent with spectroscopic experiment than the subtractive scheme. In the experiments, the contribution of 3p-orbital character of sulphur atom (S) of Cys112 was 45% in the SOMO (Figure 3) [57]. In our QM/MM calculation, the spin polarization was significantly improved in model I (50%) compared to model II (57%). This indicates that the explicit inclusion of electrostatic interaction by the QM Hamiltonian is essential for the calculation of the copper site.

Further, we found that some occupied MOs are affected by the environment. For example, with respect to the MOs dominantly occupied by 2p orbital of O atom of Gly45 (the backbone carbonyl), their energy levels in models I and II are different, as shown in Figure 4 [56]. In terms of the geometries, different descriptions of the copper coordination geometry were obtained, particularly for the coordinative bonds including a large dipole: Cuโ€“O (Gly45) and Cuโ€“S (Cys112). Thus, it is suggested that modification of the QM Hamiltonian to interact with the long-distance partial point charges of the environment is crucial for an accurate QM/MM description of the geometrical and electronic properties of metal active sites.

Next, to quantify the effects of the surrounding MM atoms on the electronic structures of the QM region, we take a solvated protein-DNA complex as the second example, that is, PU.1 in a complex with the target double-stranded DNA molecules (Figure 5). PU.1 is a member of the Ets family of proteins, which consists of transcriptional factors that regulate gene expression for biological growth and development. They share a conserved domain composed of around 85 amino acid residues, which binds with the consensus DNA sequence, 5โ€ฒ-GGAA-3โ€ฒ, as the core sequence in the DNA. As the calculation of azurin mentioned above, we employed the two schemes, where the QM region was assigned to the consensus DNA sequence and the amino acid residues which recognize the sequence. The comparison of the molecular orbitals (MOs) obtained in the additive scheme with those obtained in the subtractive scheme revealed that the order of some MOs is shifted and that the surrounding MM atoms have significant impacts on the energy of MOs [11]. This result also emphasizes the crucial importance of considering the long-range electrostatic effects from MM atoms and suggests that biological functions involved in active sites are modulated through long-range electrostatic interactions with the surrounding regions. Actually, it was reported experimentally that Bam HI, which binds the specific DNA core sequence, is supposed to protect the native electronic states of guanine bases from damages (caused by radicals) by regulating the oxidation potential of the DNA [58]. This suggests that protein can modulate the electronic states of DNA molecules, thereby playing a crucial role in the biological functions.

Furthermore, with respect to the PU.1-DNA complex, we investigated relationship between energies of MOs of DNA bases and the DNA recognition by PU.1 using the hybrid QM/MM calculations [59]. In the complex, PU.1 masks some DNA bases from the solvent through hydrogen bonds and van der Waals contacts. Accordingly, the magnitude of the masking by the protein is, in general, corresponding to the recognition mode of DNA and the protein. The analysis using the hybrid QM/MM calculations revealed that the MO energies of some DNA bases strongly correlate with the magnitude of masking of the bases from the solvent by the protein. Moreover, in the complex, PU.1 causes a variation in the magnitude of the masking among DNA bases through directly recognizing the bases and inducing structural changes of the DNA structure from the canonical one. In this way, the strong correlation of the MO energies and the magnitude of masking DNA bases from the solvent by protein is the first evidence showing the close quantitative relationship between recognition modes of bases and the energy levels of the corresponding MOs. This also means that the electronic state of each base is highly regulated by the DNA recognition of the protein. Other biological macromolecular systems can be expected to also possess modulation mechanisms, suggesting that this provides a novel basis for the understanding of the regulation functions of biological macromolecular systems.

5. Investigations of Enzymatic Reactions: A Novel Catalytic Mechanism by a Hybrid Ribozyme/Protein Catalyst

The following study is an example of applications of hybrid QM/MM MD simulation to enzymatic reactions. Aminoacyl-tRNA synthetases play a critical role in decoding genetic information through catalyzing attachment of their cognate amino acid to 3โ€ฒ-end of the specific tRNA. The fidelity of translation is assured by their strict discrimination of the cognate amino acids from noncognate ones. However, several systems have difficulties in the strict discrimination of their specific amino acids, producing misaminoacylated tRNAs [60, 61]. To avoid this error, the enzymes have editing function to hydrolyze the mis-products by themselves, but detailed mechanisms are not known due to the absence of experimental structures of aaRSs in complex with tRNA bound to the active site of the editing reaction. The crystal structure of Thermus thermophilus leucyl-tRNA synthetase (LeuRS) in complex with tRNALeu bound to the connective polypeptide (CP) 1 domain on which the active site of the editing reaction is located was determined [62]. LeuRS is one of aaRSs possessing the editing function. Exploiting this crystal structure, we performed structural modeling of the LeuRSโ€ขtRNALeu complex, for which an incognate amino acid, valine, is attached to A76 of tRNALeu [63], and identified the water molecule corresponding to the nucleophile of the hydrolysis (Figure 6) [64].

Furthermore, we elucidated the mechanism for the nucleophilic water to approach the electrophilic atom, that is, the carbonyl carbon of the misattached valine: the hydrogen atom attached to the O3โ€ฒ atom of the ribose of A76 (3โ€ฒ-HO) is required to be translocated for the nucleophilic water to approach the carbon. In other words, the 3โ€ฒ-HO plays roles as a โ€œgateโ€ to regulate the access of the nucleophile (Figure 6(b)) [64]. Then, in order to investigate changes in electronic structures on the gate opening, we performed hybrid QM/MM calculations using the above-mentioned interface program.

As a result, in both of opened and closed states, the lowest unoccupied molecular orbital (LUMO) is localized on the carbonyl group of the substrate, which contains an antibonding character regarding the Cโ€“O2๎…ž bond. This indicates that the carbonyl carbon atom can accept electrons from the nucleophile and that the electron donation results in the cleavage of the bond. Furthermore, with respect to the nucleophile, the energy difference between LUMO and the MO which includes a major contribution from the valence p-orbitals of the oxygen atom of the nucleophile decreases on gate opening. This indicates that the nucleophilic attack is facilitated on gate opening. Thus, the initial phase of the editing reaction in the LeuRSโ€ขtRNALeu complex was identified in terms of the changes of the electronic structures.

Moreover, to elucidate the editing mechanisms, we performed hybrid QM/MM MD simulations [65]. We examined four possible schemes of the reaction and identified the most feasible pathway, in which the nucleophilic water is activated by the 3โ€ฒ-OH of tRNALeu and the O2๎…ž atom is capped by a hydrogen atom of the water. This means that the editing is actually a self-cleavage reaction of the tRNA; that is, editing is a ribozymal reaction. However, it should be noted here that the protein also contributes to the reaction, by (i) stabilizing the transition state and (ii) increasing the catalytic power of the 3โ€ฒ-OH to activate the water through the formation of the hydrogen bond networks. Furthermore, the importance of opening the H-gate was also confirmed in the hybrid QM/MM MD simulations. Considering the contributions of the protein moiety to this ribozymal reaction, we concluded that editing is achieved by a โ€œhybrid ribozyme/protein catalyst,โ€ which represents a novel enzyme category.

Recently, a similar theoretical analysis of the editing mechanism was conducted by employing our fully-solvated structural model [63, 64], and the author reported almost the same reaction mechanisms as described above [66]. The small difference seems to be another alternative initial reaction pathway that was observed in the more recent study. However, this difference found in the more recent study would be due to the inappropriate QM regions lacking the crucial atoms, and so on (also see Simulations 2 and 4 and the discussions in Hagiwara et al., [65]).

The above-mentioned result is consistent with the widely accepted, but unproven, scenario of the evolution of life, which proposes that proteins previously just acted as chaperones and subsequently replaced such functional ribozymes [67]. Further, we could propose a broader scenario by introducing a novel stage of evolution, where tRNA itself acquired the ability to play the catalytic roles and employed the proteins to enhance the catalysis; this can be called the โ€œhybrid catalyst world,โ€ which previously existed on the earth and played a critical role in generating our modern world of life.

6. Conclusions

To connect the QM and MM calculation engines, GAMESS and AMBER, a UNIX shell-based interface program, was developed in our group. Overhead costs of the UNIX shell which arise from treating huge amount of information such as number of atoms of a solvated system of the biological system were avoided using C programs and modifications of the AMBER and GAMESS source code. The GAMESS and AMBER engines used in our system are highly efficient for parallelization; therefore, our system is suitable for simulations of large molecular systems. This system was applied to a metalloenzyme, azurin, and PU.1-DNA complex, and the significance of the environmental effects on the electronic structures of the site of interest was revealed. Subsequently, the system was employed for investigation of mechanisms of editing reaction in LeuRS, and the productive enzymeยทsubstrate complex was identified as an initial state of the enzymatic reaction. Then, the reaction pathways of the editing reaction occurring in the complex of LeuRS and mis-aminoacylated tRNALeu were investigated, and thereby the catalytic mechanism was identified. The revealed mechanism is consistent with the biochemical experimental result. This mechanism led us to finding a novel type of enzyme, that is, a hybrid ribozyme/protein catalyst. Thus, our interface program can play a critical role as a powerful tool for sophisticated hybrid QM/MM MD simulations of large systems, such as biological macromolecules. Such a state-of-the-art theoretical methodology could also be a solid basis for applied scientific fields such as pharmaceutical molecular design.

Acknowledgments

This work was partly supported by grants-in-aid from the Ministry of Education, Culture, Sports, Science and Technology (MEXT) under Contract nos. 21340108 and 23654129. Computations were performed using computer facilities under the โ€œInterdisciplinary Computational Science Programโ€ at the Center for Computational Sciences, University of Tsukuba, and the Computer Center for Agriculture, Forestry, and Fisheries Research, MAFF, Japan, and the Supercomputer Center, Institute for Solid State Physics, University of Tokyo. J. Kang was supported by the Research Fellowship of the Japan Society for the Promotion of Science.