Abstract

Modern biological sciences are becoming more and more multidisciplinary. At the same time, theoretical and computational approaches gain in reliability and their field of application widens. In this short paper, we discuss recent advances in the areas of solution nuclear magnetic resonance (NMR) spectroscopy and molecular dynamics (MD) simulations that were made possible by the combination of both methods, that is, through their synergistic use. We present the main NMR observables and parameters that can be computed from simulations, and how they are used in a variety of complementary applications, including dynamics studies, model-free analysis, force field validation, and structural studies.

1. Introduction

Structure and dynamics studies of proteins and other biological macromolecules involving the use of multiple techniques are rapidly becoming the norm rather than the exception. In addition to a wide array of experimental possibilities, structural studies benefit from theoretical techniques whose predictive abilities have increased tremendously due to both methodological developments and the rapid increase of available computer power. The joint use of nuclear magnetic resonance (NMR) spectroscopy [1] and molecular dynamics (MD) simulations [2] is becoming commonplace due to their high complementarity: while NMR yields highly quantitative data on dynamic processes, these data suffer from not being easily linked to unambiguously identified motions. On the other hand, MD simulations unambiguously describe atomic motions, but they are predictions impaired by force-field limitations and model approximations. Hence, combining the strengths of experimental data from NMR and simulation data from MD yields a more reliable understanding of dynamics in terms of quantities and physical description of motions, respectively. This has an impact on our ability to study a variety of biological systems, from Alzheimer’s disease-related amyloid peptides [3, 4] to the catalytic properties of antibiotic resistance enzymes [5].

Solution NMR spectroscopy allows the study of proteins in terms of both structure and dynamics. For proteins, labelling with 13C and 15N is typically performed and allows the observation, at atomic resolution, of most atoms, since naturally abundant 1H is also NMR-active. Because such isotope labelling has no effect on protein structure and function, NMR can be considered a pseudo-label-free technique. Most NMR dynamics studies focus on N–H groups, which allows observation of a specific probe for most residues, with the notable exception of prolines and N-termini. A set of different experiment types are then available to probe motions on different timescales, ranging from the ps to several days (see Figure 1). However, NMR, as many other spectroscopic or bulk techniques, is limited by averaging [6] and experimental observations are thus driven by population statistics. Therefore, even though quantitative dynamics information can be derived from NMR, most of the time this information is limited to determining timescales (rates) of motions, rather than giving a direct physical description of these motions.

MD simulations are a structural bioinformatics technique that uses Newtonian physics to describe the dynamics of a system (such as a protein immersed in water molecules) at the atomic level. Atoms are represented by charged point masses. The force field (the expression of the potential energy as a function of atomic positions) is used to compute forces in the system and generate atomic positions and velocities along a trajectory. Modern force fields and MD implementations and techniques were recently reviewed by Guvench and MacKerell [2]. Timescales amenable to MD simulations have increased by many orders of magnitude in recent years, making MD a powerful probe of macromolecular dynamics (see Figure 1).

Case [7], in a seminal 2002 article, predicted important developments in the combined use of MD and NMR, with a focus on spin relaxation experiments applied to the study of both global rotational motions and the local dynamics of individual spins. In this short paper, different aspects of the joint use of NMR and MD are discussed. While this document is not an exhaustive listing of all the work done in the field of combined NMR and MD, it should give the reader a broad picture of how these methods can be used synergetically. Computational docking using NMR constraints, although not MD per se, is also discussed briefly. Our focus is on solution NMR, and therefore joint applications of MD and solid-state NMR spectroscopy (ssNMR) are not reviewed herein. (Readers interested in MD-ssNMR studies will find the recent work of Romo et al. [8] an interesting literature starting point).

In the next section, we overview methods for calculating NMR parameters and observables from MD trajectories: spin relaxation rates, order parameters, and conformational exchange. Then, applications of combined NMR and MD for dynamics studies are presented, starting with a selection of biological models that were developed using both MD and NMR. Specific applications are then treated: model-free analysis, force field validation, the special case of backbone energetics, and long-timescale simulations. The last section summarizes the use of MD simulations in structural NMR studies, including structure determination and ligand docking. We conclude with perspectives on future usage of joint MD and NMR.

2. NMR Observables and Parameters from MD Simulations

2.1. Spin Relaxation Rates

One of the most popular approaches to the study of protein dynamics using NMR is the recording of spin relaxation rates 𝑅 1 and 𝑅 2 , as well as steady-state heteronuclear nuclear overhauser effect (NOE). 𝑅 1 is the longitudinal relaxation rate, that is, the rate with which nuclear magnetization returns to equilibrium after being perturbed. 𝑅 2 is the transverse relaxation rate, that is, the rate with which spin magnetization coherence is lost after being created. The steady-state heteronuclear NOE corresponds to the cross-relaxation between two dipolar-coupled spins [9]. These relaxation observables depend on stochastic motions on discrete timescales, which are combinations of the Larmor frequencies of the nuclear spins involved. These data can be analyzed directly, although their interpretation rarely allows the identification of the exact underlying motion.

MD can be used to calculate 𝑅 1 , 𝑅 2 , and NOE values. One approach is to compute data from spectral density values themselves computed from the partitioned correlation function reporting on the reorientation of the bond vector under study (the vector linking two bonded atoms for which motions are observed through NMR relaxation, e.g., N–H, N–C, or C–H bond vectors) [10, 11]. Alternatively, reduced spectral density values can be calculated from the experimental data and compared to spectral density values [12] extracted from MD, although the data content from such an alternative approach is reduced compared to the more direct one. Of course, these approaches, as others presented below, suffer from the limited length of trajectories, which introduces a discontinuity into the calculated correlation function [11]. This is particularly true when bond vector reorientation converges slowly, that is, the vector experiences high mobility. Although MD can be used to help in the interpretation of spin relaxation data directly, the analysis of such data is generally performed within a mathematical framework such as the model-free formalism (also known as the Lipari-Szabo formalism [13, 14], see below).

2.2. Order Parameters

In the model-free formalism (reviewed by Morin [15]), spin relaxation rates are transformed into more easily interpretable parameters such as the squared generalized order parameter ( 𝑆 2 ), a quantitative indicator of local order, and the conformational exchange parameter ( 𝑅 e x ), which is a semiquantitative indicator of μs-ms motions (treated in the next section). 𝑆 2 order parameters range from zero to one and are a measure of the spatial restriction of a chemical bond, in this case the N–H vector of the peptide plan. At a value of 1, there is no internal motion whatsoever in the bond vector. At 0, the bond is not constrained in any way relatively to others.

MD simulations can be used to calculate 𝑆 2 [5, 1619]. When calculated values fit well with experimental data, MD simulations can be used with confidence for interpreting underlying motions. The classic approach to MD-derived 𝑆 2 is to compute the internal autocorrelation function for the bond reorientation. The plateau value extracted from this function at an infinite time corresponds to 𝑆 2 [17, 20] (see Figure 2). This plateau is reached when simulations converge (i.e., when the accessible conformational space is adequately sampled). Otherwise, the function is unconverged, leading to a possible 𝑆 2 overestimation. Moreover, if a single local motion dominates the trajectory, the decay to the plateau is exponential, and the area under the curve corresponds to the internal correlation time 𝜏 , the timescale of motion characterized by 𝑆 2 (see Figure 2). In a variation of this approach, the internal autocorrelation function does not use the MD data in a chronological manner, but rather randomized, with the function directly decaying to the plateau ( 𝑆 2 ) value [5, 21]. Obviously, convergence cannot be assessed with this approach. However, it allows the combination of multiple trajectories in order to increase conformational sampling. 𝑆 2 can also be computed from spherical harmonics summation [17, 21], another technique that allows the use of multiple trajectories. Agreement between spherical harmonics and the autocorrelation function was used by Buck et al. [17] to assess convergence.

The model-free formalism was derived by assuming separation of global and local dynamics, that is, separation of timescales for the global diffusion of the protein and the local motions of the bond vector [13, 14]. For folded globular proteins, this timescale separation is generally respected. However, for unfolded proteins, where the global motions can be much faster and coupled to local movements, such separation might not be true. (It was later shown [22] that timescale separability is not an issue in most cases; a notable exception is when local motions considerably alter the overall shape of the protein.) In order to mitigate the issue of timescale separability, Prompers and Brüschweiler [16] introduced a method called isotropic reorientational eigenmode dynamics (iRED), which relies on the use of MD simulations for the analysis of experimental NMR spin relaxation data. In addition to being completely unaffected by timescale separability (and thus well suited for the study of unfolded proteins), this method can also evaluate if global and internal motions are statistically separable. The method uses principal component analysis (PCA) [23] evaluated from MD simulations in order to extract reorientational eigenmodes and amplitudes. The distribution of eigenvalues allows the assessment of timescale separation. Finally, iRED parameters are fitted to experimental data, allowing the extraction of dynamics information, such as 𝑆 2 values. The method was applied to the iron responsive element RNA by Showalter et al. [24], who could demonstrate that global and local motions are not statistically separable, but rather correlated for this small macromolecule. Maragakis et al. [19] also proposed a solution to the separability problem, where the full autocorrelation function (as opposed to the local one) is used, thereby recoupling global and local motions to simulate the effect of global tumbling on NMR relaxation. Finally, Johnson et al. [25] created different helical ensembles by modification of 𝜙 / 𝜓 angles in order to compare different approaches to the extraction of order parameters for multiple different backbone bond vectors. Their approach highlighted the fact that multiple analytical methods applied to the same system can yield a complementary and self-consistent picture of reorientational motions. In other words, extracting 𝑆 2 using different approaches allows inferences on the underlying motions in action, whereas a single 𝑆 2 value (from a single approach) does not provide information on the physical nature of this motion.

Whereas order parameters extracted from spin relaxation data (using either the model-free formalism or the iRED approach) report on the ps-ns timescale, residual dipolar couplings (RDC, couplings observable for weakly aligned samples) can be used to extract a broader timescale 𝑆 2 ranging from ps to ms [2628] (see Figure 1). The difference between spin relaxation and RDC 𝑆 2 can highlight the presence of slower μs-ms movements. Motions on this physiologically relevant timescale can improve the understanding of protein function as they might be coupled to activity [29]. As for the analysis of model-free order parameters, the analysis of RDC 𝑆 2 data also can benefit greatly from combination with MD. Indeed, Salmon et al. [30] very recently reviewed this particular application, highlighting the fact that amplitudes of motions, as well as their distribution, can reliably be extracted from the two techniques (in this case, with accelerated MD simulations, discussed further below), allowing cross-validation of both techniques and increasing the confidence with which these data can be used for the understanding of protein motions.

2.3. Conformational Exchange ( 𝑅 e x )

μs-ms motions, often related to protein functions such as enzyme catalysis [29], can be indirectly studied using a combination of order parameters extracted from spin relaxation and RDC (see above). A more direct approach is the extraction of a conformational exchange term ( 𝑅 e x , also called chemical exchange) either from model-free analysis of relaxation data or from relaxation dispersion experiments. 𝑅 e x arise from μs-ms motions because these affect the magnitude of the effective transverse relaxation rate 𝑅 2 . 𝑅 e x values depend on timescales of motion, chemical shift differences between different conformers, and population weight of these conformers. 𝑅 e x sums with the pure 𝑅 2 (reporting on ps-ns motions) to give the observed 𝑅 2 [31]. However, interpreting 𝑅 e x in terms of physical motions is not trivial (with some exceptions [3235]). MD simulations could be useful in interpreting experimental 𝑅 e x parameters. Unfortunately, the μs-ms timescale is currently barely accessible to all-atom MD. This might soon change as simulations reaching the ms timescale are now being produced [36] (see Figure 1), although on special purpose hardware that is not commonly available. Once this timescale is routinely simulated, it will be possible to compare 𝑅 e x values with observed slow motions qualitatively, and possibly quantitatively by correlating 𝑅 e x values with timescales of motion for conformers with distinct chemical shifts. Such comparisons will be facilitated by the high accuracy with which chemical shifts can now be predicted from 3D structure (reviewed by Wishart [37]).

Alternatively, accelerated molecular dynamics (AMD) techniques can be used to probe the 𝜇 s-ms timescale. AMD are a variety of techniques designed to efficiently sample low-energy conformations. This implies escaping energy minima, often by artificially reducing the energy barrier between low-energy conformations. AMD techniques were reviewed by Elber [38]. Markwick et al. [21] showed that AMD can be used together with NMR spectroscopy to study multiple timescale motions in proteins. They used AMD simulations of GB3 protein to quickly sample motions on the ms timescale. Classic MD simulations were then performed using AMD conformations as starting points, providing insights into dynamics at different timescales (ms versus ns). NMR relaxation and RDC data available for that protein were in agreement with their findings.

3. Dynamics Studies

There are many examples in the literature of practical applications of NMR and MD for the study of biological systems. MD simulations are often used to facilitate the interpretation of NMR dynamic experiments, and to find new leads that can then be pursued experimentally; MD can also be used to perform experiments that are inaccessible to NMR and to focus on a single molecule rather than an ensemble average. In turn, NMR validates simulations of specific systems and thus their predictive power. Examples are discussed below.

The 36 residue villin headpiece helical subdomain (HP36) was studied in its unfolded state by Wickstrom et al. [39]. Local structure in the unfolded state is thought to be important to protein folding. However, it is difficult to measure experimentally because those structures convert rapidly to the folded state. An approach to study local structure is the use of peptide fragments. HP36 is made of three 𝛼 helices. Corresponding fragments (HP-1, 2, and 3) were studied by NMR, where a small helical propensity was detected in HP-1 and 3. However, the presence of multiple conformations at equilibrium leads to ambiguities in data interpretation. To palliate limitations of the experimental methods, the fragments were simulated using replica-exchange molecular dynamics (REMD), an accelerated sampling technique [40]. REMD facilitates overcoming energy barriers and thus makes it possible to analyze folding, an event that would normally happen on a timescale much longer than that accessible to standard simulations (see Figure 1). Cluster analysis of the fragment conformations shows that there is a local stabilized structure in HP-1 identical to the helix observed in complete HP36. Fragments HP-2 and HP-3, on the other hand, show little helicity. Cooperative folding in HP36 could, therefore, start by local helix formation in the HP-1 region. These simulations are validated by the agreement between experimental and simulated J-couplings.

Using implicit solvent, Kent et al. [3] developed a model of the aggregation of β-amyloid fragments (Aβ). Since Aβ aggregates on μs or longer timescales, simulations of such atomistic models are expensive. Using NMR order parameters for peptide Aβ(10–35) and a 10 ns explicit solvent simulation of the same fragment as reference data, they benchmarked the accuracy of force fields CHARMM, OPLS-AA, and Amber combined with their Shen-Freed implicit solvent model. Order parameters varied considerably with force field, but CHARMM22 displayed good agreement with experiment. After a 30 ns equilibration time, their Aβ model exhibits high flexibility, consistent with experiment, and maintains the strand-loop-strand motif that promotes aggregation through a solvent-exposed hydrophobic patch. Further work on A 𝛽 peptides using both MD and NMR was performed by Fawzi et al. [4]. They used new high-field experiments on the A 𝛽 (21–30) fragment to validate simulations using the Amber ff99SB force field with the TIP4P-Ew water model. They found that 13C relaxation parameters are in good agreement with predicted values. Their explicit solvent model describes accurately the structure and dynamics of the complete Aβ(21–30) peptide.

Ferner et al. [41] studied the dynamics of two YNMG RNA tetraloops at different temperatures using both NMR and MD simulations. This represents a particular challenge for both techniques. With NMR, careful choice of temperature-dependent parameters (such as bond length and CSA) is required for model-free analysis. MD force fields are parametrized to reproduce room temperature physical properties, so the actual effect of simulation temperature on protein dynamics is hard to predict. The thermal stability and biological function of both loops is known to be temperature dependent. Even though agreement between experimental and MD-derived 𝑆 2 decreases as the temperature is increased above room temperature, the authors are still able to observe, both in experiment and simulations, the same loss of stacking interactions between first and third bases that leads to RNA melting. RNA dynamics were also studied by a joint MD-NMR approach by Frank et al. [42]. They generated ensembles of RNA conformations for two variants of the transactivation response element (TAR). They did so by selecting snapshots from MD trajectories according to the fit between simulated and experimental RDCs. The dynamic ensembles obtained showed that ligand binding to TAR involves an adaptative recognition mechanism and important conformational changes in RNA.

Fisette et al. [5] used NMR relaxation and order parameters to validate β-lactamase TEM-1 simulations. In turn, simulations and PCA were used to interpret relaxation data in terms of physical motions. Multiple simulations were shown to be necessary to sample the conformational space of flexible loops. Simulations show their limits in the catalytically-relevant Ω loop, where motions on slow timescales ( 𝜇 s-ms) that promote substrate gating cause discrepancies between NMR and MD 𝑆 2 . In more recent work (Fisette et al., in preparation), effects of substrate binding on TEM-1 and PSE-4 class A β-lactamases were studied by MD simulations, overcoming a limitation of NMR experiments: rapid enzyme turnover makes relaxation experiments in presence of saturating substrate levels impossible; the only experimental workaround is the use of covalent inhibitors or variant enzymes, which would significantly affect dynamics around the active site.

Interested in the dynamics of a small flexible RNA fragment, Musselman et al. [43] used a domain-elongation approach to perform different NMR spin relaxation experiments, and compared their results with MD-predicted relaxation ( 𝑅 1 , 𝑅 2 , and NOE), as well as model-free ( 𝑆 2 and 𝜏 ) parameters. The domain-elongation approach developed by Zhang et al. [44] consisted in adding several Watson-Crick base pairs on one side of the RNA under study using unlabelled (i.e., NMR-invisible) nucleotides to slow down global diffusion and, thus, both decouple local and global motions and allow a higher sensitivity to fast dynamics. For Musselman et al. [43], this approach was useful in allowing a defined reference frame for analysis of MD, an otherwise difficult task with such a flexible entity where local and global motions are coupled. The RNA displayed a complex mixture of local and global dynamics on multiple timescales. Whereas the fast motions were detected by both spin relaxation and MD, slower motions were indicated by MD as well as RDC data.

3.1. MD-Guided Model-Free Analysis

When applying the model-free formalism, a model selection step is necessary to decide on the inclusion of different parameters and timescales, the basic idea being to identify the most statistically-relevant model and parameter set yielding a good fit of the experimental data. Model selection protocols and parameter accuracy of the model-free formalism were studied by Chen et al. [11] using MD simulations. They generated a 10 ns trajectory of dihydrofolate reductase ternary complex and computed relaxation parameters ( 𝑅 1 , 𝑅 2 , and NOE) from it. Afterwards, they used model-free analysis and compared different model selection approaches to extract dynamic parameters from the artificial relaxation data. These parameters were then compared to those computed directly from the trajectory using autocorrelation functions. Simulations thus served as a reference where the motions are perfectly known, while the model-free analysis had to reconstruct the spectral density from 𝑅 1 , 𝑅 2 , and NOE, measurements that provide only partial information on the dynamics. Their results showed that, as proposed by d’Auvergne and Gooley [45], the use of not only Bayesian Information Criteria (BIC) [46] but also Akaike Information Criteria (AIC) [47] in the case where multiple magnetic field data are recorded are superior compared to the then generally used step-up hypothesis approach involving F-tests [48]. Indeed, these information theoretical criteria provide a better balance between bias and variance (less under- and over-fitting) and lead to more reliable model selection, while also being less computer intensive.

MD simulations can also play a direct role in model selection, as was recently shown by Fisette et al. [5]. In this approach, model selection proceeds as usual with statistical tests for the comparison of experimental and back-calculated values, but is helped for cases where the tests yield a similar statistical score for different models (i.e., when different models reproduce equally the experimental data). In these ambiguous situations, projections of the orientation of the bond vector in the course of MD simulations are used to evaluate the motion type and decide on the best model. Even though this approach should, in principle, allow the correct identification of local motions, it suffers from two problems with regards to the 𝑅 e x parameter. The first is obviously the timescale probed by the trajectory. The second is the case where the 𝑅 e x contribution arises not from motions in the bond vector itself, but rather from movements in nearby atoms that affect the electronic environment of the bond vector on the 𝜇 s-ms timescale; such motions are undetected in the projection of a given bond vector.

3.2. Force Field Validation

If we are to trust the predictive capabilities of a given force field, that force field should reproduce experimental observations within its range of application. The classical, additive force fields used for MD simulations of macromolecules are developed using quantum mechanics (QM) properties (geometry, vibrational modes, solvation energies, etc.) and experimental data. Unfortunately, the direct fitting of a force field to proteins or even small peptides is impossible since such systems are not tractable by QM calculations at the required theory level. Instead, final parameter sets are used for protein simulations that are then compared to experimental results. Hen egg white lysozyme [17, 18, 4952] and ubiquitin [19, 5053] are often used as model systems since they are small and very well characterized experimentally. NMR observables used for force field validation include root-mean-square deviation (RMSD) from NMR structural ensembles [49], NOE (measurement of inter-proton distances) [49, 54], scalar spin-coupling constants [52, 55, 56], and chemical shifts [57, 58]. Direct comparisons of spin relaxation parameters [53] are a lot less frequent. Instead, relaxation-derived quantities such as 𝑆 2 [5, 1719, 49, 50, 53, 5961] are used. RDC-derived order parameters [52, 60, 62] are increasingly used for long-timescale simulations.

A detailed validation of a GROMOS force field parameter set (45A3) by Soares et al. [49] used a variety of NMR data and two 3.5 ns simulations of lysozyme. They used a crystallographic structure as the simulations starting point. Interestingly, trajectory RMSD shows that the trajectories stay closer to the X-ray structure than to the lowest-energy NMR conformer. This is probably indicative of trajectory dependence on initial conditions, a topic that was further studied by combined NMR-MD studies, and which will be discussed later on. Soares et al. [49] also compared predicted NOE with experimental values. Their results suggested the new GROMOS parameter set violated experimental NOE constraints slightly more than the previous set (43A1). However, studies by Zagrovic and van Gunsteren [54] later showed that comparing NMR and MD-predicted NOE may not be a good indicator of simulations quality due to the nonlinear effect of r 3 or r 6 averaging [63] and focus on experimentally observed NOE that produces artificially good agreement between the two techniques. Predicted spin-coupling constants were also used by Soares et al. [49] to assess force field quality. Both three-bond 3 J 𝐻 𝑁 𝛼 - and 3 J 𝛼 𝛽 -coupling constants converged to experimental results for most residues. For some, however, trajectory length was not sufficient to achieve a meaningful comparison. Lack of conformational space sampling in MD simulations to reproduce NMR observables is also an important issue, which will be presented below.

3.3. Backbone Energetics

An important limitation of MD simulations is that, as longer timescales become accessible, force field quality, integrator stability, and other problems that were not apparent before can be revealed and negatively impact simulations quality. N–H-squared order parameters 𝑆 2 were instrumental in validating the 𝜙 and 𝜓 dihedral angle corrections necessary for common protein force fields when simulations began probing the many nanoseconds timescale. Before that correction, classical force fields would overestimate the flexibility of secondary structures. In some cases [64], short helical peptides would fold as 𝜋 helices rather than the experimentally proven 𝛼 helices. Cross-map term corrections (CMAP) were developed by MacKerell Jr et al. [65] for the CHARMM force field to account for secondary structure stability while keeping other parameters unchanged. CMAP is a detailed grid of the dihedral potential of 𝜙 and 𝜓 angles that was integrated to the CHARMM22 force field as a supplemental term. A 25 ns lysozyme trajectory acquired by Buck et al. [17] compared predicted N–H 𝑆 2 to experimental ones. Results showed that CHARMM22/CMAP predicted 𝑆 2 that were consistent with experiment, while in absence of CMAP secondary structures were much too flexible.

Similar corrections were performed for the AMBER force field: in the ff99SB parameter set, 𝜙 and 𝜓 dihedral angles were reparametrized [50] to achieve a better balance of secondary structures. Validation included a comparison of experimental and predicted 𝑆 2 for lysozyme and ubiquitin, showing much better agreement for the new parameters. Showalter and Brüschweiler [53] also used ubiquitin simulations to validate AMBER ff99SB, but compared experimental and back-calculated relaxation data in addition to relaxation-derived order parameters. Showalter et al. [59] also showed that the modified backbone energetics in ff99SB improved the description of amino acid methyl side-chain rotation, suggesting a direct relation between main- and side-chain dynamics.

Even with this correction, however, modern force fields for proteins are still scrutinized through comparison with NMR results, revealing limitations. For instance, discrepancies in predicted and experimental J-couplings led Best et al. [55] to suggest that current force fields might be too helical as they overpopulate the 𝛼 conformation of polyalanine peptides in water. Another study of short peptides by Wickstrom et al. [56] used NMR scalar coupling constants as reference data and also showed shortcomings in Amber ff99SB for Ala3 and Ala5 peptides and the TIP3P and TIP4P-Ew water models. Another NMR-MD comparison by Trbovic et al. [60] highlighted deficiencies in the OPLS-AA and AMBER (ff99SB and ff03 parameter sets) force fields. Using over 50 short (2.4 ns) simulations of the B3 immunoglobulin-binding domain of streptococcal protein G (GB3), they compared MD-predicted 𝑆 2 to experimental values determined by both spin relaxation and RDC experiments. Their results indicate that the three force fields overestimate the flexibility of amide N–H vectors at the borders of secondary structure elements and in loops. They suggest that an imbalance between hydrogen bonding and other terms may be the cause of the problem. Even though these studies concluded the impact on the accuracy of typical protein simulations is small, NMR-MD studies of small peptides may lead to force field improvements in the future.

3.4. Diffusion and Long-Timescale Simulations

Molecular dynamics rely on the ergodic hypothesis for simulating physical parameters; adequate conformational space sampling on the timescales of the biological processes being studied is required. This is rarely possible for large macromolecules, as is clearly shown by trajectory dependence on starting coordinates. For lysozyme, Koller et al. [18] showed that starting structure had a significant impact on MD-derived order parameters. Since those are increasingly used to assess force field quality, caution must be taken to acquire trajectories at least an order of magnitude longer than the solute global tumbling time. Genheden et al. [61] also studied the effects of starting structures, trajectory length, and 𝑆 2 calculation method on MD-derived 𝑆 2 . Using the carbohydrate-binding domain of galectin 3 in the free and lactose-bound states, they compared predicted order parameters with experimental data from NMR spin relaxation. They conclude that the use of multiple trajectories (ten) of a length of 125% of global tumbling (10 ns in their case) yields the best results. In both studies, the improvement obtained from longer or multiple trajectories was significant only for a small number of residues that are either very flexible (with 𝑆 2 < 0 . 7 ) or involved in slow dynamic events. Similar results were observed by Fisette et al. [5] in a joint MD-NMR study of TEM-1 β-lactamase.

With MD trajectories now routinely reaching 100 ns, the global tumbling of small proteins is becoming an object of study in simulations. Global tumbling is easily measurable using solution NMR spectroscopy. Wong and Case [51] made an extensive study of three water models (SPC/E, TIP4P-Ew and TIP3P) and four proteins (ubiquitin, binase, lysozyme, GB3). Extracting the diffusion tensor for these solutes required simulations of 20 to 100 times longer than the rotational tumbling. Their study highlights how current water models make protein diffusion too fast in simulations. The diffusion constant of the popular TIP3P model is known to be about twice that of water at room temperature, but even for SPC/E and TIP4P-Ew models, whose diffusion constants are nearer experimental values, the rotational diffusion constants of all studied proteins were too large when compared to NMR results. In the first published work involving a μs trajectory, Maragakis et al. [19] also observed a diffusion constant about twice the experimental value for ubiquitin using SPC water molecules. Now that even longer timescales are becoming accessible, theoretical approaches need to be improved once again, this time with optimized water models. However, major changes in water models are likely to require modifications to existing forcefields, since these potentials are interdependent. Water models and their limitations were reviewed by Guillot [66].

Finally, simulation stability on the μs timescale has received attention in recent work by Lange et al. [62]. They used NMR RDC experiments, which are sensitive to dynamics from ps to ms, and compared them with μs simulations of ubiquitin and GB3 for a variety of force fields. Interestingly, they noted that past 25 ns, increasing trajectory length improves RDC fit only marginally. In some cases, longer trajectories yield worse synthetic RDCs, suggesting that the accumulation of force-field imprecision outweights the increased conformational space sampling and the protein behavior starts to deviate from its native state. PCA [23] was used on all MD ensembles to find correlations between RDC mismatches and structural changes. They showed that after several hundred ns, high free energy states are overrepresented and transitions to nonnative states happen at a higher frequency. They also demonstrated that the use of particle mesh Ewald (PME) for the treatment of electrostatics is necessary to obtain an accurate RDC fit. Recently, two new sets of parameters for the Amber force field were developed using NMR observables, with the explicit goal of improving the quality of long-timescale simulations. The first one, named ff99SB-ILDN, was developed by Lindorff-Larsen et al. [52] and is limited to reparametrization of 𝜒 1 dihedral angles. They used simulations of small model peptides to identify the residue types with rotamer populations that differed most from expectations based on statistical analysis of the Protein Data Bank (PDB). Full scans of the potential energy surface (PES) of the 𝜒 1 angles of these residues were then performed. A new torsion term in the Amber force field was introduced and replaced the previous one, and parameters for that term were fitted using the QM results. Afterwards, validation was performed with globular proteins (lysozyme, BPTI, ubiquitin, and GB3) by comparing calculated 3 J scalar couplings and side-chain RDCs to experimental data from the literature, showing noticeable improvements over ff99SB. The second set of parameters is the work of Li and Brüschweiler [57, 58] and uses NMR chemical shifts from the BioMagResBank (BMRB) [67] as a reference for direct force field parametrization. This is a novel approach, as NMR observables are usually used for validation only, because systematic exploration of the parameter space is too expensive when simulating entire proteins. Here, the authors used an initial trajectory that was reweighted for each new parameter set using Boltzmann’s relation. Parameter fitting was done by comparing theoretical and experimental chemical shifts of C, Cα, and Cβ atoms of each residue and adjusting the potential of backbone dihedral angles 𝜙 and 𝜓 . Very recently, Li and Brüschweiler [68] extended their method to use RDC in a similar way, either alone or in combination with chemical shifts. The resulting parameter sets perform better for reproducing NMR data and exhibit lower RMSD between trajectories and X-ray crystallographic structures.

As the timescales amenable to MD simulations continue to increase, and ms simulations become possible [36], NMR will most certainly continue to serve as a powerful experimental reference for force field validation and development.

4. Structural Studies

Protein structure determination with NMR data, since its first uses, has been helped by computational approaches, such as distance geometry optimization [69]. Simulated annealing approaches have been used for the calculation of structures based on NMR constraints (NOEs, paramagnetic relaxation enhancements (PREs), chemical shifts, coupling constants, RDCs, etc.) [70]. The approach consists of incorporating the experimental constraints and weighing them with a given force field in simulations performed at high effective temperature (to reduce energy barriers and allow the structure to relax and minimize its energy and the violation of experimental constraints). The temperature is then lowered slowly until a converged structure satisfying both experiment and physical constraints is obtained. In the end, a family (group) of low energy conformers (generally 10–40) is used to represent the structure. This approach and more modern implementations have been available in programs such as CYANA (DYANA) [7174], XPLOR-NIH [75, 76], CNS [77, 78], and ARIA [79, 80]. Recently, MD has, however, been used in an increasing number of approaches different from strict NMR-constraints-based structure calculation, with some examples given below.

Three-dimensional protein structure prediction has long been one of the ultimate goals of theorists. A very successful approach has been with the program ROSETTA [81] from the Baker lab which implements ab initio approaches. Whereas first implementations were solely based on theory (not using experimental data), newer implementations use raw NMR data such as chemical shifts (CS-ROSETTA) [82, 83], a combination of backbone chemical shifts and unassigned NOE data (CS-DP-ROSETTA) [84], or a combination of backbone chemical shifts, amide proton distances (from NOE data) and RDC data (CS-RDC-ROSETTA) [85]. This combination of ab initio approaches with easily obtained NMR data (mostly excluding the tedious assignment of side-chain resonances) allows rapid and accurate structure predictions to be made, a good compromise to full structure determination.

NMR data has also been used in conjunction with already available structures for helping the docking of ligands to potential binding sites [86]. The HADDOCK (High Ambiguity Driven protein-protein DOCKing) approach has been particularly successful in this [87, 88], first by using solely chemical shift perturbation data, but later including RDC data [89], as well as diffusion anisotropy data extracted from spin relaxation experiments [90] or pseudocontact shifts [91]. Morelli et al. [92] proposed a combined MD-NMR approach for the study of protein complexes. Named restrained soft-docking, the method uses chemical shift perturbation as a guide for a specific docking algorithm (BiGGER). Flexibility of amino acid side chains at the protein-protein interface are taken into account, but global fluctuations are disallowed, making the method most appropriate for small and medium complexes of globular proteins. Results showed excellent agreement (RMSD) with experimentally elucidated complexes for a selection of complexes: EIN/HPr, Barnase/Barnstar, Tom20/Presequence, Cyt c/Ccp. These complexes have interfaces that range from about 1000 to 2200  Å 2 . The HSQC NMR experiment generally used to determine chemical shift perturbation typically requires about an hour. Combined with the performance of the docking algorithm, this makes the technique a powerful tool for structural genomics projects and the search for protein-protein complexes that can facilitate the identification of lead compounds for drug design. Developments of combined NMR and docking methods were reviewed by Morelli and Rigby [93].

The description of unfolded proteins is a particularly challenging area where, again, NMR and MD are very complementary. In this context, NMR observables are used for the validation of ensembles of conformers which are aimed at representing the situation of unfolded proteins in solution. Chemical shifts [94], scalar couplings [95], RDCs [96, 97], and PRE effects [98, 99] are different NMR observables which can be effectively used in such approaches. Different techniques exist such as the flexible-meccano algorithm by Bernadó et al. [100], in which random conformational sampling based on amino acid propensity and side-chain volume is used for the construction of typically 100,000 conformers. Another random sampling approach is that of Jha et al. [101] using a self-avoiding statistical coil model based on backbone conformational frequencies found for a subset of protein structures deposited in the Protein Data Bank (PDB). The ENSEMBLE approach [102, 103] uses Monte Carlo approaches and MD simulations in order to create relatively small ensembles of conformers for which the populations weights are then optimized using experimental data. The maximum occurrence approach (termed MaxOcc) [104, 105] also uses weights (in this case defined as the maximum fraction of time a conformer can exist such that experiment data is well reproduced). The originality of this approach lies in the fact that only one conformer is evaluated within the ensemble in order to avoid overfitting of the limited experimental data. Such approaches that model the distribution of conformers are very useful for the study of unfolded proteins. In particular, the combination with NMR is very effective at characterizing the persistence of secondary structures as well as hydrophobic clusters in unfolded states [106109].

5. Conclusions

The variety of applications covered in this paper, the importance of NMR developments such as RDC, and the increase of the timescales tractable by MD simulations indicate that the joint usage of MD and NMR has a bright future. NMR will be indispensable in the development and refinement of force fields and water models able to accurately simulate both global and local protein motions on the μs-ms timescales. MD is already an integral part of NMR structure determination, and simulations will become commonplace as a complement to NMR dynamics from relaxation or RDC to facilitate data interpretation (see Figure 3). The overall availability of both experimental and simulated data in databases such as the BMRB [67] for NMR and the Dynameomics repository [110, 111] for MD will facilitate further comparisons and enhance synergy. Moreover, approaches aimed at helping experimentalists use further computational approaches (see, for example, the worldwide e-Infrastructure for NMR and structural biology: WeNMR—http://www.wenmr.eu/) will also be beneficial.

Obviously, not only NMR and MD can be combined synergetically. Many other techniques such as small angle X-ray scattering (SAXS), electron microscopy (EM), electron paramagnetic relaxation (EPR), circular dichroism (CD), and mass spectrometry (MS) can be used together (see Figure 3) to enhance their total information content as recently reviewed by Cowieson et al. [112] and exemplified with the recent study of RANTES/CCL5 oligomers by Wang et al. [113].

Acknowledgments

The authors are grateful to Simon Bernèche for constructive comments. The EMBO is acknowledged for a postdoctoral fellowship to S. Morin.