1. Introduction

Metabolic engineering involves the adjustment of metabolic and regulatory processes to improve desired cellular behaviors, such as the production of proteins and chemicals. Since cellular metabolic and regulatory networks are often large and complex, the construction and analysis of computational models of these networks can be useful for identifying current network states and evaluating the effects of network perturbations on desired phenotypes. This special issue includes papers that illustrate how computational approaches can be used in metabolic engineering. Here, we provide a brief overview of several established computational approaches that can be used to aid in the engineering of metabolic networks, while describing some of the exciting recent advances in these fields.

2. Descriptive Approaches for Identifying Intracellular Metabolic States

A variety of experimental measurements can be used to quantify the state of metabolic and regulatory networks, including flux analysis, gene expression, protein expression, metabolite concentration, enzyme activity, uptake and secretion rates, and transcription factor-DNA binding assays. Computational models and approaches can be useful for integrating and analyzing such datasets to quantify metabolic fluxes and uncover their regulatory properties.

2.1. 13C Labeling Experiments and Metabolic Flux Analysis

The ability to quantitatively map intracellular fluxes using metabolic flux analysis (MFA) is critical for identifying pathway bottlenecks and elucidating network regulation in biological systems, especially in engineered cells with nonnative metabolic capacities [1, 2]. 13C-MFA experiments involve feeding isotopically labeled substrates to cells, tissues, or whole organisms and subsequently measuring the patterns of isotope incorporation that occur in intracellular metabolites or secreted products [3]. Both mass spectrometry (MS) and nuclear magnetic resonance (NMR) can be used to quantify the relative abundance of different “isotopomers” (i.e., isotope isomers) associated with each measured biomolecule. Because different metabolic pathways often give rise to distinct isotope labeling patterns, isotopomer abundances can be used to infer the relative fluxes through these pathways [4]. As a result, isotopomer measurements obtained from MS or NMR, in combination with direct measurements of extracellular uptake or secretion rates, can be computationally analyzed to reconstruct comprehensive flux maps describing intracellular metabolism, which is the essence of MFA. A complete flux map is thus the phenotypic equivalent of the transcriptional map obtained from DNA microarrays, and tracking flux changes in response to targeted perturbations can provide important information about the distribution of kinetic and regulatory controls in metabolism [5].

The standard approach for computing a flux map involves a nonlinear least-squares regression to minimize the lack-of-fit between (i) experimentally measured and (ii) computationally simulated data. The latter are derived by solving a “forward problem”, which involves solving the isotopomer and metabolite balance equations for a particular set of flux parameters in order to calculate the relative abundance of all observable metabolite labeling patterns. Because the forward problem must be solved up to hundreds or even thousands of times to achieve an optimal fit, a great deal of effort has been placed on developing improved strategies to simulate the isotopic labeling produced by a particular network state. The Elementary Metabolite Unit (EMU) approach was developed by Antoniewicz et al. [6] to address precisely this problem. Through a novel decomposition of the isotopomer network, the algorithm systematically identifies the minimal set of variables required to simulate the available labeling measurements. The EMU approach has achieved 10-fold reductions in the system sizes required to simulate 13C labeling in both medium- and large-scale networks [6, 7]. While these gains are impressive, the true power of the EMU approach is in its ability to unlock entirely new isotope labeling strategies that were previously outside the reach of computational tractability. For instance, the number of isotopomer variables required to simulate mixed 13C, 18O, and 2H labeling in even a small-scale network of gluconeogenesis (24 reactions, 14 intracellular metabolites) climbs into the millions, while the number of EMU balances is in the mere hundreds [6].

Typically, MFA relies on the assumption of both metabolic and isotopic steady state. Achieving this situation experimentally involves (i) equilibrating the system in a stable metabolic state, (ii) introducing an isotopically labeled substrate without perturbing the metabolic steady state, (iii) allowing the system to establish a new isotopic steady state that reflects the underlying metabolic fluxes, and (iv) measuring isotopic labeling in the fully equilibrated system. Depending on the relative speed of metabolic and isotopic dynamics, however, other experimental scenarios can be envisioned. If the isotopic labeling responds quickly to any metabolic changes in the system, quasi-stationary MFA can be applied to obtain a series of snapshots that describe the variation in network fluxes over time [8, 9]. Conversely, if labeling occurs slowly but metabolism is maintained in a fixed state, isotopically nonstationary MFA (INST-MFA) can be used to estimate fluxes [10]. Finally, when measurements are obtained under both metabolically and isotopically nonstationary conditions, a fully dynamic modeling approach is required to estimate fluxes [11]. In the current issue, Lequeux et al. (G. Lequeux et al. “Dynamic metabolic flux analysis demonstrated on cultures where the limiting substrate is changed from carbon to nitrogen and vice versa”) have taken a different dynamic MFA approach by obtaining transient measurements of 10 different extracellular metabolites during the shift of E. coli cells from nitrogen- to carbon-limitation, or vice versa. Numerically differentiating these extracellular measurements allowed the authors to estimate dynamically changing uptake and secretion fluxes, which provided the measurements necessary to estimate all intracellular fluxes in their model as a function of time. This has the advantage of avoiding the complications imposed by measuring and fitting transient isotope labeling data, but at the same time provides limited redundancy to validate assumptions on cofactor balancing and respiration efficiency that must be invoked in order to close the system of balance equations [12].

By applying the EMU approach, Young et al. [13] have recently developed computational routines that achieve more than 5,000-fold speedup relative to prior INST-MFA algorithms. This opens the door to several practical applications that were previously intractable due to the computational complexity of INST-MFA, where the isotopomer balances are described by differential rather than algebraic equations. INST-MFA is ideally suited to systems that label slowly due to the presence of large intermediate pools or pathway bottlenecks. This approach not only avoids the additional time and cost of feeding isotope tracers over extended periods [14], but may become absolutely necessary in cases where the system cannot be held in a fixed metabolic state long enough to allow isotopic labeling to fully equilibrate. As a result, INST-MFA is expected to become an indispensible tool for extending MFA approaches to studies of mammalian systems [1517], industrial bioprocesses [10, 18], and other scenarios where obtaining a strict isotopic steady state may be impractical. Another emerging application of INST-MFA is its application to photoautotrophic metabolism, which is the process by which plants, algae, and cyanobacteria use light energy to fix carbon dioxide into complex organic molecules. Because photoautotrophs assimilate carbon solely from CO2, feeding 13CO2 will produce a uniform steady-state labeling pattern that is insensitive to fluxes. Thus, conventional steady-state 13C-MFA is incapable of quantifying autotrophic metabolic fluxes [19]. However, transient measurements of isotope incorporation following a step change from unlabeled to labeled CO2 can be used to estimate fluxes by applying INST-MFA [20]. This approach enables comprehensive flux analysis of photoautotrophic metabolism, complementing previous 13C-MFA studies of plants [21] and cyanobacteria [22] that were limited to heterotrophic or mixotrophic culture conditions, with sugar as the major carbon source. Taken together, these advances illustrate how combined progress in both analytical capabilities and computational techniques are driving MFA applications toward larger, more dynamic, and more complex biochemical networks, which encompass a growing variety of plant, animal, and microbial systems.

2.2. Using Metabolite and Gene/Protein Expression Measurements

In addition to 13C-MFA experiments, other large-scale measurements can be made which capture cellular metabolic states. These include measurements of metabolite concentrations and gene or protein expression. Metabolite concentrations can now be quantified using mass spectroscopy for hundreds of metabolites in a single condition. For example, Bennett and colleagues used LC-MS/MS to quantify over 100 metabolite concentrations in E. coli cells grown under three different conditions [23]. These metabolite concentrations can be analyzed to identify potential metabolic bottlenecks by evaluating enzyme saturation and estimating Gibbs-free energy changes of reactions. Bennett and colleagues compared measured metabolite concentrations to reported Michaelis-Menten kinetic parameters ( ) to determine whether individual reaction rates are substrate (where ) or enzyme limited (where ). Their analysis found that most substrate concentrations were higher than the reported values (83%) indicating that for many reactions the rates are limited by enzyme levels [23]. Metabolite concentrations can also be used to estimate thermodynamic properties of metabolic reactions since the change in Gibbs-free energy for a reaction ( ) is dependent on substrate and product concentrations [24, 25]. The values can be estimated from measured metabolite concentrations and then used to distinguish between those reactions that are operating close to equilibrium ( ), and those reactions that are far from equilibrium ( ) whose rates may be limited by regulation via enzyme kinetics [25]. Thermodynamic analysis using metabolite concentrations has been preformed for both E. coli and S. cerevisiae to identify these types of reactions [23, 25, 26]. Together the analysis of metabolite concentrations using both kinetic (if available) and thermodynamic information can be useful to identify metabolic bottlenecks or rate limiting reactions.

In addition to metabolite concentrations, gene expression and protein expression measurements can also be used to help elucidate metabolic fluxes and their regulation. Recent modeling efforts have used these types of measurements to improve predictions of fluxes through metabolic reactions [5, 2729]. With these approaches, gene expression data is used to place restrictions on flux values or flux changes. The GIMME method uses an expression threshold and prevents flux through reactions associated with genes whose expression is below the threshold. In this case, an inconsistency score is minimized, where penalties depend on the magnitude of the flux and how far the expression is below the threshold [27]. Another method, proposed by Shlomi and colleagues, instead groups reactions into high, medium, and low sets based on expression levels of associated genes [29]. A flux distribution is then identified that has flux through as many reactions in the high set as possible, and no flux through as many reactions as possible in the low set. In the third method (E-flux), the relative expression levels of genes from a given condition are used to place constraints on the upper limits flux values can take [28]. Another approach, developed by Moxley and colleagues, instead uses the changes in expression level between two conditions to estimate the changes in fluxes between two conditions [5]. In this case, the predicted flux change depends on the expression change (ΔmRNA) and some additional model parameters. All of these methods have been applied to gene expression measurements, but they likely can also be used with quantitative proteomics measurements.

3. Predictive Approaches for Improving Cellular Phenotypes

The modeling approaches described in the previous sections can provide descriptions of the metabolic fluxes by analyzing different types of experimental measurements. Once these metabolic states are known, other modeling approaches can be used to identify which environmental and/or genetic perturbations would improve cellular phenotypes, such as the production of desired chemicals. These predictive approaches include pathway-based and optimization-based methods, that take into account the structure and stoichiometry of metabolic networks, as well as, kinetic modeling approaches that also account for enzyme kinetics.

3.1. Pathway-Based Approaches

Identification of the relevant pathways of a metabolic network is essential for finding effective metabolic engineering strategies. These pathways can also help derive minimal media requirements for an organism and assess the robustness and redundancy of key metabolic pathways. In this special issue, F. Llaneras and J. Picó review and compare four established methods used to identify relevant metabolic pathways: extreme currents, elementary modes, extreme pathways, and minimal generators (F. Llaneras and J. Picó, “Which metabolic pathways generate and characterize the flux space? A comparison among elementary modes, extreme pathways and minimal generators”). The authors recommend elementary modes for determining the metabolic impact of gene knockouts and the required pathways of a network for producing desired chemicals. The calculation of elementary modes requires knowledge of reaction stoichiometry and reversibility, making the recent modeling advances in biochemical reaction thermodynamics important to this analysis. Several algorithms and software packages have been developed for calculating the elementary modes of a metabolic network, including Metatool [30, 31], FluxAnalyzer [32], and functionalities built into OptFlux [33]. However, the problem of combinatorial explosion when calculating elementary modes for large complex metabolic networks has been documented [34]. This has paved the way for clustering algorithms such as the Agglomeration of Common Motifs (ACoM) [35] in an attempt to give biological meaning to elementary modes and define relatedness between reactions. Application of elementary modes to rational metabolic engineering approaches has now enabled strain design algorithms such as those used by OptFlux [33] and the genetic algorithm-based method compiled by Boghigian et al. [36]. Elementary flux modes have been successfully used to engineer strains with a variety of desired phenotypes, including sugar coutilization [37], ethanol [37, 38], and carotenoid production [39]. Z. Chen et al., in this issue, use elementary modes to find strategies for improving the conversion of glycerol into succinate by considering the effects of oxygen utilization and genetic alterations (Z. Chen et al., “Elementary mode analysis for the rational design of efficient succinate conversion from glycerol by Escherichia coli”). Random sampling of flux distributions provides another way of investigating possible flux distributions through metabolic networks [40, 41], and A. De Martino et al. in this issue use both structural analysis and sampling to explore the robustness of human red blood cell (RBC) metabolism (A. De Martino et al., “Optimal fluxes, reaction replaceability, and response to enzymopathies in the human red blood cell”).

3.2. Optimization-Based Approaches

Alternatives to pathway-based approaches include optimization-based methods, which can also identify mutations that would improve desired phenotypes (e.g., increased production yields). Here, the models use the same stoichiometric and thermodynamic constraints as those used in pathway-based approaches, but solutions are identified which maximize or minimize a stated objective. To predict how metabolic fluxes will change in response to a genetic perturbation, a number of different approaches have been developed, including flux balance analysis (FBA, reviewed in [42]), minimization of metabolic adjustment (MOMA) [43], regulatory on/off mechanism (ROOM) [44], regulated flux balance analysis (rFBA and SR-FBA) [45, 46], and probabilistic regulation of metabolism (PROM) [47]. Most of these approaches are used to predict the immediate behavior of knockout strains [43, 44], with the exception being FBA, whose predictions more closely resemble strain behavior after cells have undergone adaptive evolution [48]. MOMA has been successfully used to engineer strains with increased production of a variety of products including lycopene [49, 50], valine [51], threonine [52], and polylactic acid [53].

To identify those genetic strategies that are predicted to have the highest chemical production levels, a large number of possible strategies must be considered. Bilevel approaches can be used to identify which are the best production strategies given a maximum number of genetic alterations without having to generate and store predictions for all possible strategies. These bilevel methods can be solved using integer programming and/or genetic algorithms [5457]. A variety of such bilevel methods have been proposed which use FBA, SR-FBA, and MOMA to predict mutant strain behaviors, and these approaches can consider genetic changes involving gene deletions, altered gene expression, or altered transcriptional regulation [54, 56, 5861]. These methods have been used to design strains for a variety of chemicals [55, 6264]. A more recent bilevel approach (OptFORCE) uses optimization to identify how metabolic fluxes must change to improve metabolite production, which is independent of any assumptions about what functions are used to predict cellular behavior [60].

3.3. Kinetic Modeling Approaches

Both types of models and methods described in the last two sections do not take enzyme kinetics into account. As a result, they are unable to predict how changes in kinetic properties, enzyme, and metabolite concentrations would affect fluxes through metabolic pathways. Kinetic models are needed to make these types of predictions since they capture the dependence of fluxes on metabolite and enzyme concentrations. These types of models can be analyzed to identify which changes are needed for improving cellular phenotypes. The classical framework for elucidating parameters responsible for the control of metabolic fluxes is metabolic control analysis (MCA), developed in the early 1970s independently by Kacser and Burns [65] and Heinrich and Rapoport [66]. Recently, Visser et al. developed an alternative approach called linlog kinetics [67, 68]. Here, all rate equations are modeled with the same basic mathematical structure in which the relationship between rates and enzyme levels is linear, while for metabolite levels, a linear combination of logarithmic terms is used. Young et al. [69] have recently expanded the cybernetic modeling framework of Ramkrishna [70] to incorporate metabolic pathway concepts derived from elementary mode analysis. This led to models that could predict both local and global control properties of metabolic networks in response to either dynamic environmental shifts or stable genetic manipulations. These models were applied to predict phenotypes of several recombinant E. coli strains, and they were found to provide good agreement with experimental data. Furthermore, because of the dynamic nature of these models, they were capable of simulating responses that are not readily addressed by purely stoichiometric models (e.g., allosteric or kinetic effects of intermediate metabolites, enzyme overexpressions and partial knockdowns, and time-dependent culture conditions). In this special issue, A. Yachie-Kinoshita et al. review the history of kinetic models for human red blood cells (RBCs) and describe an RBC metabolic model implemented in the E-cell simulation environment (A. Yachie-Kinoshita et al., “A metabolic model of human erythrocytes: practical application of the E-Cell Simulation Environment”). They discuss how this E-cell RBC model can be applied to predict RBC responses to hypoxic environments and long-term cold storage and identify enzymes whose altered activity could improve storage conditions for RBCs.

4. Concluding Remarks

A growing number of computational tools that facilitate the evaluation and improvement of strains for metabolic engineering are currently being developed and expanded. These methods can account for a wide range of experimental measurements to provide an improved understanding of metabolic states and current limitations, and they can be used to identify new engineering strategies for improved chemical production. The collection of papers in this special issue highlight several recent advances and underscore the emerging applications of these computational tools.

Jennifer L. Reed
Ryan S. Senger
Maciek R. Antoniewicz
Jamey D. Young