Abstract

Characterizing the capabilities, key dependencies, and response to perturbations of genome-scale metabolic networks is a basic problem with important applications. A key question concerns the identification of the potentially most harmful reaction knockouts. The integration of combinatorial methods with sampling techniques to explore the space of viable flux states may provide crucial insights on this issue. We assess the replaceability of every metabolic conversion in the human red blood cell by enumerating the alternative paths from substrate to product, obtaining a complete map of he potential damage of single enzymopathies. Sampling the space of optimal steady state fluxes in the healthy and in the mutated cell reveals both correlations and complementarity between topologic and dynamical aspects.

1. Introduction

Understanding metabolic activity from the underlying genotype is one of the most addressed problems in computational biology. Of particular interest is the issue of the identification of the reactions that are indispensable for an organism to survive, grow or perform a specific function in a given growth medium or, conversely, of the potentially most harmful knock-outs or enzymopathies. Several experimental protocols are able to assess the essentiality of gene products (and hence of the corresponding metabolic reactions), ranging from individual knock-outs to transposon mutagenesis and RNA interference [15]. Computational approaches on the other hand might provide important clues on the system-level organization by investigating genome-scale network reconstructions.

The functional modularity of metabolic networks suggests that topological aspects may provide a key to identify a class of essential pathways [6, 7]. However the metabolic genotype only constitutes the frame on the top of which the dynamic phenotype is built. The essentiality of a metabolic pathway will in general depend on both structural considerations based on the network reconstruction from genomic information, and on the “model of metabolism” defined on it, for example, on the corresponding steady state fluxes. In E.coli, phenotypical essentiality of metabolic genes has been associated with a reduced allowed variability of the corresponding fluxes, suggesting that dynamically stiff reactions may constitute an evolutionarily robust backbone of metabolism conserved over different species [8].

Here we attempt a more thorough integration of topological and dynamical views to obtain a more comprehensive insight into a metabolic network's organization, efficiency, and ability to respond to perturbations. We will first associate the essentiality of a reaction with a measure of its topological replaceability by enumerating the alternative paths from substrate to product along the network edges, with the rationale that from a purely structural viewpoint more replaceable reactions are less likely to be crucial nodes of the network. Then we will validate and compare the essentiality map thus obtained with the metabolic phenotype resulting from the definition of a general constraint-based model for metabolic flux prediction. We shall see that dynamical and structural measures of essentiality may offer complementary views of a reaction network's robustness.

We carry out our analysis on the metabolic network of the human red blood cell (hRBC), one of the most studied complexes in systems biology, from the earliest mathematical models of single biochemical pathways [9, 10] to the currently available genome-scale reconstructions [11]. The reason for this choice lies essentially in its limited size. On the one hand, it allows to compute reaction replaceabilities exactly by a suitable modification of Johnson's algorithm for counting loops in a directed graph [12]. On the other, it allows for the efficient application of various sampling methods to the space of viable flux states [8, 13]. The latter is a vital ingredient to address many important properties of erythrocytes. Indeed for some organisms under certain conditions it is reasonable to assume that the metabolic activity is aimed at maximizing a subset of the metabolic reactions (or a function of them) associated with a certain biological function. In such cases the relevant flux configuration can be computed by standard optimization algorithms. For example, E. coli's metabolism has been shown to maximize biomass production under evolutionary pressure [14], but after a genetic knockout it responds with a minimum rearrangement of fluxes [15]. While the production of the cofactors ensuring the maintenance of osmotic balance and the release of oxygen may be argued to be their metabolic goal, erythrocytes do not generically allow for such a simplification. Information-rich directions in flux space must be retrieved by coupling the underlying constraints on fluxes with other types of analyses. Much understanding has indeed been obtained from the uniform sampling of feasible states [13, 16, 17] and by functional studies, like the computation of extreme pathways [18], of metabolic regulatory structures [19, 20] and of metabolic pools [21]. These aspects combined make hRBCs a key benchmark for both theories of metabolism and computational tools.

It is worth noting that the detailed structural information we derive (i.e., the full map of alternative paths for each substrate/product pair) cannot be retrieved by other methods. Unluckily, computation times still prevent scaling the approach we employ up to networks larger than a few hundred nodes. More refined algorithms are currently being developed to overcome this limitation.

2. Approach

2.1. Structural Analysis

Given a reaction network, we want to compute, for any pair of metabolites and that are, respectively, substrate and product in a reaction (this situation will be indicated by ), the number of alternative pathways, excluding reaction , of length allowing for the conversion , see Figure 1. The rationale is that a reaction performing a metabolite conversion for which (or, more properly, is large will be more easily substituted, in case of an enzymopathy or a knockout, than one for which the above quantity is small.

Finding paths connecting two points of a directed network is a long-studied problem in computer science. The focus is usually on locating the shortest paths or the fastest way to find any path. Enumerating all the distinct paths between two vertices is however a less confronted issue. In our case it is crucial to avoid overcounting, for example, due to self-intersecting paths. Therefore we shall resort to an exhaustive algorithm. We will identify the substitutive paths using the following trick: for each pair of metabolites such that , revert fictitiously. This results in a new graph where an auxiliary edge replaced the edge , see again Figure 1. Counting the number of alternative reaction chains producing from then comes down to computing the number of directed cycles, that is, non self-intersecting directed closed paths along the edges of the new graph, passing through the fictitious edge . Thanks to the limited size of the hRBC network it is possible to solve this enumeration problem exactly via Johnson's algorithm [12], briefly described in the following section. can now be trivially inferred. For simplicity, will denote here the number of reactions in the alternative pathway ( in Figure 1).

2.2. Flux Analysis

The space of viable fluxes will be defined through a constraint-based approach which relies on more general assumptions than flux-balance analysis (FBA, [22]). FBA is the standard method to model steady-state reaction networks where mass balance constraints are imposed to every metabolite. For a reaction network with reactions and metabolites, let us denote by and , respectively, the matrices of output and input stoichiometric coefficients. The stoichiometric matrix is given by . Letting denote a vector of fluxes (with properly chosen bounds ), the concentrations of metabolites vary in time according to , where stands for the net cellular uptake of metabolite ( if is a global output of metabolism, if is consumed by the organism, if is mass-balanced). Assuming a steady state, the concentrations are constant in time (i.e., ) and vectors satisfying , or represent flux configurations ensuring that each metabolite meets its production or consumption constraints at fixed concentrations. As is typically larger than , the system is underdetermined and feasible flux states form a convex set of dimension embedded in the -dimensional space of fluxes. In absence of a selection criterion that allows to pick one solution out of this set (as e.g., a maximum biomass principle), a uniform sampling of the solution space should be carried out. When is sufficiently small (as for hRBCs), this can be achieved effectively, albeit at a considerable computational cost, by Monte Carlo methods [13, 16] or by message-passing procedures [17].

Here we will consider a different but related flux scheme based on Von Neumann's (VN) model of reaction networks [8]. In the VN framework, one fixes the environment through a small set of intakes on nutrients and defines a self-consistent flux problem where the network chooses, given a target growth rate, how much of the nutrients to use and which metabolites are globally produced. Mass balance then emerges as a property of the solutions for some metabolites.

The equations describing the VN model have been studied by statistical mechanics methods in [23, 24]. For an intuitive derivation, note that the quantities and represent, respectively, the total output and the total input of each metabolite for a given flux vector . Then a flux vector such that , with some constant , describes a network state where metabolites are being produced at a rate at least equal to , since for each of them the total output is at least times the total input. It is simple to see that as increases the volume of such flux vectors shrinks continuously (for every flux vector is a solution). In particular, there exists a value of , representing the maximum metabolic production rate compatible with the stoichiometric constraints, above which no suitable flux vectors exist. The presence of conserved metabolic pools [25] implies [26], so that in metabolic networks optimal steady state fluxes correspond to the solutions of

The solutions of (2) do not coincide with those of (1) even for . Interestingly, a finite volume of (optimal) flux states turns out to satisfy the above constraints [8]. This trait is at odds with both the behavior of the solutions of (2) for a random reaction network (where a single solution survives at [24]) and with the optimization that is usually coupled to FBA (where typically a single flux state maximizes the objective function), and points to the robustness of metabolic phenotypes. For E.coli, in particular, the solutions of (2) have been shown to reproduce both the large-scale organization of fluxes and the individual measured rates. In addition, fluxes with the smallest solution-to-solution fluctuations, representing the most susceptible parts of the network, turn out to be strongly correlated with E.coli's phenomenologically essential genes [8]. The main technical advantage in using the VN scheme lies in the fact that its solution space can be sampled uniformly at very modest computational costs even for genome-scale models. The algorithm allowing for this, which has been recently applied to sample E.coli's solution space [8], is detailed in the following section. Its running times for hRBCs are negligible.

3. Methods

3.1. Reconstructed Network

We consider the hRBC metabolic network studied in [16], a map of which is shown in Figure 2; Table 1 lists reactions and the corresponding abbreviations. The network comprises three main pathways, namely, glycolysis (reactions 1–13), the pentose phosphate (PP) pathway (14–21) and the adenosine metabolism, with a total of metabolites linked by reactions: 49 internal reactions (34 of which come from the splitting of 17 reversible processes), 3 auxiliary fluxes to maintain the osmotic equilibrium and the redox state of the cell (ATPase, NADHase, NADPHase) and 7 uptake reactions to guarantee the intake of the necessary nutrients (GLU, ADE, ADO, INO), and of the cytosol elements (O, H, ). The forward and backward parts of reversible reactions are treated separately throughout this study, both in the structural and in the flux analysis.

3.2. Structural Analysis

Structural vulnerabilities are identified by analyzing the loop structure of a modified metabolic reaction network, created from the original one by inverting—in turn—the direction of the single reaction for which we want to compute the replaceability, as explained in Figure 1. The fastest known exact algorithm (for the worst case scenario) of this cycle enumeration problem for a directed graph was introduced by Johnson [12]. We shall now shortly describe its key ideas, referring to [12] for a pseudocode.

Given a directed graph with vertices and edges, the algorithm is designed to build non self-intersecting paths from a root vertex to itself, loading them onto stacks. The main ingredients allowing for an optimal exploration of the graph are (a) a smart choice of the root vertex, and (b) an efficient method to avoid duplicating cycles and repeating searches on the same portions of the graph. To achieve this, vertices are initially ordered in a lexicographic sequence, and the algorithm only selects as roots those nodes that are the “least” vertex (in the initial ordering) of at least one cycle. The algorithm described in [27] guarantees to find such vertices in operations. Moreover, to avoid self-intersections, each time a node is loaded onto a stack it is also given a “blocked” status. It was proven by Johnson that if a vertex stays blocked as long as every path from to the root vertex intersects the current path at a vertex other than , the algorithm outputs all cycles exactly once. By sufficiently delaying the unblocking of each of these vertices and by keeping track of the portions of the graph that have been searched holding the current stack, the maximum time that can elapse between two consecutive cycle outputs can be reduced to . The same holds for the time window before the first cycle is delivered and for the one after the output of the last cycle. Hence, the total time needed to list the, say, cycles of the graph is . In our case, each fictitious reaction reversal generates a new graph, so that computing the complete substitutability map for a network of reactions requires a time of the order . For practical reasons, we perform this analysis on the bipartite metabolic network (as in Figure 1) rather than the reduced network of Figure 2. This implies that in our case .

One can in principle consider different measures of replaceability of a metabolic conversion . The quantity , counting the total number of paths alternative to from to of any length, is perhaps the most obvious option. Taking into account the fact that, typically, longer detours can be less convenient than shorter ones from an energetic viewpoint one could instead consider -weighted functions like , with the caveat that shorter pathways might require more ATP than longer ones. -based and -based rankings of metabolic conversions are rather different. They are fully available from http://chimera.roma1.infn.it/SYSBIO. To focus on the basics, here we limit ourselves to identifying three key reaction groups that are independent of the replaceability measure used: (a)the group of reactions such that each substrate-product pair involved in them can be substituted (this is putatively the part of the network that is most robust to enzymopathies); (b)the group of reactions that cannot be substituted, corresponding to the most harmful enzymopathies; (c) the group of reversible reactions that are only replaceable in one direction, corresponding to the situation in which a conversion can only be substituted in one direction in case of a knockout.

All essentiality maps we show relate to this classification. Note that, for topological reasons, intakes are not replaceable.

3.3. Flux Analysis

Optimal flux vectors, that is, solutions of (2), are computed by the algorithm introduced in [24] based on [28]. The idea is to modify fluxes iteratively until all inequalities in (2) are satisfied. Specifically, for a fixed (with in our case) define and let denote the rows of , for . Let also, for each iteration step , be the flux vector at step and At each , the algorithm runs as follows. If , update fluxes according to and iterate in . Else, if stop, that is, is a solution.

Convergence to a solution is rigorously ensured for all , and can be approximated with the desired resolution by iterating the above process for increasing values of [24]. To guarantee that solutions are well defined one can either resort to setting fixed upper bounds on 's or, as we do, impose a linear constraint of the form on the solutions (this is equivalent to singling out one flux as the reference unit for the other fluxes). It is convenient to initialize the algorithm with a random vector . Different initial points generate trajectories to different solutions at and the sampling of the solution space thus obtained turns out to be uniform [8].

Contrary to FBA, the solution space of VN's model is generically not a polytope. Indeed much useful information can be retrieved from its shape. As a means to characterize it we employ the average overlap between different optimal flux vectors, defined as follows. Let and denote two distinct solution vectors of (2) and, for each flux , let This quantity, called the “overlap” between solutions and , equals if flux takes on the same value in solutions and and decreases as the values differ more and more. Averaging over different pairs of solutions provides a measure of the allowed variability of flux (smaller variability corresponds to larger average overlap), complementary to the standard deviation of the resulting flux distribution. The complexity of the solution space can then be roughly understood by distinguishing narrower directions with larger overlap or less variable fluxes from broader ones. It is reasonable to think that a cell will be more sensitive to perturbations (e.g., knockouts) of fluxes with larger overlap. Analyzing the susceptibility of the solution space to perturbations along the directions identified by different fluxes then allows to extract a list of the potentially more deleterious perturbations, in analogy with previous work on E.coli [8].

3.4. Response to Enzymopathies

In order to test the hRBC network against enzymopathies, we can focus on two types of perturbations. One can first employ a structural criterion: the knockout of a metabolic conversion that is less easily “substituted” is more likely to be deleterious for the cell than the knockout of a highly replaceable conversion. As said above, we concentrate here on a coarse-grained view of replaceability based on classifying reactions into the groups (a), (b), and (c) defined above, with groups (b) and (c) containing potentially essential reactions. The second criterion is based on fluxes: fluxes with smaller allowed variability (i.e., larger overlap) in the healthy cell are more likely to be essential links of the network than fluxes whose value can be changed over a larger range without losing optimality.

As is to be expected, the essentiality maps produced in these ways have a large degree of similarity, and reactions in the group (b) discussed above coincide with the physiologically most critical parts of hRBC's metabolism. The simplest way to simulate an enzymopathy on flux is to constrain its value below a certain upper bound . Deficiencies can be partial, that is, of a smaller degree, the closer is to the upper limit of the allowed range in the healthy cell, or total if . Such constraints cause in principle a modification of the solution space along the direction which in turn cascades on the entire volume, modifying the optimal states of the metabolic network.

4. Results

4.1. Structural Analysis

The substitutability map derived from the loop analysis is displayed in Figure 3. (For the sake of simplicity we exclude the highly replaceable currency exchange fluxes from this discussion.) The most replaceable core of the network lies in the PP pathway (reactions 17–21), which constitutes the main source of NADPH, the key metabolite that in erythrocytes limits the accumulation of peroxides protecting the cell from hemolysis. The high reliability coming with replaceability partly explains the reason why this group of reactions plays a central role not just as an auxiliary pathway for glycolysis, see the following analysis of fluxes. Unreplaceable reactions are instead lined up along glycolysis (numbers 1,6,8–13), in the bridge between glycolysis and the PP-pathway (14 and 16) or in auxiliary modules (22, 27, 29; the ADEAMP conversion in 25 is also not replaceable being directly linked to the ADE uptake). The physiologically most deleterious knockouts (HK, PK, and G6PDH) all belong to this group. For instance, deficiency in the level of G6PDH is the basis of different types of hemolytic anemias, including favism, and is also linked to malaria resistance [29]. Finally, there is a group of reversible reactions (numbers 2, 4, 7, 15, 23, 26) that can be replaced only in one direction. Note however that the last three of these could still be replaced in case of an enzymopathy if a proper medium is selected. For instance, if reaction 15 is removed, it could be substituted by an alternative chain of reactions provided 6PGC is externally supplied. This is instead not possible for reaction 4 and possibly 23 (depending on the directionality of reaction 26), as a knockout in these cases would necessarily result in a net production of FDP and R1P.

4.2. Flux Analysis

The flux distribution corresponding to optimal states in the healthy and enzyme deficient hRBC are displayed reaction by reaction in Figure 4, obtained by sampling 10000 solutions of (2), while a pictorial representation of the optimal flux states is given in Figure 5. For the healthy cell (black line in Figure 4 and top left panel in Figure 5) the large flux backbone is formed by the second part of glycolysis (crucial for ATP, NADH, and 23DPG production) and the PP pathway (NADPH production). The latter gives a substantial contribution to the former, not just as salvage way. The adenosine metabolism shows instead lower flux values. In addition to GLU, which is the fundamental substrate for hRBCs, the INO uptake plays an important role as an alternative way to the PP pathway. It is worth stressing that these solutions imply a net production of 23DPG, the crucial regulator for oxygen release, which is obtained without any imposed constraint. This picture is strongly reminiscent of the first eigenpathway obtained by extreme pathways analysis in [19], though the thermodynamic constraints and production requirements used in [19], including one on 23DPG, are more strict than the self-consistent analysis presented here. Comparing the distributions with FBA studies on the same system [16], one notices instead a general rearrangement of fluxes in the network apart from glycolysis. A close inspection reveals that such a rearrangement is mostly quantitative, as preferred reaction directions are generically preserved, the noteworthy exception being the RPI flux, that in the VN solution strengthens the PP pathway with respect to the FBA solution. This scenario is not surprising in view of the basic difference between FBA and the VN approach. It should be kept in mind however that the a priori constraints on flux variability are quite more strict in FBA than they are in the VN model, and the flux distribution appear to be particularly sensitive to the assumed upper and lower bounds for the fluxes.

In Figure 6 we report the overlap map of the hRBC. Comparing this with Figure 3 one sees that the large overlap backbone (signaling dynamically stiff fluxes) coincides to a large degree with the structurally most vulnerable parts of the network. Note that the overlap of reactions 2, 4, 15 and 26 is larger in the direction that cannot be replaced, further pointing to a higher susceptibility, and that currency reactions (31–35) belong to the most constrained part of the network. Revealingly, however, topological and dynamical characterizations prove to be complementary in some cases. This is seen, for example, from reaction 3, which is flux-constrained but also highly replaceable, so that the damage due to removal is limited even in presence of a small allowed dynamical range. (A similar picture holds for reaction 23.)

To conclude, we remind that in our framework uptake fluxes are optimized variables not fixed by boundary conditions. In the optimal state five of the uptakes have a limited allowed variability, implying rather severe constraints on the cell's environment.

4.3. Response to Enzymopathies

We have simulated the most studied enzymopathies by constraining the flux of the corresponding reaction. Generically speaking, the hRBC metabolism displays a large resilience against partial perturbations. Indeed, we have observed appreciable differences in relevant cellular functions compared to the nondeficient case only under full enzyme deficiencies, as also observed in [11] within a standard FBA optimization approach. Even under the most serious enzyme deficiencies the network appears to be able to maintain the production of ATP, NADH and NADPH almost constant, see also [30]. We focus here on PK and G6PDH deletions. As shown in Figure 4, the alterations in the flux distributions are not particularly striking and indeed we do not observe global flux rearrangements on the network's scale. The G6PDH enzymopathy appears to only cause local changes, confirming the structural predictions, the overlap calculations and also in agreement with clinical observations [31]. The response to PK knockout is instead more marked. The synoptic analysis of Figure 5 shows that in general the response to the perturbation consisted in a drop of the GLU uptake, and in a reduction of the glycolytic flux, while the Rapoport-Leubering shunt (reactions 8-9) for the production of 23DPG remains particularly stable, as does the adenosine metabolism. For the glycolytic deficiencies PK and HK we further observe an increase of the INO uptake to sustain the PP pathway and allow for the second part of glycolysis, and with it the production of ATP and NADH, to take place. Detailed flux configurations corresponding to the next most severe enzymopathies (HK, EN, PGK and PGM) are available from http://chimera.roma1.infn.it/SYSBIO.

5. Final Remarks

In this work we compare two robustness measures for biochemical networks, one based on structural properties (the reaction replaceability), the other based on dynamical stiffness (the overlaps). The former can be exactly assessed by enumerating the alternative paths joining substrates and products of a given reaction in a network. The latter depends on both the network topology and the model defined on it. Within VN's frame, we found that unreplaceable reactions mostly correspond to processes with a smaller allowed flux variability. In such directions, reaction removals as well as constraints on the fluxes are expected to be generically harmful. Reactions with limited (but non zero) replaceability tend to have instead smaller overlap, so that while the reaction is difficult to substitute still its flux can be largely adjusted. In an evolutionary perspective [32], the former pathways appear as “frozen”, and perturbations at these nodes will require large-scale flux rearrangement, while a mutation affecting the latter group may be neutral and could be preserved across generations. Interestingly, some reactions have both a large overlap and a large replaceability. These, albeit structurally robust, are dynamically constrained and should be considered as essential pathways of the metabolism as well. Integrating dynamical and structural characterizations may thus provide a rather complete picture of the emerging network robustness. The fact that topological and dynamical essentiality may not coincide could also prove to be important in view of the present challenges to understand the dynamical basis of topological modularity [33].

Extended flux state sampling was achieved here by an algorithm that is easily scalable to larger networks, see [8]. Exhaustive structural analysis instead was made possible by the small size of hRBC's metabolic network, which has served here as a model system to test basic concepts and algorithms. The use of the same procedure on a larger network, such as E. coli's, is likely to be prevented by CPU time growth. However, message-passing algorithms, designed specifically to solve combinatorial optimization or counting problems—albeit approximately—on graphical models, may be a suitable replacement [34].

Acknowledgment

The authors would like to express their pleasure to thank S. Jain for pointing them to [7] and D. Segrè for important comments and suggestions. This work was supported by the IIT (Italian Institute of Technology) through the Seed Project DREAM.