Complexity

Volume 2018, Article ID 5980636, 14 pages

https://doi.org/10.1155/2018/5980636

## Dynamical Criticality in Gene Regulatory Networks

^{1}Department of Physics, Informatics and Mathematics, University of Modena and Reggio Emilia, via Campi 213a, 41125 Modena, Italy^{2}European Centre for Living Technology, San Marco 2940, 30124 Venezia, Italy^{3}Institute for Systems Biology, Seattle 401 Terry Ave N, Seattle, WA 98109, USA

Correspondence should be addressed to Marco Villani; ti.erominu@inalliv.ocram

Received 22 April 2018; Revised 3 August 2018; Accepted 27 August 2018; Published 4 October 2018

Academic Editor: Eulalia Martínez

Copyright © 2018 Marco Villani et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

A well-known hypothesis, with far-reaching implications, is that biological evolution should preferentially lead to states that are dynamically critical. In previous papers, we showed that a well-known model of genetic regulatory networks, namely, that of random Boolean networks, allows one to study in depth the relationship between the dynamical regime of a living being’s gene network and its response to permanent perturbations. In this paper, we analyze a huge set of new experimental data on single gene knockouts in *S. cerevisiae*, laying down a statistical framework to determine its dynamical regime. We find that the *S. cerevisiae* network appears to be slightly ordered, but close to the critical region. Since our analysis relies on dichotomizing continuous data, we carefully consider the issue of an optimal threshold choice.

#### 1. Introduction

The idea that dynamically critical states possess some peculiar advantages, so that they tend to be selected under evolutionary dynamics, has relevant implications in biology [1, 2] as well as in evolutionary computation and in the design of artificial systems [3]. This idea, which will be referred to as the *criticality hypothesis* in this paper, has been suggested by several authors [1–6] and has played a prominent role in the search for general principles in complexity science [7–9]. It is based on the observation that, in nonlinear dynamical systems, the same set of equations can lead to very different behaviours (depending upon the values of some parameters).

Let us consider in particular dissipative deterministic dynamical systems, whose long-term behaviour is dominated by the system attractors. While different types of attractors are observed in different systems, they can generally be classified as either ordered or disordered (chaotic): for example, in time-continuous systems, the former includes fixed points and limit cycles, while the latter is associated with chaotic dynamics. In many systems, there are regions of parameter space corresponding to ordered behaviours, while parameter values in other regions correspond to disordered behaviours. The separatrices between ordered and disordered regions of parameter space are referred to as *dynamically critical regions*, and the states belonging to these regions are called *dynamically critical states* [10, 11]. Critical regions have also been said to be “at the edge of chaos” [1, 5].

According to the criticality hypothesis, systems that are in critical states should be able to perform better than the others, when interacting with an external *environment*, because ordered systems can be too rigid and incapable to react to changes, while chaotic systems can be too susceptible to small changes to robustly lead to a reliable behaviour [1, 2]. The above statement requires, of course, that there should be a method to evaluate the quality of a system with respect to an environment, i.e., its fitness (biological or artificial).

Studies of artificial evolutionary systems have shown that the criticality hypothesis cannot be true for every kind of system and every kind of task, while it can hold for large classes of systems and tasks [3]. A very important question is whether the hypothesis is valid for biological systems: in this case, the hypothesis translates into the claim that biological evolution has modified the system parameters in such a way that they are likely to be found in critical regions (evolution “towards the edge of chaos”, see Torres-Sosa et al. [12] and Hidalgo et al. [13]). In biology, two possible amendments to the criticality hypothesis have been proposed:

*Hypothesis 1. *Living systems are likely to be found in the ordered region, but close to the critical boundary [1, 2] in order to be able to react to environmental stimuli and/or, on a longer timescale, to favour evolvability [14].

*Hypothesis 2. *In biological systems, the discontinuities are considerably smoother than the sharp ones observed in physical systems [15] and thus different regimes should still be observed, but they might be separated by a broader transition region.

For recent reviews of the criticality hypothesis, see Mora and Bialek [16] and Roli et al. [17].

Testing the criticality hypothesis in real biological systems, by analyzing the activation behaviour of various genes in cells that are subject to a permanent perturbation, is the topic of the present paper. In order to relate the experimental data with “abstract” dynamical regimes, it is necessary to make use of models: interestingly, our analysis exploits a model that has played a prominent role in the development of complexity science, namely, Random Boolean Networks (RBNs for short, see Kauffman [1, 2, 18], Bastolla and Parisi [19], and Aldana et al. [20]).

In previous papers [21–23], we analyzed the behaviour of the RBN dynamical model for gene regulatory networks, comparing its outcomes with experimental results concerning the change in gene expression levels due to the knockout of single genes in the yeast *S. cerevisiae* [24]. The major outcomes of these studies were (i) the demonstration that even strongly simplified models like RBNs can accurately describe the actual size distribution of the perturbations and (ii) the identification of a way to experimentally test the hypothesis that living beings are preferentially found in dynamically critical states.

The criticality hypothesis can be tested by studying the changes induced by a single-gene knockout (the permanent *silencing* of a single gene) on the expression levels of all the other genes. The genes whose expression levels are significantly modified by the knockout will be referred to as *affected*, and the set of such affected genes will be called an *avalanche* (see also Section 4). The total number of genes that belong to an avalanche is its *size*, and, in those previous works, it has been shown that the size distribution of avalanches provides information about the dynamical regime of the gene regulatory network.

We recently achieved new results concerning both the behaviour of the model and the outcomes of its comparison with experimental data. This has been possible because both new experimental data became available [25], and new theoretical results were derived. In this paper, we describe these new theoretical results and we apply them to the now available larger set of experimental data.

Specifically, we extend the analysis of the RBN model behaviour by providing an analytical formula for the theoretical distribution of noninterfering avalanches (defined in Section 4) of any size, while, in our previous papers, the analytical distribution was limited to the smallest avalanches, leaving the study of larger ones to simulation only. Moreover, we report numerical results for self-interfering avalanches (also defined in Section 4).

On the empirical side, using the recent experimental data of [25], we obtain a maximum likelihood estimate (MLE) of the Derrida exponent ; this is a parameter used to characterize the dynamical regimes of RBNs [26, 27] that fully determines the theoretical avalanche distribution, which makes inference possible from avalanche data. We also assess the uncertainty accompanying our estimate by means of a jackknife estimate of bias and standard error.

In order to compare a Boolean model with real-valued data like those obtained from microarray experiments, it is necessary to introduce a threshold: the smallest change in the expression level of a gene in the model is a switch from 0 to 1, or vice versa, so we need to dichotomize the experimental data. This can be done by considering a gene as modified only if the ratio of its expression level in the knocked-out network to that in the wild-type network is larger than a certain threshold, or smaller than its reciprocal.

In our previous studies, fairly high threshold values (≥4) had been considered, based upon some heuristic or rule-of-thumb inspired by experimental biologists. Here, taking advantage of the analytical formula for avalanches of any size, we can study how the MLE of varies as a function of the threshold. If the plot of MLE vs. threshold had shown a large plateau, one might have claimed that the actual value of the Derrida parameter is scarcely affected by the choice of the threshold in a wide range. Such a plateau does not show up, because the MLE monotonously decreases as the threshold increases, but the picture that emerges from our data analysis is perhaps even more interesting.

With no claim of generalization beyond the data of Kemmeren et al. [25], we have two heuristic arguments that suggest an optimal threshold in a neighbourhood of 3. The first argument is based on the shape of the avalanche size distribution: on general grounds, it is to be expected that smaller avalanches are more frequent than larger ones but, if we use too small a threshold, we observe an initial increase of the empirical frequency as the size grows. The are no plausible reasons for this, but it can be argued that, when the threshold is very small, one actually observes the properties of the noise present in the data; so it might be appropriate to choose a threshold slightly larger than the smallest value that does not display this unphysical behaviour. In this way, we find 2.5 as the minimal threshold value. Note that this argument does not make use of the RBN model, but it is based only upon the data plus a reasonable assumption. The second argument is based on measuring the RBN model fit to the data: here, one observes that the discrepancy between the estimated theoretical distribution and the empirical distribution in the data achieves its minimum at either 3 or 3.5, depending on whether it is measured by the total variation distance or the Kullback–Leibler divergence (see Section 6 for details).

The two arguments point to an interval of plausible threshold values between 2.5 and 3.5, and, while they may not be decisive, they stand together in favour of threshold values within that particular region. With the value 3 chosen as threshold, the MLE of the Derrida parameter lies in the ordered region, not far from the order-disorder boundary, but with the critical value well beyond the margin of error. Furthermore, a supplemental Bayesian analysis assigns negligible posterior probability to the criticality hypothesis. This was one of the scenarios initially suggested by Kauffman [1], and it was also supported by the old analyses from our group [22] and other [28] group. The new data appear to confirm the validity of such a scenario, although clearly no definitive conclusion can be drawn from the analysis of a single organism.

Note that our analysis is not based on a direct analysis of the behaviour in time of the *S. cerevisiae*, which would require dynamical experimental data, but relies on a dynamical model to draw inferences, from static data, on the dynamical regime of the microorganism. As discussed above, it is indeed possible to determine the avalanche size distribution in the model as a function of the Derrida parameter and to compare it with the size distribution that is actually observed in *S. cerevisiae*. This comparison allows one to draw meaningful, albeit indirect, conclusions on the dynamical regime of the biological system.

The outline of the paper is as follows. In Section 2, we provide a general introduction to our modeling framework, while, in Section 3, we specifically review the RBN model. In Section 4, we introduce the definition of noninterfering avalanches and we show that their distribution is correctly approximated by an analytical formula whose derivation is postponed to Section 5. In Section 6, we compare the analytical formula derived in Section 5 with the experimental data collected by Kemmeren et al. [25]. Finally, in Section 7, we summarize our main findings and highlight some open questions.

#### 2. Generic Properties of Dynamical Systems

We are interested in stylized models that allow us to explore the generic properties of biological systems, in particular gene regulatory networks. Of course, it is perfectly legitimate to concentrate on a particular hypothesis or a specific genetic-metabolic circuit and to develop a comprehensive model, well-suited to study the details of the system under examination, according to the *system biology* approach [29]. However, these comprehensive models are not suitable for highlighting the general principles of biological organization, which apply to all living beings or at least to their broad classes. We know that there are some principles of this kind, including biological evolution and cellular organization (see chap. 2 of Serra and Villani [30] and the references therein). The study of this kind of phenomena is indeed the challenge of *complex systems* biology [31], which looks for general principles in biological systems.

The criticality hypothesis is a noteworthy example of candidate general principle. As discussed in Section 1, it can be tested by comparing models of biological systems with data, e.g., models of gene regulatory networks with actual gene expression data. The use of data for this purpose is different from the more common use of the same data to infer information about the interactions between specific genes. In testing the criticality hypothesis, interest lies in global properties of gene expression data, like the avalanche size distribution or some information-theoretic measures [32, 33]. These features of the data are compared to the corresponding predictions from a stylized model, which can be ordered, chaotic, or critical.

A suitable stylized model for gene regulatory networks is an ensemble of networks. In this paper, we obtain an ensemble of networks by generating them from a random system (with random connections and random Boolean functions) while keeping some of its features fixed (e.g., the average number of connections per node). By comparing experimental data to the properties of an ensemble of networks, it is possible to draw inferences on the values of the parameters that define the ensemble (random system). It is therefore possible to compare the distribution of avalanches in real organisms to that in RBN ensembles with different values of the Derrida parameter, and this comparison provides us with information on whether real cells are critical or not. This is a very interesting example of the way in which stylized models can be used to find generic properties, which cannot be read directly from the data, but can be inferred from a comparison between patterns in the data and model predictions.

In recent years, our knowledge about the gene regulatory network of *S. cerevisiae* has considerably improved [25, 34–36] thanks also to initiatives such as DREAM [37, 38]. Nevertheless, we do not know enough details to derive definitive conclusions about its general topology. Indeed, one of the broadest and more accurate reconstructions available of the gene regulatory network of *S. cerevisiae*, summarized by Ma et al. [39] and more comprehensively described in further references therein, shows the presence of very few well-connected genes immersed in a huge “sea” of poorly connected nodes. However, leaving aside the very few well-connected genes, the distribution of connections does not show a very heavy tail, and the subnetwork connecting the well-connected genes is very sparse. Therefore, neither Erdös-Rényi nor scale-free topologies [40, 41] appear to be entirely appropriate. In this paper, we stick to the simplicity of Erdös-Rényi networks, but discuss scale-free alternatives in Section 7.

#### 3. Random Boolean Networks

We provide below a synthetic description of the RBN model and its main properties, referring the reader to Kauffman [1, 2], Bastolla and Parisi [19], and Aldana et al. [20], for more detailed accounts. Several variants of the model have been presented and discussed [42] and even used in knockout experiments [43, 44], but we will here restrict our attention to the *classical* model. A classical RBN is a stochastic dynamical system composed of nodes, corresponding to genes, which can take either the value 0 (inactive) or the value 1 (active). Let be the activation value of node at time , and let be the vector of activation values of all the genes. The relationships between genes are represented by directed links and Boolean functions, which model the response of each node to the values of its input nodes. In a classical RBN, each node has the same number of incoming connections, and its , say, input nodes are chosen at random with uniform probability among the remaining nodes. In this way, the number of outgoing connections of any given node approximately follows a Poisson distribution with mean for large . The Boolean functions can be chosen in different ways: in this paper, we will only examine some cases where they are chosen at random with uniform probability in a predefined set of allowed transition functions.

In the so-called *quenched* model, both the topology and the Boolean function associated to each node do not change in time. The network dynamics are discrete and synchronous, so fixed points and cycles are the only possible asymptotic states for finite networks. Note that a single RBN can have, and usually has, more than one attractor. The model shows two main dynamical regimes, ordered and disordered, depending upon the degree of connectivity and upon the Boolean functions: typically, the average cycle length grows as a power law of the number of nodes in the ordered region and diverges exponentially in the disordered region [1]. The dynamically disordered region also shows sensitive dependence upon the initial conditions, not observed in the ordered one.

It should be mentioned that some interesting analytical results have been obtained by the so-called *annealed* approach [27] in which the topology and the Boolean functions associated to the nodes change at each time step. Several results for annealed networks appear to hold also for the corresponding ensembles of quenched networks, as it has been verified using numerical simulations. Although the annealed approximation may be useful for analytical investigations [20, 45, 46], in this work, we will always be concerned with quenched RBNs, which are obviously closer to real gene regulatory networks, where the topology and the regulatory effects do not undergo continuous changes.

A very important aspect concerns how to determine and measure the dynamical regime of RBNs: while several procedures have been proposed, an interesting and well-known method directly measures the spreading of perturbations through the network. This measure involves two parallel runs of the same system, whose initial states differ for only a small fraction of the nodes. This difference is usually measured by means of the Hamming distance , defined as the number of genes that have different activation values on the two runs at time ; the measure is performed on many different initial condition realizations, so one actually considers the average value , but we will omit below the somewhat pedantic brackets. If the two runs converge to the same state, i.e., , as , then the dynamics of the system are robust with respect to small perturbations (a signature of the ordered regime), while if grows in time (at least initially), then the system is in the disordered regime. The critical states are those where initially remains constant.

If a single node is perturbed, the average number of different nodes at the following time step will be equal to the average number of connections per node times the probability that a node changes value if one of its input changes. This quantity, known as the Derrida parameter [26, 27, 47], determines the dynamical regime of the network: if , at each step, more nodes will be changed, and the perturbation will grow (chaotic regime); if , after a few steps, the perturbation will die out (ordered regime). The value is critical (separating chaos from order). The presence of a phase transition in RBNs at a critical value of the Derrida parameter is also confirmed by information-theory-based analyses of their dynamics [17, 33, 48].

In the classical model of RBNs, Boolean functions are often chosen at random among all those with inputs, but a detailed study of tens of actual genetic control circuits [49] has shown that, in real biological systems, only so-called canalizing functions are found: a function is said to be canalizing if there is at least one value of one of its inputs that uniquely determines its output. It may therefore be interesting to consider models where only canalizing functions are allowed. Moreover, if we associate the value 0 to inactivity, a node that is always 0 will never show its presence, so it may also be interesting to consider models where the null function is excluded [23].

#### 4. Perturbations in Random Boolean Networks

As discussed in Section 1, we can compare what happens in the wild-type and in the knocked-out RBN. In the beginning, a single node (the knocked-out one, also called the root of the perturbation) differs in the two cases, so the initial size of the avalanche is 1. Let be the set of nodes that directly receive input from the root: if no node in changes its value, the avalanche stops at the root, and it turns out to be of size 1; otherwise, the perturbation spreads, and it can later reach nodes that receive inputs from nodes in (possibly through other intermediate nodes). After transients have died out, one can compare the two cases and see how many nodes take different values at least once: these are the nodes that have been *affected* by the perturbation, and their number is the *size* of the *avalanche* induced in that particular network by that particular knockout.

The above definition of avalanche is applicable to simulated data from individual realizations of RBNs. Experimental data obtained by microarray technologies, such as those collected by Hughes et al. [24] and Kemmeren et al. [25] are aggregated data recording gene expression changes in groups of similar cells, and an experimental avalanche will consist of those genes whose expression has changed after a single gene deletion. We will assume that a gene whose expression has changed in the real data corresponds to an affected node in the dynamical model, so we will directly compare the size of the avalanche in the model to the size of the experimental avalanche. For a proper comparison, as already noted in Section 1, it is necessary to dichotomize the experimental data; we will discuss in detail the choice of the dichotomization threshold in Section 6. The comparisons discussed in this section concern only the behaviours of two different models, the wild-type and the knocked-out RBN.

As discussed in detail by Serra et al. [22], some simplifying assumptions can be introduced, which are very well satisfied by the *S. cerevisiae* case.

*Assumption 1. *The network is sparse, i.e., both the number of incoming and outgoing links per node is much smaller than the total number of nodes.

*Assumption 2. *The avalanche size is small with respect to the total number of nodes.

A straightforward consequence of these hypotheses is that the probability that a gene, whose value has changed, feeds back its change into one of its input genes is negligible, and so is the probability that more input nodes of a gene change their values. Under these assumptions, the spreading of an avalanche can be described by a tree, since in these cases, every affected node can be regarded as the root of a subavalanche, which evolves independently of the other subavalanches. We will refer to avalanches of this type as *noninterfering* avalanches, while a *self-interfering* avalanche will be an avalanche where at least one node has two changed inputs (in the knocked-out network with respect to the wild-type one).

In synthesis, if Assumptions 1 and 2 above hold, the number of self-interfering avalanches is very small, and it suffices to consider noninterfering avalanches. From now on, in this paper, the term “avalanche” will be used for noninterfering avalanches unless otherwise explicitly stated. It can be shown [22] that, in the case where self-interfering avalanches are excluded, the probability that the knockout of a randomly chosen node gives rise to an avalanche of size , say depends only upon the probability distribution of the number of outgoing connections. In the classical RBN model considered in this paper, where every node has incoming links chosen at random with uniform probability among all other nodes, when the network is large, as anticipated in Section 3, the outdegree distribution is known to be approximately Poissonian: where denotes the probability that a node chosen at random has outgoing connections and is the average number of outgoing links; note that , because every outgoing link of a node is an input connection for another node. For instance, one finds in the simple case of RBNs with two inputs.

More specifically, it can be shown [22] that the avalanche size distribution only depends on the *probability generating function* of the outdegree distribution computed at the probability that a node chosen at random is not affected when exactly one of its inputs is affected:
in the Poissonian approximation. Serra et al. [22], for RBNs with two inputs, found when all Boolean functions are allowed, when only canalizing functions are allowed, or when the null function is also excluded.

It is immediate to check that depends upon and only through . This quantity is the product of the average number of connections per node times the probability that a node changes value if one of its input changes, that is, the Derrida parameter introduced in Section 3 (see also Serra et al. [22]). It follows that determines the avalanche size distribution, and we will show in Section 5 that upon knockout of a single gene (for noninterfering avalanches). For instance, in the case of RBNs with two inputs, we have (critical value) when all Boolean functions are allowed, when only canalizing functions are allowed, and when the null function is also excluded (two instances of ordered regime).

Figure 1 shows that (3) does well approximate the size distribution of avalanches simulated from RBNs with 1000 nodes, two inputs per node, and all Boolean functions allowed (with uniform probability). On the other hand, one can see from Figure 2(b) that the distribution of avalanches in simulated networks with 20 nodes (two inputs per node and only canalizing functions allowed) is not satisfactorily described by (3). This is unsurprising, because we know that Assumptions 1 and 2 above break down for small networks, where self-interference comes into play. Indeed, as illustrated in Figure 2(a), once the simulated avalanches that actually interfere with themselves have been identified and removed, the remaining ones are no longer at odds with the theoretical formula (see also Di Stefano et al. [21]). We obtained similar results for networks of different sizes: as it should be expected, *ceteris paribus*, the fraction of self-interfering avalanches is a monotonous decreasing function of the network size .