Abstract

From the viewpoint of thermodynamics, gene transcription necessarily consumes free energy due to nonequilibrium processes. On the other hand, regulatory molecules present on the core promoter of a gene interact often in a dynamic, highly combinatorial, and possibly energy-dependent manner, leading to a complex promoter structure. This raises the question of how gene transcription with general promoter topology consumes free energy. We propose a biophysically intuitive approach to calculate energy consumption (quantified by the production rate of entropy) of a gene transcription process. Then, we show that the numbers of the ON and OFF states of a promoter can reduce energy consumption of the gene system and the Fano factor of mRNA, and in contrast to other regulatory ways, the cooperative binding of transcription factors to DNA sites always reduces energy consumption but amplifies the mRNA noise. While our proposed approach is general, our obtained qualitative results can in turn be used to the inference of complex promoter structure.

1. Introduction

Gene expression is complex. Apart from the genetic information flow described by central dogma in biology, gene expression would involve other dynamical subprocesses such as switching between transcriptional active and inactive states [1, 2], recruitment of transcription factors (TFs) [3, 4], and feedback regulation [5, 6]. All these processes are biochemical, giving rise to stochastic fluctuations in the mRNA abundance. This stochasticity (often referred to as gene expression noise) is important for the maintenance of cellular functions and the generation of cell phenotypic diversity. Revealing gene expression mechanisms using stochastic models is a significant step toward understanding intracellular processes but is also a challenging task.

Many gene models, such as stochastic telegraph models [79], three-stage model [10], and gene models with feedback of various forms [1114], have been proposed to study the stochastic mechanisms of gene expression from different viewpoints. Although these models have successfully interpreted some biological phenomena observed in experiments [1214], they assume that the gene promoters have only one transcriptionally active (ON) state and one transcriptionally inactive (OFF) state and there are transitions between these states. This assumption is not reasonable in many situations. In fact, even for bacterial cells, the promoters that are viewed as simple can exist in a surprisingly large number of regulatory states. For example, if the PRM promoter of phage lambda in E. coli is regulated by two different TFs binding to two sets of three operators that can be brought together by looping out the intervening DNA, the number of regulatory states of the PRM promoter is up to 128 [15]. In contrast, eukaryotic promoters would be more complex since they involve other processes such as nucleosomes competing with or being removed by TFs [16]. Apart from the conventional regulation by TFs, the eukaryotic promoters can be also epigenetically regulated via histone modifications [17, 18]. All these may lead to complex promoter topology or complex promoter kinetics.

On the other hand, gene transcription depending on promoter structure is a nonequilibrium process from the viewpoint of thermodynamics. It has been shown that promoter kinetics regulated by TFs and/or other unspecified molecules can be expressed in terms of free energy [1923]. This constitutes a generalization of thermodynamic methods by extending the range of systems that can be represented (i.e., including energy-consuming systems such as those of gene transcription with complex promoter structure) and the type of metrics that can be predicted (i.e., including measures of dynamic and stochastic properties) [2325]. The usual thermodynamic formulation of cooperative and competitive association and dissociation of TFs [26] is equivalent to assigning the Gibbs-free energy to each promoter state. This representation allows one to predict equilibrium steady states (by applying a Boltzmann factor) and has been widely used to investigate the mean aspects of prokaryotic regulation [27]. However, the representation has also drawbacks; for example, it limits the analysis to energetically closed systems and forbids any investigation of the stochastic aspects of gene transcription since the representation cannot carry any kinetic information. In a word, the question of free energy consumption in gene transcription has been fully unsolved so far but has caused concerns in recent years.

In this paper, we introduce an extra set of energy values (i.e., the free energy of the activation barrier for each reaction involved in the promoter kinetics). Although difficult to access experimentally [28] and approximate to realistic cases, they can be represented in a matrix with elements being the known functions of kinetic parameters of the promoter [23, 28, 29]. Consequently, the steady-state energy consumption rate (characterizing energetic cost of promoter kinetics) can be easily calculated. In order to show the effectiveness of this method, we analyze several gene models with a representative promoter structure and derive analytical results for the corresponding energy consumption rates, which are numerically verified.

2. Models, Methods, and Theory

First, we simply introduce a biological prototype of gene expression (Figure 1(a)). To start with the expression of a gene (a DNA sequence), it is in general needed to recruit first transcription factors, histone kinases to the promoter through transcription activators, and then histone acetyltransferase complexes and other complexes to the promoter. All these histones are modified to recruit RNA polymerase II and general transcription factors to DNA, so that transcription is initialized and activated [30, 31]. This process would simultaneously accompany some repressors that inhibit transcription, until the whole part of the transcription initiation complex leaves the DNA sequence and returns to the initial state. Then, we map this biological prototype into a theoretical model of gene expression (Figure 1(b)).

We point out that Figure 1 only depicts an example for gene expression where we assume that the DNA is transcribed only when the gene is in ON state (implying the assumption that no transcription occurs in OFF state). The realistic cases of gene expression (in particular transcription) would be more complex.

In the following analysis, we will separately consider stochastic gene expression models of four kinds of promoter structures: (1) single ON and multi-OFF states, (2) single OFF and multi-ON states, (3) transcription factor dual repression, and (4) transcription factor dual promotion. We will calculate the free energy in each of these four cases and analyze the relationship between gene expression and energy dissipation.

2.1. A General Theory
2.1.1. Approximate Calculation of Probability Distribution

Recall that, for a general chemical reaction system, if we let be the probability that the system is in state at time , then the corresponding chemical master equation takes the formwhere is the probability flux with being the transition probability from state to state . Now, consider a gene model at the transcription level, where the promoter structure is general; that is, the promoter may have arbitrarily many transcriptional activity (active or inactive) states and there are transitions among these states (Figure 1).

Assume that the promoter has N states in total, including on states and OFF states. Let represent the probability that the gene dwells in state . According to the total probability principle, we apparently have the identity . Moreover, based on the promoter structure, we can directly write the following master equation for variables :where is an matrix, describing the transitions between promoter activity states and is an -dimensional vector. Note that is actually an -matrix (i.e., the sum of every row is equal to zero), implying that equation (2) at steady state has infinitely many solutions. But we have the conservative condition . Therefore, all can be uniquely determined if the initial conditions are given. In particular, the steady state of equation (2) can be uniquely determined. Denote by the steady state of .

Next, let represent the concentration of mRNA, which is a continuous variable, and let be the probability that the system is in state . Then, the chemical master equation corresponding to the above gene model can be expressed aswhere is an diagonal matrix describing transcription (here, we have assumed that there are transcription exits or ON states), is an diagonal matrix describing degradation (all will be assumed to be the same and the common degradation rate will be denoted by ), and is an -dimensional column vector. The total probability is . We point out that solving equation (3) and even its steady-state equation is usually difficult. Now, we give analytical approximations of factorial probabilities . In general, the switching rate of promoters is slower relative to the transcription rate in eukaryotes [27, 31], so we think that the time scale of promoter switching is slower than transcription in this paper; therefore, the probability distribution of mRNA can be obtained from the steady-state probability distribution of each state and summed with weight. So if the gene is only at OFF state , where , then the mRNA only has degradation without production, implying that the mRNA concentration follows an exponential distribution; then , where may be understood as a weight. If the gene is only at ON state , then the mRNA has both production and degradation, implying that the mRNA concentration follows a Poisson distribution. From a mathematical viewpoint, however, the Poisson distribution can be approximated by a normal distribution. Therefore, the steady-state probability distribution in ON state can be approximated aswhere is the mean, is the variance, and . Then, the total mRNA probability distribution can be approximated as

This explicit expression is in good accordance with the one obtained by the Gillespie stochastic simulation algorithm (Figure 2), we choose two-OFF and two-ON four-state model as representatives, and these parameters are all from the experimental data [31]. This implied that the above approximation is effective. In other words, the total probability density is equal to the sum of the individual probability densities at discrete states.

2.1.2. On Free Energy Consumption

Next, we will calculate the free energy consumption of this system; for this, we provide an effective method. First, we introduce several definitions; we define the entropy of this system as [28, 32] (i.e., the so-called Shannon entropy); then based on equation (1), the entropy generating rate (i.e., the derivative of with regard to time ) can be decomposed into (see Appendix A for details):where , that is, the so-called entropy flux rate, whereas , that is, the so-called entropy production rate, which is an exact measurement of free energy consumption of the underlying system. is also called the dissipation rate of free energy [23, 3234] and will be the interest of this paper. These concepts and results are general, but the key to obtaining the free energy consumption is how probability is obtained from equation (1) since this equation is frequently very difficult to solve. Thus, the quantity of our interest is also difficult to obtain. In the following, we will only consider the steady state.

In order to calculate the dissipation rate of free energy defined above, that is, , we define the state of the gene system as , which is an vector. Thus, every factorial probability depends, in principle, on state . Since the gene is only in one state at any moment, notation may be rewritten as where the only component of vector is not equal to zero. Unlike the traditional method that calculated directly using the expression of (i.e., in one-dimensional state space), we will calculate in the whole state space consisting of . For this, we first write the dissipation rate of free energy defined above as the following form [21, 23, 28]:where A and B represent the microscopic states of the underlying gene system in the -dimensional state space and represents the transition probability from state A to state B. Then, we decompose into two parts:where represents the free energy dissipation along the hyperplane in the -dimensional state space ( will be called the free energy dissipation of promoter), whereas represents the free energy dissipation along the -direction in the state space ( will be called the free energy dissipation of transcription). From the physical viewpoint, this decomposition seems to be reasonable and intuitive. Moreover, the decomposition has been verified by our numerical simulation (see the following sections). Therefore, we only need to calculate and separately.

Note that can be expressed aswhere the sum is over the finite states since the promoter states are finite if the factorial probabilities () are known. Moreover, the term can be directly given based on the transition between promoter states; for example, for the transition module of the form , we have . For factorial probability , we will give a physically intuitive yet effective method to estimate .

On the other hand, can be expressed aswhere represents the free energy consumption of transcription when the gene is in state . Since we have assumed that the promoter has only transcription exits, the only s would not be equal to zero whereas the other s are all equal to zero.

Before doing the calculation, we make the following preparations. For variable , we have the master equation.

In order to calculate , we first write the following differential equations for continuous variable :

The steady state of is denoted by . Then, we can write the corresponding Fokker-Planck equation aswhere . Recall that for the Fokker-Planck equationthe corresponding free energy dissipation takes the form (see Appendix B of this paper or [35, 36])

In our case, equation (12) becomes

From equation (9a) for the free energy dissipation of promoter and equation (14) for the free energy dissipation of transcription , we can see that the key of calculating the free energy dissipation of the whole system is that probability must be known, where can be approximated as equation (5).

There are many factors that affect promoter and they lead to many promoter models. Due to the limitation of space, we choose four typical models that are more concerned; there are two kinds of promoter multistate model [3, 8, 30] and two kinds of transcription factor cooperative binding model [31]. Because the complexity of the promoter can be mapped into multistate model, people want to understand the significance of the multistates of promoter and cooperation between transcription factors from different aspects. In this paper, we want to study them from the new perspective of free energy consumption.

2.2. Case 1: One ON and Multiple OFFs

The promoter structure is schematically shown in Figure 3(a), where we assume that the promoter has one transcriptional active (ON) state and inactive (OFF) states, which all together form a loop. And there are transitions among these states. For convenience, we list all the reactions of the gene model as follows:where represents mRNA, are the transition rates between different states of the promoter, represents the transcription rate in ON states, and represents the degradation rate. These rates are assumed to be constants.

Let represent the probability that the gene dwells in ON state and represent the probability that the gene dwells in OFF state . For the promoter described in Figure 3(a), transcription matrix in equation (3) of the above section takes the formwhere elements in the empty place are zero. The steady-state equation corresponding to equation (3) is , where . Solving this algebraic equation combined with the conservative condition yields , where is a constant depending on transition rates and . Note that there is only one ON state, implying that . Therefore, we have only one transcription rate, denoted by . Thus, from equation (9), we obtain the mean of the stationary mRNA level given by .

According to the general method described in the above section, we know that the probability that the gene is in ON state can be approximated as , where and . The probabilities that the gene is in OFF states can be approximated as , where . Therefore, the total probability is given by

The variance of the probability distribution is given by

Thus, the expression of the mRNA noise and Fano factor is analytically expressed as

Next, we give the analytical expression of free energy dissipation of the whole system, . First, consider the free energy consumption of promoter, . Note that equation (19a) in the above section becomes

Using the expressions of and given above, we have

Therefore, we finally arrive atwhere and are two quantities depending on transition rates between promoter states. Quantity is exactly the heat consumption per unit time of the annular flow [28, 37] between promoter states.

Then, consider the free energy consumption of transcription, . Note that in equation (13) is given by . Also, note that the probability distribution is given by equation (17). According to equation (13), we thus have

2.3. Case 2: One OFF and Multiple ONs

Here, we consider another representative promoter structure, where the promoter has one OFF state and ON states (Figure 3(b)). All the biochemical reactions are listed as follows:where are the transition rates between different states of promoter, represents the transcription rate in ON state , and represents the degradation rate.

Let represent the probability that the gene dwells in OFF state and represent the probability that the gene dwells in ON state . For the promoter described in Figure 3(b), transcription matrix in equation (3) of Section 2.1 takes the formwhere elements in the empty place are zero. The steady-state equation corresponding to equation (3) is , where . Solving this algebraic equation combined with the conservative condition yields , where is a constant depending on transition rates , , and . Note that there is only one OFF state but there are transcription rates, denoted by . Thus, from equation (9), we obtain the mean of the stationary mRNA level given by .

According to the general method described in Section 2.1, we know that the probabilities that the gene is in ON states can be approximated as , where and . The probability that the gene is in OFF state can be approximated as , where . Therefore, the total probability is given by

The variance of the probability distribution is given by

Thus, the expression of the mRNA noise and Fano factor is analytically expressed as

Next, we give the analytical expression of free energy dissipation of the whole system, . First, consider the free energy consumption of promoter, . Note that equation (19a) in the above section becomes

Therefore, we finally arrive atwhere and are two quantities depending on transition rates between promoter states. Quantity is exactly the heat consumption per unit time of the annular flow [28, 37] between promoter states.

Then, consider the free energy consumption of transcription, . Note that the probability distribution is given by equation (17). According to equation (13), we thus have

2.4. Case 3: Dual Repression Model

Research [3840] shows that when there are multiple binding sites in the promoter sequence, the promoter will be controlled by multiple transcription factors and multiple enzymes, and some enzymes are difficult to be separated after binding, which is called cooperative binding. Cooperative binding often occurs, so what are the benefits of cooperative binding for organisms? Why do these enzymes cooperate with each other? We will give the answer from the perspective of favorable gene expression and energy saving. In this section, we take the cooperative binding of two inhibitors (dual repression) as an example to illustrate, as shown in Figure 3(a), where represents the binding rate, is the dissociation rate, is the transcription rate, and is the cooperation binding degree. According to the experimental data [38, 39], represents independent binding and represents cooperation binding.

Figure 4(a) is the special case of Figure 3(a), so some indexes of this system, such as mean, variance, noise, Fano factor, probability distribution, and energy consumption, are just to make n = 3 in formulas (15)−(23). We will not go into details here, so what are the biological functions of cooperative binding of repressors? In the third part of the paper, the explanation is given based on the numerical results.

2.5. Case 4: Dual Promotion Model

Research [27] shows that the promoter will be activated by multiple transcription factors and how does the cooperation of these activators affect gene expression? We will give the answer from the perspective of favorable gene expression and energy saving. Here, we take the cooperative binding of two activators (dual promotion) as an example, as shown in Figure 4(b), where is the binding rate, is the dissociation rate, and are the transcription rates, where , is the enhancement factor, is the degree of cooperative binding, according to the experimental data [31], represents independent binding, and represents cooperation binding.

Figure 4(b) is the special case of Figure 3(b), so some indexes of this system, such as mean, variance, noise, Fano factor, probability distribution, and energy consumption, are just to make n = 3 in formulas (24)−(31). We will not go into details here, so what are the biological functions of cooperative binding of activators? In the third part of the paper, the explanation is given based on the numerical results.

3. Numerical Results

In principle, the above analysis formula shows how different promoter structures affect gene expression (including mRNA distribution, average expression level, noise intensity, Fano factor, and free energy consumption), but these results are implicit and not direct. Here, we conduct numerical simulation to give intuitive results. Next, let us use the numerical value to see how the multistates of the promoter affect the free energy consumption and the significance of cooperative binding.

3.1. Free Energy Consumption of Multistate Promoters and Its Effect on Gene Expression

Promoter structure is complex. Promoter regulation involves many biochemical processes and interactions, such as splicing, chromatin remodeling, DNA methylation, nucleosome occupation, TATA box strength, transcription factor concentration, binding site number, and lncRNA regulation. We map this phenomenon into a multistate promoter model. From the perspective of evolutionism, organisms will choose the most favorable direction for their survival and development to evolve. What is the reason why the promoter structure of eukaryotes is much more complex than that of prokaryotes? We will give some explanation from the perspective of free energy consumption.

In Figure 3(a) of the multi-OFF promoter model, we set the transcription rate as , the degradation rate as , and the switching rate between promoter states as , . These parameters are all from the experimental data [31]. In Figure 3(b) of the multi-ON promoter model, we set the transcription rates asthe degradation rate as , and the switching rate between promoter states as , . These parameters are all from the experimental data [31]. Note that the switching rates of promoters will be affected by various factors and will change at any time. In this section, we focus on the influence of multiple promoter states on gene expression, so we think that the switching rates of promoters are the same, only considering the influence of multiple states.

According to the results shown in Figure 5, if the number of ON states increases, then the mean RNA level will increase, and the noise and Fano factor decrease, while if OFF states increases, then the mean RNA level will decrease and the noise increases, but Fano factor decreases. That is, no matter multi-ON model or multi-OFF model, as long as the promoter state increases, the Fano factor will decrease accordingly. Research [31] shows that Fano factor affects the cell variability, and smaller Fano factor can reduce the variability of cells, so we think that the multipromoter states can reduce the variability of cells.

According to the results shown in Figure 5, we can see that, from the total energy of the system, no matter the multi-ON model or the multi-OFF model, the more the promoter states are, the lower the free energy consumption is. Maybe this can explain why organisms choose complex promoters from another perspective. However, for average energy consumption (defined as the total energy consumption of the system divided by the mean of mRNA, i.e., the average energy required to produce an mRNA), the multi-ON model can reduce the average energy consumption, but the multi-OFF model can increase the average energy consumption. We all know that multi-OFF promoter states can result in the bursty generation of mRNA, while bursty gene expression often leads to phenotypic diversity, which makes organisms more adaptable to the environment. In other words, multi-OFF model makes the average energy consumption of the system increase, but it is accompanied by the realization of biological functions.

3.2. Free Energy Consumption of Dual Repression or Promotion and Its Effect on Gene Expression

Researches [27, 31, 38, 39] show that the promoter will be regulated by multiple transcription factors. These transcription factors are either suppressors or activators, and many transcription factors often cooperate with each other, so what are the benefits of the cooperation of these enzymes for gene expression? In this section, we analyze its biological function through numerical results from the perspective of favorable for gene expression and free energy consumption.

In Figure 3(a) of the dual repression promoter model, we set the transcription rate as , the degradation rate as , the binding rate of transcription factors as , and the dissociation rate of transcription factors as ; these parameters are all from the experimental data [27, 38, 39]. According to the results shown in Figure 6, the cooperative binding of dual repression ( represents cooperative binding, green line) can reduce the total free energy consumption but increase the average free energy consumption compared with the independent binding ( represents independent binding, red line). And the cooperative binding of double suppressors can significantly reduce the mean value but increase the noise and Fano factor.

In Figure 4(b) of the dual promotion model, we set the transcription rates as where, the degradation rate as the binding rate of transcription factors as , and the dissociation rate of transcription factors as ; these parameters are all from the experimental data [27, 38, 39]. From Figure 7, we observe that the cooperative binding of double activators ( for cooperative binding, yellow line) can reduce the total energy consumption and increase the average energy consumption compared with the independent binding ( for independent binding, red line). And the cooperative binding of double activators can significantly increase the mean value, but the noise and Fano factor also increase.

From Figures 6 and 7, we can conclude that the cooperative binding can amplify the effect; that is to say, the cooperative binding of a double suppressor can make the inhibition effect stronger, while the cooperative binding of double activator can make the promotion effect stronger, which can be seen from the mean comparison chart. No matter the kind of cooperative binding, cooperative binding can always reduce the total free energy consumption and increase the value of noise and Fano factor; in other words, cooperative binding can reduce the free energy consumption and increase the risk of cell mutation.

4. Discussions

All living organisms not only have the ability to collect information about their environment but also adjust their internal physiological states in response to environmental changes. This common property also includes single cells’ ability to respond to various possible changes in their environment by regulating their gene expression patterns. As a matter of fact, much of this regulation occurs at the transcription initiation level and is mediated by physical or chemical interactions between TFs and DNA sites, leading to many transcriptionally active and inactive states that form complex promoter structure. In addition, in prokaryotic and eukaryotic cells, the association and dissociation of most regulatory molecules (e.g., TFs) involve cooperation and competition with the other regulatory molecules bound to the DNA sites (or the promoter). Many important biological events occurring in gene expression such as DNA looping, chromatin open/closed state, and DNA methylation are also important factors impacting gene expression and are influenced by the regulatory molecules present on the promoter in a dynamic, highly combinatorial, and possibly energy-dependent manner. The combination of all these aspects also takes place in RNA polymerase recruitment and provides the promoter with various possible levels of transcriptional competency, far from the binary vision of all-or-nothing active and inactive genes.

In this paper, we have analyzed the dynamics of a single-gene promoter with an arbitrarily complex structure, focusing on the calculation of energetic cost (quantified by the energy consumption rate). Importantly, we have developed an analytical yet effective method to free energy consumed in gene transcription, which not only can greatly reduce computational complexity but also can provide an intuitive understanding of free energy dissipation in gene transcription. In particular, the derived formulas for the calculation of the energy consumption rate provide useful information on the global behavior of the underlying gene system.

Although our calculation framework is general, we do not consider the effect of feedback regulation on promoter kinetics and gene expression. However, feedback regulation is ubiquitous in gene regulatory networks. In our case, this corresponds to the case that every component of the promoter matrix describing promoter kinetics is a function of the system’s state. Correspondingly, the gene promoter may exhibit more complex behavior. How free energy consumption is calculated and the mechanism of how energy consumption affects gene expression remain unexplored. It would not be difficult to extend the computational method for free energy consumption proposed in this paper to the case of feedback regulation.

Finally, modern system biology has set gene networks to the front of the stage, expecting complexity arising from the interactions among many genes. Currently, it seems that more attention should be paid to single nodes of these networks since spontaneous stochastic kinetics of the promoters are an unneglectable, considerable source of complexity. The connection between network and expression behaviors is worth further study from the viewpoint of energy consumption.

Appendixes

A. The General Theory of Entropy Generation Rate

For a Markovian biochemical reaction network, let represent the transition rate for state to state in state space, and let be the probability that the system is in state at time . Then, the corresponding chemical master equation takes the formwhereis the probability flow. Using this flow, if we define the entropy flux rate as

then the generation rate of the entropy is given by

In fact, according to the definition of Shannon entropy, we know that the system’s entropy is (which should be rewritten as the form of integral if is a continuous variable). The derivative of with regard to yields

Since , we have

Using , we can derive

Note that

Since , we thus prove equation (A 3).

B. Entropy Generation Rate Theory Corresponding to Fokker‐Planck Equation

Consider the following Fokker-Planck equation:

The entropy of the system is

Differentiating both sides with regard to time yields

Note that equation (B.1) may be rewritten aswhere

Using equation (B.4), it follows from equation (B.3) that

Thus,where represents the generation rate of the entropy and represents the entropy flux rate.

Data Availability

The data used in the study are included in the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the Natural Science Foundation of Guangdong Province (2017A030310590) and National Natural Science Foundation of China (11701117, 11631005, and 11901114), Key Research Platform and Research Project of Universities in Guangdong Province (2017KQNCX081 and 2018KQNCX244), Guangzhou General Project of Science and Technology Innovation (20190401010), and the Opening Project of Guangdong Province Key Laboratory of Computational Science at the Sun Yat-sen University (2018001).