Advances in Bioinformatics
Volume 2010 (2010), Article ID 749848, 17 pages
doi:10.1155/2010/749848
Research Article

Modelling Nonstationary Gene Regulatory Processes

1Department of Statistics, TU Dortmund University, 44221 Dortmund, Germany
2Biomathematics and Statistics Scotland, JCMB, The King's Buildings, Edinburgh EH93JZ, UK

Received 18 December 2009; Accepted 29 April 2010

Academic Editor: Yves Van de Peer

Copyright © 2010 Marco Grzegorcyzk et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

An important objective in systems biology is to infer gene regulatory networks from postgenomic data, and dynamic Bayesian networks have been widely applied as a popular tool to this end. The standard approach for nondiscretised data is restricted to a linear model and a homogeneous Markov chain. Recently, various generalisations based on changepoint processes and free allocation mixture models have been proposed. The former aim to relax the homogeneity assumption, whereas the latter are more flexible and, in principle, more adequate for modelling nonlinear processes. In our paper, we compare both paradigms and discuss theoretical shortcomings of the latter approach. We show that a model based on the changepoint process yields systematically better results than the free allocation model when inferring nonstationary gene regulatory processes from simulated gene expression time series. We further cross-compare the performance of both models on three biological systems: macrophages challenged with viral infection, circadian regulation in Arabidopsis thaliana, and morphogenesis in Drosophila melanogaster.

1. Introduction

An important objective in systems biology is to infer regulatory networks from postgenomic data. Bayesian networks have been widely applied as a popular tool to this end—see, for example, [1]—and novel fast Markov chain Monte Carlo (MCMC) algorithms can be applied to systematically search the space of network structures for those that are most consistent with the data [2, 3]. One reason for the popularity of Bayesian networks has been the tractability of the marginal likelihood of the network structure. This term describes how well the model structure explains the data. Its computation is usually challenging, as it requires the solution of an integral over the entire parameter space. To obtain a closed-form expression, two probabilistic models with their respective conjugate prior distributions have been employed: the multinomial distribution with the Dirichlet prior (BDe) and the Gaussian distribution with the normal-Wishart prior (BGe). These approaches are restricted in that they either require a data discretisation (BDe: Bayesian Dirichlet equivalence score) or can only capture linear relationships (BGe: Bayesian Gaussian equivalence score). Recently, a generalisation of the BGe model based on a combination of a mixture model and the allocation sampler has been proposed [4], the Bayesian Gaussian Mixture (BGM) model. In the BGM model, data points are assigned to different compartments (subsets of the data) with the allocation sampler [5]. Model parameters (and their distributions) are allowed to differ between compartments, while information is shared among the compartments via a common network structure. Given the network structure each compartment is modelled separately and independently with the Gaussian BGe model.

The present work proposes a modification of the BGM model, which is more suitable for dynamic gene regulatory networks and gene expression time series. The proposed model, which we refer to as the B G M 𝐷 model, replaces the free allocation model by a multiple changepoint process, as, for example, employed in [6], to divide the time series into segments and thereby take the time structure into account.

In a preliminary study [7], we focused on synthetic data from small network domains. We found that the new model avoids spurious self-feedback loops if feedback loops, such as 𝑋 ( 𝑡 ) 𝑋 ( 𝑡 + 1 ) (e.g., in molecular biology: transcription factors regulating their own transcription), are not excluded from the analysis. In this paper, we present the mathematical details of the new B G M 𝐷 model in greater depth, and we demonstrate that B G M 𝐷 yields better inference results for real biological systems. To this end, we analyse two gene expression time series from macrophages and Arabidopsis thaliana and cross-compare the inference results of the BGM model and the new changepoint process B G M 𝐷 model.

Furthermore, we combine both models (the free allocation model and the changepoint process) with discrete Bayesian network methodology and we compare the performance of both models on inferring the morphogenetical stages of muscle development in Drosophila melanogaster from binary gene expression time series.

We note that our modelling paradigm is complementary to other recently proposed approaches. A nonlinear nondiscretised model based on heteroscedastic regression has been proposed in [8]. However, this approach no longer allows the marginal likelihood to be obtained in closed-form and requires a restrictive approximation (the Laplace approximation, that is, an approximation based on a 2nd order Taylor series approximation of the log likelihood) to be adopted. Another nonlinear model based on node-specific Gaussian mixture models has been proposed in [9, 10]. Again, the marginal likelihood is intractable. The authors resort to the Bayesian information criterion (BIC) of [11] for model selection, which is only a good approximation to the marginal likelihood in the limit of very large data sets. A more flexible approach based on changing graphs between changepoints has been proposed in [12, 13]. Conceptually, the assumption of changing networks is reasonable for some biological scenarios, like morphogenesis, where gene-regulatory processes have been measured over a long period of time. However, for cellular processes on a shorter time scale, it is questionable whether it is the network structure rather than just the strength of the regulatory interactions that changes with time. For example, it is not the principle ability of a transcription factor to potentially bind to the promoter of a gene and thereby initiate transcription (i.e., not the network structure), but the extent to which this happens (i.e., the regulatory interaction strength). We therefore argue that, especially for short time series, it is more appropriate to leave the network structure invariant among segments and to allow the interaction parameters to vary with time by modelling the time segments (between changepoints) separately and independently with the Gaussian BGe model. The systematic sharing of information among segments via a common network structure also reduces potential problems with model overflexibility and overfitting, which a more flexible approach that allows for different network structures is susceptible to when the time series are short.

2. Methods

2.1. The Dynamical BGe Network

Dynamical Bayesian networks (DBNs) are flexible models for representing probabilistic relationships among interacting variables 𝑋 1 , , 𝑋 𝑁 . The graph 𝒢 of a DBN describes the relationships between the variables (nodes), which have been measured at equidistant time points 𝑡 = 1 , , 𝑚 , in the form of conditional probability distributions. An edge pointing from 𝑋 𝑖 to 𝑋 𝑗 means that the realisation of 𝑋 𝑗 at time point 𝑡 , symbolically: 𝑋 𝑗 ( 𝑡 ) , is influenced by the realisation of 𝑋 𝑖 at time point 𝑡 1 , symbolically: 𝑋 𝑖 ( 𝑡 1 ) . Effectively there is a bipartite graph structure between two time steps 𝑡 and 𝑡 + 1 so that the acyclicity constraint—known from static Bayesian networks—is guaranteed to be satisfied. Therefore, in principle, each node can be its own parent node in DBNs. Such self-feedback loops 𝑋 𝑛 ( 𝑡 1 ) 𝑋 𝑛 ( 𝑡 ) model autocorrelations and it depends on the application whether such self-feedback loops should be valid edges or ruled out altogether to focus the inference on the interactions between genes. 𝜋 𝑛 = 𝜋 𝑛 ( 𝒢 ) denotes the parent (node) set of node 𝑋 𝑛 in 𝒢 ; that is, the set of all nodes from which an edge points to node 𝑋 𝑛 in 𝒢 . Given a data set 𝐃 , where 𝐃 𝑛 , 𝑡 and 𝐃 ( 𝜋 𝑛 , 𝑡 ) are the 𝑡 th realisations 𝑋 𝑛 ( 𝑡 ) and 𝜋 𝑛 ( 𝑡 ) of 𝑋 𝑛 and 𝜋 𝑛 , respectively, DBNs are based on the following homogeneous Markov chain expansion: = 𝑃 ( 𝐃 𝒢 , 𝜽 ) 𝑁 𝑚 𝑛 = 1 𝑡 = 2 𝑃 𝑋 𝑛 ( 𝑡 ) = 𝐃 𝑛 , 𝑡 𝜋 𝑛 ( 𝑡 1 ) = 𝐃 ( 𝜋 𝑛 , 𝑡 1 ) , 𝜽 𝑛 , ( 1 ) where 𝜽 is the total parameter vector, composed of subvectors 𝜽 𝑛 . 𝜽 𝑛 specifies the 𝑛 th local conditional distributions 𝑃 ( 𝑋 𝑛 ( 𝑡 ) 𝜋 𝑛 ( 𝑡 1 ) , 𝜽 𝑛 ) in the factorisation. The BGe model specifies the distributional form 𝑃 ( 𝐃 𝒢 , 𝜽 ) as multivariate Gaussian distribution, and it assumes a normal-Wishart distribution as prior distribution 𝑃 ( 𝜽 | 𝒢 ) [14]. The local probability distributions 𝑃 ( 𝑋 𝑛 ( 𝑡 ) | 𝜋 𝑛 ( 𝑡 1 ) , 𝜽 𝑛 ) are then given by conditional linear Gaussian distributions. For discrete data 𝐃 the BDe model specifies the distributional form of the likelihood 𝑃 ( 𝐷 | 𝒢 , 𝜽 ) as a set of multinomial distributions, and assumes a Dirichlet distribution as the prior 𝑃 ( 𝜽 𝒢 ) for the unknown parameters [15]. In principle, the BDe model yields a higher modelling flexibility than the BGe model, but BDe requires a data discretisation that usually incurs a substantial information loss. Under fairly weak conditions imposed on the parameter vector (prior independence and modularity) and the prior distribution 𝑃 ( 𝜽 ) (conjugacy), so that 𝑃 ( 𝜽 | 𝒢 ) = 𝑁 𝑛 = 1 𝑃 ( 𝜽 𝑛 | 𝜋 𝑛 ) , where 𝜋 𝑛 = 𝜋 𝑛 ( 𝒢 ) , the parameters can be integrated out analytically. For both scoring metrics BGe and BDe [16], the marginal likelihood then satisfies the same expansion rule as the Bayesian network with fixed parameters: 𝑃 ( 𝐃 𝒢 ) = 𝑃 ( 𝐃 𝒢 , 𝜽 ) 𝑃 ( 𝜽 𝒢 ) 𝑑 𝜽 = 𝑁 𝑛 = 1 Ψ 𝐃 𝜋 𝑛 𝑛 , ( 2 ) where Ψ 𝐃 𝜋 𝑛 𝑛 = 𝑚 𝑡 = 2 𝑃 𝑋 𝑛 ( 𝑡 ) = 𝐃 𝑛 , 𝑡 𝜋 𝑛 ( 𝑡 1 ) = 𝐃 ( 𝜋 𝑛 , 𝑡 1 ) , 𝜽 𝑛 𝜃 × 𝑃 𝑛 𝜋 𝑛 𝑑 𝜽 𝑛 , ( 3 ) and 𝐃 𝜋 𝑛 𝑛 = { ( 𝐃 𝑛 , 𝑡 , 𝐃 𝜋 𝑛 , 𝑡 1 ) 2 𝑡 𝑚 } denotes the subset of the data pertaining to node 𝑋 𝑛 and its parent set 𝜋 𝑛 . For the Gaussian BGe model the (local) factors Ψ ( 𝐃 𝜋 𝑛 𝑛 ) in (3) can be computed in closed-form, according to (15) and (24) in Geiger and Heckerman [14]. For the discrete multinomial BDe model, the (local) factors can be computed in closed form according to (19) in [16].

2.2. Gaussian Mixture Bayesian Network Model

In the Gaussian BGe model, the local distributions 𝑃 ( 𝑋 𝑛 ( 𝑡 ) | 𝜋 𝑛 ( 𝑡 1 ) , 𝜽 𝑛 ) are conditional linear Gaussian distributions. We generalise the BGe model by the introduction of a latent allocation vector 𝐕 , which assigns the data points to 𝐾 different mixture components, where 𝐾 is inferred from the data by applying changepoint birth and death moves, along the lines of the changepoint model (e.g., see [6, 17]). As in the BGM model [4], conditional on the latent vector 𝐕 , a separate BGe score can be computed for each of the 𝐾 mixture components.

The allocation vector 𝐕 of size 𝑚 1 describes the allocation of the time points 𝑡 = 2 , , 𝑚 to the 𝐾 components. 𝐃 ( 𝑉 , 𝑘 ) denotes all time points that are allocated to component 𝑘 . The posterior probability of 𝒢 , 𝐕 , and 𝐾 is proportional to the joint distribution 𝑃 ( 𝒢 , 𝐕 , 𝐾 𝐃 ) = 𝑃 ( 𝒢 , 𝐕 , 𝐾 , 𝐃 ) 𝑃 ( 𝐃 ) 𝑃 ( 𝒢 , 𝐕 , 𝐾 , 𝐃 ) , ( 4 ) and the joint distribution can be factorised as follows: 𝑃 ( 𝒢 , 𝐕 , 𝐾 , 𝐃 ) = 𝑃 ( 𝐾 ) 𝑃 ( 𝐕 𝐾 ) 𝑃 ( 𝒢 ) 𝑃 ( 𝐃 𝒢 , 𝐕 , 𝐾 ) , ( 5 ) where 𝑃 ( 𝐃 𝒢 , 𝐕 , 𝐾 ) = 𝐾 𝑘 = 1 𝑃 𝐃 ( 𝐕 , 𝑘 ) = 𝒢 𝐾 𝑁 𝑘 = 1 𝑛 = 1 Ψ 𝐃 ( 𝐕 , 𝑘 ) , 𝜋 𝑛 𝑛 , ( 6 ) and 𝐃 ( 𝑉 , 𝑘 ) , 𝜋 𝑛 𝑛 = { ( 𝐃 𝑛 , 𝑡 , 𝐃 𝜋 𝑛 , 𝑡 1 ) 𝑡 { 2 , , 𝑚 } 𝐕 ( 𝑡 ) = 𝑘 } denotes the realisations of node 𝑋 𝑛 and its parent set 𝜋 𝑛 for those time points that have been allocated to the 𝑘 th component. It can be seen from these equations that 𝐕 acts as a filter which divides the data 𝐃 into 𝐾 different compartments 𝐃 ( 𝐕 , 𝑘 ) ( 𝑘 = 1 , , 𝐾 ), for which separate independent BGe scores can be computed in closed-form using (2) and (3). The BGM counterpart of (3) is given by: Ψ 𝐃 ( 𝐕 , 𝑘 ) , 𝜋 𝑛 𝑛 = 𝑡 𝐕 ( 𝑡 ) = 𝑘 𝑃 𝑋 𝑛 ( 𝑡 ) = 𝐃 𝑛 , 𝑡 𝜋 𝑛 ( 𝑡 1 ) = 𝐃 ( 𝜋 𝑛 , 𝑡 1 ) , 𝜽 𝑛 𝜽 × 𝑃 𝑛 𝜋 𝑛 𝑑 𝜽 𝑛 . ( 7 ) For instance, if we have 𝑚 = 1 1 time points and one changepoint between 𝑡 6 and 𝑡 7 so that 𝐕 assigns the time points 𝑡 2 , , 𝑡 6 to the first and the remaining time points 𝑡 7 , , 𝑡 1 1 to the second segment, then separate local BGe scores are computed for the data subsets 𝐃 ( 𝐕 , 1 ) , 𝜋 𝑛 𝑛 = { ( 𝐃 𝑛 , 𝑡 , 𝐃 𝜋 𝑛 , 𝑡 1 ) | 2 𝑡 6 } and 𝐃 ( 𝐕 , 2 ) , 𝜋 𝑛 𝑛 = { ( 𝐃 𝑛 , 𝑡 , 𝐃 𝜋 𝑛 , 𝑡 1 ) 7 𝑡 1 1 } , according to (15) and (24) in Geiger and Heckerman [14].

When a data compartment 𝐃 ( 𝐕 , 𝑘 ) is empty, then we set the factors Ψ ( 𝐃 ( 𝐕 , 𝑘 ) , 𝜋 𝑛 𝑛 ) equal to 1 ( 𝑛 = 1 , , 𝑁 ). For 𝑃 ( 𝒢 ) , we take a uniform distribution over all graph structures subject to a fan-in restriction of | 𝜋 𝑛 | . For 𝑃 ( 𝐾 ) , we take a truncated Poisson distribution with 𝜆 = 1 restricted to 1 𝐾 𝐾 as prior. We note that the MCMC inference scheme, which we will discuss in Section 2.3, does not sample 𝐕 directly, but is based on local modifications of 𝐕 based on changepoint birth, death, and reallocation moves. That is, different from the free allocation in the BGM model [4], we change the assignment of data points to components via a changepoint process (e.g., see [6, 17]). This reduces the complexity of the allocation space substantially and incorporates our prior knowledge that adjacent time points are likely to be assigned to the same component. We will refer to the new changepoint BGM model as the B G M 𝐷 model.

We identify 𝐾 with 𝐾 1 changepoints: 𝑏 1 , , 𝑏 𝐾 1 on the continuous interval [ 2 , 𝑚 ] , and for notational convenience we introduce the pseudo-changepoints 𝑏 0 = 2 and 𝑏 𝐾 = 𝑚 . The observation at time point 𝑡 is assigned to the 𝑘 th component, symbolically 𝐕 ( 𝑡 ) = 𝑘 , if 𝑏 𝑘 1 𝑡 < 𝑏 𝑘 . Following Green [17], we assume that the changepoints are distributed as the even-numbered order statistics of 𝐿 = 2 ( 𝐾 1 ) + 1 points 𝑢 1 , , 𝑢 𝐿 uniformly and independently distributed on the interval [ 2 , 𝑚 ] . The motivation for this prior, instead of taking 𝐾 1 uniformly distributed points, is to encourage a priori an equal spacing between the changepoints, that is, to discourage mixture components that contain only a short compartment of the time series. We note that the even-numbered order statistics prior on the changepoint locations induces a prior distribution on the allocation vector 𝐕 . Deriving a closed-form expression is involved but not required, as our MCMC inference scheme does not sample 𝐕 directly. The inference is based on local modifications of 𝐕 . In particular, we employ three different move types: (i) an additional component ( 𝐾 𝐾 + 1 ) can be generated by setting a new changepoint on the interval [ 2 , 𝑚 ] (changepoint birth move), (ii) the number of components can be decremented ( 𝐾 𝐾 1 ) by removing one of the existing changepoints (changepoint death move), and finally, (iii) the allocation vector 𝐕 can be changed without affecting the number of components 𝐾 by changing the position of one of the existing changepoints (changepoint reallocation move). For the acceptance probabilities of these local moves, only 𝑃 ( 𝐕 𝐾 ) ratios, which are straightforward to compute, are required.

2.3. Multinomial Mixture Bayesian Network Model

The discrete multinomial BDe model can be generalised in analogy to the continuous Gaussian Bge model. As before, the allocation vector 𝐕 divides the data into 𝐾 compartments, and each compartment of the data is modelled separately and independently with the multinomial BDe scoring metric. We note that closed-form solutions for the Ψ ( ) -terms in (7) can be obtained with (19) in [16]. We will refer to the free individual allocation model BGM with the BGe score being replaced by the discrete multinomial BDe score as the Bayesian Discrete Mixture (BDM) model, and accordingly we will refer to the discrete BDe counterpart of B G M 𝐷 as the B D M 𝐷 model. We note that the B D M 𝐷 model is similar to the 𝑛 𝑠 𝐵 𝐷 𝑒 model of Robinson and Hartemink [12], except that B D M 𝐷 leaves the inferred network structures invariant in time to allow for more information sharing among segments. We will employ the discrete counterparts BDM and B D M 𝐷 for an independent comparison of the suitability of the free allocation model and the changepoint process for inferring dynamic gene-regulatory processes (from discrete data).

2.4. MCMC Inference

We now describe an MCMC inference algorithm that can be used to obtain a sample { 𝒢 𝑖 , 𝐕 𝑖 , 𝐾 𝑖 } 𝑖 = 1 , , 𝐼 from the posterior distribution 𝑃 ( 𝒢 , 𝐕 , 𝐾 | 𝐃 ) . Our algorithm combines the structure MCMC algorithm for Bayesian networks [18] with the changepoint model (e.g., see [6, 17]), and draws on the fact that conditional on the allocation vector 𝐕 , separate BGe scores 𝑃 ( 𝐃 ( 𝐕 , 𝑘 ) | 𝒢 ) can be computed for the 𝐾 data compartments. Note that this approach is equivalent to the idea underlying the allocation sampler [5]. The resulting algorithm is effectively an RJMCMC sampling scheme in the discrete space of network structures and latent allocation vectors, where the Jacobian in the acceptance criterion is always 1 and can be omitted. With probability 𝑃 = . 5 , we perform a traditional structure MCMC move on the current graph 𝒢 𝑖 and leave the latent vector 𝐕 and the number of mixture components 𝐾 unchanged, symbolically: 𝐕 𝑖 + 1 = 𝐕 𝑖 and 𝐾 𝑖 + 1 = 𝐾 𝑖 . A new candidate graph 𝒢 𝑖 + 1 is randomly drawn out of the set of graphs 𝒩 ( 𝒢 𝑖 ) that can be reached from the current graph 𝒢 𝑖 by deletion or addition of one single edge. The proposed graph 𝒢 𝑖 + 1 is accepted with probability 𝐴 𝒢 𝑖 + 1 𝒢 𝑖 𝑃 = m i n 1 , 𝐃 𝒢 𝑖 + 1 , 𝐕 𝑖 , 𝐾 𝑖 𝑃 𝐃 𝒢 𝑖 , 𝐕 𝑖 , 𝐾 𝑖 𝑃 𝒢 𝑖 + 1 𝑃 𝒢 𝑖 | | 𝒩 𝒢 𝑖 | | | | 𝒩 𝒢 𝑖 + 1 | | , ( 8 ) where | | is the cardinality, and the marginal likelihood terms have been specified in (6). The graph is left unchanged, symbolically 𝒢 𝑖 + 1 = 𝒢 𝑖 , if the move is not accepted. We note that the network reconstruction will be based on the marginal posterior probabilities of the individual edges, which can be estimated for each edge from the MCMC sample 𝒢 1 , , 𝒢 𝐼 by the fraction of graphs in the sample that contain the edge of interest 𝐸 𝑖 , 𝑗 = 1 𝐼 𝐼 𝑘 = 1 𝐼 𝑖 , 𝑗 𝒢 𝑘 , ( 9 ) where 𝐼 𝑖 , 𝑗 ( ) is the indicator function with 𝐼 𝑖 , 𝑗 ( 𝒢 𝑘 ) = 1 if there is an edge from 𝑋 𝑖 to 𝑋 𝑗 in 𝒢 𝑘 .

With the complementary probability 1 𝑃 we leave the graph unchanged: 𝒢 𝑖 + 1 = 𝒢 𝑖 , and perform a move on ( 𝐕 𝑖 , 𝐾 𝑖 ) . We change the current number of components 𝐾 𝑖 via a changepoint birth or death move, or the allocation vector 𝐕 𝑖 by a changepoint reallocation move along the lines of the Reversible Jump Markov chain Monte Carlo algorithm (RJMCMC) algorithm [17].

The changepoint birth (death) move increases (decreases) 𝐾 𝑖 by 1 and may also have an effect on 𝐕 𝑖 . The changepoint reallocation move leaves 𝐾 𝑖 unchanged and may have an effect on 𝐕 𝑖 . If with probability ( 1 𝑃 ) a changepoint move on ( 𝐾 𝑖 , 𝐕 𝑖 ) is performed, we randomly draw the move type. Under fairly mild regularity conditions (ergodicity), the MCMC sampling scheme converges to the desired posterior distribution if the equation of detailed balance is fulfilled [17]. The condition of detailed balance implies that for each move a complementary move is defined, and that the acceptance probability depends on the proposal probability of the complementary move. The moves presented below are designed such that there is a unique complementary death move for each birth move and vice versa. Moreover, each reallocation move can be reversed by a single (complementary) reallocation move. The acceptance probabilities for these three changepoint moves ( 𝐾 𝑖 , 𝐕 𝑖 ) ( 𝐾 𝑖 + 1 , 𝐕 𝑖 + 1 ) are of the following form [17]: 𝑃 𝐴 = m i n 1 , 𝐃 𝒢 𝑖 , 𝐕 𝑖 + 1 , 𝐾 𝑖 + 1 𝑃 𝐃 𝒢 𝑖 , 𝐕 𝑖 , 𝐾 𝑖 × 𝑅 × 𝐵 , ( 1 0 ) where 𝑅 = 𝑃 ( 𝐕 𝑖 + 1 | K 𝑖 + 1 ) 𝑃 ( 𝐾 𝑖 + 1 ) / 𝑃 ( 𝐕 𝑖 | 𝐾 𝑖 ) 𝑃 ( 𝐾 𝑖 ) is the prior probability ratio, and 𝐵 is the inverse proposal probability ratio. The exact form of these factors depends on the move type.

(i) For a changepoint reallocation ( 𝑟 ), we randomly select one of the existing changepoints 𝑏 𝑗 { 𝑏 1 , , 𝑏 𝐾 1 } , and the replacement value 𝑏 𝑗 is drawn from a uniform distribution on [ 𝑏 𝑗 1 , 𝑏 𝑗 + 1 ] where 𝑏 0 = 2 and 𝑏 𝐾 = 𝑚 . Hence, the proposal probability ratio is one, the prior probabilities 𝑃 ( 𝐾 𝑖 + 1 ) = 𝑃 ( 𝐾 𝑖 ) cancel out, and the remaining prior probability ratio 𝑃 ( 𝐕 𝑖 + 1 | 𝐾 𝑖 + 1 ) / 𝑃 ( 𝐕 𝑖 | 𝐾 𝑖 ) can be obtained from page 720 in Green's RJMCMC paper [17] 𝑅 𝑟 = 𝑏 𝑗 + 1 𝑏 𝑗 𝑏 𝑗 𝑏 𝑗 1 𝑏 𝑗 + 1 𝑏 𝑗 𝑏 𝑗 𝑏 𝑗 1 , 𝐵 𝑟 = 1 . ( 1 1 ) If there is no changepoint ( 𝐾 𝑖 = 1 ) the move is rejected and the Markov chain is left unchanged.

(ii) If a changepoint birth move ( 𝑏 ) on 𝐾 𝑖 is proposed, the location of the new changepoint 𝑏 is randomly drawn from a uniform distribution on the interval [ 2 , 𝑚 ] ; the proposal probability for this move is 𝑏 𝐾 𝑖 / ( 𝑚 2 ) , where 𝑏 𝐾 𝑖 is the ( 𝐾 𝑖 -dependent) probability of selecting a birth move. The reverse death move, which is selected with probability 𝑑 ( 𝐾 𝑖 + 1 ) , consists in discarding randomly one of the ( 𝐾 𝑖 1 ) + 1 = 𝐾 𝑖 changepoints. The inverse proposal probability ratio is thus given by 𝐵 = 𝑑 ( 𝐾 𝑖 + 1 ) ( 𝑚 2 ) / ( 𝑏 𝐾 𝑖 𝐾 𝑖 ) . The prior probability ratio is given by the expression at the bottom of page 720 in Green's RJMCMC paper [17] slightly modified to allow for the fact that 𝐾 components correspond to 𝐾 1 changepoints, and we obtain 𝑅 𝑏 = 𝑃 𝐾 𝑖 + 1 𝑃 𝐾 𝑖 2 𝐾 𝑖 2 𝐾 𝑖 + 1 ( 𝑚 2 ) 2 𝑏 𝑗 + 1 𝑏 𝑏 𝑏 𝑗 𝑏 𝑗 + 1 𝑏 𝑗 , 𝐵 𝑏 = 𝑑 ( 𝐾 𝑖 + 1 ) ( 𝑚 2 ) 𝑏 𝐾 𝑖 𝐾 𝑖 . ( 1 2 ) For 𝐾 𝑖 = 𝐾 the birth of a new changepoint is invalid and the Markov chain is left unchanged.

We note that the proposal probabilities 𝑏 𝐾 and 𝑑 ( 𝐾 + 1 ) for birth and death moves can be chosen as follows: 𝑏 𝐾 = 𝑐 m i n 1 , 𝑃 ( 𝐾 + 1 ) , 𝑑 𝑃 ( 𝐾 ) ( 𝐾 + 1 ) = 𝑐 m i n 1 , 𝑃 ( 𝐾 ) , 𝑃 ( 𝐾 + 1 ) ( 1 3 ) where 𝑐 is a constant that can be chosen as large as possible subject to the constraint 𝑏 𝐾 + 𝑑 𝐾 0 . 9 for 𝐾 = 1 , , 𝐾 . This choice yields both a certain acceptance rate of the MCMC sampling scheme [17] and a simple prior probability (Hastings) ratio. From 𝑏 𝐾 𝑖 𝑃 ( 𝐾 𝑖 ) = 𝑑 ( 𝐾 𝑖 + 1 ) 𝑃 ( 𝐾 𝑖 + 1 ) it follows that the ratio 𝑑 ( 𝐾 𝑖 + 1 ) / 𝑏 𝐾 𝑖 in the expression 𝑅 𝑏 cancels out against the prior ratio 𝑃 ( 𝐾 𝑖 + 1 ) / 𝑃 ( 𝐾 𝑖 ) in the expression 𝐵 𝑏 , and the prior probability ratio simplifies to 𝑅 𝑏 𝐵 𝑏 = 2 2 𝐾 𝑖 + 1 𝑏 ( 𝑚 2 ) 𝑗 + 1 𝑏 𝑏 𝑏 𝑗 𝑏 𝑗 + 1 𝑏 𝑗 . ( 1 4 )

(iii) A changepoint death move ( 𝑑 ) is the reverse of the birth move, and we obtain 𝑅 𝑑 𝐵 𝑑 = ( 𝑚 2 ) 2 2 𝐾 𝑖 𝑏 1 𝑗 + 1 𝑏 𝑗 𝑏 𝑗 + 1 𝑏 𝑏 𝑏 𝑗 . ( 1 5 )

3. Data

We have evaluated the proposed B G M 𝐷 model on various synthetic data sets. For illustration purposes, we present results obtained for two studies with small networks where the nonlinearity was implemented by a sinusoidal transformation. In a second study, we focus on the two-gene expression time series from macrophages and Arabidopsis thaliana, which have been used for evaluating the BGM model. More details on the experimental settings can be found in the paper on BGM [4]. In a third study, we switch from the Gaussian BGe score to the discrete multinomial BDe score to infer discrete nonstationary time series from Drosophila melanogaster.

3.1. Small Synthetic Networks

The first synthetic network consists of two nodes 𝑋 and 𝑌 and possesses two edges. Node 𝑋 has a recurrent feedback loop, symbolically 𝑋 ( 𝑡 ) 𝑋 ( 𝑡 + 1 ) , and 𝑋 acts as a regulator of node 𝑌 , symbolically 𝑋 ( 𝑡 ) 𝑌 ( 𝑡 + 1 ) . We consider the scenario of a nonlinear regulatory influence that 𝑋 exerts on 𝑌 , whereby we implement the nonlinearity by a sinusoid transformation of 𝑋 ( 𝑡 ) 's signal on 𝑌 ( 𝑡 + 1 ) . The state-space equations are given by 𝑋 ( 𝑡 ) = 𝑋 ( 𝑡 1 ) + 𝑐 + 𝑐 𝑋 𝜀 𝑋 𝑌 ( 𝑡 ) , ( 𝑡 ) = s i n ( 𝑋 ( 𝑡 1 ) ) + 𝑐 𝑌 𝜀 𝑌 ( 𝑡 ) , ( 1 6 ) where 𝑐 , 𝑐 𝑋 , and 𝑐 𝑌 are constants, 𝜀 𝑋 ( . ) , 𝜀 𝑌 ( . ) are iid normally distributed random variables.

The second synthetic network is a generalisation of the two node domain where three nodes 𝑌 1 , 𝑌 2 , and 𝑌 3 are regulated by 𝑋 . The relationships are again realised by sinusoids, whereby we shift the periods: 𝑌 𝑖 ( 𝑡 ) = s i n ( 𝑋 ( 𝑡 1 ) + 𝜏 𝑖 𝜋 ) + 𝑐 𝑌 𝜀 𝑌 , 𝑖 ( 𝑡 ) with 𝜏 1 = 0 , 𝜏 2 = 2 / 3 , and 𝜏 3 = 4 / 3 . For both networks we set the drift term 𝑐 = 2 𝜋 / 4 0 to ensure that (on average) the complete period [ 0 , 2 𝜋 ] of the sinusoid is involved, and we generate 𝑚 = 4 1 observations for four different combinations of the coefficients 𝑐 𝑋 and 𝑐 𝑌 .

3.2. Bone Marrow Derived Macrophages

Interferons (IFNs) play a pivotal role in the innate and adaptive mammalian immune response against infection, and central research efforts therefore aim to elucidate their regulatory interactions [19]. For the present study, we have analysed gene expression time series from bone marrow-derived macrophages, which were sampled at 𝑚 = 2 5 × 3 0 minute time intervals. The macrophages were subjected to three external conditions: ( 1 ) infection with Cytomegalovirus (CMV), ( 2 ) treatment with Interferon Gamma (IFN 𝛾 ), and ( 3 ) infection with Cytomegalovirus after pretreatment with IFN 𝛾 (CMV+IFN 𝛾 ). Samples derived from the macrophages were hybridised to Agilent mouse genome arrays. We focus on 𝑁 = 3 Interferon-regulatory factors Irf1, Irf2, and Irf3. These factors are the key regulators in the response of the macrophage cell to pathogens. The macrophages data sets used in the study are publicly available from http://www.bioss.ac.uk/associates/marco/supplement/.

3.3. Circadian Regulation in Arabidopsis thaliana

We have also analysed two-gene expression time series from Arabidopsis thaliana cells, which were sampled at 𝑚 = 13 × 2 hour time intervals with Affymetrix microarray chips. The expressions were measured twice independently under experimentally generated constant light condition, but differed with respect to the prehistories. In the first experimental scenario, 𝑇 2 0 , the plants were entrained in a 10 h : 10 h light/dark-cycle, while the plants in the second experimental setting, 𝑇 2 8 , were entrained in 14 h : 14 h light/dark-cycle. The analysis focuses on 𝑁 = 9 genes, namely LHY, CCA1, TOC1, ELF4, ELF3, GI, PRR9, PRR5, and PRR3, which are known to be involved in circadian regulation [20, 21]. The Arabidopsis data sets used in the study are publicly available from http://www.bioss.ac.uk/associates/marco/supplement/.

3.4. Muscle Development in Drosophila melanogaster

The gene expressions in Drosophila melanogaster cells were sampled at 𝑚 = 6 7 time-steps during four different morphological stages of life: embryonic, larval, pupal, and adult stages. Since these phases cover time periods of different lengths, gene expression profiles were collected at nonequidistant time-points. The true morphological transitions occur at time points 𝑡 3 2 (embryonic to larval), 𝑡 4 2 (larval to pupal), and 𝑡 6 0 (pupal to adult) [22]. Like other researchers [12], we focus our analysis on 𝑁 = 1 3 genes involved in muscle growth and muscle development: EVE, GFL, TWI, MLC1, SLS, MHC, PRM, ACTN, UP, MYO61F, and MSP300. The quantile-discretised binary data set of these 13 Drosophila genes is available from Robinson and Hartemink [12].

4. Simulations and Evaluation

We impose a fan-in restriction of size = 3 on the maximal number of parent nodes per node as done in related articles [2, 4]. For the Gaussian BGe model the hyperparameters of the normal-Wishart prior were chosen maximally uninformative subject to certain regularity constraints [14]. In the second study, the hyperparameters of the discrete multinomial BDe model were chosen as explained in Heckerman and Geiger [16], whereby the total prior precision parameter was set to 1.

We set 𝐾 = 1 0 , and the burn-in and the sampling-phase of MCMC runs were set to 500,000 iterations each, and we sampled every 1,000 iterations during the sampling-phase. For each data set, we started 5 independent MCMC simulations from different initialisations, and we computed the potential scale reduction factor P S R F [23] based on the marginal edge posterior probabilities. As we observed, a sufficient degree of convergence for all data sets ( P S R F < 1 . 2 ), we report only the results of the empty-seeded (graphs without any edges) MCMC runs. We note that each single MCMC simulation (even for the bigger domains Arabidopsis with 𝑁 = 9 genes and Drosophila with 𝑁 = 1 3 genes) was accomplished within a few hours using Matlab© code on a SunFire X4100M2 machine with MAD Opteron 2224 SE dual-core processor.

More generally, we note that the computational complexity in network structure space is 𝑁 𝑘 = 0 ( 𝑁 𝑘 ) 𝑂 ( 𝑁 + 1 ) , where 𝑁 is the number of nodes and is the maximum number of parent nodes. The complexity in changepoint configuration space is of order 𝑚 𝐾 , where 𝐾 is the number of changepoints, and 𝑚 is the number of time points. Hence, the problem is of polynomial complexity in 𝑁 and 𝑚 , and not exponential complexity (i.e., it is not NP-hard). Polynomial complexity—as opposed to exponential complexity—does not impose any principled restrictions on the network size. However, the practical feasibility will depend on varying factors, like the efficiency of the software implementation and the capacity of parallel clusters. It also depends on the information content of the data, as increasing the number of nodes with limited number of experimental replications will increase the intrinsic uncertainty of inference. The practical decision on how many nodes and how large a network to consider will therefore usually be based on some preliminary data analysis.

We note that we allowed for self-feedback loops for the synthetic data only. For the three real applications to biological systems, we ruled out self-feedback loops altogether to enable direct comparability with the results reported for the BGM model [4].

If two independent data sets 𝐃 and 𝐃 are available for a network domain, predictive probabilities of the models BGM and B G M 𝐷 can be estimated straightforwardly [4]. For example, having sampled { 𝒢 𝑖 , 𝐕 𝑖 , 𝐾 𝑖 } 𝑖 = 1 , , 𝐼 from the posterior distribution 𝑃 ( 𝒢 , 𝐕 , 𝐾 | 𝐃 ) via the MCMC inference scheme described above, the predictive probability of B G M 𝐷 can be estimated by 𝑃 𝐃 𝐃 , B G M 𝐷 = 1 𝐼 𝐼 𝐾 𝑖 = 1 𝑖 𝑘 = 1 Ψ 𝒢 𝑖 , 𝐃 ( 𝐕 𝑖 , 𝑘 ) 𝐃 ( 𝐕 𝑖 , 𝑘 ) , ( 1 7 ) where Ψ 𝒢 𝑖 , 𝐃 ( 𝐕 𝑖 , 𝑘 ) 𝐃 ( 𝐕 𝑖 , 𝑘 ) = 𝑁 𝑛 = 1 𝑡 𝐕 𝑖 ( 𝑡 ) = 𝑘 𝑃 𝑋 𝑛 ( 𝑡 ) = 𝐃 𝑛 , 𝑡 𝜋 𝑛 ( 𝑡 1 ) = 𝐃 ( 𝜋 𝑛 , 𝑡 1 ) , 𝜽 𝑛 𝜽 × 𝑃 𝑛 𝜋 𝑛 , 𝐃 ( 𝐕 𝑖 , 𝑘 ) 𝑑 𝜽 𝑛 . ( 1 8 ) That is, when computing the BGe score for the compartment 𝐷 ( 𝐕 𝑖 , 𝑘 ) with (6) and (7), the local prior distributions 𝑃 ( 𝜽 𝑛 | 𝜋 𝑛 ) in (7) are replaced by posterior distributions 𝑃 ( 𝜽 𝑛 | 𝜋 𝑛 , 𝐃 ( 𝐕 𝑖 , 𝑘 ) ) . This results in a straightforward modification of the BGe score as follows. In (13) in Geiger and Heckerman [14], those training data that have been allocated to component 𝑘 , symbolically 𝐃 ( 𝐕 , 𝑘 ) , are included in the conditioning part of the distribution, and the sufficient statistics are adjusted accordingly.

5. Results

5.1. Inference on Synthetic Data

For the synthetic sinusoid data, the true underlying network topologies are known so that the network reconstruction accuracy can be assessed via the area under the ROC (receiver operator characteristic) curve: AUC; this is a standard criterion that has been applied in numerous related articles (e.g., [24]). Figure 1 shows histograms of the average AUC scores obtained from the marginal edge posterior probabilities for the sinusoid networks with 𝑁 = 2 and 𝑁 = 4 nodes. Figure 1 is laid out as matrix, in which the rows and columns correspond to different 𝑐 𝑋 and 𝑐 𝑌 coefficients (noise levels). We note that an increase of 𝑐 𝑋 reduces the autocorrelation of 𝑋 , while increasing 𝑐 𝑌 blurs the functional dependence of 𝑌 ( 𝑡 + 1 ) on 𝑋 ( 𝑡 ) . The autocorrelation of 𝑌 is jointly influenced by both parameters. From the histograms it can be seen that the novel B G M 𝐷 model leads to a better network reconstruction accuracy in terms of average AUC values than the standard BGe model and the BGM model in the majority of scenarios. Further investigations showed that BGe and BGM yield lower AUC scores, since they tend to infer spurious self-feedback loops on node 𝑌 ( 𝑁 = 2 ) or on the nodes 𝑌 𝑖 ( 𝑁 = 4 ), respectively.

fig1
Figure 1: AUC histograms—Cross-method comparison on synthetic sine data. AUC scores for the synthetic sine network data with 𝑁 = 2 nodes (a) and 𝑁 = 4 nodes (b). The figure is laid out as a matrix, where rows and columns correspond to different noise levels 𝑐 𝑋 (rows) and 𝑐 𝑌 (columns). In each histogram, the white bar shows the average AUC score for the BGe model. The grey bar shows the average AUC score of the BGM model, and the black bar shows the AUC score for the proposed B G M 𝐷 model. Each histogram shows averages and standard deviations obtained from 50 independent data instantiations.

This trend can be visualised by histograms of the average edge posterior probabilities. As an example, Figure 2 shows the average marginal edge posterior probabilities of the four possible edges for the 𝑁 = 2 nodes sinusoid network with 𝑐 𝑋 = 0 . 5 and 𝑐 𝑌 = 0 . 5 . Consistently, all three models under comparison assign the highest posterior probability to the true self loop 𝑋 𝑋 and the lowest posterior probability to the false edge 𝑌 𝑋 . But BGe and BGM favour the spurious feed-back loop 𝑌 𝑌 over the true edge 𝑋 𝑌 while the proposed B G M 𝐷 suppresses the false self-feedback loop and assigns a higher edge posterior probability to the true edge 𝑋 𝑌 . This shows that B G M 𝐷 yields a higher network reconstruction accuracy (see Figure 1), as it is less susceptible to inferring spurious self-feedback loops (see Figure 2).

fig2
Figure 2: Edge Posterior Probabilities—Cross-method comparison on synthetic sine data. The figure shows three histograms of the inferred marginal edge posterior probabilities in the sinusoid network with 𝑁 = 2 nodes and 𝑐 𝑋 = 0 . 5 and 𝑐 𝑌 = 0 . 5 as obtained with BGe (a), BGM (b), and B G M 𝐷 (c). In each histogram, the four bars represent the four possible edges: Left: self-loop 𝑋 𝑋 (true); centre left: 𝑋 𝑌 (true); centre right: self-loop 𝑌 𝑌 (false); right: 𝑌 𝑋 (false). Each bar shows the average marginal posterior probability, averaged over 50 independent data instantiations. It is seen that BGe and BGM have a high propensity for learning the spurious feedback loop 𝑌 𝑌 (centre right white bars). B G M 𝐷 (right histogram) assigns a higher posterior probability to the correct edge 𝑋 𝑌 (centre left black bar) and suppresses the spurious feedback loop 𝑌 𝑌 (centre right white bar)
5.2. Inference on Macrophages Data

For the macrophages data the BGM model inferred a biologically plausible state change in the host macrophage brought about by infection (CMV) or immune activation (IFN 𝛾 ), and a less pronounced state change in the combined condition CMV+IFN 𝛾 [4]. We compare these findings with results obtained with the novel B G M 𝐷 model. The fraction of sampled states for which two time points 𝑡 𝑖 and 𝑡 𝑗 are allocated to the same component 𝑘 ( 1 𝑘 𝐾 ) can be used as a connectivity measure 𝐶 ( 𝑡 𝑖 , 𝑡 𝑗 ) , and the resulting temporal connectivity structures are displayed graphically as heat maps in Figure 3. All six heatmaps in Figure 3 reflect the two-stage nature of the gene-regulatory processes in the host macrophages: the first part (time points ( 𝑡 2 , , 𝑡 6 ) ) and the last part of the three time series (time points ( 𝑡 7 , , 𝑡 2 5 ) ) are allocated to different components. For all three conditions, a stronger separation between the two regulatory states is inferred by the B G M 𝐷 model (see Figures 3(d), 3(e), and 3(f)). It appears that the B G M 𝐷 inference results are more consistent, as even for the combined condition (CMV + IFN 𝛾 ) a clear trend towards a dichotomous regulatory process can be found (see Figure 3(d)). This finding (stronger separation) is consistent with our conjecture that the novel B G M 𝐷 assigns neighbouring time-points to the same compartment more likely a priori. Interestingly, the BGM inference outlier at time point 𝑡 9 in Figure 3(b) yields a certain trend for a subdivision of the second compartment ( 𝑡 7 , , 𝑡 2 5 ) by the B G M 𝐷 model. Instead of one outlying time point two substages ( 𝑡 7 , , 𝑡 1 0 ) and ( 𝑡 1 1 , , 𝑡 2 5 ) are inferred (see Figure 3(e)). To provide statistical evidence that the new B G M 𝐷 model does not overfit the data, we compute predictive probabilities for the B G M 𝐷 model and compare them with those reported for the BGM model [4].

fig3
Figure 3: Heat maps—macrophages data. Graphical heat map presentation of the temporal connectivity structure for the macrophage gene expression time series. (a), (b), and (c): Heat matrices for experiments CMV (a), IFNG 𝛾 (b), and CMV+IFN 𝛾 (c) inferred with the BGM model. (d), (e), and (f): Heat matrices for experiments CMV (d), IFNG 𝛾 (e), and CMV+IFN 𝛾 (f) inferred with the novel B G M 𝐷 model. Each heat map indicates the estimated posterior probability of two time points being assigned to the same compartment (mixture component). The probabilities are represented by a grey shading, where white corresponds to a probability of 1 , and black corresponds to a probability of 0 . The numbers on the axes represent the time points of the time course experiment.

To this end, we treat the three experiments as independent replications, and check via predictive probabilities whether the superiority of the proposed B G M 𝐷 model can be confirmed statistically. Table 1 gives the predictive probabilities reported in the BGM paper [4] and those obtained with the proposed B G M 𝐷 model. As the predictive probabilities for B G M 𝐷 are systematically better than those of BGM, we conclude that the B G M 𝐷 model yields more stable inference results, that is, a better generalisation performance.

tab1
Table 1: Predictive Probabilities: Macrophages data. Logarithmic predictive probabilities for the macrophage data: l o g 𝑒 ( 𝑃 ( 𝐃 | 𝐃 ) ) for BGe, BGM (as reported earlier [4]) and the new B G M 𝐷 model. The standard deviations of the logarithmic probabilities are given in brackets. We note that the BGM values could be confirmed by our reanalysis of the data: the deviations were smaller than one standard deviation.
5.3. Inference on Arabidopsis thaliana Data

For the Arabidopsis thaliana data, the BGM model also inferred a biologically plausible two-stage process [4]. In this application, the two stages are likely to be related to the diurnal nature of the dark-light cycle influencing the circadian genes. The plants were subjected to different prehistories, related to different lengths of the artificial, experimentally controlled light-dark cycle. The plants in experimental scenario 𝑇 2 8 were entrained in an increased day length of 14 hours light followed by 14 hours darkness and in experiment 𝑇 2 0 the plants were entrained in a decreased day length of 10 hours light followed by 10 hours darkness. As an effect of these two entrainments, a phase shift in the gene-regulatory processes between these two experiments was expected [4]. The BGM model inferred a certain trend for a phase shift of the changepoint (subjective day to subjective night) of about 4–6 hours as a consequence of the increased day length. The heat maps in Figures 4(a) and 4(b) show that the connected blocks (compartments) of the time series are shifted along the diagonal by 2-3 time-points (4–6 hours). The B G M 𝐷 model infers the same trend but with a stronger separation score between these compartments (see Figures 4(c) and 4(d)). We note that the B G M 𝐷 model is based on changepoints so that compartments once left cannot be revisited. That is, while the BGM model tends to allocate the first time points ( 𝑡 2 , 𝑡 3 ) and the last time points ( 𝑡 9 , , 𝑡 1 3 ) in experiment T 2 8 to one single component (light grey shading in the top right and bottom left area of the heat map in the top centre panel of Figure 4), the B G M 𝐷 model has to allocate the last time points ( 𝑡 9 , , 𝑡 1 3 ) to an additional third component, as the first compartment ( 𝑡 2 , 𝑡 3 ) cannot be reused after the transition to the second compartment ( 𝑡 4 , , 𝑡 8 ) .

fig4
Figure 4: Heat maps. Arabidopsis data. Graphical heat map representations of the temporal connectivity structures for the Arabidopsis thaliana data. (a) and (b): heat matrices for experiments 𝑇 2 0 (a) and 𝑇 2 8 (b) inferred with the BGM model. (c) and (d): heat matrices for experiments 𝑇 2 0 (c) and 𝑇 2 8 (d) inferred with the novel B G M 𝐷 model. Each heat map indicates the posterior probability of two time points being assigned to the same compartment (mixture component). The probabilities are represented by a grey shading, where white corresponds to a probability of 1 , and black corresponds to a probability of 0 . The numbers on the axes represent the time points of the time course experiment.

As for the macrophages data, predictive probabilities can be computed by treating the two experiments 𝑇 2 0 and 𝑇 2 8 as independent replications. Table 2 gives the predictive probabilities reported for the BGM model [4] and those obtained with the new B G M 𝐷 model. In consistency with the results for the macrophages data, the resulting two predictive probabilities for B G M 𝐷 are better than those of the BGM model. A scatter plot 𝐸 𝑖 , 𝑗 ( 𝑇 2 0 ) versus 𝐸 𝑖 , 𝑗 ( 𝑇 2 8 ) of the inferred (marginal) posterior probabilities of the individual edges for the B G M 𝐷 model inference is given in Figure 5. As the individual edge posterior probabilities 𝐸 𝑖 , 𝑗 do not differ drastically, we extract a network structure from the averaged probabilities: 𝐸 𝑖 , 𝑗 𝐸 = ( 𝑖 , 𝑗 ( 𝑇 2 0 𝐸 ) + 𝑖 , 𝑗 ( 𝑇 2 8 ) ) / 2 .

tab2
Table 2: Predictive probabilities: Arabidopsis thaliana data. Logarithmic predictive probabilities for the Arabidopsis thaliana data: l o g 𝑒 ( 𝑃 ( 𝐃 | 𝐃 ) ) for BGe, BGM (as earlier reported [4]) and the new B G M 𝐷 model. The standard deviations of the logarithmic probabilities are given in brackets. We note that the BGM values could be confirmed by our reanalysis of the data: the deviations were smaller than one standard deviation.
749848.fig.005
Figure 5: B G M 𝐷 scatter plot. Arabidopsis data. Scatter plot of the marginal edge posterior probabilities inferred with the proposed B G M 𝐷 model. In the plot the marginal edge posterior probabilities for time series 𝑇 2 0 : 𝐸 𝑖 , 𝑗 ( 𝑇 2 0 ) (horizontal axis) are plotted versus the marginal edge posterior probabilities for time series 𝑇 2 8 : 𝐸 𝑖 , 𝑗 ( 𝑇 2 8 ) (vertical axis).

Figure 6 shows the gene interaction network that is predicted when keeping all edges with marginal posterior probability above 0.5. There are two groups of genes. Empty circles in the figure represent morning genes (i.e., genes whose expression peaks in the morning), shaded circles represent evening genes (i.e., genes whose expression peaks in the evening). There are several directed edges pointing from the group of morning genes to the evening genes such that each evening gene is regulated by at least one morning gene. Moreover, the two genes LHY and CCA1 seem to play a central role. This result is consistent with biological findings, where the morning genes were found to activate the evening genes, with LHY and CCA1 being central regulators [25]. Our reconstructed network also contains edges pointing into the opposite direction, from the evening genes back to the morning genes. This finding is consistent with biological observations [25], where the evening genes were found to inhibit the morning genes via negative feedback loops. In the reconstructed network, there are 9 edges (drawn in black) originating from the four morning genes while only 7 edges (drawn in grey) originate from the group of five evening genes. Biologically, this means that the activity of the morning genes is stronger than the activity of the group of evening genes and that the regulatory mechanisms are dominated by the morning genes in the network topology. This finding is consistent with the fact that following the light-dark cycle entrainment, the experiments were carried out in constant-light condition, resulting in a higher activity of the morning genes overall. Within the group of evening genes, the reconstructed network contains an edge between GI and TOC1. This interaction has been confirmed independently [26]. Hence, while a proper evaluation of the reconstruction accuracy is currently unfeasible—like many related studies, we lack a gold-standard owing to the unknown nature of the true interaction network—our study suggests that the essential features of the reconstructed network are biologically plausible and consistent with the literature.

749848.fig.006
Figure 6: Predicted regulatory network of nine circadian genes in Arabidopsis thaliana. From the averaged marginal edge posterior probabilities (average of 𝐸 𝑖 , 𝑗 ( 𝑇 2 0 ) and 𝐸 𝑖 , 𝑗 ( 𝑇 2 8 ) ) of the proposed B G M 𝐷 model inference results a regulatory network can be extracted. Empty circles represent morning genes. Shaded circles represent evening genes. Edges indicate predicted interactions with an inferred marginal posterior probability greater than 0.5. Edges are black (grey) if they refer to a morning (evening) gene as regulator.
5.4. Inference on Drosophila melanogaster Data

For an independent comparison of the free allocation model and the changepoint model, we carried out an analysis similar to Robinson and Hartemink [12] on the binary Drosophila muscle development gene expression time series. This time series can be analysed with the discrete counterparts BDM and B D M 𝐷 of our Gaussian Mixture models; see Section 2.3 for details. The graphical heat map representations in Figure 7 show that the BDM model does not infer the morphological stages of Drosophila melanogaster. Almost all time-points are strongly connected (white shading) and no separated blocks of connected time points have been inferred. Only a few time-points are allocated (as outliers) to other mixture components (black vertical and horizontal lines).

fig7
Figure 7: Heat maps for the discrete Drosophila data. Graphical heat map representations of the temporal connectivity structures for the Drosophila melanogaster data. (a) Heat matrix inferred with the BDM model. (b) Heat matrix inferred with the novel changepoint variant B D M 𝐷 . Each heat map indicates the posterior probability of two time points being assigned to the same compartment (mixture component). The probabilities are represented by a grey shading, where white corresponds to a probability of 1 , and black corresponds to a probability of 0 . The numbers on the axes represent the time points of the time course experiment. Both axes have been ticked at the three real morphological stage transitions: embryonic to larval ( 𝑡 3 1 𝑡 3 2 ), larval to pupal ( 𝑡 4 1 𝑡 4 2 ), and pupal to adult ( 𝑡 5 9 𝑡 6 0 ).

The novel B D M 𝐷 infers a clear block structure with different separated compartments. The connectivity structure corresponds well to the first two morphological transitions (i) embryonic to larval ( 𝑡 3 1 𝑡 3 2 ) and (ii) larval to pupal ( 𝑡 4 1 𝑡 4 2 ), whereby the separation between the embryonic and the larval stage is less pronounced (grey shading) than the separation between the larval and the pupal stage (black shading). The exact third morphological transition pupal to adult ( 𝑡 5 9 𝑡 6 0 ) has not been inferred but it can be seen that two changepoints occur during the pupal stage before the third transition to the adult stage. Figure 8 shows a graphical presentation of the changepoint location posterior probabilities inferred with B D M 𝐷 and the same trends become obvious: the first two stage transitions have been inferred correctly while two further changepoints occur before the morphological stage transition from pupal to adult. We note that Robinson and Hartemink did find the same trends with their 𝑛 𝑠 𝐵 𝐷 𝑒 model and they conclude that the third premature transition in the gene regulatory process can be explained biologically, since the gene expression program governing the transition from pupal to adult morphology should be active well before the time of the real morphological transition [12].

749848.fig.008
Figure 8: B D M 𝐷 transition posterior probabilities for Drosophila data. Graphical representation of the posterior probabilities of the changepoint locations inferred with the novel B D M 𝐷 . The transition posterior probabilities (vertical axis) are plotted against the time axis (horizontal axis). The time axis has been ticked at the three real morphological stage transitions: embryonic to larval ( 𝑡 3 1 𝑡 3 2 ), larval to pupal ( 𝑡 4 1 𝑡 4 2 ), and pupal to adult ( 𝑡 5 9 𝑡 6 0 ). We note that the BDM model is based on a free individual allocation of time points so that a transition posterior probability plot cannot be interpreted properly.

6. Discussion

Our empirical results have shown that the proposed B G M 𝐷 model performs consistently better than the BGM model. A possible explanation could be related to the latent space complexity of the models. For two components, we have 𝑚 2 changepoint locations, but 2 𝑚 1 free allocations. This difference in latent space complexity gets aggravated for more components. The difference in performance between B G M 𝐷 and BGM could therefore be a consequence of the different degrees of convergence of the MCMC simulations. However, our convergence diagnostics based on potential scale reduction factors [23] did not indicate any significant difference in the convergence. It therefore appears that the higher latent space complexity of the BGM model does not adversely affect the MCMC convergence for gene expression time series of the length investigated in our study. This suggests that another explanation for the better performance of B G M 𝐷 over BGM has to be found.

We will discuss that the performance difference is most likely a consequence of the different prior probabilities intrinsic to the models, which determine the factor 𝑅 in (10). Since both models BGM and B G M 𝐷 employ the same Poisson distribution with parameter 𝜆 = 1 truncated to the interval [ 1 , 𝐾 ] as prior for the number of mixture components, the difference between the two models BGM and B G M 𝐷 is imposed by the prior distribution of the allocation vector 𝑃 ( 𝐕 | 𝐾 ) . While the BGM model is based on a free allocation, the B G M 𝐷 model takes the time structure into account and employs a changepoint process. In this section we describe the differences between the two models in detail by three theoretical examinations, and these examinations reveal trends that appear to be immediately related to some of our empirical findings.

In the first study, we compare the (temporal) connectivity structures that are introduced by the prior 𝑃 ( 𝐕 | 𝐾 ) . To this end, we infer the prior distribution 𝑃 ( 𝐾 ) 𝑃 ( 𝐕 | 𝐾 ) of both models BGM and B G M 𝐷 via MCMC simulations that are purely prior-driven. That is, we employ an empty data set (without any data points) so that all acceptance probabilities depend on the prior probability ratios only, as 𝑃 ( 𝐃 = | ( 𝒢 , 𝐕 , 𝐾 ) ) = 1 for all combinations of 𝒢 , 𝐕 , and 𝐾 . Note that sampling network structures via edge operation moves on the graph 𝒢 becomes obsolete, because (i) the graphs do not have any effect on the likelihoods for an empty data set, and (ii) the graph priors cancel out in the prior probability ratio if a uniform graph prior 𝑃 ( 𝒢 ) = c o n s t is used. We set 𝑚 = 2 6 ; note that in this theoretical consideration based on empty data, 𝑚 only determines the length of the allocation vector.

After running 10 independent MCMC simulations with 𝑚 = 2 6 to infer the prior distribution 𝑃 ( 𝐾 ) 𝑃 ( 𝐕 | 𝐾 ) , we can compute the average prior connectivity strengths from the sampled allocation vectors. As before, the fraction of sampled allocation vectors for which two time points 𝑡 𝑖 and 𝑡 𝑗 are allocated to the same component 𝑘 ( 1 𝑘 𝐾 ) can be used as a connectivity measure 𝐶 ( 𝑡 𝑖 , 𝑡 𝑗 ) . Figure 9 shows heat maps of the inferred prior connectivity structure for the free allocation BGM model and the proposed changepoint process B G M 𝐷 model. The heat maps confirm our earlier conjecture that the proposed B G M 𝐷 model, which takes the time structure of the data into account, allocates neighbouring time points to the same compartment more likely a priori than the BGM model. More precisely, it can be seen from Figure 9(a) that the prior connectivity strength 𝐶 ( 𝑡 𝑖 , 𝑡 𝑗 ) is the same for all 𝑡 𝑖 and 𝑡 𝑗 with 𝑡 𝑖 𝑡 𝑗 in the BGM model. On the contrary, for the B G M 𝐷 model (Figure 9(b)), the connectivity strength 𝐶 ( 𝑡 𝑖 , 𝑡 𝑗 ) decreases with the temporal distance between 𝑡 𝑖 and 𝑡 𝑗 : for three time points 𝑡 𝑖 , 𝑡 𝑗 and 𝑡 𝑘 with 𝑡 𝑖 < 𝑡 𝑗 < 𝑡 𝑘 we have 𝐶 ( 𝑡 𝑖 , 𝑡 𝑗 ) < 𝐶 ( 𝑡 𝑖 , 𝑡 𝑘 ) . This finding explains wh