Abstract
In this paper we investigate Monte Carlo methods for the approximation of the posterior probability distributions in stochastic kinetic models (SKMs). SKMs are multivariate Markov jump processes that model the interactions among species in biological systems according to a set of usually unknown parameters. The tracking of the species populations together with the estimation of the interaction parameters is a Bayesian inference problem for which Markov chain Monte Carlo (MCMC) methods have been a typical computational tool. Specifically, the particle MCMC (pMCMC) method has been shown to be effective, while computationally demanding method applicable to this problem. Recently, it has been shown that an alternative approach to Bayesian computation, namely, the class of adaptive importance samplers, may be more efficient than classical MCMClike schemes, at least for certain applications. For example, the nonlinear population Monte Carlo (NPMC) algorithm has yielded promising results with a low dimensional SKM (the classical predatorprey model). In this paper we explore the application of both pMCMC and NPMC to analyze complex autoregulatory feedback networks modelled by SKMs. We demonstrate numerically how the populations of the relevant species in the network can be tracked and their interaction rates estimated, even in scenarios with partial observations. NPMC schemes attain an appealing tradeoff between accuracy and computational cost that can make them advantageous in many practical applications.
1. Introduction
Stochastic kinetic models (SKMs) are multivariate systems that model interactions among species in ecological, biological, and chemical problems, according to a set of unknown rate parameters [1]. The aim of this paper is to study computational methods for the approximation of the posterior distribution of the rate parameters and the populations of all species, provided a set of discrete, noisy observations is available. This inference problem has been traditionally addressed in a Bayesian framework using Markov chain Monte Carlo (MCMC) schemes [1–4]. In [5], a particle MCMC (pMCMC) method [6] has been applied to LotkaVolterra and prokaryotic autoregulatory models. The pMCMC technique relies on a sequential Monte Carlo (SMC) approximation of the posterior distribution of the populations to compute the MetropolisHastings (MH) acceptance ratio.
However, MCMC methods in general and pMCMC in particular suffer from a number of problems. The convergence of the Markov chain is hard to assess and the final set of samples presents correlations which can greatly reduce its efficiency. Besides, MCMC methods do not (easily) allow for parallel implementation and turn out to be computationally intensive, even for relatively simple models. A classical technique to reduce the complexity of the existing MCMC methods when applied to SKMs is the socalled diffusion approximation of the underlying stochastic jump process [7]. However, the diffusion approximation breaks down in lowconcentration scenarios [5]. Other techniques, including linear noise approximations [8–11] and Gaussianprocess estimators of the likelihood [12], have also been proposed in the past few years to reduce the computational cost. The parameters of the MCMC proposal are also hard to choose and they determine the performance of the algorithm.
An appealing alternative to MCMC methods is the class of the population Monte Carlo (PMC) algorithms [13]. PMC is an adaptive importance sampling (IS) scheme [14] that yields a discrete approximation of a target probability distribution. The PMC algorithm has important advantages with respect to MCMC techniques. It provides independent samples and asymptotically unbiased estimates at all iterations, which avoids the need of a convergence period. Additionally, PMC may be easily parallelized.
On the other hand, the main weakness of IS and PMC is their low efficiency in high dimensional problems, due to the wellknown degeneracy problem [15]. The recently proposed nonlinear PMC (NPMC) scheme [16, 17] mitigates this difficulty by computing nonlinear transformations of the importance weights (IWs), in order to smooth their variations and avoid degeneracy. In [17] a rigorous proof of convergence of the nonlinear importance sampling scheme as the number of Monte Carlo samples increases is presented. Similarly to the pMCMC method in [5], the NPMC algorithm resorts to an SMC approximation of the posterior distribution of the populations to compute the IWs.
In this paper we apply the NPMC method to the estimation of both the parameters and the unobserved populations in a complex SKM modelling an autoregulatory feedback network with several species and noisy observations. We present numerical results to compare the performance of the stateoftheart pMCMC and the proposed NPMC in two scenarios of different dimension and with two different observation models. We show that the NPMC method outperforms the pMCMC method for the same computational cost.
The rest of the paper is organized as follows. In Section 2 we present an introduction to the basics of SKMs and the usual solutions to this Bayesian inference problem. In Sections 3 and 4 we describe the pMCMC and NPMC methods, respectively, when applied to the approximation of posterior distributions in SKMs. In Section 5 we numerically compare the performance of pMCMC and NPMC schemes when applied to an autoregulatory feedback network, an SKM that can be interpreted as a generalization of the paradigmatic LotkaVolterra model. Finally, Section 6 is devoted to the conclusions.
2. Bayesian Inference for Stochastic Kinetic Models
2.1. Stochastic Kinetic Models
A SKM is a multivariate continuoustime jump process modelling the interactions among different species or molecules in ecological, biological, and chemical systems [1].
Consider an ecological interaction network that describes the time evolution of the population of species related by means of rate equations where , are the random constant rate parameters and and , , , denote the coefficients of the population of the th species, before and after the th interaction, respectively. We will refer to and as reactant and product coefficients, respectively, by analogy with terminology commonly used in biochemical reactions. A matrix of size contains the reactant coefficients and, similarly, contains the product coefficients . The stoichiometry matrix of size is defined as . The vector contains the rate parameters.
Let , , denote the nonnegative, integer population of species at time , and let denote the state of the system at this time instant. Let denote the state of the system at discrete time instants , , i.e., where denotes a timediscretization period. We denote by the vector containing the population of all species at consecutive discrete time instants, i.e., .
The th rate equation describes an interaction that takes place stochastically according to its instantaneous rate or hazard function where the product of binomial coefficients represents the number of combinations in which the th interaction can occur, as a function of the population of the reactant species with population , . We additionally define the vector . The waiting time to the next interaction is exponentially distributed with parameter , and the probability of each type of interaction is given by .
See, e.g., [1] for a detailed description and discussion of the class of SKMs.
2.2. Bayesian Inference for SKMs
We consider the logtransformed rate parameters , where , , with prior pdf . The prior pdf (for simplicity of notation, in this section we use to denote the pdfs in the model. We write conditional pdfs as , and joint densities as This is an argumentwise notation, hence denotes the distribution of possibly different from of the initial population vector is denoted by . We assume that a linear combination of the populations of a subset of species is observed at discrete time instants corrupted by Gaussian noise, i.e.,where is the observation matrix with dimensions and is a multivariate Gaussian noise component with zero mean and covariance matrix . We denote the complete observation vector with dimension DN × 1 as .
The dynamical behavior of an arbitrary SKM may be described in terms of the following set of equations (that describe a statespace model with unknown parameters): where and denote the transition pdf and the likelihood function, respectively. The Gillespie algorithm [18] (see Appendix A for a detailed description) allows to perform exact forward simulations of arbitrary SKMs, drawing samples from the transition densities , , given a set of lograte parameters and an initial population . While the assumption Gaussian observational noise may not be adequate for some applications of SKMs, it has been employed by other authors in the past [5, 11] and, in particular, it has been used for computer experiments involving the prokaryotic autoregulation model [5] that we study numerically in Section 5.
In this paper, we aim at obtaining a Monte Carlo approximation of the full joint posterior distribution of the lograte parameters and the populations , with densitygiven the prior distributions and , the transition pdf , and the likelihood function constructed from (3). We note that this statespace model can represent the SKM reaction equations of Section 2.1, yet it is abstract enough to make it applicable to other, relatively similar, models. For example, the assumption of mass action kinetics is not explicitly needed for this representation; hence other reaction rates could be incorporated into the statespace model (and the inference algorithms to be derived from it).
We are also interested in computing approximations of the posterior marginals of the rate parameters and the species populations as well as their moments (e.g., the posterior mean), which are of the form where is a real, integrable function.
Bayesian inference based on exact stochastic simulations from generated via the Gillespie algorithm often becomes practically intractable even for models of modest complexity [7]. Thus, it is very common to resort to a continuous approximation of the underlying stochastic process, which is known as the diffusion approximation [1]. This approximation is known to be poor in lowconcentration scenarios and thus should be avoided for models involving species with a very low population. Alternatively, in [3] the authors propose a solution based on a moment closure approximation of the stochastic process. Other techniques for the approximate modelling of the underlying jump process that have become relatively popular in the past few years include Gaussian processes [12] and the linear noise approximation [8–10].
This inference problem has been traditionally addressed using MCMC methods, and IS based schemes have been avoided due to their inefficiency in high dimensional spaces [1]. In [2] various MCMC algorithms are evaluated in datapoor scenarios. In [5] a likelihoodfree pMCMC scheme [6] is applied to this problem. This method is, to the best of our knowledge, the most powerful, yet computationally expensive, method provided so far for this kind of applications.
In [16] a NPMC scheme is proposed for the approximation of the marginal posterior pdf . The performance of the NPMC method is tested in a simple SKM known as predatorprey model [20], providing excellent results with a low computational cost.
In this paper we compare the performances of the pMCMC and the NPMC methods in the approximation of the full joint posterior pdf in (5), which allows performing Bayesian inference for the rate parameters and the full sample path , including unobserved components.
3. Particle MCMC for SKM’s
The particle marginal MetropolisHastings (PMMH) algorithm is a pMCMC method originally proposed in [6] for Monte Carlo sampling from the full posterior distribution . The PMMH scheme suggests a proposal mechanism of the form . To be specific, a new candidate in the parameter space, , is drawn from an arbitrary proposal distribution , while the new candidate in the variable space, , is generated using an approximation of the posterior marginal constructed by means of an SMC algorithm (i.e., a particle filter) with particles and denoted . The probability of accepting the proposed pair is where is an unbiased approximation of the marginal likelihood of (i.e., ), computed, again, by way of a particle filter with particles. The PMMH algorithm is reproduced in Algorithm 1, and the SMC approximations of and are described in Appendix B. Full details can be found in [6]. Note that the forward simulation of the stochastic process in the particle filter may be performed exactly with the Gillespie algorithm or using a diffusion approximation.

In [5] the proposal is selected as a Gaussian random walk , ignoring the correlation structure of the target distribution. The variance has to be tuned and partly determines the performance of the algorithm (see [21] for some practical advice on the choice of this parameter). Low values of result in a poor exploration of the space of , while high values yield low acceptance rates. In both situations the resulting chain presents highly correlated samples and slow mixing properties. To reduce the correlation among samples it is common to thin the obtained chain, keeping a subset of equally spaced samples and discarding the rest. After removing the initial burnin samples and thinning the output, we obtain a Markov chain with correlated samples. Then, we may construct a sample approximation of the marginal posterior distributions of the parameters and the populations , as respectively, where and denote the unit delta measure centered at and , respectively. The approximation of the full joint posterior has the form
4. Nonlinear PMC for SKM’s
The PMC method [13] is an iterative IS scheme that generates a sequence of proposal pdf’s , , that approximate a target pdf along the iterations. The NPMC scheme is proposed in [16] and it introduces nonlinearly transformed IWs (TIWs) in order to mitigate the numerical problems caused by degeneracy in the proposal update scheme.
4.1. NPMC Targeting
We first consider as a target density the marginal posterior pdf of the parameters given the observation vector , i.e., . As in [16], we construct the proposal pdf , , as a Gaussian approximation of the target pdf obtained at the previous iteration , whose mean and covariance parameters are selected to match the moments of the previous sample set. The NPMC algorithm is displayed in Algorithm 2. Details and some simple convergence results can be found in [16]. See [17] for a rigorous analysis.

Similar to the pMCMC algorithm, in the NPMC implementation the densities and required in steps 2 and 3 are replaced by their SMC approximations, which are given in Appendix B. The NPMC method may also use either exact or approximate samples of the stochastic process, depending on the computational capabilities.
For the clipping procedure performed in step 4 we consider, at each iteration , a permutation of the indices in such that and choose a clipping parameter . We select a threshold value and clip the largest IWs, , . This transformation leads to flat TIWs in the region of interest of , allowing for a robust update of the proposal. The performance of the algorithm shows little sensitivity to the selection of the clipping parameter [16]. For simplicity, step 5 performs multinomial resampling.
At each iteration of the NPMC algorithm we may construct a discrete approximation of the posterior probability distribution described by the pdf , based on the set of samples and TIWs, as The choice of a Gaussian approximation of the proposal in step 6 is arbitrary. We have adopted it here for simplicity. Other families of pdfs can be used without modifying the rest of the algorithm.
4.2. NPMC Targeting
The NPMC method proposed in [16] may be readily applied to the approximation of the full joint posterior , in an manner equivalent to the pMCMC algorithm. We consider a sampling mechanism of the form , where samples are again generated from the latest proposal and are drawn from the SMC approximation obtained via particle filtering (the iteration index has been omitted for simplicity). Then, the standard, unnormalized IW associated with the pair is computed as and is independent of . This reveals that, when samples are drawn from , the algorithm yields a discrete approximation of the posterior distribution of the unobserved populations constructed as
Even though the proposed NPMC and the pMCMC require very similar computations for each pair of samples of and thus have an equivalent computational cost, the NPMC has a set of important advantages with respect to its MCMC counterpart. PMC methods provide independent sets of samples at all iterations and do not require a burnin period. On the other hand, the nonlinearity applied in the NPMC mitigates weight degeneracy, which is the main problem arising in conventional IS based methods, dramatically increasing its efficiency in high dimensional problems. As a consequence, we claim that the total number of samples, (and thus, the computational complexity), required by the NPMC can be significantly lower than that of pMCMC, . Additionally, contrary to pMCMC, which requires a careful choice of the proposal tuning parameter, the proposed method does not require the accurate fitting of any parameters.
The NPMC method processes a set of i.i.d. samples at each iteration, requiring a low number of iterations (around 10 for the type of problems addressed here) for convergence to the target distribution. The bulk of the computational cost of NPMC, as well as of pMCMC, is the SMC approximation of the likelihood function. In the pMCMC algorithm the samples are processed sequentially (one after the other), and this process cannot be parallelized. On the contrary, at each iteration of the NPMC method, the process of drawing samples and computing the associated IWs can be performed independently for each sample . Thus, steps 1, 2, and 3 can be easily parallelized, reducing the total execution time up to that of a single sample . On the other hand, steps 4, 5, and 6 require the complete set of samples and weights and must be performed in a centralized manner. However, these computations have a negligible cost in comparison with a single likelihood approximation. Thus, the parallelization of the NPMC method can allow for a reduction in execution time up to a factor . Note that we refer here to the parallelization of the PMC method and not of the SMC filter used to approximate the likelihood function.
An extensive numerical comparison of pMCMC versus NPMC for a complex SKM that can be interpreted as a generalization of the paradigmatic LotkaVolterra model is presented in Section 5.
5. Example: An Autoregulatory Feedback Network
In this section, we compare the performance of the pMCMC and the NPMC methods when applied to the problem of approximating the posterior distributions of the lograte parameters and the populations in a complex SKM given some observed data . This model is significantly more involved than the standard predatorprey system. It has been introduced in [7] and further analyzed in [1, 5] for biochemical processes. In particular, it has also been considered as a model for representing the mechanisms for autoregulation in prokaryotes. The model is minimal in terms of the level of details included and offers a simplistic view of the mechanisms involved in gene autoregulation. However, it contains many of the interesting features of general autoregulatory feedback networks and it does provide sufficient detail to capture the network dynamics.
5.1. Prokaryotic Autoregulatory Model
The prokaryotic autoregulatory model is a SKM that involves chemical species and reaction equations, , given by [7]
We construct the dimensional vector containing the population of all species at time instant as . Thus, we obtain a stoichiometry matrix of the form and the hazard vector is given bywhere the time dependance of the population of each species is omitted for notational simplicity.
This model involves a conservation law given by the relation , where is the number of copies of this gene in the genome. We could use this relation to remove from the model, replacing any occurrences of the latter in the hazard function with , but in this paper we abide by the notation in equation (15). Further details of this model can be found in [1].
5.2. Simulation Setup
We have selected most of the simulation parameters following [5]. The true vector of rate parameters which we aim to estimate has been set to which yields logtransformed rate parameters The initial populations and the conservation constant have been set to and , respectively. In all the simulations in this paper we have performed exact sampling from the stochastic model with the Gillespie algorithm to obtain the likelihood approximation via particle filtering. The number of particles for the SMC approximation has been set to for all the simulations (both the pMCMC and the NPMC algorithms converge (as and , respectively) even for fixed [6, 17]. We have tested numerically that for performance hardly improves, while running times obviously become larger).
Independent uniform priors have been taken for each . Opposite to [5], the initial populations are assumed unknown for the inference algorithm, which employs independent Poisson priors , with the parameters set to the true initial populations, that is, , . Following [5], the conservation constant is assumed to be known in the simulations.
We consider two different observation scenarios. In the complete observation (CO) scenario we assume that all species , , are observed at regular time intervals of length and corrupted by Gaussian noise with variance (assumed to be known). Thus, the observation matrix is of the form and the observations are given by In the CO case the complete vector of observations has dimension .
In the partial observation scenario (PO) only a linear combination of the proteins is observed, also contaminated by Gaussian noise, i.e., the observation matrix is given by (with dimension ) and the observations are generated as In the PO case, a vector of scalar observations with dimension is constructed as .
We remark that the assumption of the observation noise terms (either or ) displaying a Gaussian distribution is not needed to apply neither the pMCMC method nor the NPMC algorithm. Both techniques can be applied as long as the joint likelihood of the state and the parameters can be evaluated, numerically and pointwise (either exactly or approximately; see [22]). In particular, it is possible to plug a Gaussianprocess estimate of the likelihood [12] into the NPMC algorithm while running the rest of the algorithm in the same way as described in Section 4 (and still rely on the theoretical guarantees introduced in [22]).
5.3. Performance Evaluation
To evaluate the performance of the pMCMC and the NPMC methods we compute, in all the simulation runs, the mean square error (MSE) attained by the sample set that approximates the marginal posterior of , generated by both schemes.
For the pMCMC method, we compute the MSE of each parameter based on the size final output (after removing the burnin period and thinning), as
For the NPMC method, we compute the MSE associated with each parameter , , based on the unweighted sample set at the th iteration , as where is the th component of the mean vector and the variance term is the component of matrix .
We have chosen the MSE as a figure of merit because the observations are generated synthetically and, therefore, we have access to ground truth values for the rate parameters that can be used to compute MSEs. A direct comparison in terms of the posterior distributions (e.g., using the KullbackLeibler divergence or the total variation distance) is not possible because the posterior distribution of conditional on the observations is intractable.
However, the MSE cannot be computed in real problems, where the true parameters are unknown. To monitor the stability and the efficiency of the two sampling schemes based on the generated sample alone, we resort to the socalled normalized effective sample size (NESS), which is often defined differently for MCMC and IS schemes [23].
In the MCMC literature, the NESS gives the relative size of an i.i.d. (independent and identically distributed) sample with the same variance as the current sample and thus indicates the loss in efficiency due to the use of a Markov chain [23]. For pMCMC we compute the NESS from the final chain (after removing the burnin period and thinning) as where is the average autocorrelation function (ACF) at lag . For the computation of the NESS, we truncate when .
For IS methods, the NESS may be interpreted as the relative size of a sample generated from the target distribution with the same variance as the current sample. Even when high values of the NESS do not guarantee a low approximation error, the NESS is often used as an indicator of the numerical stability of the algorithm [24]. It cannot be evaluated exactly but we may compute an approximation of the NESS at each iteration of the NPMC scheme based on the set of TIWs as
In the next subsection we present, together with other numerical results, some comparisons involving the NESS for the PMCMC and PMC methods. The comparison should be taken with some caution because, although the interpretation of the NESS is the same for both techniques, the estimators have a different form and they are derived in a different way. Nevertheless, we believe that the NESS is a good numerical indicator of degeneracy for the two schemes, because it directly reflects poor mixing (high correlation) of the Markov chain and concentration of the weight in importance samplers, and we plot it with that purpose. See [25] for more details on the NESS and related statistics.
5.4. Simulation Results
We consider two simulation scenarios in which a different number of parameters is estimated.
5.4.1. Estimation of a Single Rate Parameter
In this section we present numerical results regarding the approximation of the posterior distribution of a single rate parameter and the populations , when the rest of parameters , are assumed known.
We compare the pMCMC and the NPMC methods in this simple scenario in order to show that the two algorithms perform almost equivalently in low dimensional problems. This is in a clear contrast with the results presented in Section 5.4.2, which show that the NPMC method can be more efficient in higher dimensional settings.
We have performed independent simulation runs of the pMCMC and the NPMC schemes in the CO and the PO scenarios, with independent population and observation vectors in each simulation . Both in the CO and the PO cases, the same true population trajectories were used, but the observations in the CO scenario, , and in the PO scenario, , differ. The number of observation times has been set to and exactly the same data has been used for NPMC and pMCMC.
Following [5], as a proposal pdf in the pMCMC scheme we consider a Gaussian random walk update with variance . We have performed simulations with different values and the best results were obtained with , which are presented here. A total number of iterations have been run in each simulation. A final sample of size has been obtained from each Markov chain by discarding a burnin period of samples and thinning the output by a factor of 9.
In the NPMC scheme, the number of iterations has been set to , the number of samples per iteration is and the clipping parameter is . In this way, the computational effort of the two methods is approximately the same, as they both generate samples in the space of .
In Figure 1 the final MSE obtained by the pMCMC (left) and the NPMC (right) algorithms for each simulation run is depicted versus the final NESS, in the CO and the PO scenarios. Note that the NESS is computed differently for pMCMC and NPMC. It can be observed that both algorithms perform similarly in this case, with an equivalent computational cost. Both algorithms attain on average lower MSE values in the CO scenario, as expected. However, the NESS also takes lower values in the CO case, which indicates a worse mixing of the Markov chains in the pMCMC algorithm and also higher degeneracy in the NPMC algorithm.
In Figure 2 the evolution of the NESS (left) and the MSE (right) along the iterations of the NPMC algorithm is represented, for the CO and the PO scenarios. It can be observed that both measures attain a steady value by the 5th iteration, both in the CO and the PO case, which suggests that actually less iterations are sufficient for this problem. Again, we observe that in the CO scenario both the NESS and the MSE reach lower values.
Figure 3 (left) plots the average ACF of the final pMCMC sample, after removing the burnin period and thinning the Markov chain by a factor of 9. Particularly high correlations are present in the CO case, leading to a poor NESS. Related to the ACF, the average sample acceptance probability in the pMCMC scheme in the PO scenario is 0.091, while in the CO scenario it is only 0.0034, which means that 910 samples are accepted out of in the CO case and only 34 in the CO case.
In Figure 3 (right) the final pdf estimates of the average simulation runs represented as big circles and crosses in Figure 1 are represented in the CO and the PO scenario, for the pMCMC and the NPMC schemes. For the pMCMC method we have built a Gaussian approximation of the posterior density based on the final MCMC sample . For the NPMC method, this approximation corresponds to the proposal pdf for the next iteration , i.e., , where the mean and variance terms and are computed as in It can be observed in Figure 3 (right) that very similar results are obtained by both algorithms in this scenario. The final MSE values obtained by the pMCMC and the NPMC methods, averaged over simulation runs, are shown in Table 1, together with the MSE corresponding to the prior distribution.
Figure 4 depicts the posterior mean of the populations, , obtained with pMCMC (left) as and with NPMC (right) as in the PO scenario. The results correspond to the particular simulation runs (different for pMCMC and NPMC) identified with big squares in Figure 1 and whose posterior approximations, , are shown in Figure 3 (right). It can be observed that, in the PO scenario, the trend of the population of all the species is reasonably identified, even though only a linear combination of the proteins is observed. In the CO scenario the populations of all species are accurately estimated and are not shown for conciseness. Note that the populations of all species are very low, which suggests that the diffusion approximation may perform poorly in this scenario.
The results presented in this section reveal a very similar performance of the two methods in this simple scenario, which suggests that the parameters of pMCMC have been properly tuned. Also in terms of computational complexity pMCMC and NPMC perform very similarly. The execution time per samples (one NPMC iteration and pMCMC iterations) for the pMCMC scheme is 312 seconds, while for NPMC it is 325 seconds, both in the CO and in the PO cases, on a 3GHz Intel Core 2 Duo CPU, with 4 GB of RAM. The stochastic forward simulation of the prokaryotic model with the Gillespie algorithm has been implemented in C, and the rest of the code in Matlab R2014b. Out of the execution time per iteration of the NPMC method, only 0.06 second correspond to batch steps 4, 5, and 6, and the rest of operations can be easily parallelized, as previously discussed.
However, the pMCMC method does not allow for an easy parallelization, provides a set of highly correlated samples (especially in the CO scenario) and requires the setting of the proposal variance , the burnin and the thinning parameters, which may not be straightforward, and determines the performance of the algorithm. On the contrary, the NPMC scheme provides uncorrelated sets of samples at each iteration and does not require the precise fitting of any parameters. Additionally, the computer simulations suggest that the convergence of the NPMC algorithm may be assessed observing the evolution of the NESS, which usually reaches a steady value simultaneously with the MSE.
5.4.2. Estimation of All the Parameters
In this section we present simulation results to evaluate the performance of the pMCMC and the NPMC schemes in the approximation of the posterior distribution of the rate parameters and the populations of all species, , assuming that all the rate parameters are unknown, again in the CO and the PO scenarios.
In this case, observation times are assumed for all the simulations. Again, independent simulation runs of each algorithm have been performed. The NPMC scheme has been run for iterations, with samples per iteration and clipping parameter . The pMCMC scheme has been run with iterations in each simulation run, a burnin period of iterations, and thinning the output by a factor of . With this setup the computational effort is approximately the same in the two schemes. The variance of the random walk proposal has again been set to , since this value seems to yield the best performance.
In Figure 5 the MSE (in logarithmic scale), averaged over the parameters , attained by the pMCMC (left) and the NPMC (right) algorithms, is represented versus the NESS, in the CO and PO scenarios. Simulation runs which attained a final MSE close to the global average value are indicated with big circles (CO) and squares (PO) on both plots. It can be observed that the pMCMC method performs similarly in both scenarios, in terms of MSE and NESS, yielding poor results in both cases. On the contrary, the NPMC method provides significantly better MSE results in the CO scenario, where a larger amount of information is available. The NPMC method does not present degradation due to the high degeneracy occurring in the CO scenario.
Figure 6 depicts the evolution along the iterations of the NESS (left) and the MSE (right) averaged over independent simulation runs for the NPMC algorithm. Both indices converge to a steady value in a low number of iterations also in this complex scenario. As expected, a significantly higher final MSE is attained in the extremely datapoor PO scenario.
In Figure 7 (left) the average ACF attained by the pMCMC in the CO and the PO cases is represented. Even after thinning the output, the sample correlation is extremely high in both scenarios, which leads to a very low NESS. The acceptance rate is also very low and very long chains are required to obtain reasonable results. In the PO scenario samples are accepted on average in a simulation run of samples (acceptance rate 0.0029). In the CO case, only samples are accepted on average (rate 0.0015).
Figure 7 (right) depicts the final Markov chain provided by the pMCMC method (after removing the burnin period and thinning the output) in the average simulation run represented with a big square in Figure 5 (left). It can be observed that the mixing of the chain is very poor, with a total number of accepted samples of 46 (close to the average). Many other simulations, in both the PO and the CO scenarios, provide even lower number of accepted samples and thus very inconsistent results.
Figure 8 depicts the final Gaussian approximations of the marginal posteriors , , obtained by the pMCMC and the NPMC methods, in the CO and PO scenarios, for the average simulation runs represented as big circles and squares in Figure 5. We can observe that the NPMC method provides a significantly better approximation of the lograte parameters in the CO scenario, where a larger amount of data is available, which is also clear from Figure 5 (right). However, the pMCMC on average performs similarly in both scenarios, due to the low efficiency of the pMCMC sampling scheme when the dimension of the problem (either or ) increases.
In Table 2 the MSE of each parameter averaged over independent simulation runs is shown, as obtained with the pMCMC and the NPMC schemes, for the CO and the PO experiments. In the CO case, NPMC provides homogeneous results for all parameters. On the contrary, in the PO case, some of the parameters (specially and ) are significantly poorly estimated, presenting a final MSE close to the initial value (which corresponds to the prior knowledge). The pMCMC scheme presents significantly higher MSE values than NPMC in both observation scenarios and for all parameters .
Figure 9 depicts the population posterior mean corresponding to the average simulation runs of the pMCMC and the NPMC methods in the PO scenario, represented as big squares in Figure 5. Again, the NPMC method provides more accurate estimates of the unobserved populations than the pMCMC method, especially for . In the CO scenario both methods provide good approximations of the populations of all species.
6. Conclusion
We have investigated the use of Monte Carlobased Bayesian computation methods for approximating posterior distributions of the parameters and the species populations in stochastic kinetic models. Specifically, we have applied both particle Markov chain Monte Carlo (pMCMC) and nonlinear population Monte Carlo (NPMC) methods, which rely on different sampling and approximation schemes. Both pMCMC and NPMC methods resort to a sequential Monte Carlo approximation of the posterior populations to estimate the unknown interaction parameters. However, the pMCMC constructs a Markov chain of correlated parameter samples while the NPMC algorithm is based on an importance sampling scheme with nonlinearly transformed weights to avoid degeneracy and the numerical problems typically arising in importance samplers when applied to high dimensional problems.
We have compared the performance of the two schemes for a challenging autoregulatory feedback network with 5 species and 8 unknown interaction rate parameters. Both methods have been applied without resorting to approximations of the underlying jump process (e.g., the diffusion approximation). We have shown how the NPMC algorithm outperforms the pMCMC method and requires only a moderate computational cost. Besides, the proposed method has a set of important features, common to all PMC schemes, as the sample independence, ease of parallelization, and compared to MCMC schemes, there is no need for convergence (burnin) periods.
Let us note that there is room for improvement of both algorithms in practical applications. To be specific, the pMCMC scheme we have assessed is constructed around a random walk Metropolis algorithm, but other MCMC schemes can be adopted; hence it is possible to further optimize its performance. The same is true for the NPMC method, however. In this case, the step to optimize in practice is the choice of the importance function. In our comparison we have used multivariate Gaussian proposals (because they are straightforward to fit) but other, more efficient proposals can indeed be used within the proposed framework. Also note that a computationally simpler approach to inference in some SKMs is the use of sequential Monte Carlo approximate Bayesian computation (SMCABC) techniques [26, 27]. In the SMCABC framework the statespace model (for ) is only used for forward simulation. There is no attempt to approximate the likelihood of , which is replaced by some simple summary statistic. SMCABC methods are attractive when computational cost is a major concern, but they can be expected to deliver an inferior performance (as they do not fully exploit the probabilistic model, when it is available).
Appendix
A. Gillespie Algorithm
The Gillespie algorithm [18], which is displayed in Algorithm 3, allows generating exact forward simulations of arbitrary SKMs, by drawing samples from the transition density , , given a set of rate parameters and an initial population . The algorithm can be run up to a number of reactions or for a given time interval .

B. Sequential Monte Carlo Approximation of and
In this appendix we provide details on the approximation of the posterior and the likelihood . For a given vector of lograte parameters , the standard particle filter (see, e.g., [19]) displayed in Algorithm 4 is run.

An approximation of the posterior may be constructed from the final set of samples and weights as the discrete random measure
The likelihood can be approximated in turn as
In order to obtain a sample from the approximation in the pMCMC or the NPMC schemes, we just draw a sample out of the set according to their IWs .
Data Availability
All used data are synthetic. The code is freely available upon request.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This research has been partially supported by the Spanish Ministry of Economy and Competitiveness (Projects TEC201569868C21R ADVENTURE and TEC201786921C21R CAIMAN). Inés P. Mariño also acknowledges support from the grant of the Ministry of Education and Science of the Russian Federation Agreement no. 074022018330.