#### Abstract

Source term reconstruction methods attempt to calculate the most likely source parameters of an atmospheric release given measurements, including both location and release amount. However, source term reconstruction is vulnerable to uncertainties. In this paper, a method combining Bayesian inference with the backward atmospheric dispersion model is developed for robust source term reconstruction. The backward model is used to quantify the relationship between the source and measurements and to reduce the search range of the Bayesian inference. A Markov chain Monte Carlo method is used to sample from the multidimensional parameter space of the source term. The source location and release rate are estimated simultaneously, and the posterior probability distribution is produced by applying Bayes’ theorem. The proposed method is applied to a set of real concentration data from the ETEX-I experiment. The results demonstrate that the source location is estimated to be −2.86° ± 1.01°E, 48.25° ± 0.33°N, and the release rate is estimated to be 20.16 ± 3.56 kg/h. The true source location is correctly estimated to be within a one standard deviation interval, and the release rate is correctly determined to be within a three standard deviation interval.

#### 1. Introduction

The accurate and timely evaluation of pollutant source (biological, chemical, and radioactive materials) parameters plays an important role in the emergency response. In nuclear accidents, such as the nuclear accident in Fukushima in 2011 and the Ruthenium-106 plume over Europe accident in 2017, the source locations or release rates are unknown. A major concern in these events is how to locate the pollution source and estimate the release rate. An effective way is to reconstruct the source by using environmental monitoring data and a dispersion model. Such methods have been applied to wind tunnel experiments [1, 2], tracer experiments [3, 4], and real accidents such as nuclear accidents [5, 6] and radionuclide emissions under normal conditions [7].

Among these studies, two notable methods include Bayesian inference and the backward atmospheric model. Compared with deterministic optimization methods which provide a single optimal solution, Bayesian inference can provide a probability distribution of the source parameters based on the prior information of unknown parameters and monitoring data, but this method is time consuming for enormous sampling [8]. The backward atmospheric model can rapidly provide the possible source region for each measurement site, but it relies on further analysis to quantitatively determine the position and release rate of the source [9].

However, there are few reports on the combination of Bayesian inference and a backward atmospheric model.

For robust source term reconstruction, a new method combining Bayesian inference with the backward run model FLEXPART is proposed in this study. The source-receptor relationship and the possible source region are obtained through backward simulations. A Bayesian method combined with MCMC sampling is used to estimate the source term position and release rate simultaneously, together with their posteriori distributions. The proposed method is validated using the ETEX-I experiment by comparing both the source parameters and the concentration simulations.

#### 2. Methodology

The source term reconstruction method based on Bayesian inference is a method that combines the prior information of unknown parameters with measurements to transform them into posterior information, and it has been applied to different scenarios [10–12]. The following sections briefly summarize Bayesian inference for source reconstruction and the derivation of the source-receptor relationship.

##### 2.1. Bayesian Inference for Source Reconstruction

The aim of Bayesian inference for source reconstruction is to determine the a posteriori probability distribution of source parameters. The governing equation is Bayes’ theorem, which can be expressed as follows:where *S* is the state vector formed by the source parameter and *M* is a vector formed by the measurements.

The terms in equation (1) can be identified as follows. First, *P*(*S*) is the prior probability for the source parameters prior to any knowledge. It mainly comes from previous data, historical experience, and subjective judgment. The prior probability distribution includes uniform and normal distribution. Second, *P*(*M* | *S*) is the likelihood function and is the probability given the source parameter *S*, which indicates the fitness between the modeled and measured concentrations. Here, we define the likelihood function as the normal distribution function.where *N* is the number of monitoring data, is the *i*th monitoring data, is the *i*th data predicted by the atmospheric transport model for a known source *S*, which will be described in Section 2.2, and is the error between and . Third, *P*(*M*) is the integration of *P*(*M* | *S*) *P*(*S*) at all source parameters and is referred to as the evidence. It can be simply a normalization constant. Finally, *P*(*S* | *M*) is the posterior distribution and the answer for the source reconstruction.

##### 2.2. Source-Receptor Relationship (SRR)

The determination of the likelihood function requires the computation of *C*_{i}, i.e., the simulated concentrations. Because Bayesian inference requires a repeated computation of *C*_{i}, a direct simulation using an atmospheric dispersion model can be highly computationally demanding. An alternative method is to calculate a matrix representing the relationship between the release source and the concentration at the stations and to use matrix-vector multiplication instead. This matrix is called a source-receptor relationship. The advantage of calculating an SRR is that the calculation of the simulated concentration can avoid the numerical modeling process, which usually takes several minutes or hours. There are two ways to obtain the SRR: one is based on the source, using forward simulation. A forward simulation can establish the relationship between one source (*x*_{0}, *y*_{0}, *q*_{0}) and a series of monitoring data *C* (*C*_{1,}*C*_{2}*,…, C*_{N}); the other is based on the receptor, using backward simulation [13]. By reversing the sign of the advection, a large number of particles are released at the receptor and transported backward in time. The residence time in the grid is proportional to the contribution of a source in the grid cell. One backward simulation can establish the relationship between one monitoring dataset *C*_{i} and a series of source data through the residence time *t*_{r}. When we fix the source parameters, the concentration *C*_{i} at a station can be computed as follows:

*V* is the grid volume, which is calculated based on the grid setting parameters, including the grid resolution and level heights, and d*t* is the time resolution in the ATM. A more detailed description of the backward simulation can be found in [13].

In the case of unknown source term locations, the possible number of source term locations is generally much larger than the number of monitoring data. More simulations are needed in the forward mode than in the backward mode. Under this condition, the backward simulation is more efficient than the forward simulation in computational cost.

#### 3. Sampling Method and Convergence

Markov chain Monte Carlo (MCMC) algorithms [14] are commonly used to generate random samples to obtain the posterior probability density function of source parameters. Many MCMC algorithms have been developed, including the Metropolis–Hastings (M-H) algorithm and Gibbs algorithm. Currently, the M-H algorithm is commonly used in source term reconstruction research; during one M-H sampling, parameters are disturbed simultaneously.

The process of the M-H algorithm is as follows:(1)Initiation: generate the initial point *I* = (*x*_{0}, *y*_{0}, *q*_{0}) depending on the prior probability.(2)Perturbation: the new source *I*′ = (*x*_{0} + d*x*, *y*_{0} + d*y*, *q*_{0} + d*q*) is obtained by unbiased perturbation.(3)Accept or reject: the likelihood functions and are calculated and compared, and the ratio is obtained. At the same time, a random number *r* between [0, 1] is generated. When *R* ≤ Δ*P*, *I*′ is accepted; otherwise, *I*′ is rejected.(4)Iteration: repeat Steps 2 and 3 until the convergence condition is reached. In practice, the convergence condition is reached when the posterior distribution is stable. For MCMC sampling with multiple chains, the between-chain variance *B* and within-chain variance *W* are generally used to evaluate the convergence of sampling results [15]. For *m* Markov chains of length *n*, the variance between chains *B* iswhere

The within-chain variance *W* iswhere

The convergence value iswhere

The posterior distribution of the source parameters can be obtained by statistical analysis of the stable results after the chains reach the convergence condition.

Therefore, the implementation process of source term reconstruction can be shown in Figure 1.

#### 4. Application

##### 4.1. Benchmark

In this section, the proposed methods are used to evaluate the source term location and release rate of ETEX-I.

ETEX-I is an atmospheric tracer experiment organized by many European countries [16]. It was carried out from October to November 1994. The tracer used was PMCH. The source location was Monterfil, Brittany, France (−2.0083°E, 48.058°N). A total of 168 monitoring stations were set up in 17 countries, and 3104 available monitoring data points were obtained. The monitoring stations and the release point are shown in Figure 2.

To investigate the ability of source term reconstruction with less monitoring data, only 11 stations with 58 monitoring data points were selected, including detection and no-detection data points. The distribution of selected stations is also shown in Figure 2.

##### 4.2. Atmospheric Transport Model and Meteorological Data

The atmospheric transport model used in this work is FLEXPART V8.2 [17]. The meteorological data are NCEP Climate Forecast System Reanalysis (CFSR) 6-hour products [18], a global gridded dataset with a spatial resolution of 0.5 ° × 0.5 and temporal resolution of 6 h.

The FLEXPART model is a Lagrangian particle diffusion model developed by Norwegian Atmospheric Research Institute. It can run in both forward and backward modes. This work mainly uses the backward mode. Based on the site locations and sampling information of the measurements, the backward simulation of each monitoring data point is carried out, and parts of the results are shown in Figure 3. The backward simulations show that the possible source region moves from the stations to the true source point.

##### 4.3. Prior Distribution of Source Term Parameters

According to Section 4.2, the backward simulation results, the possible sources are mainly distributed in Central and Western Europe, Southern England, and the surrounding maritime areas. Based on these results, the sampling space of source term parameters is set as follows: longitude (−10° ∼ 10°E), latitude (45° ∼ 55°N), and release rate (0 ∼ 100 kg/h). In this paper, assume the prior distribution of parameters is uniform. According to the prior distribution density function, 1000 source terms are generated as the initial points of the MCMC chain.

#### 5. Results

##### 5.1. Convergence

Source terms are iterated according to the sampling method in Section 3, and the iterative result of location parameters of one chain is shown in Figure 4. The initial point is located at (7.5°E, 53.25°N). During the iterative calculation of Figure 1, the refined source term gradually approaches the true source location and finally stabilizes near the true source location.

The evolution of the mean value and standard deviation of the estimated source parameters is calculated and shown in Figure 5. After 3000 iterations, the mean values of all three parameters become stable. The standard deviations also show obvious reductions for all three estimated parameters. For latitude, the standard deviation is 0.33° at the steady state, which is the smallest among the three parameters. The standard deviation of the longitude is slightly larger, which is 1.01° at the steady state. This phenomenon demonstrates that the uncertainties in the estimation of the two location parameters can be different. For the release rate, the standard deviation is 3.56 kg/h.

**(a)**

**(b)**

**(c)**

Based on the convergence method in Section 3, the convergence values of source term parameters are calculated. Figure 6 shows that, among the three parameters of longitude, latitude, and release rate, latitude converges the fastest, and the release rate converges the slowest. The relatively better convergences of the location parameters may be attributed to the better initial values provided by the backward simulations in Section 4.2.

##### 5.2. Estimated Results

Figure 7 displays the posterior probability distributions of the parameters. The sampling results and true source parameters based on the prior distribution of source terms are also shown in Figure 7. The posterior probability distribution reveals that the mean value, the standard deviation, and the most likely value of the source parameters are −2.86° ± 1.01°E (−2.25°E), 48.24° ± 0.33°N (48.25°N), and 20.16 ± 3.56 kg/h (21.0 kg/h). Figure 7(a) shows the longitude values’ posterior distribution. The distribution has two peaks to the west of the true point, with deviations of 0.25° and 1.5°. The two peaks are highly influenced by the wind field, which is mainly from west to east. Figure 7(b) shows the posterior distribution of latitude values. The majority of the values are within 0.5° of the true value. Figure 7(c) shows the release rate values of the accepted states. The release rate was correctly determined to be within a three standard deviation interval. The difference between the estimated release rate and the true value mainly comes from the model error. The statistics of the a posteriori distribution are listed in Table 1. Generally, the most likely values are closer to the true values than the mean values. These results indicate that the most likely estimate may be more reliable for source reconstruction in a real nuclear accident. Compared with the location, the release rate shows relatively larger errors and standard deviations.

**(a)**

**(b)**

**(c)**

##### 5.3. Comparison of Simulations and Measurements

The observed concentrations and concentrations simulated by using different source terms were compared to validate the proposed source reconstruction methods.(i)FWD_t: forward simulation with the true source term(ii)FWD_m: forward simulation with the mean source term in Table 1(iii)FWD_p: forward simulation with the most probability source term in Table 1

Figure 8 shows the plumes modeled by different source terms at different times. It is noticeable that the plume shapes are similar to each other for the same times, indicating that the meteorological input plays a dominant role in the simulation. Although the two estimates have relatively large differences in their longitude (Table 1), the concentration distributions are generally much the same, especially for the latter times (the middle and right columns of Figure 8). This phenomenon implies that the simulation is not that sensitive to the displacement of the source positions. However, the high-concentration area of the simulation using the true source term is quite different from that using the most likely and mean estimate, indicating that the release rate differences have a larger influence on the simulation. These differences in sensitivities to the source coordinate and the release rate may partially explain different deviations of the source reconstruction for these two variables (Table 1). At some measurements in the center of the plume, the simulation with the true release rate exhibits even larger deviations than the simulations using the other two source term estimates (the first column in Figure 8). This phenomenon indicates that the air dispersion model may have its own deviations, and such deviations are cancelled out by the deviations in the most likely estimate and the mean estimate to some degree. For those measurements outside the plume, the uncertainties in the meteorological input can be the major source of these deviations.

Figure 9 compares the temporal profiles of the observed and simulated concentrations at representative sites. For all 4 sites, the simulated timings of the peaks generally show deviations within a 3 h interval compared with the monitoring results, whereas the deviations of the simulated peak values are within 50% at these four sites. For more than 70% of stations, the deviations in the estimation of the peak values are within a factor of 5. There is also a difference in the number of peaks between the simulations and measurements. Taking station D06 as an example, a small peak is shown behind the main peak in the measurement data, while it is not shown in the simulated data. Again, it is noticeable that the simulated profiles using the true release rate show the most obvious deviations at D05 and D13, which significantly overestimate the number and magnitude of the peaks. For the other two source terms, the simulated peak timing is basically the same, but the peak values are different. Thus, in addition to the release rate, it is also necessary to reduce the uncertainties in the input meteorological fields and the physical processes of the dispersion model to improve the simulation results. This improvement shall be our future work.

**(a)**

**(b)**

**(c)**

**(d)**

Table 2 compares the quantitative metrics of the simulations using three different source terms. The simulations using the true release rate exhibit the worst metrics except for the correlation *R*, implying that the model uncertainties may be further reduced. For the other two source terms, their performances are quite close to each other, except FB.

#### 6. Conclusion and Discussion

This paper introduces a method to reconstruct the source parameters given a set of measurements. The method combines Bayesian inference (MCMC sampling) and ATM backward simulation and provides the a posteriori probability distributions of the source term. The method has been applied to a tracer experiment (ETEX-I) that occurred in 1994 in Europe. The relationship between the source and receptor is obtained by using the backward run mode of FLEXPART, which reduces the calculation requirement of the likelihood function in a large number of iterative samplings and improves the calculation efficiency. The reconstructed results are in good agreement with the true parameters of the source term. The position uncertainty was within 1.0°, and the deviation between the release rate and the real rate was within 3 times the standard deviation. After 3000 iterations, the method converges to a steady state, providing not only a single solution but also a posteriori probability distribution. Meanwhile, we find that the validation between the simulation using different source terms and the observation demonstrates that the major uncertainties come from the air dispersion model, including both the meteorological input and the physical process models. The reduction of these uncertainties can improve the source reconstruction accuracy, which will be addressed in our future work.

#### Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

#### Conflicts of Interest

The authors declare that they have no conflicts of interest.