Recent Theory and Applications on Inverse Problems 2014View this Special Issue
Research Article | Open Access
Abel Palafox, Marcos A. Capistrán, J. Andrés Christen, "Effective Parameter Dimension via Bayesian Model Selection in the Inverse Acoustic Scattering Problem", Mathematical Problems in Engineering, vol. 2014, Article ID 427203, 12 pages, 2014. https://doi.org/10.1155/2014/427203
Effective Parameter Dimension via Bayesian Model Selection in the Inverse Acoustic Scattering Problem
We address a prototype inverse scattering problem in the interface of applied mathematics, statistics, and scientific computing. We pose the acoustic inverse scattering problem in a Bayesian inference perspective and simulate from the posterior distribution using MCMC. The PDE forward map is implemented using high performance computing methods. We implement a standard Bayesian model selection method to estimate an effective number of Fourier coefficients that may be retrieved from noisy data within a standard formulation.
Inverse problems are typically ill-posed and analytical solutions are seldom available. Approaches to inverse problems in the interface of applied mathematics, statistics, and scientific computing represent a setting with a myriad of tools for robust solutions, for a number of reasons including its sound theoretical setting for uncertainty quantification [1, 2], prediction, and decision making. See  for a recent review. Although these interdisciplinary approaches are becoming prevalent [4–8], topics such as knowledge representation in complex systems  and effective dimension are not as mature yet. Sparsity promoting regularization and Bayesian methods have received attention recently [8, 10].
In this paper we consider a nonlinear acoustic scattering inverse problem as a case study. The rationale is that the well posedness of the direct problem and robust numerical methods has been comprehensively studied [11, 12]. This fact allows us to address the inverse problem with a reliable direct problem solver.
The propagation of acoustic waves in a homogeneous isotropic medium with constant speed of sound is governed by the linear wave equation: for a velocity potential and the speed of sound in the medium. For monochromatic time-harmonic waves with frequency we have where the space dependent term satisfies the Helmholtz equation. Consider the following: where is called the wave number. Given an obstacle of compact support (), its forward scattering problem is governed by the Helmholtz equation in . The total wave is a superposition of the incident wave and the scattered wave , and it is subject to a boundary condition on . The boundary condition may be of type (Dirichlet), (Neumann), or (impedance). Also, the scattered waves satisfy the Sommerfeld radiation condition and are referred to as radiating solutions of the Helmholtz equation.
Given an incident wave , the obstacle boundary uniquely determines the scattered wave , and every scattered wave which is a radiating solution of the Helmholtz equation has the asymptotic behavior of an outgoing spherical wave. Consider the following: where and are the scattering amplitude or far field pattern. We denote by the boundary to far field mapping: The mapping from the obstacle boundary to the far field scattering amplitude is injective [11, 12].
In this work we address the problem of estimating the boundary given noisy measurements of the far field data . This paper is organized as follows. For the sake of making this paper self-contained, in Section 2 we briefly review useful results used throughout the paper. The acoustic layer potential and integral equation method are presented in Section 2.1. We discuss the numerical solution of the forward mapping and its implementation in Section 2.2. In Section 2.3 we present in detail the Bayesian formulation of the inverse problem along with the probability distributions involved. The inverse problem results and the effective dimension analysis are presented in Section 3. We consider the effective dimension exploration as the main contribution of this work.
2. Materials and Methods
2.1. Single Layer and Double Layer Potentials
The fundamental solution of the Helmholtz equation (3) is for , where denotes the norm and is the Hankel function of first kind and order zero. Given an integrable function , the integral operators with being the unit normal vector to directed to the exterior of , are called acoustic single layer and acoustic double layer potentials for density , respectively. Both, the single and double layer potentials are radiant solutions of the Helmholtz equation in . The single layer potential is continuous in (including the boundary ) and consequently it is defined as in (8) also for . The double layer potential is only continuous in (not including ); nonetheless is defined in the boundary as its corresponding limit when approaches , namely, see  for details.
The direct problem is formulated in the form of a combined potential using the Dirichlet boundary condition: where is the incident wave, is a coupling factor (in our case we set ), and and are the acoustic potentials defined above.
The three potentials (single layer, double layer, and combined potential) reproduce the same far field pattern given a boundary . However, the combined potential operator in (11) is the sum of the identity and a compact operator, and therefore its singular values are bounded away from zero. Consequently, the combined potential is more stable regarding numerical implementation and then we use the combined potential in the remainder.
The far field pattern for the combined potential is given by The corresponding far field pattern for the single-layer potential is given by
The integral (12) can be evaluated numerically using the trapezoidal rule after solving the integral equation (11) for . It is known  that the trapezoidal rule has high accuracy when integrating over periodic functions and therefore is a sensible method to use in this case.
2.2. Numerical Solution of the Forward Map
As a test case we consider the two-dimensional kite-shaped domain shown in Figure 1. The parametric representation for the boundary is given by .
The combined potential equation (11) gives rise to a linear system upon discretization: where is the density, , and the matrix has the form where is the identity matrix and is the number of knots used to discretize the boundary . and are the discrete kernels corresponding to single layer potential and double layer potential, respectively (for details see ).
Matrix in (15) is square, nonsymmetric, dense, complex, and well conditioned. Mathematical software such as Matlab or Python can be used to evaluate the direct problem. However, it must be taken into account that a big number of evaluations of the forward mapping must be done in a Bayesian statistics approach.
Consequently, in order to have an efficient numerical machinery to evaluate the direct problem we have implemented a parallel version of the general conjugate residuals method (GCR) to solve the linear system (14). The GCR method was programmed in C on a graphics processing unit (GPU). Of note, a serial version of the GCR method programmed in C is slow, even compared with matlab and python solvers of the linear systems; see Figure 2. The implementation details for parallel computing scheme are presented on Appendix B. Although the evaluation of the far field pattern (12) is not computationally expensive, it can be done in parallel. Details are shown also in Appendix B.
The far field pattern value obtained for the kite in the directions closely resembles results previously reported in . The convergence of Nÿstrom method, used to numerically evaluate the integral operators, is illustrated on Figure 3. We use as a reference solution the far field pattern value with wave number , obtained with the combined potential for points, and we plot in logarithmic scale for -axis the relative errors for far field patterns obtained with combined potential (solid line in Figure 3) and single layer potential (dashed line in Figure 3) for number of points .
2.3. Bayesian Formulation
First let us remember the Bayes rule: The posterior density quantifies the uncertainty of the parameter to be recovered from data . The prior density expresses the information regarding the unknown parameter . is the likelihood function, that is, the measurement error distribution assumed, conditional on . The density is a normalizing constant. Below we describe these density functions for our inverse problem.
We consider the inverse problem defined by the nonlinear forward mapping (6) from the point of view of Bayesian statistics. In Bayesian statistical inversion theory, the solution of the inverse problem is the conditional probability of parameters, given the data (posterior probability). We assume that the data are noisy measurements of far field pattern and that the boundary may be described by a vector of parameters as follows.
We construct a prior model based on a parameterization for considering that it is a simple -periodic curve and . We express as follows: where and Thus the radius of is expressed in terms of a Fourier series. The vector of parameters contains the Fourier series coefficients in (18). For a boundary that is defined by a vector of coefficients we denote the forward mapping by . We refer to as the finite vector of Fourier coefficients to represent a boundary .
2.3.1. Posterior Distribution
The posterior distribution does not have to our knowledge a closed form. Consequently, we resort to Markov chain Monte Carlo (MCMC) simulation methods to analyze the posterior distribution. We use the t-walk , a general purpose sampling algorithm for continuous distributions, in order to sample satisfactorily from the posterior distribution. We describe briefly the t-walk algorithm and implementation details in Appendix A.
The t-walk algorithm evaluates the energy function (minus log of the nonnormalized posterior). Consider the following: for an arbitrary constant (i.e., minus log of the nonnormalized posterior) on each iteration. In practice the constant is not used and the energy is evaluated as minus log of the product likelihood times the prior one.
The output simulations of the t-walk are vectors of Fourier coefficients for fixed, each one of which defines a boundary through (18). We refer subsequently to the simulations , obtained with the t-walk, as MCMC simulations or posterior samples and we refer to a simulated boundary as the curve which is defined by a simulated vector .
2.3.2. Prior Distribution
We establish the prior model as follows. Suppose that corresponds to a star shaped domain. Then admits a representation (17)-(18). Furthermore, the mean square error for approximating by is given by Further details of this consequence of Parseval inequality are described in .
Due to smoothness of , the amplitude of the Fourier coefficients decays quickly along the series. In fact Note that is the position of the coefficient in the Fourier series. Therefore the norm of the coefficients is by definition of bounded by for some positive constant .
We pose a normal distribution for each pair of coefficients as follows: with mutually independent. We note that the variance was set to . The rationale is that with the variance in this way, we guarantee that the pair satisfies the inequality (24) with probability higher than . Moreover the variance decreases when increasing position , that is, for high order coefficients along the Fourier series. Consequently, this modeling incorporates the smoothness of on the prior distribution.
The coefficient in (18) is related to the size of the scattering object. We assume that
It is not clear how to set the constant in distribution (25). This constant can be seen as a scaling factor and the parameter controls the spread of the distribution by . A single constant can be chosen for all prior distributions (25) independently of the value of the index . For this aim we sample coefficients from the distributions (25) varying the value of . We add the condition on the radius (18) to guarantee that the sample curves do not intersect themselves. The sampled curves are presented on Figure 4. As we observe, using the values of and (see Figures 4(a) and 4(b)) we obtain very smooth curves. On the other hand, Figures 4(c) and 4(d) show curves with more oscillations. For our purposes, a value of is chosen.
Assume that the measurements of the far field pattern have additive Gaussian noise with for a boundary . The likelihood is given by the noise distribution assuming independent measurements. We assume that the measurements are taken on evenly spaced directions .
3. Results and Discussion
In order to avoid committing an “inverse crime”  in our numerical experiments we proceed as follows. First we generate synthetic noisy far field measurements from the kite-shaped object (see Figure 1). Second, this kite is not obtained from any finite Fourier series expansion and it is therefore not included in any of our models. This forces our methodology to cope with diverse shapes, being the finite Fourier series only an approximation. Third, synthetic data generation is done using the single-layer potential approach, whereas for MCMC steps we evaluate the forward mapping using the combined potential. Of note, both the single and the combined potential approaches approximate the same far field pattern when is large enough, however they are numerically different.
3.1. Inverse Problem Results
For our experiments we discretize the boundary of the kite-shaped object with 256 points. The far field pattern was computed on 16 evenly spaced points. We use 16 incident waves and we set the coupling parameter to be equal to the wave number . The synthetic data are generated with additive Gaussian noise whit (equivalently SNR = 0.97). We generate 100,000 MCMC simulations of the posterior distribution for each number of coefficients (). The transient stage of the MCMC is taken into account and we remove the first 5,000 iterations in each case. We highlight that it takes about 7 hours to produce 100,000 MCMC simulations on a Python implementation. With our parallel implementation the execution time is reduced to about 1 hour.
Due to space constrains, we present only marginal posterior probability distributions of cosine terms for coefficients in Figure 5. As we observe, for each coefficient, the corresponding distribution is unimodal with low variation. The same feature is presented for coefficients for cases (not shown).
We present in Figure 6 the probability region (in gray) for the number of Fourier coefficients for cases . For this aim we draw 1000 simulated boundaries subsampled with their corresponding IAT factor in each case. The true kite-shaped boundary is shown as a red dashed line. From Figure 6 it is apparent that as we increase the number of coefficients the samples have more oscillations and the probability region becomes imprecise. This fact exhibits the numerical instability of Fourier-based representation for the solutions of the inverse problem of interest.
Also, in Figure 6 we show what we refer to as MAP boundary (dashed line) and mean boundary (continuous line) for each value of . The MAP boundary is the curve defined by the simulated vector that has the lowest energy value (maximum a posteriori). The mean boundary is the curve defined by the mean of each coefficient of simulated vectors . MAP boundaries and mean boundaries seem to approximate the true curve with approximately the same quality. In fact, we cannot qualitatively distinguish the best approximation from coefficients to . This issue gives rise to the analysis presented on the next section.
3.2. Effective Dimension
We refer to effective dimension as the number of parameters that can be properly retrieved from a noisy data set. In the context of this paper, effective dimension is related to the uncertainty quantification of parameters, that is, identifying the number of Fourier coefficients on the representation given by (17)–(20), required to approximate properly the boundary of the kite (see Figure 1) from synthetic far field measurements. As an aside comment we want to mention that effective dimension can be seen as an indicator for a parsimonious approximation regarding computational cost. However this feature is not used here due to the fact that effective dimension analysis is performed as a posterior procedure. In this section we have changed the notation for the distributions to be consistent with .
We define a collection of models given by with . is a vector of random variables. A realization of is a vector defined in (20), and is the corresponding posterior distribution for model . For simplicity we use as index to indicate the model corresponding to Fourier coefficients.
We use the super-model approach; that is, a new model is defined where is the indicator of the model and .
The posterior probability for model of explaining the observed far field pattern is given by where is the prior probability of model and is the corresponding normalizing constant (also known as the marginal likelihood) for model . A straightforward way to perform model selection is by computing (32) for all the models and choosing the model with highest probability. Indeed, the difficulty regarding the computation of (32) is evaluating the normalizing constant . To perform our MCMC only the nonnormalized posterior is needed and therefore was not previously calculated (this is the usual case when doing MCMC in a fixed number of parameters in a Bayesian application). Nevertheless, the MCMC simulations may be used to estimate and provide information to explore the effective dimension. This is explained below.
The ratio of two normalizing constants is defined as follows: where . Considering a uniform prior distribution for the models, that is, , model is better than model when , and they are indistinguishable if and is better than when . Then, the best model has the highest normalizing constant value.
A consistent estimator of the ratio is where are completely known importance sampling densities for and , respectively, and the sets and are samples of the posterior distribution for models and , respectively, (see  for details).
We observe from (34) that for a model , the product is the corresponding likelihood times prior distribution. This product is in fact the energy (21) using a suitable constant . Then the resulting sample from MCMC posterior simulations is used to compute (34), as a byproduct of the former with marginal computational cost added. Some details about approximating normalizing constant from MCMC simulations are discussed in .
The importance sampling density must be comparable to the product (35). We propose to numerically estimate the density for every model using a Kernel density estimation (KDE)  of the posterior distribution using the MCMC sample. A rough estimate may be used, for the sole purpose of estimating .
The kernel density estimator has the form where the kernel is a bounded density on and is the bandwidth. For this estimator we define a Gaussian kernel and a bandwidth where and is the standard deviation for each coefficient on the model . We compute for each model then all the models can be compared together. Then effective dimension corresponds to the model with the highest normalizing constant, equivalently the lowest value of minus logarithm of the normalizing constant.
We present in Figure 7 the minus logarithm of the estimated normalizing constants (39) for all models (red line). As we see in this figure, the lowest value is obtained with coefficients. Also we present in the same Figure the distance (norm of the pointwise difference) between the radius of the kite and the radius of the simulated boundaries. The continuous line shows the difference with respect to the mean boundaries. The dashed line corresponds to the distances with respect to the map boundaries. The boxplot in the same figure corresponds to distances of the last 1000 simulations of the posterior. These last simulations are taken and subsampled with their corresponding IAT factor. These boundaries are in fact the probability region shown in Figure 6.
Based on the Bayes factors, the best model corresponds to (we recall that ). Of note, there are MAP boundaries (e.g. ) with lower mean squared error compared with the MAP for . Also the mean boundaries from to have lower distances to the MAP. However as we observe on the boxplot, the mean of the error decreases from 3 to 11 Fourier coefficients and then oscillations appear, increasing spread and increasing error from 13 to 33 coefficients. Despite the low error of MAP and mean boundaries for models with 13 to 33 coefficients, due to the spread shown in the boxplot, the best model is certainly given by Bayes’ factor (i.e., for ). A qualitative analysis that agrees with these results is also discussed in Section 3.
Fourier series is a classical representation for smooth periodic closed curves. Furthermore, we have used a well understood and high order numerical method for forward mapping evaluation. However, these two elements by themselves are not enough to have confidence regarding the solution of the inverse problem. The quality of the solution depends also on the number of coefficients of the Fourier series used for the representation.
Our effective dimension discussion relies on MCMC methods to provide a quantitative strategy to determine the number of parameters that can be retrieved given a data set.
In this work we address the acoustic inverse scattering problem with a classical Fourier-based representation of the solution. We pose the inverse problem as a Bayesian inference problem and use the output of a MCMC method (namely, the t-walk) for our effective dimension results. For the corresponding direct problem we have used the classical layer potential approach, which was solved in a fast and reliable manner with a robust numerical method and parallel computing. Using Fourier series to represent solutions allows for a straightforward formulation that incorporates the smoothness of the solutions into the prior distribution. On the other hand, the finite Fourier representation is numerically unstable. Although other approaches to represent the scattering obstacle are applicable (e.g., wavelet basis which correspond to Besov priors), a fundamental question remains: How much information can be retrieved, within the representation, from a noisy data set?
The main contribution of this paper is the effective dimension method, which is a quantitative method to estimate and quantify the uncertainty of the estimable parameters given a noisy data set. Given a parametric representation of the solution of the inverse problem, the effective dimension method is implemented via Bayesian model selection where the normalizing constant for each model is approximated using the MCMC output. Of note, the effective dimension method is applicable regardless of the parametric representation of the solution.
A. The t-Walk
The t-walk (for “traverse” or “thoughtful” walk) is a MCMC sampler for arbitrary continuous distributions that require no tuning. The t-walk maintains two independent points in the sample space and all moves are based on four proposals (walk, traverse, hop, and blow) that are accepted with a standard Metropolis-Hastings acceptance probability on the product space. These moves produce an efficient sampling algorithm that is invariant to scale and approximately invariant to affine transformations of the state space.
For an objective function (e.g., posterior distribution) , , and , a new objective function is defined as in the corresponding product space . Then two proposals are considered as follows: where is defined by one of the following four moves.
(i) Walk Move. The walk move is defined by the function for , where are i.i.d. r.v. with density with , and with . In this case the Hastings ratio is for both cases on (A.1).
(ii) Traverse Move. The traverse move is defined by the function where is a r.v. with density which is a mixture of densities and can be easily sampled as follows: with and . The acceptance ratios are for first and second cases on (A.1), respectively.
(iii) Hop Move. The hop move is defined by the function with and . For this proposal
(iv) Blow Move. The blow move is defined by the function with and . For this proposal
The numerical implementation of the algorithm only requires the user to define three functions.(i)Initialization. A function to generate the two different initial guesses and .(ii)Support. Defines the support of the target function. Points outside of the support are rejected.(iii)Energy. In this function the minus logarithm of the target density is evaluated.
The algorithm is available for download from Andres Christen’s personal web page http://www.cimat.mx/-jac/twalk and it has been implemented on C, C++, Matlab, R, and Python languages.
B. Parallel Computing
Although MCMC methods are by definition serial procedures, the high computational cost can be reduced by performing each evaluation of the objective function in a parallel computing scheme when possible. In this appendix we present a way to solve the direct problem for combined potential and Nÿstrom method.
We recall the linear system matrix for combined potential and Nÿstrom method: where are kernels for double layer potential and single layer potential (for details see ). The matrix coefficient corresponds to where for and for , , and , with , , and .
The evaluation of each entry of the matrix involves the numerical evaluation of Bessel functions which is computationally costly. On the other hand, each entry of the matrix is independent. Then the matrix setup is recommended to be performed in a parallel scheme in order to reduce the computing time. That is, given a number of knots we define a grid of threads in CUDA of size and each entry is evaluated in a different thread.
In order to solve the linear system (B.1) we choose the GCR method which is described in Algorithm 1. The dot products involved are performed based on the cascading algorithm of the optimizing parallel reduction in CUDA sample from Nvidia CUDA toolkit documentation (see web page http://docs.nvidia.com/cuda).
The most demanding part is computing the vector since a set of coefficients should be computed on each iteration. For this aim, we use CUDA blocks and we compute the corresponding within each block by using cascading algorithm.
When using multiple incident waves, the term becomes a matrix. This is equivalent to solve as many linear systems as incident waves. The same algorithm is performed using a CUDA block for each incident wave excepting the computing of which is done for a single incident wave at time. The computing of the far field pattern can be done in parallel by approximating the integral (12) for the multiple incident waves. However this latter evaluation is not expensive and it takes no advantage of the parallel computing.
In our experiments, this parallel computing implementation has a performance enhancement of about 7x speedup over the original. An implementation with a more effective method than this parallel version of GCR is left as future work.
Conflict of Interests
The authors declare that they have no conflict of interests regarding the publication of this paper.
The authors would like to thank Editor Fatih Yaman and anonymous referees for their constructive and detailed remarks which have helped to improve this paper in a major way. This research was supported by Grant GTO-2011-C04-168676 Guanajuato-CONACyT.
- T. Oden, R. Moser, and O. Ghattas, “Computer predictions with quantified uncertainty, part I,” SIAM News, vol. 43, no. 9, 2010.
- T. Oden, R. Moser, and O. Ghattas, “Computer predictions with quantified uncertainty, part II,” SIAM News, vol. 43, no. 10, pp. 1–4, 2010.
- C. Fox, H. Haario, and J. Christen, Bayesian Theory and Applications, Inverse Problems, 2013.
- S. Geman and D. Geman, “Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 6, no. 6, pp. 721–741, 1984.
- J. Kaipio and E. Somersalo, Statistical and Computational Inverse Problems, vol. 160 of Applied Mathematical Sciences, Springer, New York, NY, USA, 2005.
- A. Tarantola, Inverse Problem Theory and Methods for Model Parameter Estimation, SIAM, Philadelphia, Pa, USA, 2005.
- L. Biegler, G. Biros, O. Ghattas et al., Eds., Large-Scale Inverse Problems and Quantification of Uncertainty, Wiley Series in Computational Statistics, John Wiley & Sons, New York, NY, USA, 2011.
- V. Kolehmainen, M. Lassas, K. Niinimäki, and S. Siltanen, “Sparsity-promoting Bayesian inversion,” Inverse Problems, vol. 28, no. 2, Article ID 025005, 2012.
- U. Grenander and M. I. Miller, “Representations of knowledge in complex systems,” Journal of the Royal Statistical Society, vol. 56, no. 4, pp. 549–603, 1994.
- C. A. Zarzer, “On Tikhonov regularization with non-convex sparsity constraints,” Inverse Problems, vol. 25, no. 2, Article ID 025006, 13 pages, 2009.
- D. L. Colton and R. Kress, Integral Equation Methods in Scattering Theory, vol. 57, John Wiley & Sons, New York, NY, USA, 1983.
- D. Colton and R. Kress, Inverse Acoustic and Electromagnetic Scattering Theory, vol. 93 of Applied Mathematical Sciences, Springer, Berlin, Germany, 2nd edition, 1998.
- G. Dalquist and A. Bjorck, Numerical Methods in Scientific Computing, vol. 102, SIAM, 2008.
- M. Chen, Q. Shao, and J. G. Ibrahim, Monte Carlo methods in Bayesian Computation, Springer Series in Statistics, Springer, New York, NY, USA, 2000.
- M. A. Pinsky, Introduction to Fourier Analysis and Wavelets, vol. 102 of Graduate Studies in Mathematics, American Mathematical Society, 2002.
- P. de Valpine, “Improved estimation of normalizing constants from Markov chain MONte Carlo output,” Journal of Computational and Graphical Statistics, vol. 17, no. 2, pp. 333–351, 2008.
- B. Silverman, Density Estimation for Statistics and Data Analysis, Monographs on Statistics and Applied Probability no. 26, Chapman & Hall, New York, NY, USA, 1986.
Copyright © 2014 Abel Palafox et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.