Abstract

Linearized multiparameter inversion is a model-driven variant of amplitude-versus-offset analysis, which seeks to separately account for the influences of several model parameters on the seismic response. Previous approaches to this class of problems have included geometric optics-based (Kirchhoff, GRT) inversion and iterative methods suitable for large linear systems. In this paper, we suggest an approach based on the mathematical nature of the normal operator of linearized inversion—it is a scaling operator in phase space—and on a very old idea from linear algebra, namely, Cramer's rule for computing the inverse of a matrix. The approximate solution of the linearized multiparameter problem so produced involves no ray theory computations. It may be sufficiently accurate for some purposes; for others, it can serve as a preconditioner to enhance the convergence of standard iterative methods.

1. Introduction

The linearized inverse problem in reflection seismology aims at recovering model perturbations from data perturbations, assuming known reference or background model. Both reference model and model perturbations consist of material parameter fields (functions of position ) that characterize wave propagation. As the background model is considered known, we will suppress it from the notation and use the word “model” and the symbol and similar symbols to refer to model perturbations. The linearized scattering operator maps a model perturbation to the corresponding perturbation of the predicted data.

With these conventions, the inverse problem may be stated as follows: given observed data , find a model perturbation so that Interpreting (1) in a least squares sense yields the normal equations is the adjoint (transpose) of : it is a migration operator (to be precise, the result of reverse time migration, properly construed). The migration operator maps the data space to model space. We will refer to the migration output as the migrated image, acknowledging its typical role in seismic processing. is the normal operator, or Hessian (we will use these terms interchangeably).

An example of this setup is linear acoustics; we will treat this example explicitly in this paper. The model consists of the velocity and density (or impedance and density, or any other equivalent combination) perturbation fields , for instance. The forward map samples the solution of the linearized (Born approximation) acoustic wave equation for a source (right-hand side) indexed by source position, at receiver positions, possibly with additional filtering to emulate antenna patterns induced by source and receiver arrays.

In simulating seismic data generation with models such as linear acoustics, waves typically propagate over hundreds of wavelengths, and model fields must resolve features on the wavelength scale. The normal equations thus represent millions, or billions, of equations in the same number of unknowns, ruling out the possibility of solving them by means of direct matrix methods such as Gaussian elimination. This paper presumes that the computations implementing the application of the normal operator are carried out by “wave equation” methods, that is, finite difference or finite element simulation. Thus each application is an expensive, large-scale computational procedure, a fact which places practical limits on the number of steps taken in an iterative scheme to solve the normal equations (2). For example, Krylov subspace methods (such as the conjugate gradient method) require at least one application of the normal operator per iteration.

We present an efficient method to approximate the solution of the normal equations that requires a few applications of the normal operator (one for one-parameter inversion, two for two-parameter inversion, and six for three-parameter inversion). This method leverages the properties of the normal operator: under some conditions (background model parameter fields slowly varying on the wavelength scale, diving wave energy eliminated from data, data polarized by propagating phases), it is a special type of matrix-valued spatially varying filter [19]. This particular type of spatially varying filter, called a pseudodifferential operator in the mathematical literature [10], preserves the discontinuities (events) in the model to which it is applied while effecting a dip- and frequency-dependent scaling of the amplitudes. Its application also mixes events when several model parameters (density, - and -wave velocities, etc.) are present: thus a density event will show up in the velocity component of acoustic migration output, and viceversa, for example.

In this work, we shall show how to separate the events corresponding to various model parameters by means of several applications of the normal operator to permuted image vectors, so that the result differs from an inversion of the data by an overall spatially dependent filter, common to all components. We have previously solved the problem of estimating and correcting for such a filter [11, 12]: it is the same problem as occurs in the solution of a single-parameter linearized inverse problem. Combining the two techniques, we recover an accurate approximate inversion for all parameters, within constraints which we shall illustrate.

We review our approach to single-parameter inversion (based on constant density acoustic modeling, for instance) in Appendix B. It belongs to the genre of scaling methods [1316], which approximate the Hessian or its inverse by estimating a filter of some sort from a single application of the normal operator to a migrated image or some other test image. In previous work on single-parameter inversion, the second author pointed out that the pseudodifferential nature of the Hessian makes possible, a very simple, and accurate approximation, provided that the reflectivity dip is unique at every imaging location [16]. Our approach to single-parameter inversion without assuming uniqueness of dip (Appendix B) carries this idea further; it is similar to that described by Guilton [15], but with different constraints on filter structure motivated by the theory of the pseudodifferential normal operator cited above. The core of our method is the Pseudodifferential Operator Algorithm [17], an efficient algorithm for application of filters of the required type, which we review in Appendix A. Several other authors have explored similar pseudodifferential scaling methods based on other fast application algorithms [18, 19].

Note that data-adaptive scaling methods such as those cited in the previous paragraph differ in an essential way from scaling by an approximate diagonal of the Hessian [20, 21]. As noted by Rickett [14], a dip-independent scaling cannot well approximate an inverse Hessian in general, as the appropriate amplitude correction (from migrated image to inversion) depends on dip. On the other hand, these methods also differ from the so-called Kirchhoff or Generalized Radon Transform or Ray-Born inversion (e.g., [2225]): both are asymptotic and approximate (either implicitly or explicitly) the normal operator as a pseudodifferential operator, but Generalized Radon Transform inversion involves explicit geometric optics computations, while scaling methods do not. Migration deconvolution methods [26] also approximate the Hessian, or its inverse, but differ from scaling methods in a similar way: migration deconvolution filters are constructed from approximate (layered) Green's function computations, rather than by data-adaptive fitting of an a priori limited filter class. Several authors have explored construction of localized filter representation of the Hessian or inverse Hessian using a more general class of Green's functions [27, 28]; again, scaling methods do not require precomputation of a Green's function database.

It has long been recognized that the offset (or scattering angle) dependence of a reflection event encodes material parameter changes across a reflector. Approximations to this relation motivated by the analysis of reflection and transmission in layered media (the Zoeppritz equations and linearization thereof, [29, 30]) lead to amplitude-versus-offset (“AVO”) analysis techniques [31]. Linearized inversion for one or more material parameters might be viewed as a quantitative counterpart, that is, as AVO inversion, and attacked via iterative linear system solvers [3234]. Herrmann [18] uses a single-parameter inversion approach resembling ours to precondition iterative inversion for faster convergence; we envision a similar use of our techniques. Note that migration deconvolution has also been used as an aid to quantitative AVO [35]. As already mentioned, migration deconvolution relies on layered modeling to obtain approximate Green's functions. Pseudodifferential scaling is limited in other ways but does not require modeled wave dynamics used to construct the approximate Hessian to be close to those of layered media (and does not require the construction of Green's functions).

Several studies have analyzed the conditioning, or error propagation properties, of multiparameter inversion [34, 3638]. We will not discuss this important issue except in that characteristic conditioning behavior will be evident in our examples: for linearized acoustic inversion from surface data, velocity or impedance perturbations are considerably better resolved than are density perturbations.

We have used only synthetic data in the work reported here and rather simple synthetic data at that. In fact, the very few published examples of inversion (in the sense of noise-level data fit) of exploration-scale data [3941], strongly suggest that the success of linearized multiparameter inversion is sensitive to other aspects of the inverse problem: source pulse and radiation pattern estimation, background field approximation, modeling the physics accurately (including elasticity, attenuation, etc.), and to very careful data preprocessing. In this work, we chose to avoid these issues and focus only on the mathematical/computational issue of multiparameter inversion, given model-consistent data.

2. Theory and Methods

Recall that the aim of this paper is to solve the normal equations for a model vector consisting of component material parameter perturbation fields . The migration output is a vector of the same type as the model, that is, an -vector of components, which we shall call images, as they typically contain visual evidence of reflectors.

Since both input and output of the normal operator consist of -vectors of material parameter fields, it is natural to represent as an block matrix of operators mapping material parameter fields of type to those of type . Since material parameter fields are positive, it is possible to use relative perturbations to parametrize the model (thus, for instance), in which, case and its blocks are nondimensional. We chose not to nondimensionalize the problem as the techniques to be described below automatically produce results with appropriate units.

The Hessian may be treated as a matrix of spatially varying filters. The filter coefficient defining the component filter depends on both the spatial position and the wavevector k, in which denotes the spatial Fourier transform of the th model component . Denote by the operator defined by the matrix of filter coefficients via (4).

Under certain conditions, the normal operator is a matrix of spatially varying filters of a special type, known as pseudodifferential, to be described below. By Beylkin [1] and Rakesh [2] established the first results of this type, which were systematically extended by Nolan and Symes [4], Ten Kroode et al. [6], De Hoop and Bleistein [5], Stolk [42], Stolk and De Hoop [43] and others. We summarize the theory as follows: the Hessian is well approximated by a pseudodifferential operator provided (i)the material parameters in the background model vary smoothly on the scale of a wavelength (since the theory is asymptotic in frequency, the technical assumption is that they are smooth, that is, infinitely differentiable, but the practical meaning is as stated here); (ii)diving wave energy is not present in the data or has been muted or dip-filtered out; (iii)the data has been polarized into propagating phase components.

These conditions are likely to be essential: either major reflectors in the background model or nonpolarized multiphase data lead to Hessians which produce nonphysical reflector images at some distance from their sources, hence cannot be well-approximated by their near-diagonal behavior. Since pseudodifferential operators are nearly local, in a sense to be made precise below, Hessians producing major reflector shifts cannot be approximated by them. For that matter, no near-diagonal approximation to the Hessian could be accurate in such cases, so these same limitations would seem to apply to all scaling methods.

Pseudodifferential operators are distinguished from other types of spatially varying filters by strong constraints on the filter coefficients , which must exhibit polynomial-like growth in the wavenumber , as . more precisely, require that a real exist so that the mixed partial derivative of order grows like as : That is, grows like for large , and every mixed partial derivative in decreases the order of growth by the order of the derivative. It is also required that spatial derivatives do not increase the order of growth of the coefficient—we will not write down precisely how this additional constraint is imposed, referring the interested reader to Taylor [10] for a more complete account.

Condition (5) also describes the growth behavior of polynomials in , which are the filter coefficients of differential operators. Other functions of , for instance itself, also satisfy these conditions, so the type of filter described by (5) is more general than differential operators. These considerations led to the name “pseudodifferential operator” being applied to this type of filter. In fact, it is possible to approximate any pseudodifferential operator arbitrarily well by the composition of a differential operator and a power of the Laplacian (i.e., a filter of the form ). In this sense, pseudodifferential operators are nearly local, that is, diagonal.

By convention, filter coefficients obeying the growth rules described here are called symbols, and we shall use this terminology as well to distinguish these special filter coefficients from the Fourier representation of general filters. The number figuring in the constraint (5) is the order of the operator; it determines the extent to which application of the operator changes the growth (or decay) rate for in the Fourier domain, for large wavenumber. Note that order is inclusive, rather than precise: if an operator is of order , it is of order for any . However, order does differentiate pseudodifferential operators by the size of their effect on oscillatory data, just as is the case for differential operators. The work cited at the beginning of this section shows how the order of the Hessian depends on the dimension of the model and the source-receiver geometry. For example, for constant density acoustics in 2D, the order is 1; for 3D and a full range (i.e., ) of azimuths, the order is 2.

Equation (5) is a very strict constraint on the structure of the pseudodifferential class of spatially varying filters, so it should not be a surprise that the behavior of such filters is also highly constrained. Very important in the sequel are the composition properties: if is a ( matrix) pseudodifferential operator of order , is another of order , then (1)the product (in other words, composition) is again a pseudodifferential operator of order , and the symbol of the product differs from the product of the symbols by a symbol of order ;(2)consequently, if (scalar operators), then the commutator is of order .

That is, to leading order in wavenumber, the product of pseudodifferential operators is simply obtained by multiplying their symbols, and scalar operators commute. These properties single out pseudodifferential operators amongst general spatially varying filters. Proofs of these and other facts about pseudodifferential operators may be found in Taylor [10], for example.

The theory cited above also showed that the normal operator is “partly invertible”, that is, at each point in the subsurface, the Hessian scales spatial Fourier components with a certain range of dips by a positive multiple of . It follows from point (2) above that an operator whose symbol is the reciprocal of that of the normal operator for this range of dips, and simply set to zero with suitable tapering outside this range will function as an approximate inverse Hessian. This construction gives an approximate inversion for the single-parameter inverse Hessian (e.g., for constant-density acoustics), illustrates the utility of these very special filters, and is explained briefly in Appendix B, and in more detail in [11, 12].

Multiparameter scattering results in a matrix Hessian, mixing influences between various parameters. In some cases, the Hessian is a matrix of pseudodifferential operators. For example, the Hessian for acoustic scattering or for polarized elastic scattering () has this property. Acoustic scattering has only one (-wave) phase, so it has this property [44], as does elastic scattering with suitably polarized data [43]. The close relation between the matrix of symbols and the normal operator which it defines again allows us to calculate an approximate inverse via symbol computations.

To this end, recall the definition of the adjugate, or classical adjoint, of an matrix A, denoted by : it is the transpose of the matrix of cofactors of . Explicitly, in which the minor is obtained from by dropping the th row and th column. The cofactor is the number on the right-hand side of (6). In our case, the transpose of the cofactor matrix may be ignored, as the matrix and thus its adjugate are symmetric (since the Hessian is symmetric).

The significance of the adjugate is this: when the matrix is invertible, then the adjugate is proportional to the inverse, This relation is Cramer's rule and is the critical observation leading to our approximation method. Note that where is the identity matrix. If we define the adjugate of to be (i.e., the matrix pseudodifferential operator with symbol matrix ), with slight abuse of notation, then the multiplicative order properties of pseudodifferential operators described above imply that The equation above features another abuse of notation, with . The power of (8) is revealed when applied to (3) Equation (10) reduces the inversion of to two steps: (i) application of , followed by (ii) inversion of the scalar operator , after the application of the adjugate.

Note also that while may be dimensionally heterogeneous, has redimensionalized all components so that their units all bear the same relation to the units of the components of . Inversion of removes this common unit factor and correctly dimensionalizes the model estimate.

For inversion of , we resort to a method similar to the one we previously developed for (reviewed in Appendix B). First apply the normal operator again to the left-hand side of (10), to form Here, we have used the fact that scalar pseudodifferential operators approximately commute with matrices of pseudodifferential operators to commute and .

Equation (11) shows how to find a pseudodifferential approximate inverse to , by minimizing an objective function, all parts of which are readily computable over a suitable class of pseudodifferential operators . From (9), a solution will approximately invert and is scalar, since is. Since removes from the left-hand side of (10) a scalar filter, it plays a similar role to the scaling factors encountered in the various scaling methods mentioned in the introduction, so we will also refer to as a scaling factor, although it is a pseudodifferential operator.

The reader will recall that is only partially invertible for a range of model wavevectors , the symbol determinant is asymptotic to a positive multiple of and to a lower power outside of this “illuminated” cone of model wavenumbers. The method explained in Appendix B constructs as an approximation to the reciprocal of the symbol of in the illuminated wavenumber cone, tapered to zero outside of this cone.

The algorithm explained in Appendix B is iterative and requires at least one application of the trial scaling factor at each iteration. An efficient method for numerical application of pseudodifferential operators is therefore required; we use an algorithm introduced by Bao and Symes [17], which we review in Appendix A.

Having constructed , we approximate the solution of normal equations by

Equations (12) and (13) show that computation of the adjugate action is key to this approach, so we examine it more carefully. We will address the two parameter cases () explicitly, both for inspiration and because it is this case for which we present examples in the next section. Writing is It appears to be necessary to compute the various components of separately: for example, if the underlying theory is acoustic scattering, then might represent input and output density perturbations, and so on, so that each would appear to require a complete modeling/migration cycle.

In fact, it is possible to compute using only applications of , thus both reducing the cost of the application and eliminating the need to write special component-to-component migration codes. In the case, only one application of is required! In fact, here, is the so-called symplectic matrix, Note that units are implicitly changed (as they are in the application of the adjugate operator) and that in practice it is presumed that the discrete representations of the two parameters are the same, so that, for example, density can be swapped for velocity and vice versa.

Equation (16) implies that the application of the adjugate on the migrated image requires one application of , rather than the four that a naive implementation would suggest. Thus, approximating the inverse for requires two applications of the normal operator, plus a small amount of additional data manipulation.

The situation is more complicated for , but a similar result holds: it is possible to compute the application of the adjugate using five applications of the normal operator, a considerable improvement over the twenty-four that the naive algorithm based on the definition (6) would appear to require; note that separate application of every component by wave-equation methods is just as expensive as application of the entire operator . See Nammour [38] for details. We conjecture that a similar drastic reduction is possible for general .

As a final note, we can show explicitly for the sense in which equation (9) is an approximation, while equation (8) is exact. Denoting the commutator of two operators by , The theory of pseudodifferential operators shows that the commutator of two pseudodifferential operators is of lower order than their composition, that is, relatively negligible for highly oscillatory input fields. Using to mean “differs by a lower order error”,

3. Numerical Examples: Layered Variable Density Acoustics

As a first application to the two parameter inversion, we construct a variable density acoustics model perturbation consisting of a thin oscillatory velocity layer and a thin oscillatory density layer in a different place (see Figures 1 and 2). The background model is homogeneous with and . The two thin layers are treated as perturbations.

We simulated reflection data for this model using the IWAVE software developed by The Rice Inversion project in linearized (Born) modeling mode [45]. The model extends around 1.7 km in depth and 6.5 km horizontally. One source is put in the middle, and receivers are laid out to created an offset ranging [−2.7 km, 2.7 km], spaced 20 m apart and 40 m below the (absorbing) surface. The isotropic point source wavelet was a 2.5–5–15–20 Hz trapezoidal zero-phase bandpass filter. 3 seconds of data were recorded at 151 receiver positions. The boundary conditions were absorbing on all sides of the domain, in particular, free surface effects were not modeled. Since the model is layered, only a single source gather need be modeled.

All (reverse-time) migrations were also carried out with IWAVE [46]. The output is a layered model (i.e., two depth traces, one for each velocity and density perturbations), which we shall display as layered fields.

Migrating the Born data shows how migration mixes the effects of the two models in the two components of the migrated images (Figure 3). We shall refer to the migrated images as and to remain consistent with our notation where the vector of migrated images is . This example, albeit simple, stresses a new challenge of multiparameter inversion. For one parameter inversion, the events in the migrated image corresponded to events in the true model. In multiparameter inversion, events in the migrated images may correspond to an event in one or more of the components of the model. It is virtually impossible to tell that these migrated images correspond to a model with separate events for velocity and density without successful inversion.

Applying the scheme outlined above, we form The result is shown in Figure 4 and shows how one application of the normal operator effectively separated the contributions of the velocity and density events. It remains to effect an amplitude correction by approximating an inverse to . For this end, we are required to form , shown in Figure 5.

The final step corrects the amplitudes of by undoing the effect of , which yields an approximate inverse. This final step complements the separation we obtained earlier with an amplitude correction Figure 6 shows that the approximate inverse compares favorably with true model. An interesting observation on this result is the fact that the velocity model is better recovered than the density model: traces of the velocity event in the density model are more apparent than those of the density event in the velocity model. This observation is in accordance with the theoretical fact that the recovery of velocity in variable density acoustics is better conditioned than the recovery of density. The inversion is successful in that the inverted model succeeds in fitting of the target data (see Figure 7).

4. Discussion and Conclusion

We have presented a method to approximate the solution of the multiparameter linearized inverse problem an extension of Cramer's rule for matrices of pseudodifferential operators. The method consists of two steps: (1)reducing the multiparameter problem to a one parameter problem, which yields an amplitude scaling of the solution; (2)correcting the amplitudes of the result from the previous step to approximate the solution.

The application of the work flow to a layered variable density acoustics example shows how the effects of different material parameters, indistinguishable in the migrated images, are separated in the first step. The amplitude correction step successfully yields an approximate solution to the linearized inverse problem.

The work flow presented above applies without modification to any model. Results of application to models more complex than the layered model described above will be presented elsewhere. We point out that other aspects of the physics of seismic data generation will need to be accommodated if this (or any other) algorithm is to extract accurate results from field data. For example, as shown already by Minkoff and Symes [39], even when the Born approximation is adequate, the radiation pattern of the source has a first-order effect on the variation of amplitude with angle and must therefore be included in the parameters to be estimated.

We have extended the Pseudodifferential Operator Algorithm (Appendix A) to three spatial dimensions (3D) which allows approximate inversion of 3D models. The symbol is represented by a truncated spherical harmonics expansion. The formulation of the method in this case remains intact, though the cost of modeling and migration increases. This extension of the method is described in detail, and examples are given in Nammour [38].

The explicit derivation analogous to for more parameters (namely ) is similar to the one presented here but requires tedious algebraic manipulations. The case becomes relevant in the linearized inverse problem for linear elasticity, for example. Six applications of the normal operator are required to approximate the solution of the linearized inverse problem. See Nammour [38] for a detailed discussion.

We end by reminding the reader that the approach to approximate inversion pursued here relies on the validity of the pseudodifferential characterization of the Hessian. As mentioned in the theory section, two of the three conditions (background model smooth on the wavelength scale, polarized data) are essential for any near-diagonal Hessian approximation, adaptive or not, to be effective. These limitations seem likely to be acceptable in some exploration settings, for example, inversion of marine data for supersalt sediment properties with soft sea floor, but may seriously obstruct this type of technique in other settings.

Appendices

A. The Pseudodifferential Operator Algorithm

Bao and Symes [17] presents an algorithm to numerically apply pseudodifferential operators, which we use to represent the scaling factor in (12). We will discuss this algorithm in this section as it is central to the feasibility of the method presented here.

This discussion is restricted to 2D, so we may write . Recall that a pseudodifferential operator is characterized by its symbol and defined by where is the principal symbol, homogeneous of degree , and is the Fourier transform of .

Thus, writing , and using the homogeneity of , we have

Notice that is periodic and smooth in , and hence it admits a rapidly converging Fourier expansion. We thus truncate the Fourier series, approximating the symbol by its first Fourier modes, Plugging (A.3) into (A.1), we obtain

Fourier transform theory identifies as the symbol of , and and are, respectively, the symbols of and .

Sampling the field and the symbol ,

Choosing and yields the unaliased discretizations of the symbols of the square root of the negative Laplacian, and

Equation (A.4) suggests the following algorithm to estimate [17]. All Fourier transforms refer to a discrete Fourier transform. Compute . For each and , compute the discrete Fourier transform of .Initialize , for , , For (a)compute [(] for and ,(b)accumulate End

A straightforward discretization of (A.1) has a computational complexity of . The algorithm described above uses FFT (Fast Fourier Transform) and thus exhibits a complexity of . The appeal of this approach is that is independent of . In fact, applications to reflection seismology require that the symbol be smooth and slowly varying in , thus, may be captured accurately by a modest number of Fourier modes or, more explicitly, a small .

The dependence on dip is captured in the angle variable , and the method allows us to capture multiple dip events by increasing .

B. The One-Parameter Case

In this appendix, we review the method developed in [11, 12] to construct an approximate inverse Hessian in the one-parameter case. The aim is to solve where . We seek a pseudo differential scaling factor and formulate its recovery as an optimization problem. Given the migrated image mmig and the remigrated image , The scaling factor is chosen from a class of pseudodifferential operators represented numerically using the PsiDO algorithm (Appendix A). In this setting, the scaling factor approximates the action of the inverse of the normal operator on the migrated image . More specifically, The first of these equations expresses the true solution , the second approximate equality follows from because pseudodifferential operators approximately commute. Defining thus yields an approximation to the true model . Equation (B.3) shows that the scaling factor approximates the action of the inverse on the normal operator on the migrated image, and it is only in that sense that approximates .

The scaling factor is represented explicitly by the PsiDO algorithm,

We enforce the continuity of using a parsimonious basis technique. Let be a set of smooth shape functions (cubic b-splines, for example). Write and plug into the objective function, We have denoted by the objective function in (B.2), and used the fact that the images and are discretized on a grid , to write out the norm explicitly.

We use limited memory BFGS (lBFGS) to minimize the objective function (B.2), which requires the gradient to be supplied by the user. The gradient is easily derived as We have also explored enforcing the positivity of the scaling factor explicitly and other symmetries of the problem, which yields to slightly different representations of the scaling factor and in turn different objective function gradient. For more details on this point and a discussion about the ability of the method to resolve multiple dip events please consult [12].

Acknowledgments

This work was supported in part by the National Science Foundation under grant DMS 0620821, and by the sponsors of The Rice Inversion Project. The authors are grateful to an anonymous reviewer and to Guest Editor Sergey Fomel for thoughtful critiques.